|
|
|
|
移动端

嗦一嗦MySQL 8.0的新特性(二)

MySQL8.0 GA版本发布了,展现了众多新特性,本系列译文将整理为3篇,本文为第二篇,为大家介绍升级的部分新特性。

作者:张锐志来源:老叶茶馆|2018-05-31 12:52

技术沙龙 | 邀您于8月25日与国美/AWS/转转三位专家共同探讨小程序电商实战

 MySQL 8.0的新特性(二)

MySQL8.0 GA版本发布了,展现了众多新特性,本系列译文将整理为3篇,本文为第二篇,为大家介绍升级的部分新特性。

第一篇MySQL 8.0的新特性(一)

一、地理信息系统 GIS

8.0 版本提供对地形的支持,其中包括了对空间参照系的数据源信息的支持,SRS aware spatial数据类型,空间索引,空间函数。总而言之,8.0版本可以理解地球表面的经纬度信息,而且可以在任意受支持的5000个空间参照系中计算地球上任意两点之间的距离。

MySQL 8.0 delivers geography support. This includes meta-data support for Spatial Reference System (SRS), as well as SRS aware spatial datatypes, spatial indexes, and spatial functions. In short, MySQL 8.0 understands latitude and longitude coordinates on the earth’s surface and can, for example, correctly calculate the distances between two points on the earths surface in any of the about 5000 supported spatial reference systems.

  • 空间参照系 Spatial Reference System (SRS)

ST_SPATIAL_REFERENCE_SYSTEMS 存在于information schema视图库中,提供了可供使用的SRS坐标系统的名称。

每个SRS坐标系统都有一个SRID编号。8.0版本支持EPSG Geodetic Parameter Dataseset中的5千多个坐标系统(包括立体模和2D平面地球模型)

The ST_SPATIAL_REFERENCE_SYSTEMS information schema view provides information about available spatial reference systems for spatial data. This view is based on the SQL/MM (ISO/IEC 13249-3) standard. 

Each spatial reference system is identified by an SRID number. MySQL 8.0 ships with about 5000 SRIDs from the EPSG Geodetic Parameter Dataset, covering georeferenced ellipsoids and 2d projections (i.e. all 2D spatial reference systems).

  • SRID 地理数据类型 SRID aware spatial datatypes

空间类的数据类型可以直接从SRS坐标系统的定义中获取,例如:使用SRID 4326定义进行建表: CREATE TABLE t1 (g GEOMETRY SRID 4326); 。

SRID是适用于地理类型的数据类型。只有同一SRID的的数据才会被插入到行中。与当前SRID数据类型的数据尝试插入时,会报错。未定义SRID编号的表将可以接受所有SRID编号的数据。

Spatial datatypes can be attributed with the spatial reference system definition, for example with SRID 4326 like this: CREATE TABLE t1 (g GEOMETRY SRID 4326); 

The SRID is here a SQL type modifier for the GEOMETRY datatype. Values inserted into a column with an SRID property must be in that SRID. Attempts to insert values with other SRIDs results in an exception condition being raised. Unmodified types, i.e., types with no SRID specification, will continue to accept all SRIDs, as before.

8.0版本增加了 INFORMATION_SCHEMA.ST_GEOMETRY_COLUMNS 视图,可以显示当前实例中所有地理信息的数据行及其对应的SRS名称,编号,地理类型名称。

MySQL 8.0 adds 

the INFORMATION_SCHEMA.ST_GEOMETRY_COLUMNS view as specified in SQL/MM Part 3, Sect. 19.2. This view will list all GEOMETRY columns in the MySQL instance and for each column it will list the standard SRS_NAME , SRS_ID , and GEOMETRY_TYPE_NAME.

  • SRID 空间索引 SRID aware spatial indexes

在空间数据类型上可以创建空间索引,创建空间索引的列必须非空,例如: CREATE TABLE t1 (g GEOMETRY SRID 4326 NOT NULL, SPATIAL INDEX(g));

Spatial indexes can be created on spatial datatypes. Columns in spatial indexes must be declared NOT NULL. For example like this: CREATE TABLE t1 (g GEOMETRY SRID 4326 NOT NULL, SPATIAL INDEX(g));

创建空间索引的列必须具有SRID数据标识以用于优化器使用,如果将空间索引建在没有SRID数据标识的列上,将输出waring信息。

Columns with a spatial index should have an SRID type modifier to allow the optimizer to use the index. If a spatial index is created on a column that doesn’t have an SRID type modifier, a warning is issued.

  • SRID 空间函数 

8.0 增加了诸如 ST_Distance() 和 ST_Length() 等用于判断数据的参数是否在SRS中,并计算其空间上的距离。到目前为止,ST_Distance和其他的空间关系型函数诸如ST_Within,ST_Intersects,ST_Contains,ST_Crosses都支持地理计算。其运算逻辑与行为参见 SQL/MM Part 3 Spatial

MySQL 8.0 extends spatial functions such as ST_Distance() and ST_Length() to detect that its parameters are in a geographic (ellipsoidal) SRS and to compute the distance on the ellipsoid. So far, ST_Distance and spatial relations such as ST_Within, ST_Intersects, ST_Contains, ST_Crosses, etc. support geographic computations. The behavior of each ST function is as defined in SQL/MM Part 3 Spatial.

二、字符集 Character Sets

8.0版本默认使用UTF8MB4作为默认字符集。相比较5.7版本,SQL性能(诸如排序UTF8MB4字符串)得到了很大的提升。UTF8MB4类型在网页编码上正占据着举足轻重的地位,将其设为默认数据类型后,将会给绝大多数的MySQL用户带来便利。

MySQL 8.0 makes UTF8MB4 the default character set. SQL performance – such as sorting UTF8MB4 strings – has been improved by a factor of 20 in 8.0 as compared to 5.7. UTF8MB4 is the dominating character encoding for the web, and this move will make life easier for the vast majority of MySQL users.

  • 默认的字符集从latin1变为 utf8mb4 ,默认排序校对规则从 latin1_swedish_ci 变为utf8mb4_800_ci_ai。

The default character set has changed from latin1 to utf8mb4 and the default collation has changed from latin1_swedish_ci to utf8mb4_800_ci_ai.

  • utf8mb4同样也成为libmysql,服务端命令行工具,server层的默认编码

The changes in defaults applies to libmysql and server command tools as well as the server itself.

  • utf8mb4同样也成为MySQL测试框架的默认编码

The changes are also reflected in MTR tests, running with new default charset.

  • 排序校对规则的权重与大小写基于Unicode委员会16年公布的Unicode 9.0.0版本。

The collation weight and case mapping are based on Unicode 9.0.0 , announced by the Unicode committee on Jun 21, 2016.

  • 在以往的MySQL版本中,latin1编码中的21种语言的特殊大小写和排序校对规则被引入了 utf8mb4 排序校对规则。例如:捷克语的排序校对规则变成了utf8mb4_cs_800_ai_ci。

The 21 language specific case insensitive collations available for latin1 (MySQL legacy) have been implemented forutf8mb4 collations, for example the Czech collation becomes utf8mb4_cs_800_ai_ci. See complete list in WL#9108 . See blog post by Xing Zhang here .

  • 增加了对特殊语境和重音敏感的排序校对规则的支持。8.0版本支持 DUCET (Default Unicode Collation Entry Table)全部三级排序校对规则。

Added support for case and accent sensitive collations. MySQL 8.0 supports all 3 levels of collation weight defined by DUCET (Default Unicode Collation Entry Table). See blog post by Xing Zhang here.

  • utf8mb4 的 utf8mb4_ja_0900_as_cs 排序校验规则对日语字符支持三级权重的排序。

Japanese utf8mb4_ja_0900_as_cs collation for utf8mb4 which sorts characters by using three levels’ weight. This gives the correct sorting order for Japanese. See blog post by Xing Zhang here.

  • 对日语有额外的假名支持特性, utf8mb4_ja_0900_as_cs_ks中的ks表示假名区分。

Japanese with additional kana sensitive feature, utf8mb4_ja_0900_as_cs_ks, where ‘ks’ stands for ‘kana sensitive’. See blog post by Xing Zhang here.

  • 把 Unicode 9.0.0之前所有排序校验规则中的不填补变成填补字符,此举有利于提升字符串的一致性和性能。例如把字符串末尾的空格按照其他字符对待。之前的排序校验规则在处理这种情况时保留字符串原样。

Changed all new collations, from Unicode 9.0.0 forward, to be NO PAD instead of PAD STRING, ie., treat spaces at the end of a string like any other character. This is done to improve consistency and performance. Older collations are left in place.

See also blog posts by Bernt Marius Johnsen here, here and here.

 

三、数据类型 Datatypes

  • 二进制数据类型的Bit-wise操作

8.0版本扩展了 bit-wise操作(如bit-wise AND等)的使用范围,使得其在所有 BINARY 数据类型上都适用。在此之前只支持整型数据,若强行在二进制数据类型上使用Bit-wise操作,将会隐式转换为64位的BITINT类型,并可能丢失若干位的数据。

从8.0版本之后,bit-wise操作可以在 BINARY 和BLOB类型上使用,且不用担心精确度下降的问题。

MySQL 8.0 extends the bit-wise operations (‘bit-wise AND’, etc) to also work with [VAR]BINARY/[TINY|MEDIUM|LONG]BLOB. Prior to 8.0 bit-wise operations were only supported for integers. If you used bit-wise operations on binaries the arguments were implicitly cast to BIGINT (64 bit) before the operation, thus possibly losing bits. 

From 8.0 and onward bit-wise operations work for all BINARY and BLOB data types, casting arguments such that bits are not lost.

  • IPV6操作

8.0版本通过支持 BINARY 上的Bit-wise操作提升了IPv6数据的可操作性。5.6版本中引入了支持IPv6地址和16位二进制数据的互相转换的INET6_ATON() 和 INET6_NTOA() 函数。

但是直到8.0之前,由于上一段中的问题我们都无法讲IPv6转换函数和bit-wise操作结合起来。由于 INET6_ATON() 可以正确的返回128bit的VARBINARY(16),如果我们想要将一个IPv6地址与网关地址进行比对,现在就可以使用 INET6_ATON(address)& INET6_ATON(network) 操作。

MySQL 8.0 improves the usability of IPv6 manipulation supporting bit-wise operations on BINARY data types. In MySQL 5.6 we introduced the INET6_ATON() and INET6_NTOA() functions which convert IPv6 addresses between text form like 'fe80::226:b9ff:fe77:eb17' and VARBINARY(16). 

However, until now we could not combine these IPv6 functions with bit-wise operations since such operations would – wrongly – convert output to BIGINT. For example, if we have an IPv6 address and want to test it against a network mask, we can now use INET6_ATON(address)& INET6_ATON(network) because INET6_ATON() correctly returns the VARBINARY(16)datatype (128 bits). See blog post by Catalin Besleaga here.

  • UUID 操作

8.0版本通过增加了三个新的函数(UUID_TO_BIN(), BIN_TO_UUID(), 和 IS_UUID())提升了UUID的可用性。UUID_TO_BIN()可以将UUID格式的文本转换成VARBINARY(16), BIN_TO_UUID()则与之相反, IS_UUID()用来校验UUID的有效性。将UUID以 VARBINARY(16) 的方式存储后,就可以使用实用的索引了。

 UUID_TO_BIN() 函数可以原本转换后的二进制数值中的时间相关位(UUID生成时有时间关联)移到数据的开头,这样对索引来说更加友好而且可以减少在B树中的随机插入,从而减少了插入耗时。

MySQL 8.0 improves the usability of UUID manipulations by implementing three new SQL functions: UUID_TO_BIN(), BIN_TO_UUID(), and IS_UUID(). The first one converts from UUID formatted text to VARBINARY(16), the second one from VARBINARY(16) to UUID formatted text, and the last one checks the validity of an UUID formatted text. The UUID stored as a VARBINARY(16) can be indexed using functional indexes. 

The functions UUID_TO_BIN() and UUID_TO_BIN() can also shuffle the time-related bits and move them at the beginning making it index friendly and avoiding the random inserts in the B-tree, this way reducing the insert time. The lack of such functionality has been mentioned as one of the drawbacks of using UUID’s. See blog post by Catalin Besleaga here.

四、消耗敏感的模型 

  • 查询优化器将会照顾到数据缓冲的状况

8.0版本自动地根据数据是否存在于内存中而选择查询计划,在以往的版本中,消耗敏感的模型始终假设数据在磁盘上。

正因为现在查询内存数据和查询硬盘数据的消耗常数不同,因此优化器会根据数据的位置选择更加优化的读取数据方式。

MySQL 8.0 chooses query plans based on knowledge about whether data resides in-memory or on-disk. This happens automatically, as seen from the end user there is no configuration involved. Historically, the MySQL cost model has assumed data to reside on spinning disks. 

The cost constants associated with looking up data in-memory and on-disk are now different, thus, the optimizer will choose more optimal access methods for the two cases, based on knowledge of the location of data. See blog post by Øystein Grøvlen here.

  • 查询优化器的直方图 

8.0版本加入了直方图统计数据。用户可以根据直方图针对表中的某列(一般为非索引列)生成数据分布统计信息,这样优化器就可以利用这些信息去寻觅更加优化的查询计划。

直方图最常见的使用场景就是计算字段的选择性。

MySQL 8.0 implements histogram statistics. With Histograms, the user can create statistics on the data distribution for a column in a table, typically done for non-indexed columns, which then will be used by the query optimizer in finding the optimal query plan. 

The primary use case for histogram statistics is for calculating the selectivity (filter effect) of predicates of the form “COLUMN operator CONSTANT”.

用以创建直方图的 ANALYZE TABLE 语法现已被扩展了两个新子句: UPDATE HISTOGRAM ON column [, column] [WITH n BUCKETS]和DROP HISTOGRAM ON column [, column]。

直方图的总计总数(桶)是可以选的,默认100。直方图的统计信息被存储在词典表column_statistics中,并可以使用

information_schema.COLUMN_STATISTICS进行查看。由于JSON数据格式的灵活性,直方图现在以JSON对象存储。

根据表的大小,ANALYZE TABLE命令会自动的判断是否要表进行采样,甚至会根据表中数据的分布情况和统计总量来决定创建等频或者等高的直方图。

The user creates a histogram by means of the ANALYZE TABLE syntax which has been extended to accept two new clauses: UPDATE HISTOGRAM ON column [, column] [WITH n BUCKETS] and DROP HISTOGRAM ON column [, column]. 

The number of buckets is optional, the default is 100. The histogram statistics are stored in the dictionary table “column_statistics” and accessible through the view information_schema.COLUMN_STATISTICS. The histogram is stored as a JSON object due to the flexibility of the JSON datatype. 

ANALYZE TABLE will automatically decide whether to sample the base table or not, based on table size. It will also decide whether to build a singleton or a equi-height histogram based on the data distribution and the number of buckets specified. See blog post by Erik Frøseth here.

五、正则表达式 

与UTF8MB4的正则支持一同,8.0版本也增加了诸如 REGEXP_INSTR(), REGEXP_LIKE(), REGEXP_REPLACE(), 和REGEXP_SUBSTR()等新函数。

另外,系统中还增加了用以控制正则表达式致性的 regexp_stack_limit (默认8000000比特) 和 regexp_time_limit (默认32步) 参数。REGEXP_REPLACE()也是社区中受呼声比较高的特性。

MySQL 8.0 supports regular expressions for UTF8MB4 as well as new functions like REGEXP_INSTR(), REGEXP_LIKE(), REGEXP_REPLACE(), and REGEXP_SUBSTR(). 

The system variables regexp_stack_limit (default 8000000 bytes) and regexp_time_limit (default 32 steps) have been added to control the execution. The REGEXP_REPLACE() function is one of the most requested features by the MySQL community, for example see feature request reported as BUG #27389 by Hans Ginzel. See also blog posts by Martin Hansson here and Bernt Marius Johnsen here.

六、运维自动化特性

开发向的运维关心数据库实例的可操作型,通常即可靠性,可用性,性能,安全,可观测性,可管理性。关于InnoDB Cluster和MGR的可靠性我们将会另起新篇单独介绍,接下来的段落将会介绍关于8.0版本针对表在其他可操作性上的改变。

Dev Ops care about operational aspects of the database, typically about reliability, availability, performance, security, observability, and manageability. High Availability comes with MySQL InnoDB Cluster and MySQL Group Replication which will be covered by a separate blog post. Here follows what 8.0 brings to the table in the other categories.

七、可靠性

8.0版本在整体上 增加了可靠性,原因如下:

MySQL 8.0 increases the overall reliability of MySQL because :

1、8.0版本将元信息存储与久经考验的事务性存储引擎InnoDB中。诸如用户权限表,数据字典表,现在都使用 InnoDB进行存储。

MySQL 8.0 stores its meta-data into InnoDB, a proven transactional storage engine. System tables such as Users and Privileges as well as Data Dictionary tables now reside in InnoDB.

2、8.0版本消除了会导致非一致性的一处隐患。在5.7及以前的版本中,存在着服务层和引擎层两份数据字典,因而可能导致在故障情况下的数据字典间的同步失败。在8.0版本中,只有一份数据字典。

MySQL 8.0 eliminates one source of potential inconsistency. In 5.7 and earlier versions there are essentially two data dictionaries, one for the Server layer and one for the InnoDB layer, and these can get out of sync in some crashing scenarios. In 8.0 there is only one data dictionary.

3、8.0版本实现了原子化,无惧宕机的DDL。根据这个特性,DDL语句要么被全部执行,要么全部未执行。对于复制环境来说这是至关重要的,否则会导致主从之间因为表结构不一致,数据漂移的情况。

MySQL 8.0 ensures atomic, crash safe DDL. With this the user is guaranteed that any DDL statement will either be executed fully or not at all. This is particularly important in a replicated environment, otherwise there can be scenarios where masters and slaves (nodes) get out of sync, causing data-drift.

基于新的事务型数据字典,可靠性得到了提高。

This work is done in the context of the new, transactional data dictionary. See blog posts by Staale Deraas here and here.

【编辑推荐】

  1. 选择云计算数据库的正确方法
  2. 如何从命令行管理MySQL数据库和用户
  3. 解救 DBA——数据库分库分表思路及案例分析
  4. 嗦一嗦MySQL 8.0的新特性(一)
  5. 区块链代表的数据库和传统数据库有何区别
【责任编辑:庞桂玉 TEL:(010)68476606】

点赞 0
分享:
大家都在看
猜你喜欢

读 书 +更多

一目了然——Web软件显性设计之路

本书阐述了为什么以及如何设计出简单易用的基于Web的软件,让用户单凭常识即可有效地使用它。主要内容包括:显性设计的概念、如何理解用户...

订阅51CTO邮刊

点击这里查看样刊

订阅51CTO邮刊