MySQL5.7官方文档翻译:InnoDB Full-Text Stopwords

非用词列表被载入和查询为全文索引使用字符集和校对规则的查询(character_set_server和collation_server的值)。如果用在全文索引或全文索引搜索的非用词文件或列有和character_set_server与collation_server不同的校对规则和字符集,非用词查询可能不会命中或者发生错误。

大小写敏感的非用词查询依赖于校对规则。例如:如果校对规则是latin1_swedish_ci,那么大小写不敏感;如果校对规则是latin1_general_cs或latin1_bin,那么大小写敏感。

InnoDB有相对较短的默认非用词列表。因为技术、文学或其他来源的文章经常使用短的词作为关键字或重要短语。例如:你可能想搜索“to be or not to be”来获取一个合理的结果,而不是这些词都被忽略。

查看InnoDB默认的非用词列表可以查询INFORMATION_SCHEME.INNODB_FT_DEFAULT_STOPWORD表。

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

为所有的InnoDB表定义自己的非用词列表:定义一个和INNODB_FT_DEFAULT_STOPWORD一样结构的表,填充上非用词,在创建全文索引前设置innodb_ft_server_stopword_table选项的值为‘db_name/table_name’。非用词表必须是只有一个名为’values‘的VARCHAR列。下面的例子针对InnoDB展示了创建和配置新的非用词列表:

-- Create a new stopword table
mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
Query OK, 0 rows affected (0.01 sec)

-- Insert stopwords (for simplicity, a single stopword is used in this example)
mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael');
Query OK, 1 row affected (0.00 sec)

-- Create the table
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

-- Insert data into the table
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0

-- Set the innodb_ft_server_stopword_table option to the new stopword table
mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
Query OK, 0 rows affected (0.00 sec)

-- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (1.17 sec)
Records: 0 Duplicates: 0 Warnings: 1

通过查询INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE来验证指定的非用词’Ishmael'没有出现。

mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15;

注:默认的,单词长度小于3或者大于84的单词不会出现在InnoDB全文搜索索引中。最大和最小的长度通过innodb_ft_max_token_size和innodb_ft_min_token_size设置。默认的行为不会应用于ngram解析器,ngran的单词长度通过ngram_token_size设置。

为了以表为基础创建非用词列表,可以使用innodb_ft_user_stopword_table配置项去指定一个非用词表(这块有点疑问,按理说这个配置应该会指定某一张InnoDB表, 但是通过测试,这个配置项与innodb_ft_server_stopword_table没有发现什么区别,只是innodb_ft_user_stopword_table的优先级更高,优先采用)。

InnoDB全文索引的原理可查看: MySQL5.7官方文档翻译:InnoDB FULLTEXT indexes