Elasticsearch安装ik中文分词插件（四）

一、IK简介

　　IK Analyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版开始， IKAnalyzer已经推出了4个大版本。最初，它是以开源项目Luence为应用主体的，结合词典分词和文法分析算法的中文分词组件。从3.0版本开始，IK发展为面向Java的公用分词组件，独立于Lucene项目，同时提供了对Lucene的默认优化实现。在2012版本中，IK实现了简单的分词歧义排除算法，标志着IK分词器从单纯的词典分词向模拟语义分词衍化。

　　IK Analyzer 2012特性:

二、配置编译环境

　　从Github下载的IK分词是源码包，需要maven环境编译

　　1、下载maven

# wget http://mirrors.hust.edu.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

　　2、解压　

# tar zxf apache-maven-3.3.-bin.tar.gz -C /usr/src/

　　3、配置环境变量

# vi /etc/profile

    export MAVEN_HOME=/usr/local/apache-maven-3.3.

    export PATH=$PATH:$MAVEN_HOME/bin

# source /etc/profile

三、安装IK分词插件

　　1、下载

　　　　到GitHub上下载适合ElasticSearch版本的IK，地址：https://github.com/medcl/elasticsearch-analysis-ik；也可以通过git clone https://github.com/medcl/elasticsearch-analysis-ik，下载分词器源码。

　　2、解压编译

# unzip elasticsearch-analysis-ik-master.zip

# cd elasticsearch-analysis-ik-master/

# mvn clean package

　　3、复制编译完成的IK分词到elasticsearch的插件路径

# mkdir $elasticsearch/plugins/ik

# cp target/releases/elasticsearch-analysis-ik-1.9..zip $elasticsearch/plugins/ik/

# cd $elasticsearch/plugins/ik/

# unzip elasticsearch-analysis-ik-1.9..zip

　　4、重启elasticsearch，使ik插件生效

# /etc/init.d/elasticsearch restart

四、ik分词测试

　　1、创建一个索引，名为“index”

# curl -XPUT http://localhost:9200/index

　　2、为“index”创建mapping

# curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'

{

    "fulltext": {

            "_all": {

            "analyzer": "ik_max_word",

            "search_analyzer": "ik_max_word",

            "term_vector": "no",

            "store": "false"

        },

        "properties": {

            "content": {

                "type": "string",

                "store": "no",

                "term_vector": "with_positions_offsets",

                "analyzer": "ik_max_word",

                "search_analyzer": "ik_max_word",

                "include_in_all": "true",

                "boost":

            }

        }

    }

}'

3、测试

# curl 'http://10.10.10.26:9200/index/_analyze?analyzer=ik&pretty=true' -d '{"text":"中华人民共和国国歌"}'

显示如下：

{

  "tokens" : [ {

    "token" : "中华人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "中华人民",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "中华",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "华人",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "人民",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "共和",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  }, {

    "token" : "国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_CHAR",

    "position" :

  }, {

    "token" : "国歌",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" :

  } ]

}

elasticsearch-analysis-ik的Github地址：https://github.com/medcl/elasticsearch-analysis-ik

Elasticsearch安装ik中文分词插件（四）

Elasticsearch安装ik中文分词插件（四）的相关教程结束。

相关推荐

SpringBoot的官方英文介绍（中文译本）

Flutter系列文章-Flutter 插件开发

VS Code 有哪些好用的插件呢？【持续更新】

JS判断数字、中文、小数位数

用python + openpyxl处理excel(07+)文档 + 一些中文处理的技巧

安装插件报错error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++

Unity中实现字段/枚举编辑器中显示中文（中文枚举、中文标签）

ASP.NET Core 中文文档第二章指南（5）在 Nano Server 上运行ASP.NET Core

Elasticsearch安装ik中文分词插件（四）

Elasticsearch安装ik中文分词插件（四）的相关教程结束。

相关推荐

SpringBoot的官方英文介绍（中文译本）

Flutter系列文章-Flutter 插件开发

VS Code 有哪些好用的插件呢？【持续更新】

JS判断数字、中文、小数位数

用python + openpyxl处理excel(07+)文档 + 一些中文处理的技巧

安装插件报错error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++

Unity中实现字段/枚举编辑器中显示中文（中文枚举、中文标签）

ASP.NET Core 中文文档 第二章 指南（5） 在 Nano Server 上运行ASP.NET Core

ASP.NET Core 中文文档第二章指南（5）在 Nano Server 上运行ASP.NET Core