当前位置: 首页 > 科技观察

Elasticsearch默认配置IK和JavaAnalyzeRequestBuilder使用

时间:2023-03-14 23:15:47 科技观察

本文大纲1.什么是Elasticsearch-analysis-ik2.默认配置IK3.使用AnalyzeRequestBuilder获取分词结果4.总结前言在《Elasticsearch 和插件 elasticsearch-head 安装详解》文章中,我使用弹性搜索5.3.x。这里我改成了ElasticSearch2.3.2。是因为版本对应https://github.com/spring-projects/spring-data-elasticsearch/wiki/Spring-Data-Elasticsearch---Spring-Boot---version-matrix:SpringBootVersion(x)SpringDataElasticsearchVersion(y)ElasticsearchVersion(z)x<=1.3.5y<=1.3.4z<=1.7.2*x>=1.4.x2.0.0<=y<5.0.0**2.0.0<=z<5.0。0***-只需要修改对应??pom文件的版本号即可**-下个ES版本会有大更新。这里可以看出5.3.x不在第二行。所以这里我就说说ElasticSearch2.3.2默认是如何配置IK的。一、什么是Elasticsearch-analysis-ik要了解什么是Elasticsearch-analysis-ik,首先要了解什么是IKAnalyzer。IKAnalyzer是一个开源的基于lucene的分词框架。官方地址:https://code.google.com/p/ik-analyzer/。Elasticsearch-analysis-ik是一个将IKAnalyzer集成到Elasticsearch中的插件,支持自定义字典。GitHub地址:https://github.com/medcl/elasticsearch-analysis-ik。Featuresupport:Analyzer:ik_smartorik_max_wordTokenizer:ik_smartorik_max_word2.IK的默认配置可以在Elasticsearch-analysis-ik官网看到,版本需要对应:IK版本ES版本master5.x->master5.3.25.3.25.2.25.2.25.1.25.1.21.10.12.4.11.9.52.3.51.8.12.2.11.7.02.1.11.5.02.0.01.2.61.0.01.2.50.90.x1.1.30.20.x1.0.00.16.2->0.19.0这里使用Elasticsearch-analysis-ik1.9.2,支持ElasticSearch2.3.2。下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v1.9.2/elasticsearch-analysis-ik-1.9.2.zip,下载成功后安装。解压zip文件并将内容复制到elasticsearch-2.3.2/plugins/ik。cdelasticsearch-2.3.2/pluginsmkdirikcp...在elasticsearch-2.3.2/config/elasticsearch.yml中添加配置:index.analysis.analyzer.default.tokenizer:"ik_max_word"index.analysis.analyzer.default.type:"ik“将默认分词器配置为ik,并将分词器指定为ik_max_word。然后重启ES。验证IK是否安装成功,访问localhost:9200/_analyze?analyzer=ik&pretty=true&text=Mason的博客是bysocket.com,可以得到如下结果集:{"tokens":[{"token":"Mason","start_offset":0,"end_offset":3,"type":"CN_WORD","position":0},{"token":"mud","start_offset":0,"end_offset":1,"type":"CN_WORD","position":1},{"token":"瓦工","start_offset":1,"end_offset":3,"type":"CN_WORD","position":2},{"token":"smith","start_offset":2,"end_offset":3,"type":"CN_WORD","position":3},{"token":"blog","start_offset":4,"end_offset":6,"type":"CN_WORD","position":4},{"token":"bysocket.com","start_offset":8,"end_offset":20,"type":"LETTER","position":5},{"token":"bysocket","start_offset":8,"end_offset":16,"type":"ENGLISH","position":6},{"token":"com","start_offset":17,"end_offset":20,"type":"ENGLISH","position":7}]}记得安装Docker时需要对应端口容器开发3.使用AnalyzeRequestBuilder获取分词结果在ES中默认配置了IK后,我们可以通过RestHTTP获取分词结果,那么在SpringBoot中如何获取分词结果以及提供的客户端依赖spring-data-elasticsearch.加入依赖pom.xmlorg.springframework.bootspring-boot-starter-data-elasticsearch在application.properties中配置ES的地址:#ESspring.data.elasticsearch.repositories.enabled=truespring.data.elasticsearch.cluster-nodes=127.0.0.1:9300然后创建一个方法,入参为search项,并返回结果列表。@AutowiredprivateElasticsearchTemplateelasticsearchTemplate;/***调用ES获取IK分词结果**@paramsearchContent*@return*/privateListgetIkAnalyzeSearchTerms(StringsearchContent){//调用IK分词AnalyzeRequestBuilderikRequest=newAnalyzeRequestBuilder(elasticsearchTemplate.getClientAccess(),INSTANCE,"indexName",searchContent);ikRequest.setTokenizer("ik");ListikTokenList=ikRequest.execute().actionGet().getTokens();//循环赋值ListsearchTermList=newArrayList<>();ikTokenList.forEach(ikToken->{searchTermList.add(ikToken.getTerm());});returnsearchTermList;}indexName这里指的是ES中设置的索引名称。从容器注入的ElasticsearchTemplateBean中获取Client,然后使用AnalyzeRequestBuilder解析请求类型进行分词,得到分词结果的AnalyzeResponse.AnalyzeToken列表。4.总结默认配置了IKtokenizer,DSL在查询ES时会自动调用IKtokenizer。如果要自定义词库,比如更偏域。【本文为专栏作家“李强强”原创稿件,转载请联系作者获得授权】点此查看该作者更多好文