当前位置: 首页 > 后端技术 > Java

elasticsearch聚合的bucketterms聚合

时间:2023-04-02 00:44:50 Java

1。背景这里简单记录下bucketaggregation下的termaggregation。记录termaggregation的各种用法,以及各种注意事项,以防日后遗忘。2.前提条件2.1创建索引PUT/index_person{"settings":{"number_of_shards":1},"mappings":{"properties":{"id":{"type":"long"},"name":{"type":"keyword"},"sex":{"type":"keyword"},"age":{"type":"integer"},"province":{"type":"keyword"},“地址”:{“类型”:“文本”,“分析器”:“ik_max_word”,“字段”:{“关键字”:{“类型”:“关键字”,“ignore_above”:256}}}}}}}2.2PUT/_bulk{"create":{"_index":"index_person","_id":1}}{"id":1,"name":"张三","sex":"男","age":20,"province":"湖北","address":"黄冈市罗田县矿河湖北省城镇"}{"create":{"_index":"index_person","_id":2}}{"id":2,"name":"李四","sex":"男","age":19,"province":"江苏","address":"江苏省南京市"}{"create":{"_index":"index_person","_id":3}}{"id":3"name":"王舞","sex":"女","age":25,"province":"湖北","address":"湖北省武汉市江汉区"}{"create":{"_index":"index_person","_id":4}}{"id":4,"name":"赵刘","sex":"女","age":30,"省份":"北京市","地址":"北京市东城区"}{"create":{"_index":"index_person","_id":5}}{"id":5,"name":"倩琪","sex":"女","age":16,"省":"北京","address":"北京市西城区"}{"create":{"_index":"index_person","_id":6}}{"id":6,"name":"female","sex":"female","age":45,"province":"Beijing","address":"北京市朝阳区"}3.各种汇总3.1统计数量最多的2个省份3.1.1dslGET/index_person/_search{"size":0,"aggs":{"agg_sex":{"terms":{"field":"province","size":2}}}}3.1.2运行结果3.2统计数量最少的2个省份3.2.1dslGET/index_person/_search{"size":0,"aggs":{"agg_sex":{"terms":{"field":"province","size":2,"order":{"_count":"asc"}}}}}注意:不建议使用_count:asc进行统计,会导致统计结果不准确。运行结果参见下面3.2.2小节的总结3.3按字段值排序-根据年龄聚合,返回最年轻的2个聚合3.3.1dslGET/index_person/_search{"size":0,"aggs":{"agg_sex":{"terms":{"field":"age","size":2,"order":{"_key":"asc"}}}}}注:这种按照字段值排序,聚合结果正确。3.3.2运行结果3.4分聚合排序——先按省聚合,每次聚合后按最小年龄排序3.4.1dslGET/index_person/_search{"size":0,"aggs":{"agg_sex":{"terms":{"field":"province","order":{"min_age":"asc"}},"aggs":{"min_age":{"min":{"field":"age"}}}}}}GET/index_person/_search{"size":0,"aggs":{"agg_sex":{"terms":{"field":"province","order":{"min_age.min":"asc"}},"aggs":{"min_age":{"stats":{"field":"age"}}}}}}注:子聚合排序一般不允许,但如果是子聚合的最大值降序和最小值升序是准确的。3.4.2运行结果3.5脚本聚合——按省份聚合,如果地址中有黄冈市,需要出现黄冈市3.5.1dslGET/index_person/_search{"size":0,"runtime_mappings":{"province_sex":{"type":"keyword","script":"""Stringprovince=doc['province'].value;Stringaddress=doc['address.keyword'].value;if(address.contains('黄冈市')){emit('黄冈市');}else{emit(省);}"""}},"aggs":{"agg_sex":{"terms":{"field":"province_sex"}}}}3.5.2运行结果3.6filter-groupby省份,只包括北方省份,需要排除湖北省3.6.1dslGET/index_person/_search{"size":0,"aggs":{“agg_province”:{“术语”:{“字段”:“省”,“包括”:“。*北。*”,“排除”:["hubei"]}}}}注意:当是字符串时,可以写正则表达式,当是数组时,需要写具体的值根据最大??年龄倒序3.7.1dslGET/index_person/_search{"size":0,"aggs":{"genres_and_products":{"multi_terms":{"size":10,"shard_size":25,"order":{"max_age":"desc"},"条款”:[{“字段”:“省”,“缺失”:“默认省”},{“字段”:“性别”}]},“aggs”:{“max_age”:{“max”:{“field":"age"}}}}}}注意:terms聚合默认不支持多字段聚合,需要其他方法。这里使用Multiterm来实现多字段的聚合。3.7.2运行结果3.8缺失值处理3.9多重聚合-同时返回按省份聚合和按性别聚合3.9.1dslGET/index_person/_search{"size":0,"aggs":{"agg_province":{"terms":{"field":"province"}},"agg_sex":{"terms":{"field":"sex","size":10}}}}3.9.2运行结果4.总结4.1是聚合字段一般情况下只能聚合以下字段类型:keyword、numeric、ip、boolean、binary字段。默认情况下不能聚合文本类型的字段。如果需要聚合,需要开启fielddata。4.2如果我们想返回所有聚合的Term结果如果我们只想返回100或1000个独特的结果,我们可以增加size参数的值。但是如果我们要全部返回,建议使用compositeaggregation4.3聚合数据。我们通过term聚合的结果是一个大概的结果,不一定完全正确。为什么?.例如:如果我们的集群有3个分片,这里我们要返回值最高的5个统计信息。即size=5,假设先不考虑shard_size参数,那么每个节点都会返回此时值最大的5个统计信息,然后再次聚合,返回,返回最终值最大的5个。这个貌似没问题,但是因为我们的数据分布在es的各个节点上,可能某个统计项(北京的用户数)在A节点上排在前5,在B节点上排不到前5,那么最终的统计结果就是缺失统计。解决方法:我们可以让es在每个节点上多返回几个结果,比如:如果我们的size=5,那么我们会为每个节点返回size*1.5+10个结果,误差也会相应减少。而这个size*1.5+10就是shard_size的值。当然我们也可以手动指定,但一般需要大于size的值。4.4排序注意事项4.4.1_count排序默认使用_count的逆序,但我们可以指定升序,但不推荐这样做,会导致错误的结果。如果我们想要升序,我们可以使用rare_terms聚合。4.4.2按字段值排序无论正序还是倒序,按字段值排序都是准确的。4.4.3子聚合排序4.5多项聚合5.源码地址https://gitee.com/huan1993/spring-cloud-parent/blob/master/es/es8-api/src/main/java/com/huan/es8/aggregations/bucket/TermsAggs.java6.参考链接https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.htmlhttps://www.elastic。co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-multi-terms-aggregation.html