更多信息请访问:Harmonyos技术社区https://harmonyos.51cto.com1.NLU(自然语言理解)语言理解引擎服务简介。引擎服务提供分词、词性标注、实体识别、意图识别、关键词提取等接口,有同步和异步两种方式。本期内容包括:分词能力、词性标注、关键词抽取、实体识别。相关代码已添加注释和调试日志,方便理解和学习。2、效果展示3、搭建环境,安装DevEcoStudio。具体请参考DevEcoStudio下载。搭建DevEcoStudio开发环境。DevEcoStudio开发环境依赖于网络环境。需要联网才能保证工具的正常使用。开发环境可根据以下两种情况进行配置:如果可以直接上网,只需要下载HarmonyOSSDK即可运行。如果网络不能直接访问Internet,则需要通过代理服务器访问。请参考配置开发环境。下载源码后,使用DevEcoStudio打开工程,运行模拟器。要在真机上运行,??需要将config.json中的buddleName修改为自己的。如果没有,请在AGC上进行配置,参见DebuggingwiththeSimulator。4.项目结构5.代码说明5.1分词能力(getWordSegment)分词API的主要功能是将一个汉字序列分割成单个的词,分词的粒度可以自定义。场景:1.搜索引擎开发场景,搜索结果按相关性排序;2.用户选择用户选择文本的场景,比如双击选择下标时,根据分词选择等5.1.1核心类importohos.ai.nlu.NluClient;//提供调用自然语言理解(NLU)引擎服务的方法。importohos.ai.nlu.NluRequestType;//定义调用NLU引擎函数的请求类型。importohos.ai.nlu.ResponseResult;//以JSON格式提供NLU识别结果。5.1.2使用流程1.NluClient静态类初始化NluClient.getInstance().init(Contextcontext,OnResultListenerlistener,booleanisLoadModel)2.获取分词结果//1.同步接口ResponseResultresponseResult=NluClient.getInstance().getWordSegment(requestData,NluRequestType.REQUEST_TYPE_LOCAL);//2.异步接口NluClient.getInstance().getWordSegment(requestData,NluRequestType.REQUEST_TYPE_LOCAL,asyncResult->{//发送分词结果sendResult(asyncResult.getResponseResult(),0);release();});requestData:JSON格式,参数名包括{text,type,callPkg,callType,callVersion,callState},其中,text:待分析的文本,必填;type:分词粒度,枚举值,0:基本词;1:根据基本词合并实体;223372036854775807:基于类型1合并实体时间、位置等整体结构,将一些常用短语Merge分离不合并,默认为0。requestType:枚举值,NluRequestType.REQUEST_TYPE_LOCAL表示调用本地引擎。3.解析返回结果ResponseResultresponseResult返回JSON格式字符串,JSON格式,参数名包括:{code,message,words}//{"code":0,"message":"success","words":["我","明天","下午","三点钟",//"想","走","江宁万达广场","看","速度","和","激情"]}//将分词结果转化为listif(result.contains("\"message\":\"success\"")){switch(operateType){//分词case0:Stringwords=result.substring(result.indexOf(WORDS)+STEP,result.lastIndexOf("]")).replaceAll("\"","");if((words==null)||("".equals(words))){//无法识别的分词结果,返回"nokeywords"lists=newArrayList<>(1);lists.add("nokeywords");}else{lists=Arrays.asList(words.split(","));}//构建eventevent=InnerEvent.get(TWO,ZERO,lists);}}4.释放资源NluClient.getInstance().destroy(slice);5.1.3分词粒度testtype=0requestData:{"text":我要明天下午3点去江宁万达广场观看速度与激情,“type":0}分词结果:{"code":0,"message":"success","words":["I","tomorrow","afternoon","三点","want"”,“Go”,“江宁万达广场”,“看”,“速度”,“和”,“激情”]}type=1requestData:{"text":我3点去江宁万达广场明天下午看速度与激情,"type":1}分词结果:{"code":0,"message":"success","words":["I","tomorrow","afternoon","三点钟","去","去","江宁万达广场","看","速度与激情"]}type=9223372036854775807requestData:{"text":我要去江宁万达广场看速度与激情明天下午3:00,"type":9223372036854775807}分词结果:{"code":0,"message":"success","words":["I","明天3点afternoon","going","JiangningWandaPlaza","look","FastandFurious"]}5.2词性标注(getWordpos)词性标注提供了getWordPos()接口,可以根据分词粒度为分词结果中的每个词标注一个正确的词性,传入的requestData参数和返回的对象ResponseResult是相同的。ResponseResultresponseResult=NluClient.getInstance().getWordPos(requestData,NluRequestType.REQUEST_TYPE_LOCAL);5.2.2词性标注结果requestData:{"text":"我明天下午3点去江宁万达广场看《速度与激情》,"type":0}responseResult:{"code":0,"message":"success","pos":[{"word":"I","tag":"rr"},{"word":"明天","tag":"t"},{"word":"下午","tag":"t"},{"word":"三点'clock","tag":"t"},{"word":"To","tag":"v"},{"word":"Go","tag":"vf"},{"word":"江宁万达广场","tag":"n"},{"word":"watch","tag":"v"},{"word":"speed","tag":"n"},{"word":"and","tag":"cc"},{"word":"passion","tag":"n"}]}词性:rr:人称代词,t:时间词,v:动词,vf:方向动词,n:名词,cc:并列连词标记词性有不同的取值详见https://developer.harmonyos.com/cn/docs/documentation/doc-guides/ai-pos-tagging-guidelines-00000010507325125.3关键词提取(getKeywords)关键词提取API提供了一个提取key的接口词的,通过这个API,可以从大量的信息中提取出文本想要表达的核心内容,可以是具有特定含义的实体,例如:人名、地名、电影等。也可以一些基本但在文本中很关键的词汇,通过这个API,可以将提取的关键词按照在文本中的权重从高到低排序,排名越高,权重越高,提取的越准确文的核心内容。5.3.1用法用法类似于分词能力。关键词提取接口为getKeywords(),requestData输入数据的JSON格式参数发生了变化。{body,number,title}body:分析文本,必填,如新闻或邮件内容或文章;number:提取关键词的个数,必填;title:内容标题,可选ResponseResultresponseResult=NluClient.getInstance().getKeywords(requestData,NluRequestType.REQUEST_TYPE_LOCAL);5.3.2提取关键词结果展示requestData:{"body":"对接各资源服务中心,接入医疗、医保、人社、民政等横向单位数据,逐步完善和丰富退役军人信息'健康档案","number":5,"title":"退役军人"}1.1.1.1.responseResult:{"keywords":["退役","军人","健康","医保","docking"],"code":0,"message":"success"}5.4实体识别(getEntity)实体识别可以从自然语言中提取出具有特定意义的实体,并完成一系列相关的操作和功能,如搜索这个基础。5.4.1用法用法类似于分词能力。获取实体识别的接口为getEntity(),requestData输入数据的JSON格式参数发生了变化{text,module,callPkg,callType,callVersion,callState}text:分析文本,必填,如新闻或邮件内容;module:要分析的实体,可选,默认会分析所有实体。解析一个实体,传递实体键值,例如:只需要解析时间实体,传递“time”。可以传多个,表示分析多个实体,用逗号“,”隔开,例如:要分析时间和地点,传“time,location”。取值范围:name,time,location,phoneNum,email,url,movie,tv,anime,league,team,trainNo,flightNo,expressNo,idNo,verificationCode,app,carNoResponseResultresponseResult=NluClient.getInstance().getEntity(requestData,NluRequestType.REQUEST_TYPE_LOCAL);5.4.2解析实体结果展示requestData:{"text":"我明天下午3点去江宁万达广场看速度与激情"}responseResult:{"entity":{"movie":[{"oriText":"速度与激情","sequence":1,"origin":"1637301307158","heat":0,"standardName":"速度与激情","charOffset":16,"normalValue":"速度与激情","user.extend":false,"isCorrected":false}],"location":[{"sequence":1,"origin":"1637301307158","oriText":"江宁万达广场","key":"江宁万达广场","type":"nspHB","coreLocation":{"value":"江宁万达广场","location":{"value":"江宁万达广场"}},"isAbstract":"0","cost":"29","charOffset":9,"normalValue":"江宁万达广场","user.extend":false,"isCorrected":false}],"time":[{"normalTime":{"start":{"timestamp":1637391600000,"section":"P","standardTime":"2021年11月20日15:00:00"}},"oriText":"明天下午三点","sequence":1,"origin":"1637301307158","charOffset":1,"normalValue":"明天下午三点","user.extend":false,"isCorrected":false}],"varietyshow":[{"oriText":"速度与激情","sequence":1,"origin":"1637301307158","heat":0,"standardName":"速度与激情","charOffset":16,"normalValue":"速度与激情"}]}}6.思考总结1.以上AI能力不需要申请权限2.这些AI能力使用还是很方便的。开箱即用,可灵活用于应用程序开发。文章相关附件,可点击下方链接下载原文。https://harmonyos.51cto.com/resource/1514更多内容请访问:与华为官方共建的鸿蒙技术社区https://harmonyos.51cto.com