鍐崇瓥鏍戠畻娉昜toc]1.浠€涔堟槸鍐崇瓥鏍戠畻娉曪紵鍐崇瓥鏍戞槸涓€绉嶉娴嬫ā鍨嬶紱瀹冭〃绀哄璞″睘鎬у拰瀵硅薄鍊间箣闂寸殑鏄犲皠鍏崇郴銆傛爲涓殑姣忎釜鑺傜偣浠h〃涓€涓璞★紝姣忔潯鍒嗘敮璺緞浠h〃涓€涓彲鑳界殑灞炴€у€硷紝姣忎釜鍙跺瓙鑺傜偣瀵瑰簲浠庢牴鑺傜偣鍒板彾瀛愯妭鐐规墍缁忓巻鐨勮矾寰勬墍浠h〃鐨勫璞$殑鍊笺€傚喅绛栨爲鍙湁涓€涓緭鍑恒€傚鏋滀綘鎯虫湁澶氫釜杈撳嚭锛屼綘鍙互鏋勫缓涓€涓嫭绔嬬殑鍐崇瓥鏍戞潵澶勭悊涓嶅悓鐨勮緭鍑恒€傚喅绛栨爲鏄暟鎹寲鎺樹腑缁忓父浣跨敤鐨勬妧鏈€傚畠鍙互鐢ㄦ潵鍒嗘瀽鏁版嵁锛屼篃鍙互鐢ㄦ潵鍋氶娴嬨€備粠鏁版嵁涓敓鎴愬喅绛栨爲鐨勬満鍣ㄥ涔犳妧鏈О涓哄喅绛栨爲瀛︿範銆傞€氫織鍦拌锛屽氨鏄喅绛栨爲缁村熀鐧剧銆?https://zh.wikipedia.org/zh-cn/%E5%86%B3%E7%AD%96%E6%A0%91#%E...娌$湅鎳傦紵閭d箞鎴戜滑鍐嶄妇鍛ㄥ織鍗庤€佸笀瑗跨摐涔﹂噷鐨勪緥瀛愶紝绔嬮┈灏辨湁涓€涓ぇ姒傜殑浜嗚В銆傛瘮濡傦紝浣犲甫琛ㄥ摜鍘昏タ鐡滄憡涔拌タ鐡滐紝浣滀负璧勬繁鍗栬タ鐡滅殑鑰佹墜锛屾€昏兘涓€鐪兼寫鍑烘渶濂藉悆銆佹渶鐢滅殑瑗跨摐锛岃€岃〃鍝ュ嵈鎬绘槸鎸戝嚭涓嶅敖濡備汉鎰忕殑銆備綘闂浣曟寫閫夋弧鎰忕殑瑗跨摐銆備綘璇达細鎷垮埌浠锋牸锛佷笉涓嶄笉锛屾垜浠涔拌タ鐡滐紝浣犺鍏堢湅鈥滃畠鏄粈涔堥鑹茬殑锛熲€濓紝濡傛灉鏄€滅豢鏉剧煶鑹测€濓紝鍐嶇湅鈥滃畠鐨勫熀鏈舰鐘舵槸浠€涔堬紵鈥濓紝濡傛灉鏄€滃嵎鏇茬殑鈥濃€濓紝鐒跺悗鍒ゆ柇鈥滃惉璧锋潵鍍忎粈涔堬紵鈥濓紝鏈€鍚庯紝鎴戜滑寰楀嚭鏈€缁堢粨璁猴細杩欎釜鐡滃緢婊嬫鼎锛屽懜鍛稿懜锛屽緢鐢滐紒鐩镐俊澶у鐜板湪搴旇鏈変竴涓ぇ姒傜殑浜嗚В浜嗭紝涓嶅氨鏄€夋嫨涓€涓洰鐨勶紙鎴戜滑闇€瑕佽繘琛屽垎绫荤殑鏍囩锛夛紝鐒跺悗鏍规嵁涓€绯诲垪鐨勭壒寰佹潵婊¤冻鎴戜滑鐨勭洰鐨勶紝鎴戜滑灏变細鐢ㄥ埌杩欎釜鐗瑰緛浠ュ悗鎬庝箞鎸戔€滃ソ鐡溾€濓紵.浣嗭紒鍏堢粰浣犳臣涓€鐩嗗喎姘达紝绗竴姝ユ€庝箞寮€濮嬶紵涓嶇畝鍗曪紝鐩存帴閫夆€滈鑹测€濆氨鍙互浜嗭紒浣嗘垜浠负浠€涔堜笉浠庘€滄牴鑼庘€濆紑濮嬪憿锛熶笅闈㈠氨鏄垜浠鎬庝箞鍒掑垎锛屼篃灏辨槸鍒掑垎鐨勬爣鍑嗐€?.鍒掑垎鏍囧噯2.1淇℃伅澧炵泭锛圛D3鍐崇瓥鏍戠畻娉曞垝鍒嗘爣鍑嗭級棣栧厛瑕佷簡瑙d俊鎭喌鐨勬蹇碉紝淇℃伅鐔碉紝缁村熀鐧剧涓婄殑瀹氫箟锛氭槸姣忔潯鎺ユ敹鍒扮殑娑堟伅涓寘鍚殑骞冲潎淇℃伅閲忋€傚湪杩欓噷锛屸€滄秷鎭€濊〃绀烘潵鑷垎甯冩垨鏁版嵁娴佺殑浜嬩欢銆佹牱鏈垨鐗瑰緛銆傦紙鏈€濂藉皢鐔电悊瑙d负涓嶇‘瀹氭€ц€岄潪纭畾鎬х殑搴﹂噺锛屽洜涓烘潵婧愯秺闅忔満锛岀喌瓒婂ぇ銆傦級鏉ユ簮鐨勫彟涓€涓壒寰佹槸鏍锋湰鐨勬鐜囧垎甯冦€傝繖閲岀殑鎯虫硶鏄紝涓嶅お鍙兘鍙戠敓鐨勪簨鎯呭湪纭疄鍙戠敓鏃舵彁渚涙洿澶氫俊鎭€傚嚭浜庤澶氬叾浠栧師鍥狅紝灏嗕俊鎭紙鐔碉級瀹氫箟涓烘鐜囧垎甯冨鏁扮殑鍊掓暟鏄湁鎰忎箟鐨勩€備簨浠剁殑姒傜巼鍒嗗竷鍜屾瘡涓簨浠剁殑淇℃伅閲忔瀯鎴愪簡涓€涓殢鏈哄彉閲忥紝杩欎釜闅忔満鍙橀噺鐨勫潎鍊硷紙鍗虫湡鏈涳級灏辨槸杩欎釜鍒嗗竷浜х敓鐨勪俊鎭噺鐨勫钩鍧囧€硷紙鍗崇喌锛夈€傜喌鐨勬蹇佃捣婧愪簬鐗╃悊瀛︼紝鐢ㄤ簬琛¢噺鐑姏瀛︾郴缁熺殑鏃犲簭绋嬪害銆傚湪淇℃伅璁轰腑锛岀喌鏄笉纭畾鎬х殑搴﹂噺銆備絾鍦ㄤ俊鎭笘鐣屼腑锛岀喌瓒婇珮锛岃兘浼犻€掔殑淇℃伅瓒婂锛岀喌瓒婁綆锛岃兘浼犻€掔殑淇℃伅瓒婂皯銆傝繕鏄笉鏄庣櫧锛熼偅浣犱笉濡ㄨ浣忥細淇℃伅鐔垫槸琛¢噺淇℃伅閲忕殑缁村熀鐧剧锛歨ttps://zh.wikipedia.org/zh-cn/%E7%86%B5_(%E4%BF%A1%E6%81%AF%E8%AE%BA)Shannon,C.E.(1948)銆傚叧浜庨€氳鐨勬暟瀛︾悊璁恒€傝礉灏旂郴缁熸妧鏈湡鍒婏紝27(3)锛?79鈥?23銆俤oi:10.1002/j.1538-7305.1948.tb01338.xGeneralGround锛屽垝鍒嗘暟鎹泦鐨勪竴鑸師鍒欐槸锛氳鏃犲簭鐨勬暟鎹彉寰楁洿鏈夌З搴忋€傚垝鍒嗘暟鎹泦鍓嶅悗淇℃伅鐨勫彉鍖栫О涓轰俊鎭鐩娿€傜煡閬撲簡濡備綍璁$畻淇℃伅澧炵泭锛屾垜浠氨鍙互璁$畻灏嗘暟鎹泦鍒掑垎涓烘瘡涓壒寰佸€煎緱鍒扮殑淇℃伅澧炵泭銆傚叿鏈夋渶楂樹俊鎭鐩婄殑鐗瑰緛鏄渶浣抽€夋嫨銆備篃灏辨槸璇达紝鎴戜滑鍙互鍒╃敤淇℃伅澧炵泭鏉ラ€夋嫨鍐崇瓥鏍戠殑鍒掑垎灞炴€с€備粬浠殑鍏紡濡備笅锛?$淇℃伅鐔碉細Ent(D)=-\displaystyle\sum_{k=1}^{|y|}p_klog_2p_k\\鍙栬礋鏁帮細淇濊瘉淇℃伅鐔?0$$鍙栧€艰秺灏?Ent(D)$鐨勬秷鎭喌瓒婂皬$$淇℃伅澧炵泭Gain(D,a)=Ent(D)-\sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent(D^v)\\V锛氬浣曚娇鐢ㄧ鏁e睘鎬$$鐨勫彲鑳藉€间釜鏁帮紵鍐嶆鍊熺敤鍛ㄥ織鍗庤€佸笀涔︿腑鐨勪緥瀛愶紝鎴戜滑鏉ュ尯鍒嗕竴涓嬪ソ鐡滃拰鍧忕摐銆傛敞锛氭湰鏂囦娇鐢↖D3鍐崇瓥鏍戠畻娉曞洜涓烘垜浠殑鐩殑鏄尯鍒嗗ソ鐡滃拰鍧忕摐锛屾墍浠ュ厛璁$畻瀹冪殑淇℃伅鐔碉細$$Ent(D)=-\sum_{k=1}^{2}p_klog_2p_k=-(\frac{8}{17}log_2(\frac{8}{17})+\frac{9}{17}log_2(\frac{9}{17}))=0.998$$鍚屾牱鐨勯亾鐞嗗彲浠ュ亣璁炬垜浠寜棰滆壊鍒嗙被锛岄偅涔堝氨鏈変笁绉嶅彲鑳界殑棰滆壊{缁匡紝榛戯紝娴呯櫧}锛岀劧鍚庢垜浠绠楁瘡绉嶉鑹插搴旂殑濂界摐鍧忕摐鐨勬鐜囷紙$p_1锛氬ソ锛宲_2锛氬潖$锛夛細缁胯壊:$p_1=0.5,p_2=0.5$;榛戣壊锛?p_1=\frac{4}{6},p_2=\frac{2}{6}$锛涙祬鐧借壊:$p_1=\frac{1}{5},p_2=\frac{4}{5}$;杩欐牱鎴戜滑鍙互璁$畻鍑哄悇绉嶄俊鎭鐩婏細$$Ent(green)=1,Ent(black)=0.918,Ent(lightwhite)=0.722$$鐒跺悗璁$畻淇℃伅Gain锛?$Gain(D,color)=Ent(D)-\sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent(D^v)=0.998-(\frac{6}{17}\times1+\frac{6}{17}\times0.918+\frac{5}{17}\times0.722)=0.109\\鍏朵粬淇℃伅澧炵泭鍚岀悊鍙緱锛欸ain(D,root)=0.143;Gain(D,knock)=0.141;Gain(D,texture)=0.381;Gain(D,navel)=0.289;Gain(D,touch)=0.006$$texture鐨勪俊鎭鐩婃渶澶э紝鎵€浠ユ垜浠彇绾圭悊浣滀负鎴戜滑鐨勫垝鍒嗘爣鍑嗭紝鍚屾牱浠庣汗鐞嗗紑濮嬭绠楀叾浠栧睘鎬э紝寰楀埌淇℃伅澧炵泭锛屼互姝ょ被鎺紝鐩村埌鍒掑垎瀹屾墍鏈夌殑鏍囧噯銆?.2鍩哄凹鎸囨暟锛圕ART鍐崇瓥鏍戠畻娉曞垝鍒嗘爣鍑嗭級涓夈€佸喅绛栨爲浼樺娍璇勪环1銆佸喅绛栨爲鏄撲簬鐞嗚В鍜屽疄鐜般€備汉浠湪瑙i噴鍚庢湁鑳藉姏鐞嗚В鍐崇瓥鏍戞墍琛ㄨ揪鐨勬剰鎬濄€?.瀵逛簬鍐崇瓥鏍戯紝鏁版嵁鐨勫噯澶囧線寰€寰堢畝鍗曪紝鐢氳嚦娌℃湁蹇呰銆傚叾浠栨妧鏈€氬父闇€瑕佸厛瀵规暟鎹繘琛屾硾鍖栵紝渚嬪鍒犻櫎鍐椾綑鎴栫┖鐧藉睘鎬с€傚叾浠栨妧鏈€氬父闇€瑕佸崟涓暟鎹睘鎬с€?.鏄櫧鐩掓ā鍨嬨€傚鏋滅粰瀹氫竴涓瀵熸ā鍨嬶紝寰堝鏄撴牴鎹敓鎴愮殑鍐崇瓥鏍戞帹瀵煎嚭鐩稿簲鐨勯€昏緫琛ㄨ揪寮忋€?.鏄撲簬閫氳繃闈欐€佹祴璇曞妯″瀷杩涜璇勪及銆?琛ㄦ槑鍙互琛¢噺妯″瀷鐨勫彲淇″害6.鍙互鍦ㄨ緝鐭殑鏃堕棿鍐呭澶ф暟鎹簮鍋氬嚭鍙鏈夋晥鐨勭粨鏋溿€傚喅绛栨爲鐨勭己鐐瑰浜庨偅浜涘湪姣忎釜绫诲埆涓牱鏈噺涓嶄竴鑷寸殑鏁版嵁锛屽喅绛栨爲涓殑淇℃伅澧炵泭鐨勭粨鏋滀細鍋忓悜浜庨偅浜涘彇鍊艰緝澶氱殑鐗瑰緛銆傛枃涓鏈夐敊璇紝娆㈣繋澶у鎵硅瘎鎸囨銆傪煉煉?.浠g爜#Calculateinformationentropyfrommathimport*defent(data):data_length=len(data)dic={}#Statisticsforiindata:data_leable=i[-1]ifdata_leablenotindic:dic[data_leable]=0dic[data_leable]+=1#returndic#璁$畻淇℃伅鐔礶nt_number=0.0forjindic:num1=float(dic[j])/data_lengthent_number=ent_number-num1*log(num1,2)returnent_number#Divide鏁版嵁闆哾efdata_split(data,axis,value):data_list=[]foriindata:ifi[axis]==value:feat=i[:axis]feat.extend(i[axis+1:])data_list.append(feat)returndata_list#extend鍜宎ppend鍑芥暟鍖哄埆锛歛.extend(b)锛氭妸b鐨勬墍鏈夊€煎姞鍒癮涓婏紱a.append(b):娣诲姞b鍒楄〃鍒癮涓?a=[1,2]#b=[3,4]#c=[3,4]#a.append(b)#c.extend(b)#print(a,c)#[1,2,[3,4]][3,4,3,4]#璁$畻淇″彿澧炵泭锛岄€夋嫨鏈€浣崇壒寰乨efgain_chose(data):"""data_ent锛氫俊鎭喌feat_list锛氭瘡涓爣绛句笅鐨勬墍鏈夋暟鎹甪eat_value锛氭瘡涓爣绛句笅鐨勫€糳ata_gain锛氫俊鎭鐩奺nt_number锛氳绠楀悇鍒椾笅鐨勪俊鎭喌prop锛氳绠楁鐜嘼est_feature:鏈€浣冲垝鍒嗙壒寰?""data_base_length=len(data[0])-1ent_base=ent(data)gain_max=-1#鍙栦换浣曞皬浜?鐨勫€煎嵆鍙?绗竴涓猣or寰幆寰楀埌姣忎竴鍒楋紝绗簩涓猣or寰幆鏄彇姣忎竴鍒椾腑鐨勫€糵oriinrange(data_base_length):feat_list=[a[i]foraindata]feat_value=set(feat_list)ent_number=0.0forjinfeat_value:data_split_use=data_split(data,i,j)prop=len(data_split_use)/float(len(data_split_use))ent_number=ent_number+prop*ent(data_split_use)data_gain=ent_base-ent_number濡傛灉data_gain>gain_max:gain_max=data_gainbest_feature=ireturnbest_feature#DrawdecisiontreeimportoperatordefmajorityCnt(classList):classCount={}forvoteinclassList:ifvotenotinclassCount.keys():classCount[vote]=0classCount[vote]+=1sortedClassCount=sorted(classCount.iteritems(),key=operator.itemgetter(1),reverse=True)returnsortedClassCount[0][0]defcreateTree(dataSet,labels):"""classList:鍦ㄦ暟鎹泦涓瓨鍌ㄦ爣绛綽estFeatLabel:鏍硅妭鐐筸yTree:瀛樺偍鏍戠粨鏋?""classList=[example[-1]forexampleindataSet]#濡傛灉閮芥槸涓€涓壒寰侊紝鐩存帴杩斿洖ifclassList.count(classList[0])==len(classList):returnclassList[0]#濡傛灉鏁扮粍闀垮害涓?锛屽垯iflen(dataSet[0])==1:returnmajorityCnt(classList)bestFeat=gain_chose(dataSet)bestFeatLabel=labels[bestFeat]#閫夋嫨鏍硅妭鐐筸yTree={bestFeatLabel:{}}del(labels[bestFeat])#鍒犻櫎閬垮厤閫夋嫨featValues=[example[bestFeat]渚嬪鍦ㄦ暟鎹泦涓璢uniqueVals=set(featValues)forvalueinuniqueVals:subLabels=labels[:]myTree[bestFeatLabel][value]=createTree(data_split(dataSet,bestFeat,value),subLabels)returnmyTree5.鍐崇瓥鏍戠畻娉曡繃绋嬬涓€姝ワ細璁$畻淇℃伅鐔碉細鍒╃敤鍝堝笇琛ㄥ緱鍒板垎绫籰瀵瑰簲鐨勬暟閲忓叧绯籥bels锛岀劧鍚庢劅鍙楀搴旂殑鏁伴噺鍏崇郴锛岃绠椾俊鎭喌绗簩姝ワ細璁$畻淇℃伅澧炵泭锛岄€夋嫨鏈€浣崇壒寰侊細鍒嗕袱姝ャ€傜涓€姝ワ細鍒掑垎鏁版嵁闆嗭紝绗簩姝ユ槸璁$畻绗竴姝ワ細浠ヨタ鐡滀负渚嬶紝鎴戜滑閫夋嫨浜?yes'or'no'涔嬪悗锛屾垜浠鍥炲幓瑙傚療涔嬪墠鐨勪竴缁勭壒寰?color:a)锛岀劧鍚庡湪Hue:a涓紝鍒嗗埆璁$畻婊¤冻'yes'鎴?no'鐨勪俊鎭喌銆備互姝ょ被鎺紝鍒嗗埆寰楀埌绾圭悊鍜屽叾浠栫壒寰佺殑淇℃伅鐔点€傜浜屾锛氭牴鎹笂涓€姝ュ緱鍒扮殑淇℃伅鐔佃绠椾俊鎭鐩娿€傜涓夋锛氶€夋嫨淇℃伅澧炵泭鏈€澶х殑鍊笺€傜涓夋锛氶€夊嚭鏈€浣崇壒寰佸悗锛屽紑濮嬬粯鍒跺喅绛栨爲銆傚弬鑰冨懆蹇楀崕銆婅タ鐡滀功銆嬨€婃満鍣ㄥ涔犲疄鎴樸€?/p>
