当前位置: 首页 > 后端技术 > Python

Spacy简单介绍

时间:2023-03-25 22:54:06 Python

.dataframetbodytrth:only-of-type{vertical-align:middle;}.dataframetbodytrth{vertical-align:top;}.dataframetheadth{text-align:right;}安装Spacypipinstallspacy导入工具包和英文模型#python-mspacydownloaden文档处理importspacynlp=spacy.load('en')#分词doc=nlp('天气很好,风很大,阳光很好。我们没有上课在下午。')fortokenindoc:print(token)#分句forsentindoc.sents:print(sent)Weatherisgood,verywindyandsunny.Wehavenoclassesintheafternoon.Weatherisgood,verywindyandsunny.Wehavenoclassesintheafternoon.doc中token的词性:print('{}-{}'.format(token,token.pos_))Weather-NOUNis-AUXgood-ADJ,-PUNCTvery-ADVwindy-ADJand-CCONJsunny-ADJ.-PUNCTWe-PRONhave-AUXno-DETclasses-NOUNin-ADPthe-DETafternoon-NOUN.-PUNCT生命名实体知识别doc=nlp('我去了北京,在那里遇到了我大学的老朋友Jack。')forentindoc.ents:print('{}-{}'.format(ent,ent.label_))fromspacyimportdisplacydisplacy.render(doc,style='ent',jupyter=True)beijing-GPEJack-PERSON我去了北京GPE我遇到老朋友的地方JackPERSONfromuni.

##寻找出文中所有的人名defgetFileContent(path):withopen(path,'r')asf:returnf.read()doc=nlp(getFileContent('./data/pride_and_prejudice.txt'))sents=[sforsindoc.sents]print(len(sents))fromcollectionsimportCounter,默认tdictdeffind_person(doc):c=Counter()forentindoc.ents:ifent.label_=='PERSON':c[ent.lemma_]+=1返回c.most_common(10)print(find_person(doc))7153[('伊丽莎白',600),('达西',355),('简',277),('宾利',260),('班纳特',258),('柯林斯',166),('Wickham',108),('Lizzy',94),('Gardiner',90),('LadyCatherine',76)]恐怖袭击分析defread_lines(path):withopen(path,'r')asf:returnf.readlines()text=read_lines('./data/rand-terrorism-dataset.txt')nlp_list=[nlp(line)forlineintext]common_terrorist_groups=['taliban','al-qaeda','hamas','fatah','plo','biladal-rafidayn']common_locations=['iraq','baghdad','kirkuk','mosul','afghanistan','kabul','basra','palestine','gaza','israel','istanbul','beirut','pakistan']location_entity_dict=defaultdict(Counter)用于nlp_list中的文章:article_terrorist_groups=[ent.lemma_forentinarticle.entsifent.label_=='PERSON'orent.label_=='ORG']#人或组织article_locations=[ent.lemma_forentinarticle.entsifent.label_=='GPE']terrorism_common=[entforentinarticle_terrorist_groupsifent.lower()incommon_terrorist_groups]locations_common=[entforentinarticle_locationsifent.lower()incommon_locations]forfound_entityinterrorism_common:forfound_locationinlocations_common:location_entity_dict[found_entity][found_location]+=1location_entity_dictdefaultdict(collections.Counter,{'PLO':Counter({'Beirut':9,'ISRAEL':17,'Israel':21,'Iraq':8,'巴勒斯坦':1}),'Fatah':Counter({'Israel':18,'Beirut':1,'Iraq':1,'ISRAEL':4,'Gaza':11}),'Hamas':Counter({'ISRAEL':7,'Israel':19,'Beirut':1,'Gaza':70}),'Taliban':Counter({'AFGHANISTAN':3,'Kabul':45,'Pakistan':17,'Afghanistan':263}),'HAMAS':Counter({'ISRAEL':1}),'Al-Qaeda':Counter({'Kabul':1,'Iraq':4,'Israel':1,'Baghdad':5,'Pakistan':1,'Mosul':16,'Kirkuk':2}),'al-Qaeda':Counter({'Iraq':46,'Afghanistan':6,'喀布尔':2,'伊斯坦布尔':3,'巴格达':14,'巴勒斯坦':3,'摩苏尔':1,'基尔库克':3,'巴基斯坦':5}),'比拉德al-Rafidayn':Counter({'Iraq':21,'Baghdad':32,'Basra':4,'Mosul':4,'Palestine':6}),'taliban':Counter({'Kabul':1})})将熊猫导入为pddf=pd.DataFrame.from_dict(dict(location_entity_dict),dtype=int)df=df.fillna(value=0).astype(int)df
.dataframetbodytrth:only-of-type{vertical-align:middle;}.dataframetbodytrth{vertical-align:top;}.dataframetheadth{text-align:right;}PLO法塔赫哈马斯塔利班HAMASAl-Qaedaal-QaedaBiladal-Rafidayntaliban贝鲁特911000000以色列1747010000以色列211819001000伊拉克81000446210巴勒斯坦100000360加沙01170000000阿富汗000300000喀布尔0004501201巴基斯坦0001701500阿富汗00026300600巴格达00000514320摩苏尔0000016140基尔库克000002300伊斯坦布尔000000300巴士拉000000040
导入matplotlib.pyplot作为pltimportseaborn作为snsplt.figure(figsize=(12,10))hmap=sns.heatmap(df,annot=True,fmt='d',cmap='YlGnBu',cbar=False)plt.title('trror')#x轴旋转30度plt.xticks(rotation=30)plt.show()