结构化数据是最好的应对方式。一般是类似JSON格式的字符串。可以直接解析JSON数据,提取JSON的关键字段。JSONJSON(JavaScriptObjectNotation)是一种轻量级的数据交换格式;适用于数据交互场景,如网站前后端数据交互。Python3.x自带JSON模块,直接导入json即可使用。Json模块提供了四个函数:dumps、dump、loads、load,用于字符串和python数据类型之间的转换。Python操作json的标准api库参考https://docs.python.org/zh-cn....://tool.oschina.net/codeformat/json1。json.loads()实现:将json字符串转成python类型,返回一个python类型。json到python的类型转换比较如下:importjsona="[1,2,3,4]"b='{"k1":1,"k2":2}'#当字符串是字典时,{}必须在''单引号外{}必须是""双引号printjson.loads(a)[1,2,3,4]printjson.loads(b){'k2':2,'k1':1}获取豆瓣热门电影的案例importurllib.parseimporturllib.requestimportjsonurl='https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&page_limit=50&page_start=0'#豆瓣最新流行牧人={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/85.0.4183.102Safari/537.36','Referer':'https://movie.douban.com','Connection':'keep-alive'}#请求头信息req=urllib.request.Request(url,headers=herders)#设置请求头response=urllib.request.urlopen(req)#发起请求并得到响应responsehjson=json.loads(response.read())#将json转成字典#遍历字典中的电影,item为itemin的每部电影信息hjson["subjects"]:print(item["rate"],item["title"])#打印每部电影的评分和标题输出6.9神弃之地7.2救我脱离邪恶6.1福尔摩斯小姐:失踪的侯爵6.2杀戮隧道6.3OKBossLady7.3我要结束这一切8.3鸣鸟不飞:乌云7.71/2魔法7.8树上有个好地方6.3苗先生5.1开往釜山的火车2:半岛...2.json.dumps()将python类型转成json字符串,返回一个str对象。原始python类型转换为json类型对比如下:importjsona=[1,2,3,4]b={"k1":1,"k2":2}c=(1,2,3,4)json.dumps(a)'[1,2,3,4]'json.dumps(b)'{"k2":2,"k1":1}'json.dumps(c)'[1,2,3,4]'json.dumps中文编码问题如果PythonDict字典包含中文,json.dumps序列化中文默认ascii编码importchardeimportjsonb={"name":"China"}json.dumps(b)'{"name":"\\u4e2d\\u56fd"}'printjson.dumps(b){"name":"\u4e2d\u56fd"}chardet.detect(json.dumps(b)){'置信度':1.0,'encoding':'ascii'}'China'中的ascii字符编码,不是真正的中文如果要输出真正的中文,需要指定ensure_ascii=Falsejson.dumps(b,ensure_ascii=False)'{"name":"\xe6\x88\x91"}'printjson.dumps(b,ensure_ascii=False){"name":"I"}chardet.detect(json.dumps(b,ensure_ascii=False)){'confidence':0.7525,'encoding':'utf-8'}3.json.dump()importjsona=[1,2,3,4]json.dump(a,open("digital.json","w"))b={"name":"me"}json.dump(b,open("name.json","w"),ensure_ascii=False)json.dump(b,open("name2.json","w"),ensure_ascii=True)4.json.load()读取字符串中的元素转换将文件中的json形式转化为python类型importjsonnumber=json.load(open("digital.json"))print(number)b=json.load(open("name.json"))print(b)b.keys()printb['name']获取lagou城市表信息的实际项目importurllib.parseimporturllib.requestimportjsonurl='http://www.lagou.com/lbs/getAllCitySearchLabels.json?'#Lagoucitylistherders={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/85.0.4183.102Safari/537.36','Referer':'http://www.lagou.com','Connection':'keep-alive'}#请求头信息req=urllib.request.Request(url,headers=herders)#设置请求头response=urllib.request.urlopen(req)#发起请求并得到响应hjson=json.loads(response.read())#print(hjson)#将json转字典#遍历字典中A开头的城市列表foriteminhjson["content"]["data"]["allCitySearchLabels"]["A"]:print(item["name"],item["code"])#PrintthecityclearingandcodeoutputstartingwithA:Anyang171500000Anqing131800000Anshan081600000Anshun240400000Ankang270400000Aksu311800000AlxaLeague070300000Altay310400000AbaTibetanandQiangAutonomousPrefecture23070000ExtractJSONPathinformation来自JSON库指定信息的工具JSONPath不同于XpathJsonPath相当于XPATHforXMLforJSON下载地址:https://pypi.python.org/pypi/...安装方法:pipinstalljsonpath参考文档XPathJSONPathResult/store/book/author$.store.book[*].author*获取商店所有书籍author//author$..author获取所有作者/store/$.store.allthingsinstore,即一些书和一辆红色自行车。/store//price$.store..price获取所有价格商店//book[3]$..book[2]第二本书//book[last()]$..book[(@.length-1)]`$..book[-1:]`获取最后一本书//book[position()<3]$..book[0,1]`$..book[:2]`获取前两本书//book[isbn]$..book[?(@.isbn)]获取isbn属性的书//book[price<10]$..book[?(@.price<10)]获取所有价格小于10的书//$..*匹配任意元素的情况仍然以http://www.lagou.com/lbs/getA...为例获取所有城市importurllib.requestimportjsonimportjsonpathurl='http://www.lagou.com/lbs/getAllCitySearchLabels。json'#Pullercitylistherders={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/85.0.4183.102Safari/537.36','Referer':'http://www.lagou.com','Connection':'keep-alive'}#请求头信息req=urllib.request.Request(url,headers=herders)#设置请求头response=urllib.request.urlopen(req)#发起请求并得到响应hjson=json.loads(response.read())#将字符加载到json对象中citylist=jsonpath.jsonpath(hjson,'$..name')#获取所有城市名称#print(type(citylist))#
