[爬虫]lxml获取当前节点的html并正确显示中文

时间：2023-03-26 11:41:09 Python

获取当前节点：etree.tostring正确显示中文方法一：使用html库的unescape函数html.unescapefromlxmlimportetreeimporthtmlwithopen('list.html','r',encoding='utf-8')asf:text=f.read()tree=etree.HTML(text)r=html.unescape(etree.tostring(tree.xpath('//*[@id="scroll_marquee"]')[0]).decode('utf-8'))print(r)print(type(r))参考链接：调用tostring()中文乱码("digits)爬取网页时;")解决方法二：使用lxml库的etree.tostring方法fromlxmlimportetreeimportrequestsresponse=requests.get('https://www.baidu.com/).texttree=etree.HTML(response)strs=tree.xpath("//body")strs=strs[0]strs=str(etree.tostring(info,encoding="utf-8"),encoding='utf-8')打印(strs)参考链接：lxml提取html标签内容，tostring()无法显示中文的解决办法

上一篇：烧瓶从零到一2-Flask相关参数配置

下一篇：想要入门Python，还是要看这篇文章

[爬虫]lxml获取当前节点的html并正确显示中文相关文章