Python爬虫库BeautifulSoup获取对象名,属性,内容,注释

时间：2023-03-26 14:40:02 Python

Python爬虫库BeautifulSoup获取对象名称、属性、内容和注释.Tag对象与XML或HTML原生文档中的标签相同。frombs4importBeautifulSoupsoup=BeautifulSoup('Extremelybold','lxml')tag=soup.btype(tag)bs4.element.Tag2.Tag的Name属性每个tag都有自己的名字://www.kaifx.cn/broker/th...,使用.name获取tag.name'b'tag.name="blockquote"#修改原文档标签极粗3.Tag的Attributes属性获取单个属性tag['class']['boldest']获取所有属性tag.attrs{'class':['boldest']}作为字典并添加属性tag['class']='verybold'tag['id']=1print(tag)非常粗体deleteattributedeltag['class']deltag['id']tag

Extremelybold

4.Tag多值属性的多值属性会返回一个列表css_soup=BeautifulSoup('

','lxml')print(css_soup.p['class'])12['body','strikeout']1rel_soup=BeautifulSoup('

返回首页

','lxml')print(rel_soup.a['rel'])rel_soup.a['rel']=['index','contents']print(rel_soup.p)['index']

返回首页

12如果转换后的文档是XML格式，那么标签不包含多值属性xml_soup=BeautifulSoup('

','xml')xml_soup.p['class']```bash'bodystrikeout'二、可遍历字符串(NavigableString)1.字符串常被包含在标签中，使用NavigableString类将字符串包裹在标签中```bashfrombs4importBeautifulSoupsoup=BeautifulSoup('Extremelybold','lxml')tag=soup.bprint(tag.string)print(type(tag.string))极度加粗2.NavigableString字符串和Python中的str字符串一样，NavigableString对象可以直接转成str字符串unicode_string=str(tag.string)print(unicode_string)print(type(unicode_string))极粗3.tag中包含的字符串不可编辑，但可以替换为其他字符串，使用满足的replace_with()hodtag.string.replace_with("Nolongerbold")tagNolongerbold13.BeautifulSoup对象BeautifulSoup对象表示文档的全部内容。大多数时候，它可以看作是一个Tag对象，它支持遍历文档树和搜索文档树中描述的大部分方法。4.注释和特殊字符串（Comment）objectmarkup=""soup=BeautifulSoup(markup,'lxml')comment=soup.b.stringtype(comment)bs4.element.CommentComment对象是一种特殊类型的NavigableString对象comment'嘿，哥们。想买一个用过的解析器吗？

上一篇：Python 中 base64 编码与解码

下一篇：Python对象的内置方法(__del__和__str__)的介绍和使用

Python爬虫库BeautifulSoup获取对象名,属性,内容,注释相关文章