第一步,安装requests-htmlupgradepippipinstall--upgradepipupgradeurllib3sudopython3-mpipinstallurllib3--upgradeinstallrequests-htmlsudopython3-mpipinstallrequests-html步骤1.1,给项目,安装requests-html修改setup.py文件,添加install_requires=['requests-html',]],修改launch.json添加"pythonPath":"/usr/bin/python3"命令行,安装sudopython3-msetupinstallpython文件,使用fromrequests_htmlimportHTMLSession步骤2,继续使用youtube-dl新建信息抽取类classXxxIE(InfoExtractor):创建匹配规则_VALID_URL=r'https?://(?:www\.|m\.)?xxx\.com.+posts?.+'对应源码开始,先去defextract_info(self,url,download=True,ie_key=None,extra_info={},process=True,force_generic_extractor=False)oftheYoutubeDL.pyfile:#...forieinies:ifnotie.suitable(url):continue#...转到common.py文件的@classmethoddefsuitable(cls,url)在提取器文件夹下:如果'_VALID_URL_RE'不在cls.__dict__中:cls._VALID_URL_RE=re.compile(cls._VALID_URL)#...2.1剩下的交给classXxxIE(InfoExtractor):先引用extractor文件夹下的extractors.py在XxxIE中下载爬取,然后fromrequests_htmlimportHTMLclassXxxIE(InfoExtractor):_GEO_COUNTRIES=['CN']IE_NAME='xxx:blog'IE_DESC='我去'_VALID_URL=r'https?://(?:www\.|m\.)?xxx\.com.+posts?.+'_TEMPLATE_URL='%s://www.xxx.com/%s/posts/%s/'_LIST_VIDEO_RE=r']+?href="(?P
