python爬虫有点无聊，之前弄了动漫的截图

时间：2023-03-26 17:14:17 Python

（在家玩腻了），然后去B站看了一些python爬虫的视频。我没有做基础理论学习，也就是直接上了实战。用同样的公式爬取还可以，至少可以爬取一些东西，hhh。今天我将分享我的一个爬虫代码。文不多说，直接上传完整代码ps：这段代码有些问题。每次我爬到命运之图，它都会给我报错。我必须尝试跳过它。如果有哪位大佬能帮我找一下如果有错误并指正，importrequestsasrimportreimportosimporttimefile_name="animescreenshot"ifnotos.path.exists(file_name):os.mkdir(file_name)forpinrange(1,34):print("--------------------抓取页面{}------------------".format(p))url='https://www.acgimage.com/shot...{}'.format(p)headers={"user-agent":"Mozilla/5.0(WindowsNT10.0;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/80.0.3987.162Safari/537.36"}resp=r.get(url,headers=headers)html=resp.textimages=re.findall('data-original="(.*?)"',html)names=re.findall('title="(.*?)"',html)print(images)print(names)dic=dict(zip(images,names))forimageinimages:time.sleep(1)打印(image,dic[image])name=dic[image]name=image.split('/')[-1]i=r.get（图像，标题=标题）。contenttry:withopen(file_name+'/'+name+'.jpg','wb')asf:f.write(i)exceptFileNotFoundError:continue首先导入要使用的库importrequestsasrimportreim端口osimport时间然后分析要抓取的网站：动漫截图网恒大外汇http://www.fx61.com/brokerlis...好了，url已经确定了，第一步找headers就完成了。下面是代码显示url='https://www.acgimage.com/shot...{}'.format(p)headers={"user-agent":"Mozilla/5.0(WindowsNT10.0;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/80.0.3987.162Safari/537.36"}然后获取要抓取图片的内容从上图可以找到图片的位置：data-origina=内容后面还有图片的名字：title=后面的内容然后用正则表达式re搜索即可images=re.findall('data-original="(.*?)"',html)names=re.findall('title="(.*?)"',html)最后保存就好了i=r.get(image,headers=headers).contentwithopen(file_name+'/'+name+'.jpg','wb')asf:f.write(i)还有一些细节，比如换页然后换页后面的数字跳转到对应的页换页。问题解决orpinrange(1,34):url='https://www.acgimage.com/shot...{}'.format(p)将爬取的图片放入自己使用os库file_name创建的文件zh中file_name="animationscreenshot"ifnotos.path.exists(file_name):os.mkdir(file_name)和sleep函数是为了不影响爬取的网站而使用的。虽然爬行速度较慢，但??这是道德时间.sleep(1)

上一篇：Python键盘记录器脚本

下一篇：神操作！竟然有人用Python在Excel中画了

python爬虫有点无聊，之前弄了动漫的截图相关文章