当前位置: 首页 > 后端技术 > Python

用Python同步&异步爬取某段视频

时间:2023-03-26 14:31:39 Python

什么是某段视频?我想每个人都有一点了解。刚开始接触某个视频的时候,很多视频都让我感动。即使是现在,也一样,正如我一直强调的那样,“所有的技巧都抵不上一个真实故事的重量”。那种触动人心的感觉,真是让人深刻的了解这个世界!了解世界!而人天生就是感知世界、适应世界、改变世界的。从这点来看,我还是挺喜欢某个视频的。因为它开阔了我们的视野,让我们知道世界上还有这样的事情。另外,某个视频可以时时刻刻提醒我们,人世间充满苦难,让我们觉得自己其实是幸福的。综上所述,简单说明和介绍一下我们今天要爬取的目标,即爬取某个视频上的视频。分享的内容主要分为同步爬取和异步爬取,爬取时间是比较的,因为代码难度不是很大,就不赘述了。Asyncio是异步使用的,相关练习在之前的文章中做过:~异步爬取某人的荣耀~,~为什么要用异步来写爬虫~,有兴趣的可以看看,做一下练习。同步代码如下:#coding:utf-8#__auth__="maiz"importosimportreimportrandomimportrequestsfromdatetimeimportdatetimefromlxmlimportetreeclassSync(object):headers={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/72.0.3626.121Safari/537.36'}download_folder="./videos"defrun(self):url='https://www.pearvideo.com/category_5'ifos.path.exists(self.download_folder):#检查这个文件夹是否存在print("文件夹已经存在")else:os.mkdir(self.download_folder)#如果不存在则创建print("文件夹已经存在创建")resp=requests.get(url,headers=self.headers)ifresp.status_code==200:tree=etree.HTML(resp.text)lis=tree.xpath('//ul[@id="categoryList"]/li')else:为liinlis引发requests.RequestException:filename,download_url=self.parse_video_url(li)print(f"==>开始下载{filename}")self.download(filename,download_url)defparse_video_url(self,li)->tuple:title=li.xpath('./div/a/div[2]/text()')[0].strip('“”!?').replace("|","").replace("|","")page=str(li.xpath('./div/a/@href')[0]).split('_')[1]ajax_url='https://www.pearvideo.com/videoStatus.jsp?'params={'contId':page,'mrd':random.random()}headers=self.headers.copy()headers.update({'Referer':'https://www.pearvideo.com/video_'+page})resp=requests.get(ajax_url,headers=headers,params=params)ajax_text=resp.json()download_url=ajax_text["videoInfo"]['videos']["srcUrl"]download_url=re.sub(r"\d{13}",f"cont-{page}",download_url)returntitle+".mp4",download_urldefdownload(self,文件名:str,url:str):resp=requests.get(url,headers=self.headers)如果resp.status_code==200:content=resp.contentwithopen(os.path.join(self.download_folder,filename),"wb")asfb:fb.write(content)print(f"Downloaded:{filename}")print("-"*60)else:raiserequests.RequestExceptionif__name__=='__main__':开始=datetime.now()s=Sync()s.run()end=datetime.now()print((end-start).total_seconds(),"seconds")小编最近找了个免费代理ip如果需要的话平台,点击获取PC端:http://i0k.cn/4KzbY移动端http://i0k.cn/53dbO异步代码如下:#coding:utf-8#__auth__="maiz"importosimportreimportrandomimportasyncioimportaiofilesimportaiohttpfromdatetimeimportdatetimefromlxmlimportetreeclassSpider(object):headers={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/72.0.3626.121Safari/537.36'}download_folder="./videos"urls=[]asyncdefmain(self):awaitself._get_video_urls()downloader=[asyncio.create_task(self._download_video(文件名,url))forfilename,urlinself.urls]awaitasyncio.gather(*downloader)asyncdef_get_video_urls(self):url='https://www.pearvideo.com/category_5'asyncwithaiohttp.ClientSession(headers=self.headers)assession:asyncwithsession.get(url)asresponse:ifresponse.status==200:text=awaitresponse.text()tree=etree.HTML(text)lis=tree.xpath('//ul[@id="categoryList"]/li')else:raiseaiohttp.ClientResponseErrorspider=[self._parse_video_url(li)forliinlis]awaitasyncio.wait(spider)asyncdef_parse_video_url(self,li):山雀le=li.xpath('./div/a/div[2]/text()')[0].strip('“”!?').replace("|","").replace("|","")page=str(li.xpath('./div/a/@href')[0]).split('_')[1]ajax_url='https://www.pearvideo.com/videoStatus.jsp?params={'contId':page,'mrd':random.random()}headers=self.headers.copy()headers.update({'Referer':'https://www.pearvideo.com/video_'+page})asyncwithaiohttp.ClientSession(headers=headers)assession:asyncwithsession.get(ajax_url,params=params)asresponse:ajax_text=awaitresponse.json()download_url=ajax_text["videoInfo"]['videos']["srcUrl"]download_url=re.sub(r"\d{13}",f"cont-{page}",download_url)self.urls.append((title+".mp4",download_url))asyncdef_download_video(self,filename:str,url:str):asyncwithaiohttp.ClientSession(headers=self.headers)assession:print(f"startdownload=>{filename}")asyncwithsession.get(url,headers=self.headers)asresponse:content=awaitresponse.read()asyncwithaiofiles.open(os.path.join(self.download_folder,filename),"wb")asfb:awaitfb.write(content)print(f"Downloaded=>{filename}.mp4")defrun(self):ifos.path.exists(self.download_folder):#检查这个文件夹是否存在print("Thefolderalreadyexists")else:os.mkdir(self.download_folder)#如果不存在则创建打印(“文件夹创建”)loop=asyncio.get_event_loop()loop.run_until_complete(self.main())if__name__=='__main__':start=datetime.now()s=Spider()s.run()end=datetime.now()print("="*40)print((end-start).total_seconds(),"seconds")右键运行代码在当前文件夹下生成一个videos文件夹,并下载相关的视频文件去看看你感兴趣的内容,代码得到后台回复:“某视频下载”。以上就是今天要跟大家分享的内容