当前位置: 首页 > 后端技术 > Python

scrapy-splash简单使用

时间:2023-03-26 00:09:47 Python

1.创建一个scrapy应用scrapystartprojectjingdong2.穿上爬虫(爬虫名字不能是scrapygenspiderjdjd.com3.打开scrapy-splash服务sudodockerrun-p8050:8050scrapinghub/splash4.安装scrapy-溅起框架pipinstallscrastscrapy-splash五。httpcompression.HttpCompressionMiddleware':810}SPLASH_URL='http://localhost:8050'DUPEFILTER_CLASS='scrapy_splash.SplashAwareDupeFilter'HTTPCACHE_STORAGE='scrapy_splash.SplashAwareFSCacheStorage'六、重写scrapy的start_requests方法调用request:defin(self_res)self_urls:.start_storageyieldSplashRequest(url,self.parse,args={'wait':'0.5'})完整示例:importscrapyfromscrapy_splashimportSplashRequestclassJdSpider(scrapy.Spider):name='jd'#allowed_domains=['jd.com','book.jd.com']start_urls=['https://book.jd.com/']defstart_requests(self):对于self.start_urls中的url:yieldSplashRequest(url,self.parse,args={'wait':'0.5'})defparse(self,response):div_list=response.xpath('//div[@class="book_nav_body"]/div')fordivindiv_list:title=div.xpath('./div//h3[@class="item_header_title"]/a/text()')print(title)