25.Python快速开发分布式搜索引擎Scrapy精讲—请求与响应简介

时间：2023-03-25 19:32:38 Python

【百度云搜索，搜索各种资料：http://www.lqkweb.com】【搜索网盘，搜索各种资料：http://www.swpan.cn]Requests请求Requests请求就是我们写在爬虫文件中的Requests()方法，也就是提交一个请求地址，Requests请求就是我们自定义的**Requests()方法来提交一个Request　　参数：　　url=字符串类型url地址　　callback=回调函数名　　method=字符串类型请求方法，如果GET，POST　　headers=字典类型，浏览器Useragent　　cookies=设置cookies　　meta=字典类型的键值对，直接传一个指定的值给回调函数　　encoding=设置网页编码　　priority=默认为0，如果设置更高，越优先调度　　dont_filter=默认为False。如果设置为true，则过滤掉当前url#-*-coding:utf-8-*-importscrapyfromscrapy.httpimportRequest,FormRequestimportreclassPachSpider(scrapy.Spider):#爬虫的定义，必须继承scrapy.spidername='PACH'#set设置爬虫名称Allowed_domains=['www.luyin.org/']#爬域名#start_URLS=['']#爬取网站只适合做的请求不需要登录，因为不能设置cookie等信息header={'User-Agent':'Mozilla/5.0(WindowsNT10.0;WOW64;rv:54.0)Gecko/20100101Firefox/54.0'}#设置浏览器useragentdefstart_requests(self):#启动url函数，会替换start_urls"""第一次请求登录页面，设置并启用cookie获取cookie，设置回调函数"""return[Request(url='http://www.luyin.org/',headers=self.header,meta={'cookiejar':1},#打开Cookies记录，将Cookies传给回调函数callback=self.parse)]defleitparse(self,响应=t响应se):.xpath('/html/head/title/text()').extract()print(title)ResponseresponseResponseresponse是下载器返回的responseresponseResponseresponseparameterheaders返回响应header　　status返回status　　body返回页面内容，字节类型　　URL返回返回抓取url＃-*-编码：UTF-8-*-进口scrapy.http进口请求，formrequestimportreclasspachspider（scrapy.spider）：＃定义定义爬虫setcrawlernameallowed_domains=['www.luyin.org/']#抓取域名#start_urls=['']{header=header=不需要登录的信息'User-Agent':'Mozilla/5.0(WindowsNT10.0;WOW64;rv:54.0)Gecko/20100101Firefox/54.0'}#设置浏览器用户代理defstart_requests(self):#启动url函数，会替换start_urls"""第一次请求登录页面，设置cookie获取cookie，设置回调函数"""return[Request(url='http://www.luyin.org/',headers=self.header,meta={'cookiejar':1},#OpenCookies记录，将Cookie传递给回调函数callback=self.parse)]defparse(self,response):title=response.xpath('/html/head/title/text()').extract()print(title)print(response.headers)print(response.status)#print(response.body)print(response.url)

上一篇：神奇的RPython初探：解释器性能提升400倍

下一篇：用Python实现朋友圈九宫格图片

25.Python快速开发分布式搜索引擎Scrapy精讲—请求与响应简介相关文章