当前位置: 首页 > 后端技术 > Python

从原理到实战,详细的Scrapy爬虫教程,值得收藏

时间:2023-03-25 23:46:04 Python

涔嬪墠鍒嗕韩杩囧緢澶歳equests鍜宻eleniumPython鐖櫕鐨勬枃绔犮€傛湰鏂囧皢浠庡師鐞嗗埌瀹炴垬锛屽甫浣犱簡瑙e彟涓€涓己澶х殑妗嗘灦Scrapy銆傚鏋滀綘瀵筍crapy鎰熷叴瓒o紝涓嶅Θ璺熺潃杩欑瘒鏂囩珷涓€璧峰姩鎵嬪惂锛?.Scrapy妗嗘灦浠嬬粛Scrapy鏄細涓€涓敱Python璇█寮€鍙戠殑蹇€熴€侀珮绾х殑灞忓箷鎶撳彇鍜岀綉椤电埇鍙栨鏋讹紝鐢ㄤ簬鎶撳彇缃戠珯鍜屼粠椤甸潰涓彁鍙栫粨鏋勫寲鏁版嵁锛屽彧闇€灏戦噺浠g爜锛屽嵆鍙揩閫熸姄鍙?2.杩愯鍘熺悊Scrapy妗嗘灦鐨勮繍琛屽師鐞嗙湅涓嬪浘灏卞浜嗭紙鍏跺疄鍘熺悊姣旇緝澶嶆潅锛屼笉鏄笁瑷€涓よ璇存竻妤氱殑锛屾湁鍏磋叮鐨勮鑰呭彲浠ュ幓gzh銆怭ython缂栫▼瀛︿範銆戝湀]鏇村鐩稿叧鏂囩珷浜嗚В锛屾湰鏂囦笉鍋氳繃澶氳В閲婏級Scrapy涓昏鍖呮嫭浠ヤ笅缁勪欢锛氬紩鎿庯紙ScrapyEngine锛夐」鐩皟搴﹀櫒锛圫cheduler锛変笅杞藉櫒锛圖ownloader锛夌埇铏紙Spiders锛夐」鐩閬擄紙Pipeline锛変笅杞紻ownloaderMiddlewares銆丼piderMiddlewares鍜孲chedulerMiddlewares涓夈€佸叆闂?.1瀹夎绗竴绉嶏細鍦ㄥ懡浠よ妯″紡涓嬩娇鐢╬ip鍛戒护瀹夎锛?pipinstallscrapy绗簩绉嶏細鍏堜笅杞斤紝鍐嶅畨瑁咃細$pipdownloadscrapy-d./#閫氳繃鍥藉唴鎸囧畾闀滃儚婧愪笅杞?pipdownload-ihttps://pypi.tuna.tsinghua.edu.cn/simplescrapy-d./杩涘叆涓嬭浇鐩綍鐒跺悗鎵цth涓嬮潰鍛戒护瀹夎锛?pipinstallScrapy-1.5.0-py2.py3-none-any.whl3.2浣跨敤鍜屼娇鐢ㄥぇ鑷村垎涓轰互涓嬪洓姝?鍒涘缓涓€涓猻crapy椤圭洰scrapystartprojectmySpider2鐢熸垚涓€涓埇铏玸crapygenspiderdemo"demo.cn"3鎻愬彇鏁版嵁锛屾敼杩涚埇铏娇鐢▁path绛?鍦ㄧ閬撲腑淇濆瓨鏁版嵁淇濆瓨鏁版嵁3.3绋嬪簭杩愯鍦ㄥ懡浠crapycrawlqb涓繍琛岀埇铏?qb鐖櫕鐨勫悕绉板湪pycharm涓繍琛岀埇铏玣romscrapyimportcmdlinecmdline.execute("scrapycrawlqb".split())鍥涖€佸熀鏈楠や娇鐢⊿crapy鐖櫕妗嗘灦鐨勫叿浣撴楠ゅ涓嬶細銆嬮€夋嫨鐩爣缃戠珯瀹氫箟鎶撳彇鐨勬暟鎹紙閫氳繃瀹屾垚ScrapyItems锛夊啓涓€涓彁鍙栨暟鎹殑铚樿洓骞舵墽琛岃湗铔涜幏鍙栨暟鎹拰鏁版嵁瀛樺偍鈥濅簲銆傜洰褰曟枃浠惰鏄庢垜浠湪鍒涘缓scrapy椤圭洰鐨勬椂鍊欙紝缁х画鍒涘缓涓€涓猻pider銆傜洰褰曠粨鏋勫涓嬶細绠€鍗曚粙缁嶄竴涓嬪悇涓富瑕佹枃浠剁殑浣滅敤锛歴crapy.cfg锛氶」鐩厤缃枃浠秏ySpider/锛氶」鐩甈ython妯″潡锛屼唬鐮佷細浠庤繖閲屽紩鐢╩ySpider/items.py锛氶」鐩洰鏍囨枃浠秏ySpider/pipelines.py锛氶」鐩閬撴枃浠秏ySpider/settings.py锛氶」鐩缃枃浠秏ySpider/spiders/锛氬瓨鏀剧埇铏唬鐮佺洰褰?.1scrapy.cfg鏂囦欢椤圭洰閰嶇疆鏂囦欢杩欐槸鏂囦欢鐨勫唴瀹癸細#鑷姩鍒涘缓鑰咃細scrapystartproject##鏈夊叧[deploy]閮ㄥ垎鐨勬洿澶氫俊鎭紝璇峰弬闃咃細#https://scrapyd.readthedocs.io/en/latest/deploy.html[settings]default=mySpider.settings[deploy]#url=http://localhost:6800/project=mySpider/椤圭洰鐨刴ySpider5.2Python妯″潡锛屼唬鐮?.3mySpider/items.py椤圭洰鐨勭洰鏍囨枃浠朵細浠庤繖閲屽紩鐢?鍦ㄦ澶勫畾涔夋偍鐨勫凡鎶撳彇椤圭洰鐨勬ā鍨?#璇峰弬闃呬互涓嬫枃妗o細#https://docs.scrapy.org/en/latest/topics/items.htmlimportscrapyclassMyspiderItem(scrapy.Item):#鍦ㄦ澶勪负鎮ㄧ殑椤圭洰瀹氫箟瀛楁like:#name=scrapy.Field()pass瀹氫箟浜唖crapyitems鐨勬ā鍧楋紝渚嬪锛歯ame=scrapy.Field()mySpider/pipelines.py椤圭洰鐨?.4pipeline鏂囦欢#鍦ㄨ繖閲屽畾涔変綘鐨刬tempipelines##涓嶈蹇樿灏嗘偍鐨勭閬撴坊鍔犲埌ITEM_PIPELINES璁剧疆#璇峰弬闃咃細https://docs.scrapy.org/en/latest/topics/item-pipeline.html#瀵逛簬浣跨敤鍗曚釜inter澶勭悊涓嶅悓鐨勯」鐩被鍨嬪緢鏈夌敤闈㈠鏉ヨ嚜itemadapterimportItemAdapterclassMyspiderPipeline:defprocess_item(self,item,spider):杩斿洖鐨刬tem鏂囦欢灏辨槸鎴戜滑鎵€璇寸殑pipeline銆傚綋Item鍦⊿pider涓鏀堕泦鍚庯紝浼氳浼犻€掔粰ItemPipeline锛堢閬擄級銆傝繖浜汭temPipeline缁勪欢鎸夌収瀹氫箟鐨勯『搴忓鐞嗘瘡涓狪temItemPipeline鏄竴涓狿ython绫伙紝瀹冨疄鐜颁簡绠€鍗曠殑鏂规硶锛屼緥濡傜‘瀹欼tem鏄惁琚涪寮冨拰瀛樺偍銆備互涓嬫槸itempipeline鐨勪竴浜涘吀鍨嬪簲鐢細楠岃瘉鐖彇鐨勬暟鎹紙妫€鏌tem鏄惁鍖呭惈鏌愪簺瀛楁锛屼緥濡俷ame瀛楁锛夋鏌ラ噸澶嶏紙骞朵涪寮冿級灏嗙埇鍙栫殑缁撴灉淇濆瓨鍒版枃浠舵垨鏁版嵁搴撲腑5.5mySpider/settings銆俻yprojectSettingsfile#ScrapysettingsformySpiderproject...BOT_NAME='mySpider'#scrapyprojectnameSPIDER_MODULES=['mySpider.spiders']NEWSPIDER_MODULE='mySpider.spiders'.......#Obeyrobots.txt澶嶅埗浠g爜rulesROBOTSTXT_OBEY=False#閬靛畧绾﹀畾锛屼竴鑸负false锛屼絾鏄垱寤洪」鐩悗涓篢rue锛屾垜浠敼涓篎alse#閰嶇疆Scrapy鎵ц鐨勬渶澶у苟鍙戣姹傛暟锛堥粯璁わ細16锛?CONCURRENT_REQUESTS=32#鏈€澶у苟鍙戣姹傛暟榛樿16銆傘€?...#DOWNLOAD_DELAY=3#涓嬭浇寤惰繜3绉?瑕嗙洊榛樿璇锋眰澶达細#璇锋眰澶达紝鎴戜滑鎵撳紑DEFAULT_REQUEST_HEADERS={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Language':'en',}#SpiderMiddleware#SPIDER_MIDDLEWARES={#'mySpider.middlewares.MyspiderSpiderMiddleware':543,#}#涓嬭浇涓棿浠?DOWNLOADER_MIDDLEWARES={#'mySpider.middlewares.MyspiderDownloaderMiddleware':543,#}......#閰嶇疆椤圭洰绠¢亾#璇峰弬闃卙ttps://docs.scrapy.org/en/latest/topics/item-pipeline.html#ITEM_PIPELINES={#'mySpider.pipelines.MyspiderPipeline':300,#Pipeline#}......鐪佺暐鍙风渷鐣ヤ唬鐮侊紝涓€鑸噸瑕佺偣缁欏嚭娉ㄩ噴6.mySpider/spiders/:store鐖櫕浠g爜鐩綍importscrapyclassDbSpider(scrapy.Spider):name='db'allowed_domains=['douban.com']#鍙互淇敼start_urls=['http://douban.com/']#寮€澶磚rl涔熷彲浠ヤ慨鏀筪efparse(self,response):#pass6.ScrapyshellScrapy缁堢鏄竴涓氦浜掑紡缁堢锛屾垜浠彲浠ュ湪涓嶅惎鍔ㄨ湗铔涚殑鎯呭喌涓嬪皾璇曞拰璋冭瘯浠g爜锛屼篃鍙互鐢ㄦ潵娴嬭瘯XPath鎴朇SS琛ㄨ揪寮忕湅鐪嬪畠浠槸濡備綍宸ヤ綔鐨勶紝鏂逛究鎴戜滑浠庣埇鍙栫殑缃戦〉涓彁鍙栨暟鎹紝浣嗕竴鑸敤鐨勪笉澶氥€傛湁鍏磋叮鐨勫彲浠ユ煡鐪嬪畼鏂规枃妗o細瀹樻柟鏂囨。http://scrapy-chs.readthedocs.io/zh_CN/latest/topics/shell.htmlScrapyShell鏄牴鎹笅杞界殑椤甸潰浼氳嚜鍔ㄥ垱寤轰竴浜涙柟渚跨殑瀵硅薄锛屾瘮濡俁esponse瀵硅薄鍜岄€夋嫨鍣ㄥ璞★紙鐢ㄤ簬HTML鍜孹ML鍐呭锛夈€傚姞杞絪hell鏃讹紝鎮ㄥ皢鑾峰緱鍖呭惈鍝嶅簲鏁版嵁鐨勬湰鍦板搷搴斿彉閲忋€傝緭鍏esponse.body浼氳緭鍑哄搷搴斿寘body锛岃緭鍑簉esponse.headers鍙互鐪嬪埌鍝嶅簲鍖呭ご銆傚綋杩涘叆response.selector鏃讹紝浣犱細寰楀埌涓€涓猺esponse鍒濆鍖栫殑Selector瀵硅薄锛屼綘鍙互浣跨敤response.selector.xpath()鎴杛esponse.selector.css()鏉ユ煡璇esponse銆係crapy涔熸彁渚涗簡涓€浜涘揩鎹锋柟寮忥紝姣斿response.xpath()鎴杛esponse.css()涔熷彲浠ョ敓鏁堬紙濡備笂渚嬶級銆係electors閫夋嫨鍣ㄣ€奡crapySelectors鍐呯疆XPath鍜孋SSSelector琛ㄨ揪寮忔満鍒躲€婼elector鏈夊洓绉嶅熀鏈柟娉曪紝鏈€甯哥敤鐨勬槸xpath:xpath()锛氫紶鍏path琛ㄨ揪寮忥紝杩斿洖琛ㄨ揪寮忓垪琛╨ist瀵瑰簲鐨勬墍鏈夎妭鐐圭殑閫夋嫨鍣╡xtract()锛氬皢鑺傜偣搴忓垪鍖栦负瀛楃涓插苟杩斿洖listcss()锛氫紶鍏ヤ竴涓狢SS琛ㄨ揪寮忥紝杩斿洖璇ヨ〃杈惧紡瀵瑰簲鐨勬墍鏈夎妭鐐圭殑閫夋嫨鍣ㄥ垪琛╨ist锛岃娉曞悓BeautifulSoup4re()锛氭牴鎹鍒欓€氳繃浼犲叆鐨勮〃杈惧紡鎻愬彇鏁版嵁骞惰繑鍥炲瓧绗︿覆鍒楄〃VII.瀹炶返妗堜緥鏈妭鎴戝皢浠ヤ娇鐢⊿crapy鐖彇鎴樺簱鏁版嵁涓轰緥7.1妗堜緥鎻忚堪鏃㈢劧鎴戜滑瀵箂crapy鐨勫伐浣滄祦绋嬪拰鍘熺悊鏈変簡鍒濇鐨勪簡瑙o紝閭d箞鎴戜滑鏉ュ仛涓€涓叆闂ㄧ殑灏忔渚嬶紝鐖彇鍟嗗搧淇℃伅鎺ㄨ崘閫氳繃娓e彜涓婚〉銆傚涓嬪浘锛屼竴涓皬鏂规鏄竴涓猧tem淇℃伅銆傛垜浠渶瑕佹彁鍙栨瘡涓」鐩殑鍏釜缁勪欢锛歩mgLink锛堝皝闈㈠浘鐗囬摼鎺ワ級锛涙爣棰橈紙鏍囬锛夛紱绫诲瀷锛堢被鍨嬶級锛涜闂€咃紙鍙楁杩庣▼搴︼級锛涜瘎璁猴紙璇勮缂栧彿锛夛紱likes(鎺ㄨ崘鍙?鐒跺悗灏辨槸涓€涓猵ageitem锛屾垜浠繕闇€瑕侀€氳繃缈婚〉鏉ュ疄鐜版壒閲忛噰闆嗘暟鎹€?.2鏂囦欢閰嶇疆鐩綍缁撴瀯鎴戜滑涔嬪墠宸茬粡璁茶В浜嗘柊鐨剆crapy椤圭洰锛坺cool锛夊拰spider椤圭洰锛坺c锛夛紝杩欓噷涓嶅啀璧樿堪锛岀劧鍚庡緱鍒版垜浠殑鐩綍缁撴瀯濡備笅鍥撅細start.py鏂囦欢骞朵负浜嗚繍琛屾柟渚匡紝鍦▃cool鐩綍涓嬫柊寤哄惎鍔ㄦ枃浠躲€傚苟杩涜鍒濆鍖栬缃€俧romscrapyimportcmdlinecmdline.execute('scrapycrawlzc'.split())settings.pyfile鍦ㄨ繖涓枃浠朵腑锛屾垜浠渶瑕佸仛鍑犱釜璁剧疆馃憞閬垮厤绋嬪簭杩愯鏃舵墦鍗發og鏃ュ織淇℃伅LOG_LEVEL='WARNING'ROBOTSTXT_OBEY=False娣诲姞璇锋眰鏍囧ご锛氭墦寮€绠¢亾锛歩tem.py鏂囦欢importscrapyclassZcoolItem(scrapy.Item):#鍦ㄨ繖閲屼负鎮ㄧ殑椤圭洰瀹氫箟瀛楁锛屼緥濡傦細imgLink=scrapy.Field()#灏侀潰鍥剧墖閾炬帴title=scrapy.Field()#Titletypes=scrapy.Field()#typevistor=scrapy.Field()#浜烘皵comment=scrapy.Field()#璇勮鏁發ikes=scrapy.Field()#鎺ㄨ崘鏁?.3椤甸潰鏁版嵁鎻愬彇棣栧厛锛屾垜浠湪鐐叿椤甸潰浣跨敤xpath-helper娴嬭瘯锛氱劧鍚庡湪zc.py鏂囦欢涓仛鍒濇娴嬭瘯锛歞efparse(self,response):divList=response.xpath('//div[@class="work-list-box"]/div')print(len(divList))缁撴灉濡備笅鍥撅細娌¢棶棰橈紝鐒跺悗鎴戜滑瑙f瀽鎻愬彇鍚勭淇℃伅锛宒efparse(self,response):divList=response.xpath('//div[@class="work-list-box"]/div')fordivindivList:imgLink=div.xpath("./div[1]/a/img/@src").extract()[0]#1.灏侀潰鍥剧墖閾炬帴...2.title(鏍囬);3绉嶏紙绫诲瀷锛夛紱4vistor锛堜汉姘旓級锛?comment锛堣瘎璁烘暟锛?...likes=div.xpath("./div[2]/p[3]/span[3]/@title").extract_first()#6likes锛堟帹鑽愭暟閲忥級item=ZcoolItem(imgLink=imgLink,title=title,types=types,vistor=vistor,comment=comment,likes=likes)yielditem瑙i噴锛歺path鏁版嵁鎻愬彇鏂瑰紡锛歋.N.method&descriptionextract()杩斿洖鎵€鏈夌鍚堣姹傜殑鏁版嵁锛屾湁涓€涓猯istextract_first()杩斿洖hrefs鍒楄〃涓殑绗竴涓暟鎹€俫et()鍜宔xtract_first()鏂规硶杩斿洖鐩稿悓鐨勫€硷紝鍗冲垪琛ㄤ腑鐨勭涓€涓暟鎹€俫etall()鏂规硶涓巈xtract()鏂规硶鐩稿悓锛岃繑鍥炴墍鏈夌鍚堣姹傜殑鏁版嵁锛屽苟瀛樺偍鍦ㄤ竴涓垪琛ㄤ腑銆傛敞鎰忥細"get()銆乬etall()鏂规硶鏄柊鏂规硶锛宔xtract()銆乪xtract_first()鏂规硶鏄€佹柟娉曘€俥xtract()銆乪xtract_first()鏂规硶鑾峰彇涓嶅埌浼氳繑鍥濶one銆俫et()銆乬etall()鏂规硶灏嗗紩鍙戦敊璇€傗€滻tem瀹炰緥鍒涘缓锛坹ield涓婇潰浠g爜琛岋級杩欓噷鎴戜滑宸茬粡鍦ㄧ洰褰曟枃浠堕厤缃殑item鏂囦欢涓缃ソ浜嗭紝涓轰簡鏁版嵁鐨勫瓨鍌紝鎴戜滑闇€瑕佸湪鐖櫕鏂囦欢鐨勫紑澶村鍏ヨ繖涓被锛歠romzcool.itemsimportZcoolItem鐒跺悗浣跨敤yield杩斿洖鏁版嵁銆傛鏃犵枒闂负浠€涔堢敤yield鑰屼笉鏄痳eturn涓嶈兘鐢╮eturn锛屽洜涓鸿缈婚〉锛岀敤return鐩存帴閫€鍑哄嚱鏁帮紱鑰宖oryield锛氳皟鐢╢or鏃讹紝鍑芥暟涓嶄細绔嬪嵆鎵ц锛岃€屾槸杩斿洖涓€涓猤enerator瀵硅薄銆傝凯浠f椂鍑芥暟寮€濮嬫墽琛岋紝yield鏃惰繑鍥炲綋鍓嶅€?i)銆備箣鍚庤繖涓嚱鏁颁細寰幆鎵ц锛岀洿鍒版病鏈変笅涓€涓€笺€?.4缈婚〉瀹炵幇鎵归噺鏁版嵁閲囬泦閫氳繃涓婇潰鐨勪唬鐮侊紝鍙互鍒濇瀹炵幇鏁版嵁閲囬泦锛屼絾鏄彧鑳芥槸绗竴椤碉紝濡備笅鍥撅細浣嗘槸鎴戜滑鐨勭洰鏍囨槸閲囬泦100椤电殑鎵归噺鏁版嵁锛屾墍浠ヤ唬鐮佽繕闇€瑕佸緟淇敼銆備笅闈粙缁嶄袱绉嶇炕椤垫柟娉曪細鏂规硶涓€锛氭垜浠厛鍦ㄩ〉闈腑瀹氫綅鍒颁笅涓€椤电殑鎸夐挳锛屽涓嬪浘鎵€绀猴細鐒跺悗鍦╢or寰幆瀹屾垚鍚庣紪鍐欏涓嬩唬鐮併€俷ext_href=response.xpath("//a[@class='laypage_next']/@href").extract_first()濡傛灉next_href:next_url=response.urljoin(next_href)print('*'*60)print(next_url)print('*'*60)request=scrapy.Request(next_url)yieldrequestsscrapy.Request()锛氬皢涓嬩竴椤电殑url浼犻€掔粰Request鍑芥暟锛屾敹闆嗙炕椤靛懆鏈熸暟鎹€俬ttps://www.cnblogs.com/heymonkey/p/11818495.html#scrapy.Request()鍙傝€冮摼鎺ユ敞鎰忔柟娉曚竴鏄笅涓€椤垫寜閽殑href瀵瑰簲灞炴€у€间笌涓嬩竴椤祏rl涓€鑷?鏂规硶浜岋細瀹氫箟涓€涓叏灞€鍙橀噺count=0锛屾瘡鐖彇涓€椤垫暟鎹姞1锛屾柊寤轰竴涓猽rl锛岀劧鍚庣敤scrapy.Request()鍙戣捣璇锋眰銆傚涓嬪浘锛歝ount=1classZcSpider(scrapy.Spider):name='zc'allowed_domains=['zcool.com.cn']start_urls=['https://www.zcool.com.cn/home?p=1#tab_anchor']#绗竴椤电殑urldefparse(self,response):globalcountcount+=1fordivindivList:#...xxx...nyieldexturl=www'kuaikanmanhua.com/tag/0?state=1&sort=1&page={}'.format(count)yieldscrapy.Request(next_url)瀹為檯妗堜緥涓細鐢ㄥ埌杩欎袱涓柟娉曘€?.5鏁版嵁瀛樺偍鏁版嵁瀛樺偍鍦╬ipline.py涓繘琛岋紝浠g爜濡備笅锛歠romitemadapterimportItemAdapterimportcsvclassZcoolPipeline:def__init__(self):self.f=open('Zcool.csv','w',encoding='utf-8',newline='')#line1self.file_name=['imgLink','title','types','vistor','comment','likes']#line2self.writer=csv.dictwriter锛坰elf.f锛宖ieldnames=self.file_name锛夛純line3self.writer.writer.writeheader锛堬級锛僱ine4defprocess_item锛坰elf锛宨tem锛宨tem锛宻pider锛夛細self.writer.writer.writer.writer.writerow锛坉ict锛坕tem锛夛級锛僱ine5print5print5print锛坕tem锛塺eturnitem#line6defclose_spider(self,spider):self.f.close()瑙i噴锛歭ine1锛氭墦寮€鏂囦欢锛屾寚瀹氭柟娉曚负write锛屼娇鐢ㄧ涓変釜鍙傛暟娑堥櫎csv鍐欏叆鏁版嵁鏃朵骇鐢熺殑绌鸿line2锛氬厛璁剧疆鏂囦欢琛岀殑瀛楁鍚嶏紝娉ㄦ剰涓€瀹氳鍜岃湗铔涗紶杩囨潵鐨勫瓧鍏搁敭鍚嶄竴鑷磍ine3:鎸囧畾鏂囦欢鐨勫啓娉曟槸csvdictionary鍐欐硶锛屽弬鏁?鎸囧畾鍏蜂綋鏂囦欢锛屽弬鏁?鎸囧畾瀛楁鍚峫ine4:鍐欑涓€琛屽瓧娈靛悕锛屽洜涓哄彧闇€瑕佸啓涓€娆★紝鎵€浠ユ枃浠舵斁鍦╛_init__line5:鍐欏叆鍏蜂綋浼犺繃鏉ョ殑鍊約pider锛屾敞鎰弒pider鏂囦欢涓瓂ield鐨刬tem鏄竴涓被鍒涘缓鐨勫疄渚嬪璞★紝鎴戜滑鍐欑殑鏄痙ata锛屽啓鐨勬槸瀛楀吀锛屾墍浠ヨ繖閲岄渶瑕佽繘琛岃浆鎹€俵ine6锛氬啓瀹岋紝鍥炲埌7.6銆傜▼搴忚繍琛屾槸鍥犱负涔嬪墠鍒涘缓浜唖tart.py鏂囦欢锛屽苟涓哄叾鍋氫簡鍒濆鍖栬缃€傜幇鍦ㄨ繍琛岀埇铏▼搴忥紝涓嶉渶瑕佸湪鎺у埗鍙拌緭鍏ュ懡浠わ細scrapycrawlzc锛堢埇铏」鐩悕锛夎繍琛宻tart銆俻y鏂囦欢锛氬緱鍒板涓嬬粨鏋滐細瀵瑰簲椤甸潰锛氭墦寮€csv鏂囦欢濡備笅鍥撅細锛堝洜涓篶sv鏂囦欢鍦╳ord涓槸涔辩爜锛岃繖閲屾垜鐢╪otepad++鎵撳紑锛夋病闂锛屾暟鎹噰闆嗗仛瀹屼簡銆?.7.鎬荤粨鍏ラ棬妗堜緥闇€瑕佺粏蹇冿紝涓昏鏄珐鍥哄熀纭€鐭ヨ瘑锛屼负杩涢樁瀛︿範鍋氬噯澶囥€?/p>