当前位置: 首页 > 后端技术 > Node.js

使用puppeteer+nodejs爬取喜欢的动画资源

时间:2023-04-03 15:05:41 Node.js

鍦嗗渾鏈€杩戠獊鐒舵兂灏濊瘯鍓棰戯紝鎵€浠ユ兂鍏堜粠鍔ㄦ极寮€濮嬨€備簩娆″厓鍏ユ墜锛屽繀椤昏鏈夊師瑙嗛鎵嶈兘鍓棰戙€備綘濡備綍鎵惧埌杩欎簺璧勬簮锛熸垜鐭ラ亾鏈夊緢澶氥€傛垜缁忓父鍘籐iuDM銆傜湅鍔ㄦ极锛岄噷闈㈢殑鍔ㄦ极娓呮櫚搴﹁繕涓嶉敊锛屼簬鏄兂鐫€鎬庝箞鍐欎竴涓埇铏洿鎺ヤ笅杞借嚜宸卞枩娆㈢殑鍔ㄦ极銆傛瘯绔熸槸缁欏墠绔敤鐨勩€傛墜鍔ㄤ笅杞芥湁鐐瑰按灏€備笅杞藉悗鐨勬渶缁堟晥鏋滅殑鏂囦欢鍚嶆槸娌℃湁鐨勩€俶p4鎬庝箞瑙e喅锛熸瘮濡傛垜涓嬭浇鐨勯緳鐚氨鏄痽um鏍煎紡鐨勩€傛垜鐩存帴鎶婂悗缂€鍚嶆敼鎴?mp4灏辨悶瀹氫簡銆傚墠绔nodejs鐣ョ煡涓€浜岋紝瀵箇indow绯荤粺涓€绐嶄笉閫氥€侾uppeteer鐗堟湰锛?4.3.0鑺傜偣鐗堟湰锛?6.1.0寮€濮嬪垎鏋愮綉绔欙紝闅忎究鎼滅储涓€涓枩娆㈢殑鍔ㄧ敾浠嬬粛椤甸潰鐐瑰嚮涓€涓挱鏀惧湴鍧€锛孎12鍚姩锛屽垎鏋愰〉闈㈠氨OK浜嗭紝杩欎釜缃戠珯杩樻槸涓轰簡濂界帺锛岃皟璇曞櫒浼氬憡璇夋垜濡備綍鍦ㄦ墦寮€鏃惰烦杩囪皟璇曞櫒鐨勬寰幆鍙槸鍦╥frame涓紝闅惧害闄嶄綆浜嗭紝鎴戣繕寰楃粰鍒汉璋冭瘯銆傪煒勷煒勫垎鏋愭挱鏀惧湴鍧€鐨勭敱鏉ユ€濊矾涓€锛氶€氳繃鎺ュ彛璇锋眰鍒嗘瀽锛屾槸鍚︽湁鍏卞悓鐐规姄鍖呭悗鍙戠幇锛岀涓€闆嗗拰绗簩闆嗙殑鎾斁璧勬簮璺緞娌℃湁鍏辨€э紝鍙互璇达紝鏀惧純o(鈺ワ箯鈺?o鎬濊矾2锛氱洿鎺ョ湅鎾斁鍣ㄦ簮鐮侀€昏緫锛屾壘鍒皍rl鐨勬嫾鎺ラ€昏緫鎵惧嚭鎾斁鍣ㄦ簮鐮侊紝鐩存帴閫氳繃璋冭瘯宸ュ叿鎵惧埌鎵€鏈塲s鏂囦欢锛屽啀鐪嬬湅鍒嗘瀽鎾斁鍣ㄦ簮鐮侀槄璇绘挱鏀惧櫒婧愮爜锛屽彂鐜拌繖涓綉绔欎細鍦ㄩ〉闈腑瀛樺偍涓€涓叏灞€鍙橀噺銆傚叏灞€鍙橀噺player_aaaa浼氳璧嬪€煎瓨鍌紝鐒跺悗浼氬紩鍏ヤ竴涓猨s鏂囦欢銆傛枃浠跺悕锛?static/player/parse.js鎵撳紑璋冭瘯宸ュ叿锛屾壘鍒?static/player/parse.js鏂囦欢锛?static/player/parse.js鏂囦欢鍐呭娴忚鍣ㄦ帶鍒跺彴杈撳叆锛歁acPlayer.Parse+MacPlayer.PlayUrl鍥犱负鎾斁鍣ㄧ殑婧愮爜鏄竴涓嚜鎵ц鍑芥暟锛岀劧鍚庢垜浠湅鍒拌繖涓猵arse.js鏂囦欢涓殑璧勬簮鎷兼帴鏂规硶锛屾墍浠ユ垜浠彲浠ョ洿鎺ュ湪娴忚鍣ㄧ殑鎺у埗鍙版嫾鍑鸿繖涓祫婧愶紝鐒跺悗鍙冲嚮鍗冲彲淇濆瓨锛熷皢浠ヤ笂鍦板潃鏀惧叆娴忚鍣ㄨ闂紝鍙戠幇灏辨槸鎴戜滑瑕佷笅杞界殑璧勬簮銆傝繖涓€姝ユ垜浠彲浠ュ彸閿繚瀛樸€傚綋鐒讹紝浣滀负涓€涓悎鏍肩殑鍓嶇锛屾€庝箞鍙兘鍘诲彸閿繚瀛樺憿锛熺瓑鎴戜滑涓嬫潵锛屽氨鍑嗗鍔ㄧ敤澶ф潃鍣紝浜哄伓eer閰嶅悎nodejs甯姪鎴戜滑瀹炵幇璧勬簮demo鐨勮嚜鍔ㄤ笅杞藉浣曞仛鑷姩鍖栵紵閫氳繃涓婇潰鐨勮繛鎺ワ紝浼氳繘鍏ヤ竴涓В鏋愰〉闈紝鍥犱负鎴戜滑瑕佸仛鑷姩涓嬭浇锛屾墍浠ュ繀椤昏鎵惧埌瑙嗛婧愯繛鎺ワ紝涓嶇劧涓嶈锛宱(鈺ワ箯鈺?o鎼滅储鍏冪礌椤甸潰锛屾壘鍒版渶鍚庣殑resourceaddressfinal璧勬簮鍦板潃浣跨敤puppeteer瑙f瀽椤甸潰锛岃幏鍙栬棰戣祫婧愬湴鍧€锛岀劧鍚庝娇鐢╪odejs鑷姩涓嬭浇瑙嗛鎬濊矾涓€锛氶亶鍘嗘挱鏀惧垪琛紝鐒跺悗鍚姩涓€涓换鍔★紝渚濇鎵撳紑椤甸潰锛屾壘鍒拌祫婧愬湴鍧€锛岀劧鍚庢敹闆嗘墍鏈夌殑鎾斁璧勬簮鍦板潃锛屼娇鐢ㄦ湰鍦颁笅杞絥odejs锛屼负浠€涔堜笉鐢ㄤ笂闈㈢殑idea锛屽洜涓洪偅涓猧dea鐨勪唬鐮佹垜鍐欏畬浜嗭紝娴嬭瘯浜嗕竴涓嬶紝鍙戠幇浠栫殑鏈嶅姟鍣ㄥ鐞嗕笉浜嗭紝杩樻槸瀹夊叏鐐瑰ソ锛屼竴娆′竴涓搷浣滄€濊矾浜岋細浣跨敤puppetee鑷姩瑙﹀彂鍙抽敭涓嬭浇骞朵繚瀛樺埌鎴戜滑瑕佷笅杞界殑鍦版柟锛堣繖涓柟娉曟垜浠繕娌¤瘯杩囷級鎬濊矾涓夛細閬嶅巻鎾斁鍒楄〃锛岀劧鍚庡紑濮嬩竴涓换鍔★紝浠庣涓€涓紑濮嬶紝鎵撳紑椤甸潰锛屾壘鍒拌祫婧愬湴鍧€锛屼娇鐢╪odejs涓嬭浇鍒版湰鍦帮紝涓嬭浇瀹屾垚锛屼笅涓€涓槸杩欐牱鐨勩€傛€濊矾3闅剧偣鍒嗘瀽pupeteer濡備綍鑾峰彇鍏冪礌鐨勫睘鎬э紝鍒棶鎴戯紝鍙嶆鎴戜篃涓嶆噦锛孲tackOverflow鐨勭瓟妗堟槸stackoverflow澶т浆鍛婅瘔鎴戠殑//getawaitpage.evaluate('document.querySelector("span.styleNumber").getAttribute("data-Color")')//鑾峰彇澶氫釜constattr=awaitpage.$$eval("span.styleNumber",el=>el.map(x=>x.getAttribute("鏁版嵁棰滆壊")));nodejs涓嬭浇杩滅▼瑙嗛骞舵樉绀鸿繘搴onstfs=require('fs');consthttps=require('https')//鎴戠殑demo浣跨敤axios涓嬭浇constaxiosRequest=require('./utils/request');//杩欐槸涓€涓猘xios瀹炰緥axiosRequest.get('https://media.w3.org/2010/05/sintel/trailer.mp4',{responseType:'stream'}).then(response=>{//杩斿洖澶翠腑鐨勫唴瀹归暱搴﹀瓧娈碉紝瀹冨皢鍛婅瘔鎴戜滑瑙嗛鏈夊澶?/鑾峰彇瑙嗛鐨勬€婚暱搴︼紙浠ュ瓧鑺備负鍗曚綅锛塩onsttotalLength=response.headers['content-length']//褰撳墠鏁版嵁鎬婚暱搴ettotalChunkLength=0//褰撳墠璇诲彇鐨勬祦constreadSteam=response.data//璇诲彇娴佸皢瑙﹀彂鐨勪簨浠秗eadSteam.on('data',(chunk)=>{totalChunkLength+=chunk.lengthconsole.log('鏁版嵁浼犺緭锛屽綋鍓嶈繘搴?=>',((totalChunkLength/totalLength)*100).toFixed(2)+'%')});//璇诲彇瀹屾垚鏃堕棿readSteam.on('end',(chunk)=>{console.log('鑾峰彇杩滅▼鏁版嵁瀹屾垚')});//璇诲彇閿欒灏嗚Е鍙戠殑浜嬩欢readSteam.on('error',(err)=>{console.log('鑾峰彇杩滅▼绔暟鎹畬鎴愶紝鍑虹幇閿欒锛岄敊璇俊鎭?=>',err)});//鍐欏叆鏈湴鏂囦欢鍚峜onstfileName=67.mp4//璋冪敤nodejs鍐欏叆鏂囦欢methodconstwriteFile=readSteam.pipe(fs.createWriteStream(fileName))//鍐欏叆瀹屾垚浜嬩欢writeFile.on("finish",()=>{writeFile.close();console.log("鎭枩灏忓紵锛屾湰鍦版暟鎹啓鍏ュ仛瀹屼簡鈥濓級;});//鍐欏叆閿欒瑙﹀彂鐨勪簨浠秝riteFile.on("error",(err)=>{console.log("鎶辨瓑锛屽啓鍏ユ湰鍦版枃浠舵椂鍑虹幇寮傚父锛岄敊璇俊鎭?=>",err);});});//axios浠g爜濡備笅constaxios=require('axios')//鍒涘缓涓€涓猘xios瀹炰緥constservice=axios.create({baseURL:'',//api鐨刡ase_url//姘镐笉瑜壊锛岀湡鐢蜂汉灏辨槸杩欎箞鎵х潃馃槃馃槃timeout:90000000//璇锋眰瓒呮椂鏃堕棿})//璇锋眰鎷︽埅鍣╯ervice.interceptors.request.use(config=>{returnconfig},error=>{//澶勭悊璇锋眰閿欒console.log(error)//鐢ㄤ簬璋冭瘯Promise.reject(error)})//鍝嶅簲鎷︽埅鍣╯ervice.interceptors.response.use(response=>{returnresponse},error=>{returnPromise.reject(error)})module.exports=servicecompletecodecompletecodedisclaimer棣栧厛闈炲父鎰熻阿鏈珯璁╂垜杩欎釜浜屾鍏冭€?鎵惧埌鎴戝枩娆㈢殑鐗囨簮馃槃馃槃馃槃鏈」鐩粎渚涘涔犱娇鐢ㄣ€傛湰浜烘棤鎰忓鏈綉绔欒繘琛岀埇鍙栫瓑鎿嶄綔銆傚笇鏈涙兂鐢ㄧ殑鍚屽鍙互鑷繁鐜╃帺銆傚鏍戝お澶у紩鏉ヤ镜鏉冿紝璇风涓€鏃堕棿鑱旂郴鍒犻櫎銆傚睘鎬ф湪鍋朵笅杞芥枃浠?/p>