当前位置: 首页 > 科技观察

UsingHeadlessChromeforPageRendering

时间:2023-03-12 11:32:36 科技观察

UsingHeadlessChromeforPageRendering是属于作者web开发基础和工程实践的系列文章。主要介绍使用Node.js使用ChromeRemoteProtocol远程控制HeadlessChrome渲染界面的基本用法。本文所涉及的参考文献和引用资料均列于此处。最近笔者在写declarative-crawler的动态页面爬虫,即使用declarative-crawler爬知乎美图介绍的HeadlessChromeSpider时,需要选择没有界面的浏览器执行JavaScript代码动态生成页面。以前笔者经常使用PhantomJS或者Selenium来进行动态页面渲染,但是在Chrome59之后,Chrome提供了headless模式,可以在命令上使用Chromium和Blink渲染引擎提供的完整的现代Web平台特性线。需要注意的是,HeadlessChrome仍然有一定的局限性。与Nightmare或Phantom等工具相比,Chrome的远程接口仍然无法提供更好的开发者体验。在下面介绍的代码示例中我们也会发现,我们还需要大量的模板代码来进行控制。安装和启动Chrome安装完成后,我们可以使用自带的命令行工具启动:$chrome--headless--remote-debugging-port=9222https://chromium.org为了方便部署,作者使用Docker镜像进行快速部署,如果你本地有Docker环境,可以使用如下命令快速启动:dockerrun-d-p9222:9222justinribeiro/chrome-headless如果你在Mac下本地使用,我们也可以创建一个命令别名:aliaschrome="/Applications/Google\Chrome.app/Contents/MacOS/Google\Chrome"aliaschrome-canary="/Applications/Google\Chrome\Canary.app/Contents/MacOS/Google\Chrome\Canary"aliaschrome="/Applications/Chromium.app/Contents/MacOS/Chromium"如果是Ubuntu环境,我们可以使用deb安装:#InstallGoogleChrome#https://askubuntu.com/questions/79280/how-to-install-chrome-browser-properly-via-command-linesudoapt-getinstalllibxss1libappindicator1libindicator7wgethttps://dl.google.com/linux/direct/google-chrome-stable_current_amd64.debsudodpkg-igoogle-chrome*.deb#Mightshow"errors",fixedbynextlinesudoapt-getinstall-fchrome命令行也支持丰富的命令行参数,--dump-dom参数可以打印document.body.innerHTML到标准输出:铬-headless--disable-gpu--dump-domhttps://www.chromestatus.com/和--print-to-pdf标志将页面输出为PDF:chrome--headless--disable-gpu--print-to-pdfhttps://www.chromestatus.com/除了第一次,我们还可以通过--screenshot参数获取页面截图:chrome--headless--disable-gpu--screenshothttps:///www.chromestatus.com/#Sizeofastandardletterhead.chrome--headless--disable-gpu--screenshot--window-size=1280,1696https://www.chromestatus.com/#Nexus5xchrome--headless--disable-gpu--screenshot--window-size=412,732https://www.chromestatus.com/如果我们需要更复杂的截图策略,比如全页截图,就需要使用代码进行远程控制代码控制启动在上一篇文章中,我们介绍了如何使用命令行手动启动Chrome。这里我们尝试使用Node.js来启动Chrome。最简单的方法是使用child_process启动:constexec=require('child_process').exec;functionlaunchHeadlessChrome(url,callback){//AssumingMacOSx.constCHROME='/Applications/Google\Chrome.app/Contents/MacOS/Google\Chrome';exec(`${CHROME}--headless--disable-gpu--remote-debugging-port=9222${url}`,callback);}launchHeadlessChrome('https://www.chromestatus.com',(err,stdout,stderr)=>{...});远程控制这里我们使用chrome-remote-interface来远程控制Chrome。实际上,chrome-remote-interface是ChromeDevToolsProtocol的远程包。详细的功能和参数可以参考协议文档。使用npm安装后,我们可以使用如下代码片段进行简单控制:constCDP=require('chrome-remote-interface');CDP((client)=>{//extractdomainsconst{Network,Page}=client;//setuphandlersNetwork.requestWillBeSent((params)=>{console.log(params.request.url);});Page.loadEventFired(()=>{client.close();});//enableeventsthenstart!Promise.all([Network.enable(),Page.enable()]).then(()=>{returnPage.navigate({url:'https://github.com'});}).catch((err)=>{console.error(err);client.close();});}).on('error',(err)=>{//cannotconnecttotheremoteendpointconsole.error(err);});我们还可以使用chrome-remote-interface提供的命令行功能,比如我们可以在命令行访问一个界面,记录所有的网络请求:$chrome-remote-interfaceinspect>>>Network.enable(){result:{}}>>>Network.requestWillBeSent(params=>params.request.url){'Network.requestWillBeSent':'params=>params.request.url'}>>>Page.navigate({url:'https://www.wikipedia.org'}){'Network.requestWillBeSent':'https://www.wikipedia.org/'}{result:{frameId:'5530.1'}}{'Network.requestWillBeSent':'https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia_wordmark.png'}{'Network.requestWillBeSent':'https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2.png'}{'Network.requestWillBeSent':'https://www.wikipedia.org/portal/wikipedia.org/assets/js/index-3b68787aa6。js'}{'Network.requestWillBeSent':'https://www.wikipedia.org/portal/wikipedia.org/assets/js/gt-ie9-c84bf66d33.js'}{'Network.requestWillBeSent':'https:16ed124e8ca7c5ce9d463e8f99b2064427366360'}{'Network.requestWillBeSent':'https://www.wikipedia.org/portal/wikipedia.org/assets/img/sprite-project-logos.png?9afc01c5efe0a8fb6512c776958e2fbad'如果你想创建}某种标志:>>>Page.unmeterscomcategory{[,'mandgory:'{type:'string',description:'URLtonamethepageto.'}},returns:[{name:'frameId','$ref':'FrameId',隐藏:true,描述:'Frameidthatwillbenavigated.'}],description:'NavigatescurrentpagetothegivenURL.',handlers:['browser','renderer']}>>>Page.navigate{[Function]category:'command',parameters:{url:{type:'string',description:'URLtonavigatethepageto.'}},returns:[{name:'frameId','$ref':'FrameId',hidden:true,description:'Frameidthatwillbenavigated.'}],description:'NavigatescurrentpagetothegivenURL。',handlers:['browser','renderer']}上面我们也提到了需要用代码来控制浏览器进行全屏截图。这里我们需要使用Emulation模块来控制页面viewport的缩放:constCDP=require('chrome-remote-interface');constargv=require('minimist')(process.argv.slice(2));constfile=require('fs');//CLIArgsconsturl=argv.url||'https://www.google.com';constformat=argv.format==='jpeg'?'jpeg':'png';constviewportWidth=argv.viewportWidth||1440;constviewportHeight=argv.viewportHeight||900;constdelay=argv.delay||0;constuserAgent=argv.userAgent;constfullPage=argv.full;//启动ChromeDebuggingProtocolCDP(asyncfunction(client){//ExtractusedDevToolsdomains.const{DOM,Emulation,Network,Page,Runtime}=client;//Enableeventsondomainsweareinterestedin.awaitPage.enable();awaitDOM.enable();awaitNetwork.enable();//如果指定了useragentoverride,passtoNetworkdomainif(userAgent){awaitNetwork.setUserAgentOverride({userAgent});}//Setupviewportresolution,etc.constdeviceMetrics={width:viewportWidth,height:viewportHeight,deviceScaleFactor:0,mobile:false,fitWindow:false,};awaitEmulation.setDeviceMetricsOverride(deviceMetrics);awaitEmulation.setVisibleSize({width:viewportWidth,height:viewportHeight});//NavigatetotargetpageawaitPage.navigate({url});//WaitforpageloadeventtokakescreenshotPage.loadEventFired(async()=>{//如果`full`CLI选项被通过,我们需要测量//渲染页面的高度并使用Emulation.setVisibleSizeif(fullPage){const{root:{nodeId:documentNodeId}}=awaitDOM.getDocument();const{nodeId:bodyNodeId}=awaitDOM.querySelector({selector:'body',nodeId:documentNodeId,});const{model:{height}}=awaitDOM.getBoxModel({nodeId:bodyNodeId});awaitEmulation.setVisibleSize({width:viewportWidth,height:height});//ThisforceViewportcallensuresthatcontentoutsidetheviewportis//rendered,otherwiseitshowsupasgrey.Possiblyabug?awaitEmulation.forceViewport({x:0,y:0,scale:1});}setTimeout(asyncfunction(){constscreenshot=awaitPage.captureScreenshot({format});constbuffer=newBuffer(screenshot.data,'base64');file.writeFile('output.png',buffer,'base64',function(err){if(err){console.error(err);}else{console.log('Screenshotsaved');}client.close();});},delay);});}).on('错误',err=>{console.error('Cannotconnecttobrowser:',err);});【本文为专栏作家“张子雄”原创文章,如需转载请联系作者】点此查看作者更多好文章