前言,一定体验过。当你想把PDF转WORD的时候,自己打字就在眼前:没钱就是想卖淫??好好想想~不过,博主是不会退缩的,毕竟迎难而上是传统美德。于是,今天的题目就出来了:用python写一个PDF转WORD的小工具(基于某网站界面)。一、思路分析在网上搜索一下,可以找到很多PDF转换工具,包括很多在线转换网站,比如这个:然后通过网站提供的测试接口,通过爬虫模拟实现转换。没错~思路就是这么简单明了。今天的主角是:https://app.xunjiepdf.com通过抓包分析,我们知道这是一个POST请求,然后使用requests库进行模拟。需要注意的是,该接口仅用于测试,因此可转换的页面有限。如果需要更完整的功能,请支持原版。2.我的代码号称一万个码农,一万种代码,以下是我的代码,仅供参考。导入相关库:importtimeimportrequests定义PDF2Word类:#2020最新python学习资源分享:1156465813classPDF2Word():def__init__(self):self.machineid='ccc052ee5200088b92342303c4ea9399'self.token=''self.guid=''self.keytag=''defproduceToken(self):url='https://app.xunjiepdf.com/api/productoken'headers={'User-Agent':'Mozilla/5.0(WindowsNT6.3;Win64;x64;rv:76.0)Gecko/20100101Firefox/76.0','Accept':'application/json,text/javascript,*/*;q=0.01','Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2','Content-Type':'application/x-www-form-urlencoded;charset=UTF-8','X-Requested-With':'XMLHttpRequest','Origin':'https://app.xunjiepdf.com','Connection':'keep-alive','Referer':'https://app.xunjiepdf.com/pdf2word/',}data={'machineid':self.machineid}res=requests.post(url,headers=headers,data=data)res_json=res.json()ifres_json['code']==10000:self.token=res_json['token']self.guid=res_json['guid']print('token成功获取')returnTrueelse:returnFalsedefuploadPDF(self,filepath):filename=filepath.split('/')[-1]files={'file':open(filepath,'rb')}url='https://app.xunjiepdf.com/api/Upload'headers={'User-Agent':'Mozilla/5.0(WindowsNT6.3;Win64;x64;rv:76.0)Gecko/20100101Firefox/76.0','Accept':'*/*','Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2','Content-Type':'application/pdf','来源':'https://app.xunjiepdf.com','连接':'keep-alive','Referer':'https://app.xunjiepdf.com/pdf2word/',}params=(('tasktype','pdf2word'),('phonenumber',''),('loginkey',''),('machineid',self.machineid),('token',self.token),('limitsize','2048'),('pdfname',filename),('queuekey',self.guid),('uploadtime',''),('filecount','1'),('fileindex','1'),('pagerange','all'),('picturequality',''),('outputfileextension','docx'),('picturerotate','0,undefined'),('filesequence','0,undefined'),('filepwd',''),('iconsize',''),('picturetoonepdf',''),('isshare','0'),('softname','pdfonlineconverter'),('softversion','V5.0'),('validpagescount','20'),('limituse','1'),('filespwdlist',''),('fileCountwater','1'),('languagefrom',''),('languageto',''),('cadverchose',''),('pictureforecolor',''),('picturebackcolor',''),('id','WU_FILE_1'),('name',filename),('type','application/pdf'),('lastModifiedDate',''),('size',''),)res=requests.post(url,headers=headers,params=params,files=files)res_json=res.json()ifres_json['message']=='上传成功':self.keytag=res_json['keytag']print('成功上传PDF')returnTrueelse:returnFalsedefprogress(self):url='https://app.xunjiepdf.com/api/Progress'headers={'User-Agent':'Mozilla/5.0(WindowsNT6.3;Win64;x64;rv:76.0)Gecko/20100101Firefox/76.0','Accept':'text/plain,*/*;q=0.01','Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2','Content-Type':'application/x-www-form-urlencoded;charset=UTF-8','X-Requested-With':'XMLHttpRequest','Origin':'https://app.xunjiepdf.com','Connection':'keep-alive','Referer':'https://app.xunjiepdf.com/pdf2word/',}data={'tasktag':self.keytag,'phonenumber':'','loginkey':'','limituse':'1'}res=requests.post(url,headers=headers,data=data)res_json=res.json()ifres_json['message']=='处理成功':print('PDF处理完成')returnTrueelse:print('PDF处理')returnFalsedefdownloadWord(self,output):url='https://app.xunjiepdf.com/download/fileid/%s'%self.keytagres=requests.get(url)withopen(output,'wb')asf:f.write(res.content)print('PDF下载成功("%s")'%output)defconvertPDF(self,filepath,outpath):filename=filepath.split('/')[-1]filename=filename.split('.')[0]+'.docx'self.produceToken()self.uploadPDF(filepath)whileTrue:res=self.progress()ifres==True:breaktime.sleep(1)self.downloadWord(outpath+filename)执行主函数:if__name__=='__main__':pdf2word=PDF2Word()pdf2word.convertPDF('001.pdf','')注意:convertPDF函数有两个参数,第一个是要转换的PDF,第二个是转换后的目录,运行它,输入soul,“.docx”文件已经lay在我的目录里,我舒服~
