当前位置: 首页 > 后端技术 > Python

学习Python爬虫:爬取股票信息

时间:2023-03-26 19:04:10 Python

分析打开Chrome的开发者模式,一一选择股票代码。我们可以将所有的股票代码存储在一个列表中,剩下的就是找一个网站,循环取出每只股票的数据。同花顺想必聪明的同学都发现了,000001就是股票代码。接下来我们只需要拼接这个链接http://www.kaifx.cn/question/...就可以源源不断的获取我们想要的数据。首先介绍下本次实战使用的请求库和解析库:Requests和pyquery。数据存储最终落地Mysql。获取股票代码列表的第一步当然是先构建股票代码列表,我们先定义一个方法:defget_stock_list(stockListURL):r=requests.get(stockListURL,headers=headers)doc=PyQuery(r.text)list=[]#获取所有section中的节点并迭代foriindoc('.stockTablea').items():try:href=i.attr.hreflist.append(re.findall(r"\d{6}",href)[0])except:continuelist=[item.lower()foriteminlist]#将爬取的信息转为小写返回列表获取详细数据详细数据好像在页面上,但是,实际上并不存在。最终获取数据的地方不是页面,而是数据接口。http://qd.10jqka.com.cn/quote...1至于怎么查,这次就不说了,不过还是希望想学爬虫的同学自己动手,搜索一下,多找几次,自然就找到了方法。数据接口有了,我们先看看返回的数据:showStockDate({"info":{"000001":{"name":"\u5e73\u5b89\u94f6\u884c"}},"data":{“000001”:{“10”:“16.13”,“8”:“16.14”,“9”:“15.87”,“13”:“78795234.00”,“19”:“1262802470.00”,“7”:"16.12","15":"40225508.00","14":"37528826.00","69":"17.73","70":"14.51","12":"5","17":"945400.00","264648":"0.010","199112":"0.062","1968584":"0.406","2034120":"9.939","1378761":"16.026","526792":"1.675",“395720”:“-948073.000”,“461256”:“-39.763”,“3475914”:“313014790000.000”,“1771976”:“1.100”,“6”:“16.12”,“11”:“”}}})1很明显,这个结果不是标准的json数据,而是JSONP返回的标准格式数据,这里我们先对head和tail进行处理,变成标准的json数据,然后对照这上面的数据进行分析page,最后将分析后的值写入数据库。defgetStockInfo(list,stockInfoURL):count=0forstockinlist:try:url=stockInfoURLstockr=requests.get(url,headers=headers)#将获取的数据封装成字典dict1=json.loads(r.text[14:int(len(r.text))-1])print(dict1)#获取字典中的数据构建写入数据模板insert_data={"code":stock,"name":dict1['info'][stock]['name'],"jinkai":dict1['data'][stock]['7'],"chengjiaoliang":dict1['data'][stock]['13'],"振福"":dict1['data'][stock]['526792'],"zuigao":dict1['data'][stock]['8'],"chengjiaoe":dict1['data'][stock]['19'],"huanshou":dict1['data'][stock]['1968584'],"zuidi":dict1['data'][stock]['9'],"zuoshou":dict1['data'][stock]['6'],"liutongshizhi":dict1['data'][stock]['3475914']}cursor.execute(sql_insert,insert_data)conn.commit()print(stock,':writeEntercomplete')except:print('Writeexception')#遇到错误继续循环continuecompletecode我们会稍微封装一下代码来完成这个actua我战斗。importrequestsimportreimportjsonfrompyqueryimportPyQueryimportpymysql#数据库连接defconnect():conn=pymysql.connect(host='localhost',port=3306,user='root',password='password',database='test',charset='utf8mb4')#获取操作游标cursor=conn.cursor()return{"conn":conn,"cursor":cursor}connection=connect()conn,cursor=connection['conn'],connection['cursor']sql_insert="insertintostock(code,name,jinkai,chengjiaoliang,zhenfu,zuigao,chengjiaoe,huanshou,zuidi,zuohou,liutongshizhi,create_date)values(%(code)s,%(name)s,%(jinkai)s,%(chengjiaoliang)s,%(zhenfu)s,%(zuigao)s,%(chengjiaoe)s,%(huanshou)s,%(zuidi)s,%(zuoshou)s,%(liutongshizhi)s,now())"headers={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/78.0.3904.108Safari/537.36'}defget_stock_list(stockListURL):r=requests.get(stockListURL,headers=headers)doc=PyQuery(r.text)list=[]#获取所有节中的一个节点,并进行过渡fori在doc('.stockTablea').items():try:href=i.attr.hreflist.append(re.findall(r"\d{6}",href)[0])except:continuelist=[item.lower()foriteminlist]#将爬取的信息转为小写returnlistdefgetStockInfo(list,stockInfoURL):count=0forstockinlist:try:url=stockInfoURLstockr=requests.get(url,headers=headers)#将获取的数据封装到字典中dict1=json.loads(r.text[14:int(len(r.text))-1])print(dict1)#获取字典中的数据,写入数据templateinsert_data={"code":stock,"name":dict1['info'][stock]['name'],"jinkai":dict1['data'][stock]['7'],"chengjiaoliang":dict1['data'][stock]['13'],"zhenfu":dict1['data'][stock]['526792'],"zuigao":dict1['data'][stock]['8'],"chengjiaoe":dict1['data'][stock]['19'],"huanshou":dict1['data'][stock]['1968584'],"zuidi":dict1['data'][stock]['9'],"zuoshou":dict1['data'][stock]['6'],"liutongshizhi":dict1['data'][stock]['3475914']}光标。execute(sql_insert,insert_data)conn.commit()print(stock,':writecomplete')except:print('writeexception')#遇到错误继续循环continuedefmain():stock_list_url='https://hq.gucheng.com/gpdmyl...'stock_info_url='http://qd.10jqka.com.cn/quote...'list=get_stock_list(stock_list_url)#list=['601766']getStockInfo(list,stock_info_url)if__name__=='__main__':main()