当前位置: 首页 > 后端技术 > Python

在爬申万指数

时间:2023-03-26 13:16:50 Python

之前,我是用知乎上的爬虫爬交易所股票数据的。给个传送门:https://zhuanlan.zhihu.com/p/...现在打算按板块对个股进行分类,所以打算爬申万的28个板块指数。如果你是网络新手,试试上面的方法:就是你了。点击Initiator定位$.ajax({type:"POST",//POST数据类型:"json",//数据格式:JSONurl:'handler.aspx',//目标地址data:"tablename=swzs&key=L1&p="+(pageindx+1)+"&where=L1in('801010','801020','801030','801040','801050','801080','801110','801120','801130','801140','801150','801160','801170','801180','801200','801210','801230','801710','801720','801730','801740','801750','801760','801770','801780','801790','801880','801890')&orderby="+orderby+"&fieldlist=L1,L2,L3,L4,L5,L6,L7,L8,L11&pagecount=28&timed="+newDate().getTime(),beforeSend:function(){$("#divload").show();$("#Pagination").hide();},//发送数据完成前:function(){$("#divload").hide();$("#Pagination").show();},//接收数据成功后:function(json){$("#productTabletr:gt(0)").remove();varproductData=json.root;if(productData!=""){$.each(productData,function(i,n){vartrs="";/*if(where.indexOf("000001")>-1){trs+=""+n.L1+""+n.L2+""+n.L3+""+n.L4+""+changeTwoDecimal_f(parseFloat(n.L5)/1000000)+""+n.L6+""+n.L7+""+n.L8+""+changeTwoDecimal_f(parseFloat(n.L11)/1000000)+"";}else{trs+=""+n.L1+""+n.L2+""+n.L3+""+n.L4+""+changeTwoDecimal_f(parseFloat(n.L5)/1000000)+""+n.L6+""+n.L7+""+n.L8+""+changeTwoDecimal_f(parseFloat(n.L11)/1000000)+"";}*/trs+=""+n.L1+""+n.L2+""+n.L3+""+n.L4+""+changeTwoDecimal_f(parseFloat(n.L5)/1000000)+""+n.L6+""+n.L7+""+n.L8+""+changeTwoDecimal_f(parseFloat(n.L11)/1000000)+"";tbody+=trs;});}else{tbody="无数据查询";}$("#productTable").append(tbody);$("#productTabletr:gt(0):odd").attr("class","odd");$("#productTabletr:gt(0):even").attr("class","enen");$("#productTabletr:gt(0)").hover(function(){$(this).addClass('mouseover');},function(){$(this).removeClass('mouseover');});}});怎么操作,不过在这一段里,看到了type,url和对应的注释。想想url的组成。'data'属性应该是url的参数。'data'属性末尾有一个gettime。同时观察到页面没有今天的开盘数据,所以获取节前的收盘时间。这里是2020-04-30-15:00对请求进行分组。url="http://www.swsindex.com/handler.aspx?tablename=swzs&key=L1&p=1&where=L1in('801010','801020','801030','801040','801050','801080','801110','801120','801130','801140','801150','801160','801170','801180','801200','801210','801230','801710','801720','801730','801740','801750','801760','801770','801780','801790','801880','801890')&orderby=&fieldlist=L1,L2,L3,L4,L5,L6,L7,L8,L11&pagecount=28&timed=1588230000000》成功~按上述方法抓取数据。------更新------爬取遇到一些问题,摸了下正则表达式,做个记录。首先,按照上面的方法提取所有的行,去掉队头的'root'数据=。re.compile("'root':\[(.*?)\]",re.S).findall(r.text)然后将每一行彼此分开datas=data[0].split('},{')3。将队首的'{'替换为空格,提取各列的属性stock=datas[i].replace('{',"").split(",")stocks=re.compile(":'(.*?)'",re.S).findall("".join(stock))完成