如果有一个文件,里面有10万个url,需要向每个url发送http请求,并打印请求结果的状态码,如何写代码最快尽可能如何完成这些任务?Python中并发编程的方法有很多种。多线程标准库threading、concurrency、coroutineasyncio,当然还有异步库grequests,每一个都可以实现上面的需求,下面我们用代码一一实现,本文代码直接运行即可,如给你以后并发编程的一个参考:队列+多线程定义一个大小为400的队列,然后启动200个线程,每个线程不断从队列中获取和访问url。主线程读取文件中的url放入队列,然后等待队列中的元素全部接收并处理。代码如下:fromthreadingimportThreadimportsysfromqueueimportQueueimportrequestsconcurrent=200defdoWork():whileTrue:url=q.get()status,url=getStatus(url)doSomethingWithResult(status,url)q.task_done()defgetStatus(ourl):try:res=requests.get(ourl)returnres.status_code,ourlexcept:return"error",ourldefdoSomethingWithResult(status,url):print(status,url)q=Queue(concurrent*2)foriinrange(concurrent):t=线程(target=doWork)t.daemon=Truet.start()try:forurlinopen("urllist.txt"):q.put(url.strip())q.join()exceptKeyboardInterrupt:sys.exit(1)运行结果为如下:学习新技能了吗?线程池如果使用线程池,建议使用更高级的concurrent.futures库:importconcurrent.futuresimportrequestsout=[]CONNECTIONS=100TIMEOUT=5urls=[]withopen("urllist.txt")asreader:forurlinreader:urls.append(url.strip())defload_url(url,timeout):ans=requests.get(url,timeout=timeout)returns.status_codewithconcurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS)asexecutor:future_to_url=(executor.submit(load_url,url,TIMEOUT)forurlinurls)forfutureinconcurrent.futures.as_completed(future_to_url):try:data=future.result()exceptExceptionasexc:data=str(type(exc))finally:out.append(data)print(data)协程序+aiohttp协议也是并发非常常用的工具了:importasynciofromaiohttpimportClientSession,ClientConnectorErrorasyncdeffetch_html(url:str,session:ClientSession,**kwargs)->tuple:try:resp=awaitsession.request(method="GET",url=url,**kwargs)exceptClientConnectorError:return(url,404)return(url,resp.status)asyncdefmake_requests(urls:set,**kwargs)->None:asyncwithClientSession()assession:tasks=[]forurlinurls:tasks.append(fetch_html(url=url,session=session,**kwargs))results=awaitasyncio.gather(*tasks)forresultinresults:print(f'{result[1]}-{str(result[0])}')if__name__=="__main__":importsysassertsys.version_info>=(3,7),"ScriptrequiresPython3.7+."withopen("urllist.txt")asinfile:urls=set(map(str.strip,infile))asyncio.run(make_requests(urls=urls))grequests[1]这是一个第三方库,目前有3.8Kstars,就是Requests+Gevent[2],让异步http请求更简单。Gevent的本质是协程。使用前:pipinstallgrequests使用起来相当简单:importgrequestsurls=[]withopen("urllist.txt")asreader:forurlinreader:urls.append(url.strip())rs=(grequests.get(u)foruinurls)forresultingrequests.map(rs):print(result.status_code,result.url)请注意,grequests.map(rs)是并发执行的。运行结果如下:还可以添加异常处理:>>>defexception_handler(request,exception):...print("Requestfailed")>>>reqs=[...grequests.get('http://httpbin.org/delay/1',timeout=0.001),...grequests.get('http://fakedomain/'),...grequests.get('http://httpbin.org/status/500')]>>>grequests.map(reqs,exception_handler=exception_handler)RequestfailedRequestfailed[None,None,
