本文转载自微信公众号《一线码农聊科技》,一线码农聊科技,作者。转载本文,请联系一线码农,聊聊技术公众号。一、背景1、讲故事前几天在收录.NET高级调试文章到https://github.com/ctripxchuang/dotnetfly的过程中,发现了一个有趣的评论。截图如下:大概是在Winform主线程下执行Task.Result会造成死锁。我还阅读了图片中的参考链接。斯蒂芬是绝对的老大,但本文主要关注死锁的原因。眼见为实,那么本文将从windbg的角度进行分析。二、Windbg分析1、真的会造成死锁吗?看文章和截图,好像真的会死锁。当然,我已经很多年没玩winform了,不知道会不会发生,至少在Console里不会。好吧,让我们从一段测试代码开始。publicpartialclassForm1:Form{publicForm1(){InitializeComponent();}privatevoidbutton1_Click(objectsender,EventArgse){varjsonTask=GetJsonAsync("http://cnblogs.com").Result;textBox1.Text=jsonTask;}publicasyncstaticTaskGetJsonAsync(stringuri){using(varclient=newHttpClient()){varjsonString=awaitclient.GetStringAsync(uri);returnjsonString;}}}代码很简单,运行程序,点击click,界面卡死,有点不敢相信.2.查找死锁的原因接下来,赶紧求助于windbg,附加到进程上,一探究竟。1)主线程界面无响应。自然是主线程卡住了,急着看看此时主线程在干什么?只需使用命令~0s+!clrstack。0:000>!clrstackOSThreadId:0x5a10(0)ChildSPIPCallSite0000004d10dfde0000007ffb889a10e4[GCFrame:0000004d10dfde00]0000004d10dfdf2800007ffb889a10e4[HelperMethodFrame_1OBJ:0000004d10dfdf28]System.Threading.Monitor.ObjWait(Boolean,Int32,System.Object)0000004d10dfe04000007ffb66920d64System.Threading.ManualResetEventSlim.Wait(Int32,System.Threading.CancellationToken)0000004d10dfe0d000007ffb6691b4bbSystem.Threading.Tasks.Task.SpinThenBlockingWait(Int32,System.Threading.CancellationToken)0000004d10dfe14000007ffb672601d1System.Threading.Tasks.Task.InternalWait(Int32,System.Threading.CancellationToken)0000004d10dfe21000007ffb6725cfa7System.Threading.Tasks.Task`1[[System.__佳能,MSCORLIB]]。getResultCore(boolean)00004D10DFE25000007FFB18172A1BWINDOWSFORMSAPPSAPPSAPPSAPP.BUTTOM1.BUTTON1_CLICK(SYSTEM.OBJECT.OBJECT,SYSTEM.OBJECT,SYSTEM.EVENTARGS)Control.OnClick(System.EventArgs)0000004d10dfe2f000007ffb3a027b83System.Windows.Forms.Button.OnClick(System.EventArgs)0000004d10dfe34000007ffb3a837231System.Windows.Forms.Button.OnMouseUp(System.Windows.Forms.MouseEventArgs)0000004d10dfe40000007ffb3a7e097dSystem.Windows.Forms.Control.WmMouseUp(System.Windows.Forms.MessageByRef,System.Windows.Forms.MouseButtons,Int32)0000004d10dfe48000007ffb3a0311ccSystem.Windows.Forms.Control.WndProc(System.Windows.Forms.MessageByRef)0000004d10dfe54000007ffb3a0b0c97System.Windows.Forms.ButtonBase.WndProc(System.Windows.Forms.MessageByRef)0000004d10dfe5c000007ffb3a0b0be5System.Windows.Forms.Button.WndProc(System.Windows.Forms.MessageByRef)0000004d10dfe5f000007ffb3a030082System.Windows.Forms.NativeWindow.Callback(IntPtr,Int32,IntPtr,IntPtr)0000004d10dfe69000007ffb3a765a02DomainBoundILStubClass.IL_STUB_ReversePInvoke(Int64,Int32,Int64,Int64)0000004d10dfe9d000007ffb776d221e[InlinedCallFrame:0000004d10dfe9d0]系统.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSGByRef)0000004d10dfe9d000007ffb3a0b9489[InlinedCallFrame:0000004d10dfe9d0]System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSGByRef)0000004d10dfe9a000007ffb3a0b9489DomainBoundILStubClass.IL_STUB_PInvoke(MSGByRef)0000004d10dfea6000007ffb3a046661System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr,Int32,Int32)0000004d10dfeb5000007ffb3a045fc7System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32,System.Windows.Forms.ApplicationContext)0000004d10dfebf000007ffb3a045dc2System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32,System.Windows.Forms.ApplicationContext)0000004d10dfec5000007ffb181708e2WindowsFormsApp4.Program.Main()[E:\net5\ConsoleApp1\WindowsFormsApp4\Program.cs@19]0000004d10dfee7800007ffb776d6923[GCFrame:0000004d10dfee78]从堆栈输出看,主线程最后是卡在Task.Result下的Monitor.ObjWait上,也就是说,它还没有取到最后一个jsonString,这很奇怪。已经过了几分钟。网络有问题吗??我的网络满100M火力。.??????2)jsonString去哪儿了?判断是否是网络问题的一个好方法是直接暴力搜索托管堆。如果在托管堆上找到jsonString,说明程序有问题某些地方导致Result延迟,使用命令!dumpheap-typeString-min8500+!do000001f19002fcf0查看,如图下图:从图中可以明显看出html回来了,既然都回来了,为什么还不让Task.Result结束呢?下一步是查看谁拥有此html,只需使用!gcroot。0:000>!gcroot000001f19002fcf0Thread5a10:0000004d10dfe25000007ffb18172a1bWindowsFormsApp4.Form1.button1_Click(System.Object,System.EventArgs)[E:\net5\ConsoleApp1\WindowsFormsApp4\Form1.cs@26]rbp+10:0000004d10dfe2b0->000001f180007f78WindowsFormsApp4.Form1->000001f180070d68System.ComponentModel.EventHandlerList->000001f180071718System.ComponentModel.EventHandlerList+ListEntry->000001f1800716d8System.EventHandler->000001f1800716b0System.Windows.Forms.ApplicationContext->000001f180071780System.EventHandler->000001f18006ab38System.Windows.Forms.Application+ThreadContext->000001f18006b140System.Windows.Forms.Application+MarshalingControl->000001f18016c9c8System.Collections.Queue->000001f18016ca00System.Object[]->000001f18016c948System.Windows.Forms.Control+ThreadMethodEntry->000001f18016c8b8System.Object[]->000001f1800e6f80System.Action->000001f1800e6f60System.Runtime.CompilerServices.AsyncMethodBuilderCore+MveNextRunner->000001f1800a77d0WindowsFormsApp4.Form1+d__2->000001f1800b4e50System.Threading.Tasks.Task`1[[System.String,mscorlib]]->000001f19002fcf0System.StringFound1uniqueroots(run'!GCRootall-all'to),这个System.String最后被5a10线程的WindowsFormsApp4.Form1持有,可以用!t验证一下5a10到底是什么线程0:000>!tLockIDOSIDThreadOBJStateGCModeGCAllocContextDomainCountAptException015a10000001f1f1b012002026020Preemptive000001F1800E70E8:000001F1800E7FD0000001f1f1ad5b900STA22712c000001f1f1b2a2702b220Preemptive0000000000000000:0000000000000000000001f1f1ad5b900MTA(Finalizer)我去,5a10竟然是主线程,真It'sabitconfusing,themainthreadisstuck,andthestringisheldbythemainthread,whichiscompletelyinexplicable.3)Lookingforabreakthroughpoint,Ishouldgobackandcalmlythinkaboutthisreferencechain.IfoundthatthereisaQueue:->000001f18016c9c8System.Collections.Queue.Ihaveanidea.IcansetabreakpointattheplacewheretheQueueisenteredtodebug.Downloadthesourcecode,useDnSpyasatool,andjustdoit.Ascanbeseenfromthefigure,whenenteringtheQueue,thread10isused,whichmeansthatthestringhasnotbeenheldbythemainthreadatthistime,andthencarefullyanalyzethecallstack,Ithinkyoushouldfigureitout,anywayIhadthispictureinmyheadafterwatchingit.从图中可以看出,Task的延续最终是由WindowsFormsSynchronizationContext.Post派发到Control下的Queue,而Queue中的数据需要UI线程去执行,所以就有了如下对话:Mainthread:任务哥,你在干什么?我在等你的信号?任务:兄弟,我已经在你家了,你什么时候来接我?简而言之:任务需要主线程来执行,但是主线程傻傻地等待着任务的完成状态,所以任务的继续执行永远不会执行,这造成了一个非常尴尬的场面,明白吗???????3、破解的方法简单,一般分为两种。1.禁止将延续任务丢进Queue来切断这条路径。言外之意就是让线程池自己结束任务,让UI线程感知到任务已经完成,最后UI线程就可以拿到最终的html了。方法是在await之后添加ConfigureAwait(false),参考如下:2.禁止阻塞主线程如果主线程没有被阻塞,那么主线程可以自由获取Control.Queue中要执行的任务。修改方法也很简单,只需要在GetJsonAsync之前添加await即可。4.结论是自己多练。理论知识是别人强行灌输给你的。事实上,你不知道它是否正确。实践验证才是真正属于你的东西,很难忘记,毕竟你真的经历过,实践过,验证过。