当前位置: 首页 > 科技观察

记得一个.NET企业OA后台服务卡死分析

时间:2023-03-19 13:01:26 科技观察

一、背景1、讲故事前段时间朋友微信找到我说他生产机上的Console服务好像卡死了,然后不是日志产生了,对方收不到我的httpclient请求。不知道程序怎么了,特来求助。哈哈,一般来说,窗体程序(WinForm,WPF)卡住的情况比较多,Console和WebApi相对较少。既然你找到我了,那我们就在WinDbg上分析一下吧。二、WinDbg分析1、程序是否真的卡住了?列表。0:000>!tthreadCount:124unStartEdThread:0backgroundThread:105pendingThread:0deadThread:0deadThread:18hostedRuntime:nolockdbgDBGDBGDBGIDosidtheadthreacgcModeGCAlloc上下文域名gcAlloc上下文域名APTAPTAPTAPTAPTAPTAPTAPTAPTAPTAPTAPTAPTAPTAPTAD异常0165ec000002097A4BF3902AE00000000000000000000000000000000.00000000来000002097A50BA102b220Preemptive00000209060BE8F8:00000209060BEBE8000002097a4bea20-00001MTA(Finalizer)4311d20000002097A524EC0102a220Preemptive0000000000000000:0000000000000000000002097a4bea20-00001MTA(ThreadpoolWorker)59b8a4000002097DE102903029220Preemptive0000000000000000:0000000000000000000002097a4bea20-00001MTA(ThreadpoolWorker)6131b22c000002097DE0ADB01029220抢先0000000000000000:0000000000000000000002097a4bea20-00001MTA(ThreadpoolWorker)7161b0e8000002097DE1A030202b220Preemptive00000209063CB630:00000209063CC1F8000002097a4bea20-00001MTA817138c8000002097DE175C0202b220Preemptive00000209063F71C0:00000209063F7BC8000002097a4bea20-00001MTA9181afd0000002097DE181E0202b220Preemptive00000209064627E0:0000020906463598000002097a4bea20-00001MTA10191ac48000002097DE13310202b220Preemptive000002090632D6C8:000002090632E0D8000002097a4bea20-00001MTA112018704000002097DE16390202b220Preemptive00000209063FB5A8:00000209063FBBC8000002097a4bea20-00001MTA12211ade4000002097DE187F0202b220Preemptive00000209062EA138:00000209062EA708000002097a4bea20-00001MTA132213164000002097DE13920202b220Preemptive0000020906392108:0000020906392A38000002097a4bea20-00001MTA14231b334000002097DE169A0202b220Preemptive00000209063CD9E8:00000209063CE1F8000002097a4bea20-00001MTA...10616819e180000020927066770a029220Preemptive0000000000000000:0000000000000000000002097a4bea20-00001MTA(ThreadpoolCompletionPort)108136af7400000209285902908029220Preemptive00000209063B60E0:00000209063B6408000002097a4bea20-00001MTA(ThreadpoolCompletionPort)10784e7f0000002097AD6B6008029220Preemptive00000209063E3300:00000209063E3DD8000002097a4bea20-00001MTA(ThreadpoolCompletionPort)一般来说,卦中的LockCount列表示当前线程持有的托管锁的数量,现在应该是-000。..不过没关系,我们也可以通过!syncblk来查看锁的情况,毕竟95%的锁场景都会用到。0:000>!syncblkIndexSyncBlockMonitorHeldRecursionOwningThreadInfoSyncBlockOwner176000002097E0DA1D817100000209216A3500132c87100000209002c6a68System.Object191000002097E0DAC88271000002097ADC54F01ae905300000209000185e8System.IO.TextWriter+SyncTextWriter----------------------------Total270CCW0RCW0ComClassFactory0Free32从卦中的MonitorHeld=27来看,说明当前有13个线程在等待这个SyncTextWriter对象,而有1个线程持有它,那么为什么这个线程没有退出呢?接下来,您可以切换到线程53并检查其线程堆栈。0:053>~~[1ae90]sntdll!NtWriteFile+0x14:00007ffd`a70df774c3ret0:053>!clrstackOSThreadId:0x1ae90(53)ChildSPIPCallSite00000070505BDE8800007ffda70df774[InlinedCallFrame:00000070505bde88]Interop+Kernel32.WriteFile(IntPtr,Byte*,Int32,Int32ByRef,IntPtr)00000070505BDE8800007ffd32a7cbe2[InlinedCallFrame:00000070505bde88]Interop+Kernel32.WriteFile(IntPtr,Byte*,Int32,Int32ByRef,IntPtr)00000070505BDE5000007ffd32a7cbe2ILStubClass.IL_STUB_PInvoke(IntPtr,Byte*,Int32,INT32BYREF,INTPTR)000070505BDF2000007FFD3A2D6971SYSTEM.CONSOLEPAL+WINDOWSCONSOLESTERAM.WRITEFILENATIVE(INTPTR,BYTE[],INT32,INT32,INT32,BOOLEAN)/libraries/System.Console/src/System/ConsolePal.Windows.cs@1131]00000070505BDFD000007ffd377c05d9System.IO.StreamWriter.Flush(Boolean,Boolean)[/_/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs@260]00000070505BE05000007ffd3a2d6687System.IO.StreamWriter.WriteLine(System.String)00000070505BE0D000007ffd3a2d6472System.IO.TextWriter+SyncTextWriter/System.WriteLine(System.WriteLine)src/libraries/System.Private.CoreLib/src/System/IO/TextWriter.cs@880]00000070505BE13000007ffd3a2d640dSystem.Console.WriteLine(System.String)[/_/src/libraries/System.Console/src/System/Console.cs@716]00000070505BE16000007ffd3a2375abHCloud.xxx+d__2.MoveNext()00000070505BEAE000007ffd37e19365System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon,System.Private].Core__Canon](Ref.__Canon)[[系统.__佳能]./src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs@63]00000070505BEB4000007ffd3a235cc3HCloud.xxxx.Execute(Quartz.IJobExecutionContext)00000070505BEBB000007ffd3a231264Quartz.xxx+d__9.MoveNext()00000070505BEE9000007ffd3a230183系统.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Quartz.Core.JobRunShell+d__9,Quartz]](d__9ByRef)[/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs@63]00000070505BEF1000007ffd3a2300dbSystem.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Quartz.Core.JobRunShell+d__9,Quartz]](d__9ByRef)[/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilder.cs@33]00000070505BEF4000007ffd3a230082Quartz.xxxxl.Run(System.Threading.CancellationToken)00000070505BF02000007ffd3a22ffc3Quartz.Core.QuartzSchedulerThread+c__DisplayClass28_0.b__0()00000070505BF06000007ffd3939b71aSystem.Threading.Tasks.Task`1[[System.__Canon,System.Private.CoreLib].InnerInvoke()[/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Future.cs@507]00000070505BF0C000007ffd37d54431System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(系统.线程.Thre广告,System.Threading.ExecutionContext,System.Threading.ContextCallback,System.Object)[/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs@300]00000070505BF11000007ffd37d53657System.Threading。Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.TaskByRef,System.Threading.Thread)[/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs@2352]00000070505BF1C000007FFD37D50E04系统.threading.threadpoolworkqueue.dispatch()很奇怪,我们仍然会卡在System.Console.WriteLine方法上。为了探究原因,我们使用k命令查看非托管栈。0:053>k5#Child-SPRetAddrCallSite0000000070`505bddd800007ffd`a32febdantdll!NtWriteFile+0x140100000070`505bdde000007ffd`32a7cbe2KERNELBASE!WriteFile+0x7a0200000070`505bde5000007ffd`3a2d69710x00007ffd`32a7cbe20300000070`505bdf2000007ffd`3a2d672fSystem_Console!System.ConsolePal.WindowsConsoleStream.WriteFileNative+0x610400000070`505bdf8000007ffd`377c05d9System_Console!System.ConsolePal.WindowsConsoleStream.Write+0x3f0:053>ubntdll!NtWriteFile+0x14ntdll!NtDeviceIoControlFile+0x15:00007ffd`a70df755cd2eint2Eh00007ffd`a70df757c3ret00007ffd`a70df7580f1f840000000000nopdwordptr[rax+rax]ntdll!NtWriteFile:00007ffd`a70df7604c8bd1movr10,rcx00007ffd`a70df763b808000000moveax,800007ffd`a70df768f604250803fe7f01testbyteptr[SharedUserData+0x308(00000000`7ffe0308)],100007ffd`a70df7707503jnentdll!NtWriteFile+0x15(00007ffd`a70df775)00007ffd`a70df7720f05syscall从上面的syscall系统调用关键字,代码是切换用户态ntdll!NtWriteFile进入内核态nt!NtWriteFile方法,进入内核状态为什么没有返回?这又是一个值得深思的问题。2、为什么进入内核态后没有return。其实win32api方法ntdll!NtWriteFile的第一个参数是句柄文件句柄,签名如下。__kernel_entryNTSYSCALLAPINTSTATUSNtWriteFile([in]HANDLEFileHandle,[in,optional]HANDLEEvent,[in,optional]PIO_APC_ROUTINEApcRoutine,[in,optional]PVOIDApcContext,[out]PIO_STATUS_BLOCKIoStatusBlock,[in]PVOIDBuffer,ULONGLength,[in,optional]PLARGE_INTEGERByteOffset,[in,optional]PULONGKey);也许句柄被内核中的其他人占用了,你可以使用!handle查看rcx寄存器。0:053>rrax=0000000000000008rbx=00000070505bdf50rcx=0000000000000418rdx=0000000000000000rsi=0000000000000000rdi=0000000000000418rip=00007ffda70df774rsp=00000070505bddd8rbp=00000070505bdf10r8=0000000002000805r9=0000000000000000r10=0000020906191620r11=00000070505bc8f8r12=0000000000000100r13=0000000000000053r14=0000000000000077r15=000002097ADC54F0IOPL=0NVUPEIPLZRNAPONCCS=0033SS=002BDS=002BES=002BFS=0053GS=002B=002BEFL=00000246NTDLL!FileAttributes0GrantedAccess0x120196:ReadControl,SynchWrite/Add,Append/SubDir/CreatePipe,WriteEA,ReadAttr,WriteAttrHandleCount3PointerCount65483Noobjectspecificinformationavailable哈哈,其实我什么都看不到,我也看不到进入内核模式,于是下一步就是上网搜索了。其实有经验的朋友一定猜到了。应该是控制台开启了快速编辑窗口。截图如下:快速编辑窗口。窗口,编辑一些内容,但是一旦被用户独占,程序端无法输出内容在控制台窗口,只能在内核态傻等,应该是形成的原因。画出如下图。把信息告诉朋友后,朋友说他用的是Windows服务部署,但是不管是什么部署方式,注释掉Console.WriteLine肯定是对的。三:总结卡住的事故,主要是开发者使用了大量的Console.WriteLine来输出日志。在某个时刻,输出窗口由于各种原因被别人独占,导致程序端无法向该窗口输出内容,一直被阻塞。等待后,朋友将日志输出切换为文件方式,问题解决。其实很多新手朋友都会犯这个问题,特此记录。