当前位置: 首页 > 科技观察

记得一个.NET药品仓储管理系统卡住分析

时间:2023-03-20 20:59:26 科技观察

一:背景1.讲故事这个月初,一个朋友在wx上找到我,说用了一段时间,他的api只有requests没有response这种情况,截图如下:从朋友的描述来看,好像是程序被什么东西卡住了。这种卡死的问题解决起来还是比较简单的。下面我就用windbg来给大家分析一下。二:Windbg分析1.request是做什么的?既然有朋友说api有请求没有响应,那么如何验证朋友的话是否正确呢?我们都知道.NET使用HttpContext来表示请求。言外之意就是可以抓取HttpContextNetext中有一个!whttp命令可以帮助我们。0:000>!whttpContextThreadTimeOutRunningStatusVerbUrl000000563bf803b04200:01:5000:01:24200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x-HN000000563bf84660--00:020nishedcom:5GET030003/000000563c4a04705100:01:5000:00:12200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2C00000056bbf635903000:01:5000:02:41200POSThttp://xxx.com:30003/Wmsxxx/xxx?xxx=xxx&xxx=xxx-B2C00000056bc82a038--00:01:50Finished200GEThttp://localhost:30003/00000056bc84a3e84400:01:5000:00:51200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x00000056bc8671c84600:01:5000:00:45200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C000000573bf446983500:01:5000:02:39200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x000000573bf483c03300:01:5000:02:41200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x-HN000000573bf97e804000:01:5000:02:32200POST://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=ZJB2C000000573c583b08--00:01:50Finished200GEThttp://localhost:30003/000000573c589ec8--00:01:50Finished200GEThttp://xxx.com:30003/Wms/xxx/xxx/xxx000000573c760e28--00:01:50Finished200POSThttp://xxx.com:30003/Wms/xxx/xxx/xxx000000573c95f9904800:01:5000:00:31200POSThttp://xxx.com:30003/Wms/Common/xxx?xxx=xxx&xxx=x-HN00000057bbf4f8e83100:01:5000:02:12200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x00000057bc0803405000:01:5000:00:19200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x000000583c4aee804300:01:5000:01:http11200POST//xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B000000583c4d0c505300:01:5000:00:01200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B00000058bbf8f1a03400:01:5000:02:22200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B000000593bfe17584100:01:5000:01:22200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2C000000593c892160--00:01:50Finished200GEThttp://xxx.com:30003/Wms/xxx/xxx/xxxJob000000593ca813b04500:01:5000:00:30200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-HN000000593caa45d8--00:01:50Finished200GEThttp://xxx.com:30003/00000059bc1ad8083200:01:5000:01:4520://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C00000059bc1c3d703600:01:5000:01:29200POSThttp://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x25HttpContextobject(s)foundmatchingcriteria从Running一栏可以看到大部分请求都达到了1分钟以上,也验证了朋友提到的卡死问题。根据经验,可以取Running栏中最大httpContext所在的thread。也就是上面的线程30和33,我们看看他们在做什么?2.探索运行时间最长的线程,然后切到线程30和33以查看它们的线程堆栈0:000>~30sntdll!NtWaitForSingleObject+0xa:00007ffd`b81f024ac3ret0:030>!clrstackOSThreadId:0x29d0(30)ChildSPIPCallSite0000005acc3ac??59000007ffdb81f024a[PrestubMethodFrame:0000005acc3ac??590]xxx.xxx.RedisConnectionHelp.get_Instance()0000005acc3ac??85000007ffd4dd78911xxx.xxx.RedisCache..ctor(Int32,System.String)0000005acc3ac??8c000007ffd4dd78038xxx.xxx.CacheByRedis.HashGet[[System.__Canon,mscorlib]](System.String,System.String,Int32)0000005acc3ac??96800007ffdabef1f7c[StubHelperFrame:0000005acc3ac??968]0000005acc3ac??9c000007ffd4dd77f18xxx.xxx.Cache.xxx.GetCacheNotAreaDataEntity[[System.__canon,mscorlib]](system.String,System.String,System.String)...0:030>?33sntdll!ntwaitFormultPirlultPiroBjects+0xa:00007ffd`b81f07baC33>!]0000005accabafb800007ffdb81f07ba[HelperMethodFrame_1OBJ:0000005accabafb8]System.Threading.Monitor.ObjWait(Boolean,Int32,System.Object)0000005accabb0d000007ffdaac60d64System.Threading.ManualResetEventSlim.Wait(Int32,System.Threading.CancellationToken)0000005accabb16000007ffdaac5b4bbSystem.Threading.Tasks.Task.SpinThenBlockingWait(Int32,System.Threading.CancellationToken)0000005accabb1d000007ffdab5a01d1System.Threading.Tasks.Task.InternalWait(Int32,System.Threading.CancellationToken)0000005accabb2a000007ffdab59cfa7System.Threading.Tasks.Task`1[[System.__Canon,mscorlib]].GetResultxxx(Boolean)0000005accabb2e000007ffd4d8d338fxxx.Config.xxx.Config`1[[System.__Canon,mscorlib]].GetConfig(xxx.Config.Model.ConfigListener,System.Func`2>)0000005accabb34000007ffd4d8d2f40xxx.Config.xxx.Config`1[[System.__Canon,mscorlib]].get_Item(System.String,System.String)0000005accabb3c000007ffd4dd78f7fxxx.Util.BaseConfig.get_GetRedisConn()0000005accabb44000007ffd4dd78e9cxxx.xxx.RedisConnectionHelp.GetConnectionString()0000005accabb4a000007ffd4dd789cbxxx.xxx.RedisConnectionHelp..cctor()0000005accabb94000007ffdabef6953[GCFrame:0000005accabb940]0000005accabc5b000007ffdabef6953[PrestubMethodFrame:0000005accabc5b0]xxx.xxx.RedisConnectionHelp.get_Instance()0000005accabc87000007ffd4dd78911xxx.xxx.RedisCache..ctor(Int32,System.String)0000005accabc8e000007ffd4dd78038xxx.xxx.CacheByRedis.HashGet[[System.__Canon,mscorlib]](System.String,System.String,Int32)0000005accabc98800007ffdabef1f7c[StubHelperFrame:0000005accabc988]0000005accabc9e000007ffd4dd77f18xxx.Core.Cache.xxx.GetCacheNotAreaDataEntity[[System.__Canon,mscorlib]](System.String,System.String,System.String)...从以上信息不难发现30号线程卡在了RedisConnectionHelp.get_Instance(),而33号线程已经进入了RedisConnectionHelp.get_Instance()方法,最后在GetConfig()处等待Result的结果。根据经验,30号线程好像是在等待锁,而33号线程在等待一个异步结果。下一个突破点是在RedisConnectionHelp.Instance3探索代码,找到问题代码连接下来,使用反编译工具ILSpy找到问题代码。publicstaticclassRedisConnectionHelp{publicstaticConnectionMultiplexerInstance{get{if(_instance==null){lock(Locker){if(_instance==null||!_instance.IsConnected){_instance=GetManager();}}}return_instance;}}}线程30确定够了,它卡在储物柜里了。接下来深挖线程33执行的GetManager()方法,简化代码如下:;returnJsonConvert.DeserializeObject(config);}catch(Exceptionex){returndefault(T);}}}privatestringGetConfig(ConfigListenerlistener,Func>action){vartext2=action(newGetConfigRequest{DataId=listener.DataId,Group=listener.Group,Tenant=text}).Result;returntext2;}internalasyncTaskDoGetConfigAsync(GetConfigRequestrequest){IRestResponserestResponse=awaitHttpUtil.Request(currentServerAddr,M??ethod.GET,reParamValues(),xxx);returnrestResponse.Content;}可以看到代码无限卡在Result上等了半天,这里想到了同步上下文。我认为他的程序是.NET4.8下的ASP.NETMVC。我记得上下文应该是AspNetSynchronizationContext。具体死锁原因可以参考我的文章:ATask。结果是死锁,这段代码怎么写?大约有四种解决方案。ConfigureAwait(false)改为全异步,然后用Task包裹。改为全同步3:综上所述,其实这次事故主要是异步代码导致的死锁问题。结果同步代码。批判这种现象的文章很多,在asp.netcore中已经搬走了同步上下文这个大坑,给小伙伴们的建议是改成全同步,死锁问题就没有了。哈哈,真替朋友开心!本文转载自微信公众号《行行码农聊技术》,可通过以下二维码关注。转载本文请联系一线码农聊聊技术公众号。