记得一个生产环境freeze优化流程：大事务并发发送回滚

时间：2023-03-19 22:13:08 科技观察

概述最近生产环境有这样一个现象，一般的订单调度只需要2s就可以出结果，但是多人调度会卡住，多了15分钟结果一个都出不来，有时还失败导致数据不准确。记录下生产环境卡顿时排查问题的过程。1.获取ASH报告SQL>@?/rdbms/admin/ashrpt.sql--Tospecifyabsolutebegintime:--[MM/DD/YY]]HH24:MI[:SS]--08/09/1908:40:002、ASH分析（1）TopUserEvents（2）相关sqlTopSQLwithTopEventssql详情（3）存储过程（4）TOPsession从上面的分析可以看出两个明显的等待事件：waitforstoppereventtobe增加waitingevent并等待一个undorecord等待事件，这个应该是批量任务调度时产生的大量大事务，导致一些rollback导致资源消耗严重3、处理大事务并发送rollbacks一般情况下，waitforstopperevent待增加的等待事件与waitforanundorecord等待事件相关联。对于这个等待事件metalink上面有一篇文档464246.1SometimesParallelRollbackofLargeTransactionmaybecomeveryslow.Afterkillingalargerunningtransaction(eitherbykillingtheshadowprocessorabortingthedatabase)thendatabaseseemstohang,orsmonandparallelqueryserverstakingalltheavailablecpu.Infast-startparallelrollback,thebackgroundprocessSmonactsasacoordinatorandrollsbackasetoftransactionsinparallelusingmultipleserverprocesses.Faststartparallelrollbackismainlyusefulwhenasystemhastransactionsthatrunalongtimebeforecomitting,especiallyparallelInserts,Updates,Deletesoperations.WhenSmondiscoversthattheamountofrecoveryworkisaboveacertainthreshold,itautomaticallybeginsparallelrollbackbydispersingtheworkamongseveralparallelprocesses.Therearecaseswhereparalleltransactionrecoveryisnotasfastasserialtransactionrecovery,becausethepqslavesareinterferingwitheachother.Itlookslikethechangesmadebythistransactioncannotberecoveredinparallelwithoutcausingaperformanceproblem.Theparallelrollbackslav多个进程很可能争用相同的资源，这导致与串行回滚相比更差的回滚性能。解决方法：--关闭并发回滚，转成串行回滚（直接重启解决方案）查看，查看v$px_session视图，发现所有并发进程都是由smon进程引起的（即qcsid为列为smon进程的sessionid)，smon进程的等待事件为waitforstoppereventtobeincreased，即smon进程变大事务回滚，默认参数fast_start_parallel_rollback为low，即,回滚时会启动2*CPU并发进程，并且因为使用并发，可能是并发之间使用公共资源，导致回滚速度变慢。因为是生产环境，不能随便重启，所以我用下面的方法修改这个参数：(1)找到smon进程IDselectpid,spid,pname,username,tracefilefromv$processwherepname='SMON'(2)禁用smon进程的事务清理（DisableSMONtransactioncleanup）oradebugsetorapid'SMON'sOraclePID';oradebugevent10513tracenamecontextforever,level2(3)查询V$FAST_START_SERVERS视图，kill所有smon启用的并发进程(4)修改fast_start_parallel_rollback参数altersystemsetfast_start_parallel_rollback=falsemon;(5)enablesmonProcesstransactioncleanup(enabletransactionrecovery)oradebugsetorapid'SMON'sOraclePID';oradebugevent10513tracenamecontextoff(6)获取tracefilenameoradebugtracefile_name(7)验证4.业务验证修改后，进入业务验证。高峰期还是有滞后，但是频率降低了很多，也没有再出现错误之类的。同时观察新的报告，可以发现并发回滚等等待事件都没有了。

上一篇：阿里在云栖大会上没提的数字经济逻辑

下一篇：每日一学IOS92：UI测试

记得一个生产环境freeze优化流程：大事务并发发送回滚相关文章