介绍死锁(deallocks):是指两个或多个进程(线程)在执行过程中,由于争夺资源而相互等待,如果没有外力作用,它们就会无法前进。这时候,就说系统处于死锁状态或者系统发生了死锁,而这些总是在互相等待的进程(线程)就称为死锁进程(线程)。由于资源占用是互斥的,当一个进程申请资源时,如果没有外界的帮助,相关进程(线程)将永远无法分配到所需的资源而无法继续运行。这就产生了一种特殊的死亡现象。锁。交叉持有死锁情况,当执行程序中的两个或多个线程完全阻塞(等待)时,每个线程都在等待一个被其他线程占用和阻塞的资源。例如,如果线程1锁定记录A并等待记录B,而线程2锁定记录B并等待记录A,则两个线程之间会发生死锁。在计算机系统中,如果系统的资源分配策略不当,或者更常见的是程序员编写的程序可能存在错误等,都会导致进程因不当竞争资源而死锁。死锁的四个必要条件(1)互斥:一种资源一次只能被一个进程(线程)使用。(2)请求和持有条件:当一个进程(线程)因请求资源而阻塞时,它不会放手获得的资源。(3)非剥夺条件:本进程(线程)获得的资源,在用完之前不能强行剥夺。(4)循环等待条件:多个进程(线程)形成首尾相连的循环等待资源关系。图1交叉持有锁的死锁示意图:注:执行完func2和func4后,子线程1已经获取了锁A,正在尝试获取锁B,但是此时子线程2已经获取了锁B,正在尝试去获取锁A,那么子线程1和子线程2就没有办法获取到锁A和锁B,因为它们互相占用,永远不会释放,所以就出现了死锁现象。使用pstack和gdb工具分析死锁程序Linux平台pstack简介pstack是Linux下(如RedHatLinux系统、UbuntuLinux系统等)非常有用的一个工具,它的作用是打印出进程堆栈信息。可以输出所有线程的调用关系栈。Linux平台gdb简介GDB是GNU开源组织发布的一款功能强大的UNIX下程序调试工具。Linux系统包括GNU调试器gdb,它是用于调试C和C++程序的调试器。它允许程序开发人员在程序运行时观察程序的内部结构和内存使用情况。gdb提供的一些主要功能如下:1.运行程序,设置影响程序运行的参数和环境;2.在指定的程序中控制程序3.程序停止时,可以查看程序的状态;4.当程序崩溃时,可以查看核心文件;5、可以修改程序的错误,重新运行程序;6.可以动态监控程序中变量的值;7可以单步调试代码,观察程序的运行状态。gdb程序调试的对象是可执行文件或进程,而不是程序的源代码文件。但是,并非所有可执行文件都可以使用gdb进行调试。如果要将生成的可执行文件用于调试,需要在执行g++(gcc)命令编译程序时加上-g参数,并在编译时指定程序包含调试信息。调试信息包含程序中每个变量的类型和在可执行文件中的地址映射以及源代码的行号。gdb使用此信息来关联源代码和机器代码。gdb的基本命令很多,就不详细介绍了。如果您需要更多信息,请参阅gdb手册。清单1.测试程序#include#include#includepthread_mutex_tmutex1=PTHREAD_MUTEX_INITIALIZER;pthread_mutex_tmutex2=PTHREAD_MUTEX_INITIALIZER;pthread_mutex_tmutex3=PTHREAD_MUTEX_INITIALIZER;pthread_mutex_tmutex4=PTHREAD_MUTEX_INITIALIZER;staticintsequence1=0;staticintsequence2=0;intfunc1(){pthread_mutex_lock(&mutex1);++sequence1;sleep(1);pthread_mutex_lock(&mutex2);++sequence2;pthread_mutex_unlock(&mutex2);pthread_mutex_unlock(&mutex1);returnsequence1;}intfunc2(){pthread_mutex_lock(&mutex2);++sequence2;sleep(1);pthread_mutex_lock(&mutex1);++sequence1;pthread_mutex_unlock(&mutex1);pthread_mutex_unlock(&mutex2);returnsequence2;}void*thread1(void*arg){while(1){intiRetValue=func1();如果(iRetValue==100000){pthread_exit(NULL);}}}void*thread2(void*arg){while(1){intiRetValue=func2();if(iRetValue==100000){pthread_exit(NULL);}}}void*thread3(void*arg){while(1){sleep(1);charszBuf[128];memset(szBuf,0,sizeof(szBuf));strcpy(szBuf,"thread3");}}void*thread4(void*arg){while(1){sleep(1);charszBuf[128];memset(szBuf,0,sizeof(szBuf));strcpy(szBuf,"thread3");}}intmain(){pthread_ttid[4];if(pthread_create(&tid[0],NULL,&thread1,NULL)!=0){_exit(1);}if(pthread_create(&tid[1],NULL,&thread2,NULL)!=0){_exit(1);}if(pthread_create(&tid[2],NULL,&thread3,NULL)!=0){_exit(1);}if(pthread_create(&tid[3],NULL,&thread4,NULL)!=0){_exit(1);}sleep(5);//pthread_cancel(tid[0]);pthread_join(tid[0],NULL);pthread_join(tid[1],NULL);pthread_join(tid[2],NULL);pthread_join(tid[3],NULL);pthread_mutex_destroy(&mutex1);pthread_mutex_destroy(&mutex2);pthread_mutex_destroy(&mutex3);pthread_mutex_destroy(&mutex4);return0;}List2.编译测试程序[dyu@xilinuxbldsrvpurify]$g++-glock.cpp-olock-lpthreadList3.找到测试程序的进程号[dyu@xilinuxbldsrvpurify]$ps-ef|greplockdyu67215751015:21pts/300:00:00./lock列表4.执行pstack(pstack–进程号)的输出结果[dyu@xilinuxbldsrvpurify]$pstack6721Thread5(Thread0x41e37940(LWP6722)):#00x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.0#10x0000003d1a808e1ain_L_lock_1034()from/lib64/libpthread.so.0#20x0000003d1a808cdcinpthread_mutex_lock()来自/lib64/libpthread.so.0#30x0000000000400a9binfunc1()()#40x0000000000400ad7inthread1(void*)()#50x0000003d1a80673dinstart_thread()来自/lib64/libpthread.so.0#60x0000003d19cd40cdinclone.6Thread4(Thread0x42838940(LWP6723)):#00x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.0#10x0000003d1a808e1ain_L_lock_1034()from/lib64/libpthread.so.0#20x0000003d1a808cdcinpthread_mutex_lock()from/lib64/libpthread.so.0#30x0000000000400a17infunc2()()#40x0000000000400a53inthread2(void*)()#50x0000003d1a80673dinstart_thread()from/lib64/libpthread.so.0#60x0000003d19cd40cdinclone()from/lib64/libc.so.6Thread3(Thread0x43239940(LWP6724)):#00x0000003d19c9a541innanosleep()from/lib64/libc.so.6#10x0000003d19c9a364insleep()from/lib64/libc.so.6#20x00000000004009bcinthread3(void*)()#30x0000003d1a80673dinstart_thread()from/lib64/libpthread.so.0#40x0000003d19cd40cdinclone()from/lib64/libc.so.6Thread2(Thread0x43c3a940(LWP6725)):#00x0000003d19c9a541innanosleep()来自/lib64/libc.so.6#10x0000003d19c9a364insleep()来自/lib64/libc.so.6#20x00000000000400976inthread4(void*)()#30x0000003d1a80673d1a80673#40X0000003D19CD40CDINCLONE()来自/lib64/libc.so.6thread1(thread0x2b984ecabd90(lwp6721)):#00x0000003d1a807b35inpthReadppthread。pstack–进程号)的输出结果[dyu@xilinuxbldsrvpurify]$pstack6721Thread5(Thread0x40bd6940(LWP6722)):#00x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.0#10x0000003d1a808e1ain_L_lock_1034()from/lib64/libpthread.so.0#20x0000003d1a808cdcinpthread_mutex_lock()来自/lib64/libpthread.so.0#30x0000000000400a87infunc1()()#40x0000000000400ac3inthread1(void*)()#50x0000003d1a80673dinstart_thread()from/lib64/libpthread.so.0#60x0000003d19cd40cdinclone()from/lib64/libc.so.6Thread4(Thread0x415d7940(LWP6723)):#00x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.0#10x0000003d1a808e1ain_L_lock_1034()from/lib64/libpthread.so.0#20x0000003d1a808cdcinpthread_mutex_lock()from/lib64/libpthread.so.0#30x0000000000400a03infunc2()()#40x0000000000400a3finthread2(void*)()#50x0000003d1a80673dinstart_thread()from/lib64/libpthread.so.0#60x0000003d19cd40cdinclone()from/lib64/libc.so.6Thread3(Thread0x41fd8940(LWP6724)):#00x0000003d19c7aec2inmemset()from/lib64/libc.so.6#10x00000000004009beinthread3(void*)()#20x0000003d1a80673dinstart_thread()来自/lib64/libpthread.so.0#30x0000003d19cd40cdinclone()来自/lib64/libc.so.6Thread2(Thread0x429d9940(LWP6725)):#00x0000003d19c7ae0dimmemset()来自/lib64/libc.so.6#10x000000008200thread4(00x000000008200thread40))#20x0000003d1a80673dinstart_thread()来自/lib64/libpthread.so.0#30x0000003d19cd40cdinclone()来自/lib64/libc.so.6thread1(thread0x2af906fd9d90(lwp6721)):#00x0000000000000000000000000000000000b35inpthRead_joinsremtirel/lib64/lib64/lib64.lib64.lib64.lib64.调用关系栈分析:当进程挂掉时,多次使用pstack查看进程的函数调用栈,死锁线程会一直处于等待锁的状态,对比多次函数调用栈的输出结果,并判断哪两个线程(或几个线程)没有发生变化,一直处于等待锁状态(可能有两个线程没有发生变化)输出分析:根据上面的输出对比可以发现,thread1和thread2是第一个pstack输出的,是sleep函数变化到第二个pstack输出的memset函数。但是线程4和线程5一直处于等待锁状态(pthread_mutex_lock),连续两次pstack信息输出没有变化,所以我们可以推测线程4和线程5发生了死锁。Gdbintothread输出:清单6。然后通过gdbattach到死锁进程(gdb)infothread5Thread0x41e37940(LWP6722)0x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.04Thread0x42838940(LWP6723)0x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.03Thread0x43239940(LWP6724)0x0000003d19c9a541innanosleep()from/lib64/libc.so.62Thread0x43c3a940(LWP6725)0x0000003d19c9a541innanosleep()来自/lib64/libc.so.6*1Thread0x2b984ecabd90(LWP6721)0x0000003d1a807b35inpthread_join()来自/lib64/libpthread.so.so切换到线程5(gdb)thread5[Switchingtothread5(Thread0x41e37940(LWP6722))]#00x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.0(gdb)where#00x0000003d1a80d4c4in__lll_lock.libpso4.libp.wait/()thread/(gdb)where#00x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.0(gdb)where#00x0000003d1a80d4c4in__ll4.so.0(gdb)where#00x0000003d1a80d4c4in__lll_lock_read_10x0000003d1a808e1ain_L_lock_1034()from/lib64/libpthread.so.0#20x0000003d1a808cdcinpthread_mutex_lock()from/lib64/libpthread.so.0#30x0000000000400a9binfunc1()atlock.cpp:18#40x0000000000400ad7inthread1(arg=0x0)atlock.cpp:43#50x0000003d1a80673dinstart_thread()from/lib64/libpthread.so.0#60x0000003d19cd40cdinclone()from/lib64/libc.so.6清单8.线路4和线路5的输出(gdb)f3#30x0000000000400a9binfunc1()atlock.cpp:1818(lock&mutex2);gdb)thread4[切换到线程4(Thread0x42838940(LWP6723))]#00x0000003d1a80d4c4in__lll_lock_wait()from/lib64/libpthread.so.0(gdb)f3#30x000000000000400a17infunc2()atlock.cpp:3131ppthread=mutex1)__={__lock=2,__count=0,__owner=6722,__nusers=1,__kind=0,__spins=0,__list={__prev=0x0,__next=0x0}},__size="\002\000\000\000\000\000\000\000B\032\000\000\001",'\000',__align=2}(gdb)pmutex3$2={__data={__lock=0,__count=0,__owner=0,__nusers=0,__kind=0,__spins=0,__list={__prev=0x0,__next=0x0}},__size='\000',__align=0}(gdb)pmutex2$3={__data={__lock=2,__count=0,__owner=6723,__nusers=1,__kind=0,__spins=0,__list={__prev=0x0,__next=0x0}},__size="\002\000\000\000\000\000\000\000C\032\000\000\001",'\000',__align=2}(gdb)从上面可以发现,线程4试图获取lockmutex1,但是lockmutex1已经被LWP为6722(__owner=6722)的线程获取了,thread5正在尝试获取锁mutex2,但锁mutex2已被LWP6723(__owner=6723)获取。从pstack的输出可以发现,LWP6722对应线程5,LWP6723对应线程4。因此我们可以断定线程4和线程5存在交叉持锁的死锁现象。查看线程源码,发现线程4和线程5同时使用mutex1和mutex2,申请顺序不合理。总结本文简单介绍了一种在Linux平台上分析死锁问题的方法,对一些死锁问题的分析有一定的作用。希望大家有所帮助。通过了解死锁产生的原因,特别是死锁的四个必要条件,可以尽可能地避免、预防和解决死锁。因此,在系统设计、进程调度等方面,要注意如何防止这四个必要条件成立,如何确定合理的资源分配算法,避免进程腐败占用系统资源。此外,还要防止进程在等待状态时占用资源。在系统运行过程中,动态地检查系统能够满足的进程发出的每个资源申请,并根据检查结果决定是否分配资源。如果系统可能死锁,则不分配,否则分配。因此,应合理规划资源的分配,采用有序的资源分配方法和银行家算法是避免死锁的有效方法。