后台centos7监控告警某段时间磁盘使用率达到99%。由于监控是汇总信息的形式,没有监控快照信息(可以查到某个进程的I/O和CPU消耗),所以需要上传到服务器上。周期性的执行statistics命令获取快照信息。需要使用iostat-dx-k查看avgqu-sz,await,svctm,%util;sar-u查看%iowait,%user;pidstat-d查看进程I/O读写快照信息生成统计文件步骤cat>/tmp/at_task.sh</tmp/pidstat_\`date+%F_%T\`.log2>&1&sar-u2>/tmp/sar_\`date+%F_%T\`.log2>&1&while[1];doecho-n\`date+%T\`>>/tmp/iostat_\`date+%F\`2>&1&&iostat-dx-k11>>/tmp/iostat_\`date+%F\`2>&1;睡觉2;done&EOFwhile循环中使用iostat的原因是为了输出日期+%T时间,否则只有数据,没有时间信息,没有用。使用at命令执行at15:14today-f/tmp/at_task.sh报错Can'topen/var/run/atd.pidtosignalatd。没有atd运行?重启atd服务serviceatdrestartrestarttheattimingtaskat15:14today-f/tmp/at_task.shjob2atWedMar1315:14:002019得到如下快照信息iostat15:13:35Linux3.10.0-862.14.4.el7.x86_64(ip-xxxxx)03/13/2019_x86_64_(4CPU)设备:rrqm/swrqm/sr/sw/srkB/swkB/savgrq-szavgqu-szawaitr_awaitw_awaitsvctm%utilvda0.120.0717.3119.41580.7990.5236.570.092.394.420.570.720.722.63SCD00.000.000.000.000.000.000.000.000.000.006.000.280.280.280.280.280.280.250.250.250.250.250.00SAR03:14:00NICE%pmcpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpucpuPM全0.250.000.380.000.0099.3703:14:04PM全部1.250.130.630.000.0097.9903:14:06PM全部0.250.130.500.000.000.0099.1203:14:14:14:08PMUIDPIDkB_rd/skB_wr/skB_ccwr/sCommand03:14:02PM570090890.006.000.00uxxx03:14:02PM570091400.006.000.00uxxx03:14:02PM570092920.0002:14:00PM570092920.0002:0010.010180840.002.000.00bashkillkill信息采集命令ps-ef|egrep'iostat|sar|pidstat|while'|grep-vgrep|awk'{print$2}'|xargs-lkill但ps-ef|egrep命令获取不到while循环的pid,并没有kill掉while循环,数据会一直写入/tmp/iostat_2019-03-13-_-通过lsof,打开文件的过程是不是定位lsof/tmp/iostat_2019-03-13[root@ip-10-186-60-117~]#[root@ip-10-186-60-117~]#通过lsof可以定位进程打开mysql-error.loglsof/opt/mysql/data/5690/mysql-error.logCOMMANDPIDUSERFDTYPEDEVICESIZE/OFFNODENAMEmysqld12858actiontech-universe1wREG253,1634520083533/opt/mysql/data/5690/mysql-error.logmysqld12858actiontech-universe2wREG253,16345/33/opt/data/5690/mysql-error.log显示一个进程只有一个inode存放了某个文件,可以查看通过lsof的文件。那些进程使用它来获取写入文件的进程号。installsysemtapyum-yinstallsystemtapSystemTap是Linux内核的监控跟踪工具使用systemtap中的inodewatch.stp工具查找文件的进程号得到inodestat-c'%i'/tmp/iostat_2019-03-134210339获取文件所在设备的major,minorls-al/dev/vda1brw-rw----1rootdisk253,1Jan3013:57/dev/vda1获取写入文件的pidstap/usr/共享/系统temtap/examples/io/inodewatch.stp25314210339检查“/lib/modules/3.10.0-862.14.4.el7.x86_64/build/.config”失败,出现错误:没有这样的文件或目录版本不正确或缺少内核开发打包,使用:yuminstallkernel-devel-3.10.0-862.14.4.el7.x86_64从kernel-devel网站下载对应的kernel-devel包rpmbuildfor:ScientificLinux7根据系统内核版本wgetftp://FTP。pbone.net/mirror/ftp.scientificlinux.org/linux/scientific/7.2/x86_64/updates/security/kernel-devel-3.10.0-862.14.4.el7.x86_64.rpmrpm-ivhkernel-devel-3.10.0-862.14.4.el7.x86_64.rpm再次执行stapstap/usr/share/systemtap/examples/io/inodewatch.stp25314210339......缺少单独的debuginfos,使用:debuginfo-installkernel-3.10.0-862.14.4.el7.x86_64Pass2:分析失败。[manerror::pass2]抑制的类似错误消息数:2.安装debuginfokerneldebuginfo-installkernel-3.10.0-862.14.4.el7.x86_64验证:kernel-debuginfo-common-x86_64-3.10.0-862.14。4.el7.x86_641/3验证:yum-plugin-auto-update-debug-info-1.1.31-50.el7.noarch2/3验证:kernel-debuginfo-3.10.0-862.14.4.el7.x86_643/3已安装:kernel-debuginfo.x86_640:3.10.0-862.14.4.el7yum-plugin-auto-update-debug-info.no??arch0:1.1.31-50.el7Dependency安装:kernel-debuginfo-common-x86_64.x86_640:3.10.0-862.14.4.el7再次执行stapstap/usr/share/systemtap/examples/io/inodewatch.stp25314210339ERROR:模块版本不匹配(#1SMPTueSep2514:32:52CDT2018vs#1SMPWedSep2615:12:11UTC2018),release3.10.0-862.14.4.el7.x86_64WARNING:/usr/bin/staprunexitedwithstatus:1添加-v查看详细报告错误stap-v/usr/share/systemtap/examples/io/inodewatch.stp25314210339Pass1:使用240276virt/41896res/3368shr/38600datakb解析用户脚本和471个库脚本,在300usr/20sys/320realms.Pass2:分析脚本:2个探针,12个函数,8个嵌入,0个全局变量,使用399436virt/196284res/4744shr/197760datakb,in1540usr/560sys/2106realms.Pass3:usingcached/root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030.cPass4:usingcached/root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030.koPass5:开始运行。错误:模块版本不匹配(#1SMPTueSep2514:32:52CDT2018vs#1SMPWedSep2615:12:11UTC2018),版本3.10.0-862.14.4.el7.x86_64WARNING:/usr/bin/staprun退出状态:1Pass5:运行在0usr/10sys/38realms中完成。Pass5:运行失败。[manerror::pass5]修改vim/usr/src/kernels/3.10.0-862.14.4.el7.x86_64/include/generated/compile.h#defineUTS_VERSION"#1SMPTueSep2514:32:52CDT2018"改为#defineUTS_VERSION"#1SMPWedSep2615:12:11UTC2018"rm-rf/root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030*再次执行stap/usr/examples/systemtapio/inodewatch.stp25314210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4671)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4677)vfs_write0xfd00001/4210339iostat(4683)vfs_write0xfd00001/4210339............可见已经得到了写入/tmp/iostat_date+%F文件的进程号,但是进程号一直打印出来,因为后台进程iostat-dx-m在while循环中,后面会执行每睡2siostat生成一个新的pid如何让iostat-dx-m停止写入/tmp/iostat_date+%F文件?除了重启大法,$_$rm-rf不能在iostat进程写文件时终止后台。删除文件后,while循环会生成一个新文件rm-rf/tmp/iostat_2019-03-1*stat/tmp/iostat_2019-03-1*File:'/tmp/iostat_2019-03-13'Size:146700Blocks:512IOBlock:4096regularfileDevice:fd01h/64769dInode:4210339Links:1Access:(0644/-rw-r--r--)Uid:(0/root)Gid:(0/root)Access:2019-03-1416:07:26.211888899+0800Modify:2019-03-1416:18:17.854019793+0800Change:2019-03-1416:18:17.854019793+0800正确做法cat>/tmp/iostat.sh<>/tmp/iostat_\`date+%F\`2>&1&&iostat-dx-m11>>/tmp/iostat_\`日期+%F\`2>&1;睡觉2;done&EOFatnow+1minutetodaybash/tmp/iostat.sh#likethis您可以轻松获取进程号pidps-ef|grepiostatroot85931016:16pts/200:00:00bash/tmp/iostat.sh