对于Linux系统管理员来说,没有一项最重要的任务就是确保他或她管理的系统处于良好状态。Linux系统管理员可以使用很多工具来帮助监控和显示系统中的进程,例如top和htop,但是这些工具都无法与collectl相提并论。collectl:Linux性能监控工具collectl是一个很棒的、功能丰富的命令实用程序,可用于收集描述系统当前状态的性能数据。与大多数其他监控工具不同,collectl不会查看有限数量的系统指标,而是可以收集有关许多不同类型系统资源的信息,例如处理器、磁盘、内存、网络、套接字、TCP、Inode、Infiniband、Lustre、NFS、进程、quadric、slab和伙伴系统信息。使用collectl的一个非常好的方面是它还可以充当top、ps和iotop之类的实用程序,其中许多是为特定目的而设计的。那么,是什么特性使colleclt成为如此有用的工具呢?经过大量研究,我整理了一份collectl命令行实用程序的一些最重要功能的列表。collectl的特性?它可以交互运行,作为守护进程,或两者兼而有之。?能够以多种格式显示输出结果。?它能够监控几乎所有的子系统。?它可以扮演许多其他实用程序的角色,如ps、top、iotop或vmstat。?能够记录和回放捕获的数据。?它能够以多种文件格式导出数据。(如果您想使用外部工具分析数据,这很有用)。?它可以作为服务运行以监视远程机器或整个服务器集群。?它可以在终端显示数据并将数据写入文件或套接字。如何在Linux中安装collectl?collectl实用程序适用于所有Linux发行版,它只需要运行perl,因此在您的计算机上安装collectl之前,请确保您的计算机上安装了Perl。在Debian/Ubuntu/LinuxMint上,可以使用以下命令在基于Debian的机器(如Ubuntu)上安装collectl实用程序。$sudoapt-getinstallcollectlOnRHEL/CentOS/FedoraIfyouareusingaRedHatbaseddistribution,itiseasytogetitfromthesoftwarerepositorywiththehelpoftheyumcommand.Somepracticalexamplesofthe#yuminstallcollectlcollectlutilityOncethecollectltoolisinstalled,youcaneasilyrunitfromtheterminal,evenwithoutanyoptions.Thefollowingcommandwilldisplayinformationaboutprocessor,diskandnetworkstatisticsetc.inaveryshortandeasytoreadformat.#collectlwaitingfor1secondsample...##cpusysinterctxswKBReadReadsKBWritWritesKBInPktInKBOutPktOut135790132200927413051027191186000039041207531188005232506132733106300001101252834137500001101282870142400367110119394922710044311011728091384000016061627321348000011012249931615005631203大家可以从终端屏幕中显示的上述输出结果中看到,处理命令输出中的系统衡量标准值非常容易,因为它就显示onasingleline.Whenthecollectlutilityisexecutedwithoutanyoptions,itdisplaysinformationaboutthefollowingsubsystems:?Processor?Disk?NetworkNote:Inourexample,subsystemreferstoeverysystemresourcethatcanbemeasured.Youcanalsodisplaystatisticsforallsubsystemsexceptslabbycombiningthiscommandwiththe--alloptionasshownbelow.#collectl--allwaitingfor1secondsample...##cpusysinterctxswCpu0Cpu1FreeBuffCachInacSlabMapFragmentsKBReadReadsKBWritWritesKBInPktInKBOutPktOutIPTcpUdpIcmpTcpUdpRawFragHandleInodesReadsWritesMetaComm16381715424303901G175M1G683M193M1Gnsslkjjebbk00243110100006230008160240829000011174513243164261G175M1G683M193M1Gnsslkjjebbk0000030200006220008160240828000015279316833714241G175M1G683M193M1Gssslkjjebbk0000110100006220008160240829000016287218754274461G175M1G683M193M1Gssslkjjebbk00243110100006220008160240828000024284213834733681G175M1G683M193M1Gssslkjjebbk001686110100006220008160240828000027384410994783651G175M1G683M193M1Gnsslkjjebbk0000161900006220008160240828000026582312383964281G175M1G683M193M1Gssslkjjebbk00002113900006220008160240828000015175312763613911G175M1G683M193M1Gssslkjjebbk004031203000062300081602408290000但是,你如何借助该实用工具监测处理器的使用情况?The"-s"optioncanbeusedtocontrolwhichsubsystemdataiscollectedorreplayed.Forexample,thefollowingcommandcanbeusedtomonitorasummaryofprocessorusage.#collectl-scwaitingfor1secondsample...##cpusys interctxsw1527491155163772144514279312472748871292241796125816174311131517431179141706107815618scd2Thebestwaytolearnhowtousecommand-linetoolsistoactuallyusethemasmuchaspossible,sorunthiscommandinyourterminalandseewhathappens.#collectl-scdnwaitingfor1secondsample...##cpusysinterctxswKBReadReadsKBWritWritesKBInPktInKBOutPktOut254943333300001102273825291000001101275886253100000001204872240600001101261854209100202110139410043398000028364169552464004031203257890160900001101162814116500796432202141779138300486110111279512850000214114你很容易明白:默认选项是“cdn”,它代表处理器、磁盘和网络数据。Theresultofthecommandisthesameastheoutputof"collectl-scn".Ifyouwanttocollectdataaboutmemory,usethefollowingcommand.#collectl-smwaitingfor1secondsample...##FreeBuffCachInacSlabMap1G177M1G684M193M1G1G177M1G684M193M1G1G177M1G684M193M1G1G177M1G684M193M1G1G177M1G684M193M1G1G177M1G684M193M1G1G177M1G684M193M1G1G177M1G684M193M1G如果你想获得关于内存使用情况、闲置内存以及对系统性能而言很重要的其他方面的一些详细信息,上述输出结果非常itworks.Wanttogetsomedataabouttcp?Usethefollowingcommandtoachieve.#collectl-stwaitingfor1secondsample...##IPTcpUdpIcmp00000000000000000000000000000000000000000000Onceyouhavesomeexperience,itiseasytocombineoptionstogetthedesiredresult.Forexample,youcancombinethe"t"fortcpwiththe"c"forprocessor.Thefollowingcommanddoesthis.#collectl-stcwaitingfor1secondsample...##cpusysinterctxswIPTcpUdpIcmp238961313600002459163662000021884824080000301091626740000383826175200003138201408000015578113350000173802131400001737551218000014278813210000我们人类很难记住所有的可用选项,于是我列出了该工具支持的子系统摘要列表。?b–partnersysteminformation(memorysegments)?c–processor?d–disk?f–NFSV3data?i–inodeandfilesystem?j–interrupt?l–Lustre?m–memory?n–network?s-Socket?t-TCP?x-Interconnect?y-slabs(systemobjectcache)AveryimportantdataforsystemadministratorsofLinuxusersisthedatacollectedintermsofdiskusage.Thefollowingcommandwillhelpyoumonitordiskusage.#collectl-sdwaitingfor1secondsample...##KBReadReadsKBWritWrites0000000000927000000363000000000010070000您还可以使用“-sD”选项收集有关单个磁盘的数据,但请注意,不会报告有关所有磁盘的数据。#collectl-sDwaitingfor1secondsample...#DISKSTATISTICS(/sec)#Pct#NameKBytesMergedIOsSizeKbytesMergedIOsSizeRWSizeQLenWaitSvcTimUtilsda00005211226261881sda0000000000000sda0000240212120000sda00001520438380000sda00001924536464120205sda0000204021021020000sda0000000000000sda00001162633938116164sda0000000000000sda0000000000000sda000032531110116164sda0000000000000你还可以使用其他的具体子系统来收集详细数据。Eachspecificsubsystemislistedbelow.?C–处理器?D–磁盘?E–通过ipmitool的环境数据(风扇、功率和温度)?F–NFS数据?J–中断?L–LustreOST详细信息,或客户端文件系统详细信息?N–网络?T–65个TCP计数器仅以绘图格式?X–互连?Y–Slabs(系统对象缓存)?Z–进程collectl实用程序有许多可用选项,仅一篇文章没有足够的时间和篇幅来一一详述。不过,了解如何使用实用程序作为top和ps还是值得的。将collectl用作top实用程序很容易,只需在终端中运行以下命令,您将看到top实用程序在Linux系统上执行时提供的类似输出。#collectl--top#TOPPROCESSESsortedbytime(countersare/sec)13:11:02#PIDUserPRPPIDTHRDSVSZRSSCPSysTUsrTPctAccuTimeRKBWKBMajFMinFCommand^COuch!tecmint20140R1G626M00.010.141528:48.24000109/usr/lib/firefox/firefox3403tecmint20140R1G626M10.000.202028:48.44000600/usr/lib/firefox/firefox5851tecmint2046660R17M13M00.020.06800:01.280000/usr/bin/perl1682root2016662R211M55M10.020.01303:10.2400095/usr/bin/X3454tecmint2034038S216M45M10.010.02301:23.320000/usr/lib/firefox/plugin-container4658tecmint2046573S207M17M10.000.02200:08.23000142gnome-terminal2890tecmint2025713S340M68M00.000.01101:19.950000compiz3521tecmint20124S710M148M10.010.00101:47.840000skype1root2000S3M2M00.000.00000:02.570000/sbin/init2root2000S0010.000.00000:00.000000kthreadd3root2020S0000.000.00000:00.600000ksoftirqd/05root020S0000.000.00000:00.000000kworker/0:0H7root020S0000.000.00000:00.000000kworker/u:0H8rootRT20S0000.000.00000:04.420000迁移/09root2020S0000.000.00000:00.000000rcu_bh10root2020R0000.000.00000:02.220000rcu_sched11rootRT20S0000.000.00000:00.050000watchdog/012rootRT20S0010.000.00000:00.070000watchdog/113root2020S0010.000.00000:00.730000ksoftirqd/114rootRT20S0010.000.00000:01.960000migration/116root020S0010.000.00000:00.000000kworker/1:0H17root020S0010...通过在终端中运行“ps”命令。#collectl-c1-sZ-i:1waitingfor1secondsample...###RECORD1>>>tecmint-vgn-z13gn<<<(1397979716.001)(SunApr2013:11:562014)####PROCESSUMMARY(countersare/sec)#PIDUserPRPPIDTHRDSVSZRSSCPSysTUsrTPctAccuTimeRKBWKBMajFMinFCommand1root2000S3M2M00.000.00000:02.570000/sbin/init2root2000S0010.000.00000:00.000000kthreadd3root2020S0000.000.00000:00.600000ksoftirqd/05root020S0000.000.00000:00.000000kworker/0:0H7root020S0000.000.00000:00.000000kworker/u:0H8rootRT20S0000.000.00000:04.420000migration/09root2020S0000.000.00000:00.000000rcu_bh10root2020S0000.000.00000:02.240000rcu_sched11rootRT20S0000.000.00000:00.050000watchdog/012rootRT20S0010.000.00000:00.070000watchdog/113root2020S0010.000.00000:00.730000ksoftirqd/114rootRT20S0010.000.00000:01.960000migration/116root020S0010.000.00000:00.000000kworker/1:0H17root020S0010.000.00000:00.000000cpuset18root020S0010.000.00000:00.000000khelper19root2020S0000.000.00000:00.000000kdevtmpfs20root020S0000.000.00000:00.000000netns21root2020S0000.000.00000:00.000000bdi-default22root020S0000.000.00000:00.000000kintegrityd我很确定许多Linux系统管理员会非常喜欢这个工具如果想深入了解collectl,可以参考参考manpage,坚持使用一段时间。只需在终端中键入以下命令并开始阅读参考手册页。#mancollectl参考链接collectl主页:http://collectl.sourceforge.net/index.html英文链接:http://www.tecmint.com/linux-performance-monitoring-with-collectl-tool/
