ProblemDescriptionThememoryusageoftheLinuxserverexceedsthethresholdandanalarmistriggered.问题排查首先,通过free命令观察系统的内存使用情况,显示如下:totalusedfreesharedbufferscachedMem:24675796245871448865203570121612488-/+buffers/cache:226176442058152Swap:20964721082241988248其中,可以看出内存总量为24675796KB,已使用22617644KB,只剩余2058152KB。Then,afterusingthetopcommand,shift+Mtosortbymemory,observetheprocessesthatusethemostmemoryinthesystem,andfindthatonly18GBofmemoryisoccupied,andotherprocessesaresmallandcanbeignored.Therefore,whereisnearly4GBofmemory(22617644KB-18GB,about4GB)used?Further,throughcat/proc/meminfo,itisfoundthatthereisnearly4GB(3688732KB)ofSlabmemory:......Mapped:25212kBSlab:3688732kBPageTables:43524kB......Slabisusedtostorethekerneldatastructurecache,再通过slabtop命令查看这部分内存的使用情况:OBJSACTIVEUSEOBJSIZESLABSOBJ/SLABCACHESIZENAME1392634813926348100%0.21K773686183494744Kdentry_cache33404026205678%0.09K83514033404Kbuffer_head15104015053799%0.74K302085120832Kext3_inode_cache发现其中大部分(大约3.5GB)都是用于了dentry_cache。问题解决1、修改/proc/sys/vm/drop_caches,释放Slab占用的cache内存空间(参考drop_caches官方文档):写入thiswillcausethekerneltodropcleancaches,dentriesandinodesfrommemory,causingthatmemorytobecomefree.Tofreepagecache:*echo1>/proc/sys/vm/drop_cachesTofreedes*echo2>/proc/sys/vm/drop_cachesTofrepagecache,dentriesandinodes:*echo3>/proc/sys/vm/drop_caches由于这是非破坏性操作,并且脏对象不可释放,因此用户应首先运行“同步”以确保释放所有缓存对象。此方法无法实现。如果用户1.2,方法1.2需要root权限。不是root,但有sudo权限,可以通过sysctl命令设置:$sync$sudosysctl-wvm.drop_caches=3$sudosysctl-wvm.drop_caches=0#recoverydrop_caches运行后,可以用sudosysctl-a|grepdrop_caches查看是否生效。3.修改/proc/sys/vm/vfs_cache_pressure,调整清理inode/dentrycaches的优先级(默认为100),LinuxInsight中有相关的解释:Atthedefaultvalueofvfs_cache_pressure=100thekernelwillattempttoreclaimdentriesandinodesata“fair”ratewithrespecttopagecacheandswapcachereclaim.Decreasingvfs_cache_pressurecausesthekerneltoprefertoretaindentryandinodecaches.Increasingvfs_cache_pressurebeyond100causesthekerneltoprefertoreclaimdentriesandinodes.具体的设置方法,youcanrefertomethod1ormethod2.Referenceshttps://www.kernel.org/doc/Documentation/sysctl/vm.txthttp://major.io/2008/12/03/reducing-inode-and-dentry-caches-to-keep-oom-killer-at-bay/http://linux-mm.org/Drop_CachesThefollowingrecordsaretheprogressoffurtherinvestigation.Deeperreasons.Theaboveinvestigationfoundthattherearealotofdentry_cacheintheLinuxsystemoccupyingmemory.Whyaretheresomanydentry_cache?1.First,clarifytheconceptandfunctionofdentry_cache:directoryentrycacheisdesignedbyLinuxtoimprovetheprocessingefficiencyofdirectoryentryobjects;itrecordsthemappingrelationshipbetweendirectoryentriesandinodes.Therefore,whentheapplicationinitiatesthestatsystemcall,thecorrespondingdentry_cacheitemwillbecreated(further,ifthefileofeachstatdoesnotexist,thentherewillalwaysbealargenumberofnewdentry_cacheitemscreated).2、当前服务器是storm集群的一个节点。首先,我想到了与风暴相关的工作流程。我strace了storm的worker进程,发现stat系统调用非常频繁,stat文件一直是新的文件名:sudostrace-fp
