为什么CMS GC时出现Concurrent Mode Failure

如题所述

并发收集器(concurrentcollector)指的是回收年老代和持久代时,采用多个线程和应用线程并发执行,减少应用停顿时间,但如果参数设置不当,容易出现Concurrent ModeFailure现象,此时JVM将采用停顿的方式进行full gc,整个gc时间相当可观,完全违背了采用CMS GC的初衷。
出现此现象的原因主要有两个:一个是在年老代被用完之前不能完成对无引用对象的回收;一个是当新空间分配请求在年老代的剩余空间中得到满足。原文如(if theconcurrent collector is unable to finish reclaiming the unreachable objectsbefore the tenured generation fills up, or if an allocation cannot be satisfiedwith the available free space blocks in the tenured generation, then theapplication is paused and the collection is completed with all the applicationthreads stopped)。
出现此现象的具体gc日志如下:
90003.167: [GC 90003.167: [ParNew: 261760K->0K(261952K), 0.0204310secs] 778897K->520196K(1310528K), 0.0207190 secs]
90006.049: [GC 90006.050: [ParNew: 261760K->0K(261952K), 0.0136380 secs]781956K->521446K(1310528K), 0.0138720 secs]
90010.628: [GC 90010.628: [ParNew: 261760K->261760K(261952K), 0.0000350secs]90010.628: [CMS (concurrent mode failure)[Unloadingclass sun.reflect.GeneratedSerializationConstructorAccessor1818]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1816]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1819]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1821]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1817]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1822]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1820]
: 521446K->415777K(1048576K), 2.2550270 secs] 783206K->415777K(1310528K),2.2553820 secs]
90015.099: [GC 90015.099: [ParNew: 261760K->0K(261952K), 0.0198180 secs]677537K->418003K(1310528K), 0.0200650 secs]
90018.670: [GC 90018.670: [ParNew: 261760K->0K(261952K), 0.0131610 secs]679763K->419115K(1310528K), 0.0133750 secs]
90022.254: [GC 90022.254: [ParNew: 261760K->0K(261952K), 0.0151240 secs]680875K->420505K(1310528K), 0.0154180 secs]
当时的JVM参数如下:
-server -Xms1280m -Xmx1280m -Xmn256m -Xss256k -XX:PermSize=128m-XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=5 -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled-XX:+DisableExplicitGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
因为配置了+CMSClassUnloadingEnabled参数,所以出现Unloading classsun.reflect.GeneratedSerializationConstructorAccessor的日志,这是个好习惯,如果空间不够时可以卸载类来释放空间,以进行FULL GC,相反,如果gc日志中出现了此日志,应该检查各代的大小设置是否合理。这里应用从启动到上述现象出现时还没有进行过CMS GC,出现concurrent modefailure现象的原因是年轻代GC(ParNew),年老代所剩下的空间不足以满足年轻代,也就是开头提到的原因二。要避免此现象,方法一是降低触发CMS的阀值,即参数-XX:CMSInitiatingOccupancyFraction的值,默认值是68,所以这里调低到50,让CMS GC尽早执行,以保证有足够的空间,如下:
-server -Xms1280m -Xmx1280m -Xmn256m -Xss256k -XX:PermSize=128m-XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50-XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -verbose:gc-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
调完之后发现还是一样的现象(这里有点不是很明白,年老代空间为1024m(1280m-256m),50%时触发CMS GC,也就是在年老代512m的时候,剩下的堆空间有512m,就算年轻代全部装进去应该也是够的),所以怀疑是年轻代太大,有大的对象,在年老代有碎片的情况下将很难分配,所以有了第二个解决办法,即减少年轻代大小,避免放入年老代时需要分配大的空间,同时调整full gc时压缩碎片的频次,减少持久代大小,以及将触发CMS GC的阀值适当增大(因为年轻代小了,这个调大点没关系,后面可以再调试这个参数),参数如下:
-server -Xms1280m -Xmx1280m -Xmn128m -Xss256k -XX:PermSize=96m -XX:MaxPermSize=96m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSCompactAtFullCollection -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
调整完后没有那个现象了,这里主要起作用的就是调小年轻代大小。在年老代到达826m(触发CMS阀值(1280-128)*0.7=806m)时出现了CMS GC,用时27ms,日志如下:
705.444: [GC 705.445: [ParNew: 130944K->0K(131008K), 0.0197680 secs]954628K->826284K(1310656K), 0.0199720 secs]
705.467:[GC [1 CMS-initial-mark: 826284K(1179648K)] 826744K(1310656K), 0.0271540 secs]
705.494:[CMS-concurrent-mark-start]
706.717:[CMS-concurrent-mark: 1.223/1.223 secs]
706.717:[CMS-concurrent-preclean-start]
706.717:[CMS-concurrent-preclean: 0.000/0.000 secs]
706.742:[CMS-concurrent-abortable-preclean-start]
706.742:[CMS-concurrent-abortable-preclean: 0.000/0.000 secs]
707.067: [GC 707.067: [ParNew: 130944K->0K(131008K), 0.0160200 secs]957228K->827348K(1310656K), 0.0162110 secs]
707.796: [GC[YG occupancy: 66671 K (131008 K)]707.796: [Rescan (parallel) ,0.0278280 secs]707.824: [weak refs processing, 0.0420770 secs] [1 CMS-remark:827348K(1179648K)] 894019K(1310656K), 0.0711970 secs]
707.877: [CMS-concurrent-sweep-start]
708.453: [GC 708.454: [ParNew: 130944K->0K(131008K), 0.0203760 secs]848439K->718796K(1310656K), 0.0205780 secs]
709.833: [GC 709.833: [ParNew: 130944K->0K(131008K), 0.0160170 secs]430484K->301411K(1310656K), 0.0161840 secs]
709.916: [CMS-concurrent-sweep: 1.974/2.040 secs]
709.916: [CMS-concurrent-reset-start]
709.951: [CMS-concurrent-reset: 0.034/0.034 secs]
711.187: [GC 711.187: [ParNew: 130944K->0K(131008K), 0.0130890 secs]413136K->283326K(1310656K), 0.0132600 secs]
观察一段时间的gc情况,gc效率也很高,单次YGCT<20ms,FGCT <40ms:
$ jstat -gcutil 31935 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 64.29 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.33 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.41 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.45 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.49 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.58 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.63 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.69 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.72 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.75 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.79 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.84 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.90 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.95 36.47 73.15 1293 19.514 6 0.211 19.725
0.00 0.00 64.99 36.47 73.15 1293 19.514 6 0.211 19.725
这时,想再测试下-XX:CMSInitiatingOccupancyFraction的值,调到80时又出现了上述现象Concurrent ModeFailure,启动后还没进行过CMS GC,在年老代914m时就出现了:
759.994: [GC 759.994: [ParNew: 130944K->0K(131008K), 0.0172910 secs]1040896K->911480K(1310656K), 0.0174730 secs]
760.879: [GC 760.879: [ParNew: 130944K->0K(131008K), 0.0300920 secs]1042424K->914190K(1310656K), 0.0302950 secs]
761.768: [GC 761.769: [ParNew: 130944K->130944K(131008K), 0.0000340secs]761.769: [CMS (concurrent mode failure)[Unloading classsun.reflect.GeneratedMethodAccessor342]edMethodAccessor348]
[Unloading class sun.reflect.GeneratedMethodAccessor411]
[Unloading class sun.reflect.GeneratedMethodAccessor407]
[Unloading class sun.reflect.GeneratedMethodAccessor541]
最后总结下,出现Concurrent ModeFailure现象时,解决办法就是要让年老代留有足够的空间,以保证新对象空间的分配。另外在JVM BUG中有提到,JDK1.5_09版本之前,JVM参数-XX:CMSInitiatingOccupancyFraction是无效的,我这里应用环境的版本是JDK1.5_08,从gc日志来看是可以生效的。
GC时还有一个常见的错误PromotionFailed,解决办法类似,也是调整年轻代和年老代的比例,还有CMSGC的时机。
温馨提示:答案为网友推荐,仅供参考