Dec 20 21:23:45 vgfs001 kernel: tiotest_AMD_x86 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0 Dec 20 21:23:45 vgfs001 kernel: tiotest_AMD_x86 cpuset=/ mems_allowed=0 Dec 20 21:23:45 vgfs001 kernel: Pid: 1937, comm: tiotest_AMD_x86 Not tainted 2.6.32-431.29.2.lustre.el6.x86_64 #1 Dec 20 21:23:45 vgfs001 kernel: Call Trace: Dec 20 21:23:45 vgfs001 kernel: [<ffffffff810d07b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81122b80>] ? dump_header+0x90/0x1b0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff8122894c>] ? security_real_capable_noaudit+0x3c/0x70 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81123002>] ? oom_kill_process+0x82/0x2a0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81122f41>] ? select_bad_process+0xe1/0x120 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81123440>] ? out_of_memory+0x220/0x3c0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff8112fd5f>] ? __alloc_pages_nodemask+0x89f/0x8d0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81167cea>] ? alloc_pages_current+0xaa/0x110 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff8111ff77>] ? __page_cache_alloc+0x87/0x90 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81120c8e>] ? grab_cache_page_write_begin+0x8e/0xc0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa0a8f228>] ? ll_write_begin+0x58/0x1a0 [lustre] Dec 20 21:23:45 vgfs001 kernel: [<ffffffff811204f3>] ? generic_file_buffered_write+0x123/0x2e0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81078fd7>] ? current_fs_time+0x27/0x30 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81121f50>] ? __generic_file_aio_write+0x260/0x490 Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa05211a5>] ? cl_env_info+0x15/0x20 [obdclass] Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81122208>] ? generic_file_aio_write+0x88/0x100 Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa0aa3907>] ? vvp_io_write_start+0x137/0x2a0 [lustre] Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa05301da>] ? cl_io_start+0x6a/0x140 [obdclass] Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa05348e4>] ? cl_io_loop+0xb4/0x1b0 [obdclass] Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa0a46306>] ? ll_file_io_generic+0x2a6/0x610 [lustre] Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa0a47192>] ? ll_file_aio_write+0x142/0x2c0 [lustre] Dec 20 21:23:45 vgfs001 kernel: [<ffffffffa0a4747c>] ? ll_file_write+0x16c/0x2a0 [lustre] Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81189298>] ? vfs_write+0xb8/0x1a0 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff81189c61>] ? sys_write+0x51/0x90 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff810e204e>] ? __audit_syscall_exit+0x25e/0x290 Dec 20 21:23:45 vgfs001 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Dec 20 21:23:45 vgfs001 kernel: Mem-Info: Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA per-cpu: Dec 20 21:23:45 vgfs001 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA32 per-cpu: Dec 20 21:23:45 vgfs001 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 7: hi: 186, btch: 31 usd: 11 Dec 20 21:23:45 vgfs001 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 9: hi: 186, btch: 31 usd: 46 Dec 20 21:23:45 vgfs001 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: Node 0 Normal per-cpu: Dec 20 21:23:45 vgfs001 kernel: CPU 0: hi: 186, btch: 31 usd: 2 Dec 20 21:23:45 vgfs001 kernel: CPU 1: hi: 186, btch: 31 usd: 7 Dec 20 21:23:45 vgfs001 kernel: CPU 2: hi: 186, btch: 31 usd: 27 Dec 20 21:23:45 vgfs001 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 5: hi: 186, btch: 31 usd: 39 Dec 20 21:23:45 vgfs001 kernel: CPU 6: hi: 186, btch: 31 usd: 33 Dec 20 21:23:45 vgfs001 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Dec 20 21:23:45 vgfs001 kernel: CPU 8: hi: 186, btch: 31 usd: 1 Dec 20 21:23:45 vgfs001 kernel: CPU 9: hi: 186, btch: 31 usd: 35 Dec 20 21:23:45 vgfs001 kernel: CPU 10: hi: 186, btch: 31 usd: 29 Dec 20 21:23:45 vgfs001 kernel: CPU 11: hi: 186, btch: 31 usd: 2 Dec 20 21:23:45 vgfs001 kernel: active_anon:1198006 inactive_anon:171400 isolated_anon:96 Dec 20 21:23:45 vgfs001 kernel: active_file:548228 inactive_file:548497 isolated_file:0 Dec 20 21:23:45 vgfs001 kernel: unevictable:0 dirty:899 writeback:2342 unstable:0 Dec 20 21:23:45 vgfs001 kernel: free:29297 slab_reclaimable:10639 slab_unreclaimable:376601 Dec 20 21:23:45 vgfs001 kernel: mapped:1032 shmem:0 pagetables:5613 bounce:0 Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA free:15708kB min:80kB low:100kB high:120kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15320kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Dec 20 21:23:45 vgfs001 kernel: lowmem_reserve[]: 0 3512 12097 12097 Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA32 free:53892kB min:19596kB low:24492kB high:29392kB active_anon:4kB inactive_anon:44kB active_file:1249260kB inactive_file:1249288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596496kB mlocked:0kB dirty:3436kB writeback:4180kB mapped:0kB shmem:0kB slab_reclaimable:24608kB slab_unreclaimable:689432kB kernel_stack:8kB pagetables:196kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:4212142 all_unreclaimable? no Dec 20 21:23:45 vgfs001 kernel: lowmem_reserve[]: 0 0 8585 8585 Dec 20 21:23:45 vgfs001 kernel: Node 0 Normal free:47588kB min:47900kB low:59872kB high:71848kB active_anon:4792020kB inactive_anon:685556kB active_file:943652kB inactive_file:944700kB unevictable:0kB isolated(anon):384kB isolated(file):0kB present:8791040kB mlocked:0kB dirty:160kB writeback:5188kB mapped:4128kB shmem:0kB slab_reclaimable:17948kB slab_unreclaimable:816972kB kernel_stack:5040kB pagetables:22256kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2346101 all_unreclaimable? no Dec 20 21:23:45 vgfs001 kernel: lowmem_reserve[]: 0 0 0 0 Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA: 3*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15708kB Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA32: 183*4kB 19*8kB 19*16kB 19*32kB 24*64kB 17*128kB 7*256kB 5*512kB 27*1024kB 8*2048kB 0*4096kB = 53892kB Dec 20 21:23:45 vgfs001 kernel: Node 0 Normal: 109*4kB 185*8kB 121*16kB 43*32kB 8*64kB 117*128kB 43*256kB 8*512kB 1*1024kB 1*2048kB 2*4096kB = 47084kB Dec 20 21:23:45 vgfs001 kernel: 1269461 total pagecache pages Dec 20 21:23:45 vgfs001 kernel: 172616 pages in swap cache Dec 20 21:23:45 vgfs001 kernel: Swap cache stats: add 1017139, delete 844523, find 444300/457367 Dec 20 21:23:45 vgfs001 kernel: Free swap = 3377416kB Dec 20 21:23:45 vgfs001 kernel: Total swap = 4194300kB Dec 20 21:23:45 vgfs001 kernel: 3145727 pages RAM Dec 20 21:23:45 vgfs001 kernel: 96633 pages reserved Dec 20 21:23:45 vgfs001 kernel: 9844603 pages shared Dec 20 21:23:45 vgfs001 kernel: 528776 pages non-shared Dec 20 21:23:45 vgfs001 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name Dec 20 21:23:45 vgfs001 kernel: [ 591] 0 591 2817 4 9 -17 -1000 udevd Dec 20 21:23:45 vgfs001 kernel: [ 2028] 0 2028 6899 30 0 -17 -1000 auditd Dec 20 21:23:45 vgfs001 kernel: [ 2058] 0 2058 63875 54 2 0 0 rsyslogd Dec 20 21:23:45 vgfs001 kernel: [ 2088] 0 2088 2740 38 7 0 0 irqbalance Dec 20 21:23:45 vgfs001 kernel: [ 2110] 32 2110 4744 22 1 0 0 rpcbind Dec 20 21:23:45 vgfs001 kernel: [ 2229] 81 2229 8028 9 3 0 0 dbus-daemon Dec 20 21:23:45 vgfs001 kernel: [ 2251] 29 2251 5837 10 2 0 0 rpc.statd Dec 20 21:23:45 vgfs001 kernel: [ 2281] 0 2281 47351 11 7 0 0 cupsd Dec 20 21:23:45 vgfs001 kernel: [ 2317] 0 2317 1020 8 0 0 0 acpid Dec 20 21:23:45 vgfs001 kernel: [ 2327] 68 2327 9771 123 9 0 0 hald Dec 20 21:23:45 vgfs001 kernel: [ 2328] 0 2328 5100 9 10 0 0 hald-runner Dec 20 21:23:45 vgfs001 kernel: [ 2370] 0 2370 5630 8 7 0 0 hald-addon-inpu Dec 20 21:23:45 vgfs001 kernel: [ 2376] 68 2376 4502 9 0 0 0 hald-addon-acpi Dec 20 21:23:45 vgfs001 kernel: [ 2396] 0 2396 96535 42 11 0 0 automount Dec 20 21:23:45 vgfs001 kernel: [ 2425] 0 2425 16671 8 4 -17 -1000 sshd Dec 20 21:23:45 vgfs001 kernel: [ 2534] 0 2534 20331 28 4 0 0 master Dec 20 21:23:45 vgfs001 kernel: [ 2549] 89 2549 20397 29 10 0 0 qmgr Dec 20 21:23:45 vgfs001 kernel: [ 2562] 0 2562 28661 7 1 0 0 abrtd Dec 20 21:23:45 vgfs001 kernel: [ 2577] 0 2577 27116 77 6 0 0 ksmtuned Dec 20 21:23:45 vgfs001 kernel: [ 2589] 0 2589 29332 21 6 0 0 crond Dec 20 21:23:45 vgfs001 kernel: [ 2638] 0 2638 5394 5 4 0 0 atd Dec 20 21:23:45 vgfs001 kernel: [ 2649] 0 2649 104692 1712 3 0 0 python Dec 20 21:23:45 vgfs001 kernel: [ 2666] 0 2666 257137 979 3 0 0 libvirtd Dec 20 21:23:45 vgfs001 kernel: [ 2695] 0 2695 27085 6 5 0 0 rhsmcertd Dec 20 21:23:45 vgfs001 kernel: [ 2796] 99 2796 3223 9 7 0 0 dnsmasq Dec 20 21:23:45 vgfs001 kernel: [ 2802] 0 2802 16175 7 1 0 0 certmonger Dec 20 21:23:45 vgfs001 kernel: [ 2824] 0 2824 33502 11 1 0 0 gdm-binary Dec 20 21:23:45 vgfs001 kernel: [ 2840] 0 2840 1016 6 3 0 0 mingetty Dec 20 21:23:45 vgfs001 kernel: [ 2842] 0 2842 1016 6 7 0 0 mingetty Dec 20 21:23:45 vgfs001 kernel: [ 2844] 0 2844 1016 6 4 0 0 mingetty Dec 20 21:23:45 vgfs001 kernel: [ 2846] 0 2846 1016 6 4 0 0 mingetty Dec 20 21:23:45 vgfs001 kernel: [ 2850] 0 2850 1016 6 4 0 0 mingetty Dec 20 21:23:45 vgfs001 kernel: [ 2862] 0 2862 3212 4 9 -17 -1000 udevd Dec 20 21:23:45 vgfs001 kernel: [ 2863] 0 2863 3212 4 9 -17 -1000 udevd Dec 20 21:23:45 vgfs001 kernel: [ 2911] 0 2911 41157 11 6 0 0 gdm-simple-slav Dec 20 21:23:45 vgfs001 kernel: [ 2929] 0 2929 35211 911 2 0 0 Xorg Dec 20 21:23:45 vgfs001 kernel: [ 2970] 0 2970 1029163 10 1 0 0 console-kit-dae Dec 20 21:23:45 vgfs001 kernel: [ 3040] 42 3040 5010 5 9 0 0 dbus-launch Dec 20 21:23:45 vgfs001 kernel: [ 3041] 42 3041 7951 10 0 0 0 dbus-daemon Dec 20 21:23:45 vgfs001 kernel: [ 3043] 42 3043 67404 11 8 0 0 gnome-session Dec 20 21:23:45 vgfs001 kernel: [ 3046] 0 3046 12497 11 3 0 0 devkit-power-da Dec 20 21:23:45 vgfs001 kernel: [ 3052] 42 3052 33326 64 0 0 0 gconfd-2 Dec 20 21:23:45 vgfs001 kernel: [ 3069] 42 3069 91526 3293 8 0 0 gnome-settings- Dec 20 21:23:45 vgfs001 kernel: [ 3070] 42 3070 30178 56 0 0 0 at-spi-registry Dec 20 21:23:45 vgfs001 kernel: [ 3072] 42 3072 89614 11 6 0 0 bonobo-activati Dec 20 21:23:45 vgfs001 kernel: [ 3080] 42 3080 33821 11 8 0 0 gvfsd Dec 20 21:23:45 vgfs001 kernel: [ 3081] 42 3081 72400 92 0 0 0 metacity Dec 20 21:23:45 vgfs001 kernel: [ 3084] 42 3084 68544 64 2 0 0 gnome-power-man Dec 20 21:23:45 vgfs001 kernel: [ 3085] 42 3085 62195 10 6 0 0 polkit-gnome-au Dec 20 21:23:45 vgfs001 kernel: [ 3087] 42 3087 96302 288 0 0 0 gdm-simple-gree Dec 20 21:23:45 vgfs001 kernel: [ 3094] 0 3094 13186 10 9 0 0 polkitd Dec 20 21:23:45 vgfs001 kernel: [ 3107] 42 3107 86550 9 5 0 0 pulseaudio Dec 20 21:23:45 vgfs001 kernel: [ 3109] 499 3109 42114 25 10 0 0 rtkit-daemon Dec 20 21:23:45 vgfs001 kernel: [ 3114] 0 3114 35562 11 6 0 0 gdm-session-wor Dec 20 21:23:45 vgfs001 kernel: [27425] 0 27425 25109 40 3 0 0 sshd Dec 20 21:23:45 vgfs001 kernel: [27430] 0 27430 27123 80 6 0 0 bash Dec 20 21:23:45 vgfs001 kernel: [ 1567] 0 1567 1711609 1190642 1 0 0 lwfsd Dec 20 21:23:45 vgfs001 kernel: [ 1691] 89 1691 20351 20 5 0 0 pickup Dec 20 21:23:45 vgfs001 kernel: [ 1926] 0 1926 25227 25 8 0 0 sleep Dec 20 21:23:45 vgfs001 kernel: [ 1927] 0 1927 46749 4269 7 0 0 tiotest_AMD_x86 Dec 20 21:23:45 vgfs001 kernel: Out of memory: Kill process 1567 (lwfsd) score 306 or sacrifice child Dec 20 21:23:45 vgfs001 kernel: Killed process 1567, UID 0, (lwfsd) total-vm:6846436kB, anon-rss:4742528kB, file-rss:20040kB
这里是从Lustre的入口导致的oom,但实际上,其他入口例如KVM管理程序也可能引起oom,即任何分配内存的可能点都可能引起oom。
从分析过程来看,确实是Lustre的Cache占用了大量内存,导致内存分配不足。
三个措施。
1、增大内存
从12GB增大到16GB。
virsh setmaxmem vgfsxxx 16GB --config
运行启动后
virsh setmem vgfsxxx 16GB
这个没有用,跑了几次测试后,仍然掉服务。
2、调整lwfsd的服务优先级
设置lwfsd的服务优先级为“-17”
PID=`ps | grep lwfs | grep -v grep | awk '{print $1}'`
echo -17 > /proc/$PID/oom_adj
echo -17 > /proc/$PID/task/$PID/oom_adj
这个好像有用。
3、修改内存分配策略
并且echo "2" >/proc/sys/vm/overcommit_memory,使得分配内存时,必须存在足够的空间用于映射。
这个好像也有一定的用处。再跑跑试试。