http://blog.chinaunix.net/uid-25909722-id-3011815.html
在用pthread函数库实现一个线程池的过程中,遇到了几个小小的问题:
(2)pthread_cancel使用不当引起的SIGSEGV/Segmentation fault
具体的情况为:
在线程池中有两类线程:work_thread和manager_thread。前者是工作线程,后者是管理线程。其中管理线程只有一个。
管理线程的实现中调用了一个函数:pool_delete_thread(),来定期清理线程池中的空闲线程,也就是对过量的空闲线程调用pthread_cancel()函数。一般在线程池负载有大变小的时候,进行清理工作。
同时,线程池中有一个关闭线程池的函数close_pool()的函数。该函数一般只在程序结束时调用。
而close_pool()的实现为:将所有调用了pthread_cond_wait的处于等待的空闲线程唤醒,然后调用pthread_cancel()将它们杀掉。
这样问题就来了:
每当调用了close_pool()函数之后,如果管理线程再调用了pool_delete_thread()函数,就会发生SIGSEGV错误:
- digdeep@ubuntu:~/pthread/threadpool$ gdb -c core ./threadPoolTest
-
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
-
Copyright (C) 2010 Free Software Foundation, Inc.
-
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
-
This is free software: you are free to change and redistribute it.
-
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
-
and "show warranty" for details.
-
This GDB was configured as "i686-linux-gnu".
-
For bug reporting instructions, please see:
-
<http://www.gnu.org/software/gdb/bugs/>...
-
Reading symbols from /home/digdeep/pthread/threadpool/threadPoolTest...(no debugging symbols found)...done.
-
[New Thread 6499]
-
[New Thread 6500]
-
[New Thread 6492]
-
[New Thread 6501]
-
warning: Can't read pathname for load map: Input/output error.
-
Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/lib/i386-linux-gnu/libpthread-2.13.so...done.
-
done.
-
Loaded symbols for /lib/i386-linux-gnu/libpthread.so.0
-
Reading symbols from /lib/i386-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug/lib/i386-linux-gnu/libc-2.13.so...done.
-
done.
-
Loaded symbols for /lib/i386-linux-gnu/libc.so.6
-
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
-
Loaded symbols for /lib/ld-linux.so.2
-
Core was generated by `./threadPoolTest'.
- Program terminated with signal 11, Segmentation fault.
- #0 0x00d276f0 in pthread_cancel (th=3077909360) at pthread_cancel.c:35
-
35 pthread_cancel.c: No such file or directory.
-
in pthread_cancel.c
-
(gdb) bt
-
#0 0x00d276f0 in pthread_cancel (th=3077909360) at pthread_cancel.c:35
-
#1 0x08048ffc in pool_delete_thread ()
-
#2 0x08049223 in manage_thread ()
-
#3 0x00d21e99 in start_thread (arg=0xb474cb70) at pthread_create.c:304
- #4 0x001e073e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
最后的解决办法是:
在调用了close_pool()函数之后,设置一个stop_flag,然后在管理线程中的来判断stop_flag的值是否被赋值,如果被赋值,则不要调用pool_delete_thread。实际上,当主线程调用了close_pool()函数之后,管理线程也没有必要在调用了pool_delete_thread()函数。
在网上搜索找到的相关资料:
Linux的Native
POSIXThread Library的实现,有一个race
condition,表现出来的现象是,对一个正要结束的线程调用pthread_cancel()的时候,会随机的收到SIGSEGV。这个问题在UNIX各个版本,如Solaris,HP-UX,AIX上面都没有。换一个思路,通过pthread_kill来测试线程是否存在,然后再进行相应的动作,这样应该可以避免出现向正在结束的线程调用pthread_cancel()。但是pthread_kill()也出现了SIGSEGV,真是ft,也是一样的原因。
解决方法:
1、使用pthread_mutex和pthread_cond系列函数进行同步,避免Linux NPTL中的这个race condition。
2、增加一个状态机制,用一个全局的表来存储每个线程的状态,当线程结束的时候,将表中相应的状态从RUNNING置为DEAD;主线程不断的check那个状态表就可以了,有些dirty;-)
参考:
http://blog.chinaunix.net/u/13667/showart_222280.html
http://linux.derkeiler.com/Newsgroups/comp.os.linux.development.apps/2004-04/0632.html
http://www.9php.com/FAQ/cxsjl/c/2008/01/6564294109336.html