strace简介
strace是Linux环境下的一款程序调试工具,用来监察一个应用程序所使用的系统调用。
Strace是一个简单的跟踪系统调用执行的工具。在其最简单的形式中,它可以从开始到结束跟踪二进制的执行,并在进程的生命周期中输出一行具有系统调用名称,每个系统调用的参数和返回值的文本行。
常用选项
-T: strace输出显示时间
-t: 可以在每行的输出之前添加时间戳,-tt可以精确到微妙级别,-ttt也可以精确到微妙级,但是它并不是打印当前时间,而是显示自从epoch以来的所经过的秒数
-s: 指定输出的字符串的最大长度,默认为32,如果输出到文件的话会全部输出
-o: 指定将strace输出到文件
-ff:如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
-e: 指定跟踪某个行为,例如-e trace=open指定只跟踪open行为
-y: 将文件句柄用文件路径代替显示
操作:
strace -T -tt -o /tmp/strace.log CMD
strace -T -tt CMD 2>&1 |tee /tmp/strace.log
strace -T -tt -s 100 -o /tmp/strace.log CMD
strace -T -tt -s 100 -ff -o /tmp/strace.log CMD
strace -T -tt -s 100 -e trace=XXXX -o /tmp/strace.log CMD
利用strace观察"客户端client"执行SQL:
#通过sys.processlist表中pid可以知道客户端连接pid
mysql> select thd_id,conn_id,user,pid,program_name,command,current_statement from processlist where conn_id>0 and pid>0;
+----------+---------+----------------+-------+--------------+---------+-------------------------------------------------------------------+
| thd_id | conn_id | user | pid | program_name | command | current_statement |
+----------+---------+----------------+-------+--------------+---------+-------------------------------------------------------------------+
| 10474777 | 5237734 | root@localhost | 30795 | mysql | Query | select thd_id,conn_id,user,pid ... list where conn_id>0 and pid>0 |
+----------+---------+----------------+-------+--------------+---------+-------------------------------------------------------------------+
#session1
[root@VM_0_11_centos ~]# strace -T -tt -s 100 -o /data/backup/strace.log -p 30795
strace: Process 5841 attached
^Cstrace: Process 5841 detached
#session进行数据查询
MySQL [(none)]> use yoon
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MySQL [yoon]> select * from yoon.hank limit 100;
+--------+------+-------+-----------+
| id | name | name1 | concent |
+--------+------+-------+-----------+
| 262126 | a | b | 123aabbcc |
| 262127 | a | b | 123aabbcc |
| 262128 | a | b | 123aabbcc |
| 262129 | a | b | 123aabbcc |
| 262130 | a | b | 123aabbcc |
| 262131 | a | b | 123aabbcc |
| 262132 | a | b | 123aabbcc |
| 262133 | a | b | 123aabbcc |
| 262134 | a | b | 123aabbcc |
| 262135 | a | b | 123aabbcc |
| 262136 | a | b | 123aabbcc |
| 262137 | a | b | 123aabbcc |
| 262138 | a | b | 123aabbcc |
...
...
1 row in set (0.00 sec)
查看strace.log文件
16:00:30.097466 socket(AF_NETLINK, SOCK_RAW, NETLINK_AUDIT) = 4 <0.000019>
16:00:30.097521 sendmsg(4, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={len=51, type=0x464 /* NLMSG_??? */, flags=NLM_F_REQUEST, seq=0, pid=0}, iov_len=16}, {iov_base={{len=1701602675, type=0x7463 /* NLMSG_??? */, flags=NLM_F_DUMP_FILTERED|0x2a00, seq=1869768224, pid=1870209133}, "\x6f\x6e\x2e\x68\x61\x6e\x6b\x20\x6c\x69\x6d\x69\x74\x20\x31\x30\x30\x3b\x00"}, iov_len=35}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 51 <0.000013>
16:00:30.097762 close(4) = 0 <0.000007>
16:00:30.097840 times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 3869677778 <0.000005>
16:00:30.097902 sendto(3, "\"\0\0\0\3select * from yoon.hank limit 100", 38, 0, NULL, 0) = 38 <0.000033>
16:00:30.098017 recvfrom(3, "\1\0\0\1\4&\0\0\2\3def\4yoon\4hank\4hank\2id\2id\f?\0\24\0\0\0\10#B\0\0\0*\0\0\3\3def\4yoon\4hank\4hank\4name\4name\f!\0<\0\0\0\375\10@\0\0\0,\0\0\4\3de"..., 16384, 0, NULL, NULL) = 2704 <0.002894>
16:00:30.101030 times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 3869677778 <0.000006>
16:00:30.101095 write(1, "+--------+------+-------+-----------+\n", 38) = 38 <0.000019>
16:00:30.101137 write(1, "| id | name | name1 | concent |\n", 38) = 38 <0.000013>
16:00:30.101187 write(1, "+--------+------+-------+-----------+\n", 38) = 38 <0.000028>
16:00:30.101264 write(1, "| 262126 | a | b | 123aabbcc |\n", 38) = 38 <0.000010>
16:00:30.101291 write(1, "| 262127 | a | b | 123aabbcc |\n", 38) = 38 <0.000009>
16:00:30.101315 write(1, "| 262128 | a | b | 123aabbcc |\n", 38) = 38 <0.000023>
16:00:30.101354 write(1, "| 262129 | a | b | 123aabbcc |\n", 38) = 38 <0.000010>
16:00:30.101476 write(1, "| 262130 | a | b | 123aabbcc |\n", 38) = 38 <0.000067>
16:00:30.101596 write(1, "| 262131 | a | b | 123aabbcc |\n", 38) = 38 <0.000011>
16:00:30.101640 write(1, "| 262132 | a | b | 123aabbcc |\n", 38) = 38 <0.000009>
16:00:30.101672 write(1, "| 262133 | a | b | 123aabbcc |\n", 38) = 38 <0.000007>
16:00:30.101699 write(1, "| 262134 | a | b | 123aabbcc |\n", 38) = 38 <0.000037>
16:00:30.101758 write(1, "| 262135 | a | b | 123aabbcc |\n", 38) = 38 <0.000008>
16:00:30.101858 write(1, "| 262136 | a | b | 123aabbcc |\n", 38) = 38 <0.000034>
16:00:30.101956 write(1, "| 262137 | a | b | 123aabbcc |\n", 38) = 38 <0.000008>
16:00:30.102003 write(1, "| 262138 | a | b | 123aabbcc |\n", 38) = 38 <0.000007>
16:00:30.102032 write(1, "| 262139 | a | b | 123aabbcc |\n", 38) = 38 <0.000007>
16:00:30.102058 write(1, "| 262140 | a | b | 123aabbcc |\n", 38) = 38 <0.000017>
16:00:30.102096 write(1, "| 262141 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102118 write(1, "| 262142 | a | b | 123aabbcc |\n", 38) = 38 <0.000007>
16:00:30.102145 write(1, "| 262143 | a | b | 123aabbcc |\n", 38) = 38 <0.000007>
16:00:30.102171 write(1, "| 262144 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102190 write(1, "| 262145 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102208 write(1, "| 262146 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102226 write(1, "| 262147 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102244 write(1, "| 262148 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102261 write(1, "| 262149 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102279 write(1, "| 262150 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102297 write(1, "| 262151 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102314 write(1, "| 262152 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102332 write(1, "| 262153 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102350 write(1, "| 262154 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102367 write(1, "| 262155 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102385 write(1, "| 262156 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102403 write(1, "| 262157 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102421 write(1, "| 262158 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.102438 write(1, "| 262159 | a | b | 123aabbcc |\n", 38) = 38 <0.000005>
16:00:30.104371 write(1, "| 262223 | a | b | 123aabbcc |\n", 38) = 38 <0.000009>
16:00:30.104393 write(1, "| 262224 | a | b | 123aabbcc |\n", 38) = 38 <0.000008>
16:00:30.104415 write(1, "| 262225 | a | b | 123aabbcc |\n", 38) = 38 <0.000009>
16:00:30.104438 write(1, "+--------+------+-------+-----------+\n", 38) = 38 <0.000008>
.....
.....
16:00:30.104466 write(1, "100 rows in set (0.00 sec)\n", 27) = 27 <0.000009>
16:00:30.104488 write(1, "\n", 1) = 1 <0.000009>
16:00:30.104518 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=528, ...}) = 0 <0.000008>
16:00:30.104646 rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0 <0.000004>
16:00:30.104671 ioctl(0, TIOCGWINSZ, {ws_row=48, ws_col=210, ws_xpixel=1680, ws_ypixel=384}) = 0 <0.000005>
16:00:30.104693 ioctl(0, TIOCSWINSZ, {ws_row=48, ws_col=210, ws_xpixel=1680, ws_ypixel=384}) = 0 <0.000006>
16:00:30.104715 ioctl(0, TCGETS, {B9600 opost isig icanon echo ...}) = 0 <0.000006>
利用strace观察"server端"执行SQL:
#命令
strace -o /tmp/strace.log -T -tt -f -p `pidof mysqld`
#假设你想看跟IO相关的行为,可以用-e trace来指定
# mysql相关的IO操作:
# mysql: read, write, open
# innodb: pread64, pwrite64
strace -o /tmp/strace.log -T -tt -f -e trace=read,open,write,pwrite64,pread64 -p `pidof mysqld`
示例:
strace -o /data/backup/strace.log -T -tt -ff -p 16137
#然后在本地通过socket登录执行命令:
mysql> select * from yoon.hank limit 1;
+--------+------+-------+-----------+
| id | name | name1 | concent |
+--------+------+-------+-----------+
| 262126 | a | b | 123aabbcc |
+--------+------+-------+-----------+
mysql> show processlist;
+---------+-----------------+-----------+------+---------+---------+------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+---------+-----------------+-----------+------+---------+---------+------------------------+------------------+-----------+---------------+
| 1 | event_scheduler | localhost | NULL | Daemon | 8287501 | Waiting on empty queue | NULL | 0 | 0 |
| 5238127 | root | localhost | yoon | Query | 0 | starting | show processlist | 0 | 0 |
+---------+-----------------+-----------+------+---------+---------+------------------------+------------------+-----------+---------------+
2 rows in set (0.00 sec)
mysql> select * from performance_schema.threads where processlist_id=5238127\G
*************************** 1. row ***************************
THREAD_ID: 10475559
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 5238127
PROCESSLIST_USER: root
PROCESSLIST_HOST: localhost
PROCESSLIST_DB: yoon
PROCESSLIST_COMMAND: Query
PROCESSLIST_TIME: 0
PROCESSLIST_STATE: Sending data
PROCESSLIST_INFO: select * from performance_schema.threads where processlist_id=5238127
PARENT_THREAD_ID: NULL
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: Socket
THREAD_OS_ID: NULL
1 row in set (0.00 sec)
若THREAD_OS_ID: NULL,用以下命令去查找:
# find /data/backup -type f | xargs grep 'limit'
/data/backup/strace.log.2979:16:07:39.573166 recvfrom(71, "\3select @@version_comment limit "..., 33, MSG_DONTWAIT, NULL, NULL) = 33 <0.000017>
/data/backup/strace.log.2979:16:07:47.499130 recvfrom(71, "\3select * from yoon.hank limit 1", 32, MSG_DONTWAIT, NULL, NULL) = 32 <0.000056>
查看strace.log.2979:(可以清晰的看见对应SQL输出)
16:07:30.983782 epoll_wait(10, [{EPOLLIN, {u32=916046432, u64=140635325186656}}], 1024, -1) = 1 <8.589057>
16:07:39.572915 futex(0x1db9000, FUTEX_WAIT_PRIVATE, 2, NULL) = 0 <0.000078>
16:07:39.573051 futex(0x1db9000, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000021>
16:07:39.573100 recvfrom(71, "!\0\0\0", 4, MSG_DONTWAIT, NULL, NULL) = 4 <0.000012>
16:07:39.573166 recvfrom(71, "\3select @@version_comment limit "..., 33, MSG_DONTWAIT, NULL, NULL) = 33 <0.000017>
16:07:39.573352 sendto(71, "\1\0\0\1\1'\0\0\2\3def\0\0\0\21@@version_comme"..., 114, MSG_DONTWAIT, NULL, 0) = 114 <0.000531>
16:07:39.573947 lseek(61, 0, SEEK_CUR) = 681024550 <0.000018>
16:07:39.574009 write(61, "20220601 16:07:39,db-test-tools-"..., 135) = 135 <0.000040>
16:07:39.574112 epoll_ctl(10, EPOLL_CTL_MOD, 71, {EPOLLIN|EPOLLERR|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=916046432, u64=140635325186656}}) = 0 <0.000114>
16:07:39.574292 epoll_wait(10, [{EPOLLIN, {u32=916046432, u64=140635325186656}}], 1024, -1) = 1 <7.924350>
16:07:47.498880 recvfrom(71, " \0\0\0", 4, MSG_DONTWAIT, NULL, NULL) = 4 <0.000143>
16:07:47.499130 recvfrom(71, "\3select * from yoon.hank limit 1", 32, MSG_DONTWAIT, NULL, NULL) = 32 <0.000056>
16:07:47.499495 stat("./yoon/hank.frm", {st_mode=S_IFREG|0640, st_size=8654, ...}) = 0 <0.000017>
16:07:47.500128 access("./yoon/hank.TRG", F_OK) = -1 ENOENT (No such file or directory) <0.000027>
16:07:47.500432 sendto(71, "\1\0\0\1\4&\0\0\2\3def\4yoon\4hank\4hank\2id\2"..., 229, MSG_DONTWAIT, NULL, 0) = 229 <0.000179>
16:07:47.500681 lseek(61, 0, SEEK_CUR) = 681033697 <0.000012>
16:07:47.500811 epoll_ctl(10, EPOLL_CTL_MOD, 71, {EPOLLIN|EPOLLERR|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=916046432, u64=140635325186656}}) = 0 <0.000014>
16:07:47.500870 epoll_wait(10, [{EPOLLIN, {u32=916046432, u64=140635325186656}}], 1024, -1) = 1 <17.389393>
16:08:04.890405 recvfrom(71, "\21\0\0\0", 4, MSG_DONTWAIT, NULL, NULL) = 4 <0.000024>
16:08:04.890548 recvfrom(71, "\3show processlsit", 17, MSG_DONTWAIT, NULL, NULL) = 17 <0.000028>
16:08:04.890711 sendto(71, "\246\0\0\1\377(\4#42000You have an error i"..., 170, MSG_DONTWAIT, NULL, 0) = 170 <0.000191>
16:08:04.891046 lseek(61, 0, SEEK_CUR) = 681058614 <0.000020>
16:08:04.891230 epoll_ctl(10, EPOLL_CTL_MOD, 71, {EPOLLIN|EPOLLERR|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=916046432, u64=140635325186656}}) = 0 <0.000013>
16:08:04.891270 epoll_wait(10, [{EPOLLIN, {u32=916046432, u64=140635325186656}}], 1024, -1) = 1 <2.361343>
16:08:07.252746 recvfrom(71, "\21\0\0\0", 4, MSG_DONTWAIT, NULL, NULL) = 4 <0.000045>
16:08:07.252929 recvfrom(71, "\3show processlist", 17, MSG_DONTWAIT, NULL, NULL) = 17 <0.000017>
16:08:07.253113 sendto(71, "\1\0\0\1\n\30\0\0\2\3def\0\0\0\2Id\0\f?\0\25\0\0\0\10\201\0\0\0"..., 472, MSG_DONTWAIT, NULL, 0) = 472 <0.000173>
16:08:07.253436 lseek(61, 0, SEEK_CUR) = 681060989 <0.000034>
16:08:07.253691 epoll_ctl(10, EPOLL_CTL_MOD, 71, {EPOLLIN|EPOLLERR|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=916046432, u64=140635325186656}}) = 0 <0.000040>
16:08:07.253807 epoll_wait(10, [{EPOLLIN, {u32=916046432, u64=140635325186656}}], 1024, -1) = 1 <15.977585>
16:08:23.231545 recvfrom(71, "F\0\0\0", 4, MSG_DONTWAIT, NULL, NULL) = 4 <0.000049>
16:08:23.231729 recvfrom(71, "\3select * from performance_schem"..., 70, MSG_DONTWAIT, NULL, NULL) = 70 <0.000030>
16:08:23.232048 access("./performance_schema/threads.TRG", F_OK) = -1 ENOENT (No such file or directory) <0.000021>
16:08:23.232444 sendto(71, "\1\0\0\1\21H\0\0\2\3def\22performance_schema"..., 1607, MSG_DONTWAIT, NULL, 0) = 1607 <0.000204>
16:08:23.232730 lseek(61, 0, SEEK_CUR) = 681085891 <0.000038>
16:08:23.232948 epoll_ctl(10, EPOLL_CTL_MOD, 71, {EPOLLIN|EPOLLERR|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=916046432, u64=140635325186656}}) = 0 <0.000027>
16:08:23.233051 epoll_wait(10, <detached ...>
strace可以用来跟踪某个线程的调用情况,例如可以适用strace跟踪客户端SQL执行情况,如果开发说有很多慢SQL,而MySQL却没有任何记录,那么就可以用strace来跟踪。
pstack简介
pstack用来跟踪进程栈,这个命令在排查进程问题时非常有用,比如我们发现一个服务一直处于work状态(如假死状态,好似死循环),使用这个命令就能轻松定位问题所在;
可以在一段时间内,多执行几次pstack,若发现代码栈总是停在同一个位置,那个位置就需要重点关注,很可能就是出问题的地方。
操作:
pstack使用起来非常简单,直接pstack -pid即可
netstat -lntp | grep 3306
tcp 0 0 0.0.0.0:13306 0.0.0.0:* LISTEN 16137/mysqld
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 16137/mysqld
ps -Lf 16137
UID PID PPID LWP C NLWP STIME TTY STAT TIME CMD
mysql 16137 14218 16137 0 151 Feb25 ? Sl 19:54 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/3306/etc/my_3306.conf --basedir=/usr/local/mysql --datadir=/data/mysql/3306/data
mysql 16137 14218 16139 0 151 Feb25 ? Sl 50:36 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/3306/etc/my_3306.conf --basedir=/usr/local/mysql --datadir=/data/mysql/3306/data
mysql 16137 14218 16140 0 151 Feb25 ? Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/3306/etc/my_3306.conf --basedir=/usr/local/mysql --datadir=/data/mysql/3306/data
mysql 16137 14218 16146 0 151 Feb25 ? Sl 2:59 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/3306/etc/my_3306.conf --basedir=/usr/local/mysql --datadir=/data/mysql/3306/data
mysql 16137 14218 16147 0 151 Feb25 ? Sl 3:24 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/3306/etc/my_3306.conf --basedir=/usr/local/mysql --datadir=/data/mysql/3306/data
mysql 16137 14218 16148 0 151 Feb25 ? Sl 2:59 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/3306/etc/my_3306.conf --basedir=/usr/local/mysql --datadir=/data/mysql/3306/data
mysql的pid为:"16137" ,fork了几十个进程,有的是用户连接,有的是mysql自己的线程,例如:
read_thread、write_thread等,LWP对应的是performance_schema.threads表的"THREAD_OS_ID"
全部抓取:
假如发生了mysqld hang住的情况,可以直接pstack mysqldpid来抓取所有的线程堆栈,这也是通常使用的方式,例如:
pstack 16137 >> /tmp/pstack.log
单个抓取:
假设你运气好,你现在可以通过performance_schema.threads可以查到对应的线程的thread_os_id,你可以指定单个线程的lwp进行抓取:
MySQL [performance_schema]> show processlist;
+----+-----------------+-----------+--------------------+---------+------+------------------------+---------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------+--------------------+---------+------+------------------------+---------------------------+
| 4 | event_scheduler | localhost | NULL | Daemon | 4319 | Waiting on empty queue | NULL |
| 23 | root | localhost | performance_schema | Query | 0 | starting | show processlist |
| 24 | root | localhost | yoon | Query | 22 | User sleep | select sleep(600) from t1 |
+----+-----------------+-----------+--------------------+---------+------+------------------------+---------------------------+
3 rows in set (0.00 sec)
MySQL [performance_schema]> select THREAD_OS_ID from threads where processlist_id=24\G
*************************** 1. row ***************************
THREAD_OS_ID: 15020
1 row in set (0.00 sec)
pstack 15020 > /data/backup/single_pstack.log
pstack是平时用的比较多的工具,尤其是诊断MySQL hang住的情况,例如主从延迟特别高等。
gdb
GDB 是 linux 环境下的一般功能强大的调试器,用来调试 C 或 C++ 写的程序。
常用命令:
info threads:查看全部线程
thread n: 指定某个线程
b: 在某处打断点
c: 继续往下走
s: 执行一行代码,如果代码函数调用,则进入函数
n: 执行一行代码,函数调用不进入
p: 打印某个变量值
list: 打印代码的文本信息
bt: 查看某个线程的栈帧
info b: 查看当前所有断点信息
进入gdb界面:
gdb /usr/local/mysql/bin/mysqld
进入调试:
(gdb) run --defaults-file=/data/mysql/3306/etc/my_3306.conf --user=mysql --gdb
(gdb) 操作:
(gdb) thread 41 # 线程41
(gdb) info thread
(gdb) b do_command
(gdb) n
(gdb) bt
通过gdb来修改MySQL的连接数,例如
gdb -p $(pidof mysqld) -ex "set max_connections=1500" -batch