为了部署实验用的openstack环境,其中有NTP的安装环节。在这个过程中,真是折腾了一下午。。。遇到了一些问题!
由于公司内部网络管理的原因,很多网站没有办法访问,比如公开的时间服务站点,我找了几个都没有办法访问,于是乎,我就选择了选择将openstack的controller节点node0作为time server,其他的节点作为client。
我的openstack的基础服务器上安装的linux系统是centos7.首先按照openstack官网的说法,安装了chrony 2.1.1的版本。配置也很简单。openstack官网的说法,时间服务器节点配置在controller机器上,其他的节点作为client节点。开始执行的时候,总是无法同步上时间服务器。
1 chronyc> sourcestats -v #time server上操作的信息 2 210 Number of sources = 1 3 .- Number of sample points in measurement set. 4 / .- Number of residual runs with same sign. 5 | / .- Length of measurement set (time). 6 | | / .- Est. clock freq error (ppm). 7 | | | / .- Est. error in freq. 8 | | | | / .- Est. offset. 9 | | | | | | On the -. 10 | | | | | | samples. 11 | | | | | | | 12 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev 13 ============================================================================== 14 node0 0 0 0 +0.000 2000.000 +0ns 4000ms 15 chronyc> sourcestats -v 16 210 Number of sources = 1 17 .- Number of sample points in measurement set. 18 / .- Number of residual runs with same sign. 19 | / .- Length of measurement set (time). 20 | | / .- Est. clock freq error (ppm). 21 | | | / .- Est. error in freq. 22 | | | | / .- Est. offset. 23 | | | | | | On the -. 24 | | | | | | samples. 25 | | | | | | | 26 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev 27 ============================================================================== 28 node0 0 0 0 +0.000 2000.000 +0ns 4000ms 29 chronyc>
在client上执行chronyc的操作,注意看下面红色的部分,是一个?号,说明现在没有和时间服务器同步上,不知是什么地方配错了。
1 [root@node1 tools]# chronyc -a 2 chrony version 2.1.1 3 Copyright (C) 1997-2003, 2007, 2009-2015 Richard P. Curnow and others 4 chrony comes with ABSOLUTELY NO WARRANTY. This is free software, and 5 you are welcome to redistribute it under certain conditions. See the 6 GNU General Public License version 2 for details. 7 8 200 OK 9 chronyc> sourcestats -v 10 210 Number of sources = 1 11 .- Number of sample points in measurement set. 12 / .- Number of residual runs with same sign. 13 | / .- Length of measurement set (time). 14 | | / .- Est. clock freq error (ppm). 15 | | | / .- Est. error in freq. 16 | | | | / .- Est. offset. 17 | | | | | | On the -. 18 | | | | | | samples. 19 | | | | | | | 20 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev 21 ============================================================================== 22 node0 0 0 0 +0.000 2000.000 +0ns 4000ms 23 chronyc> sources 24 210 Number of sources = 1 25 MS Name/IP address Stratum Poll Reach LastRx Last sample 26 =============================================================================== 27 ^? node0 0 7 0 10y +0ns[ +0ns] +/- 0ns
后来,采用原始的NTP的配置。不用chrony了。但是,NTP的配置,也不是很顺利,遇到了下面几个主要问题。这些问题,其实是一个一个的被暴露出来的。因为这里几个机器是很久没有用的poweredge r610服务器,有的开机了,有的是下电状态。各自的硬件时钟hwclock值差很多,几个小时。
首先看看我time master的/etc/ntp.conf原始配置:(其他的配置信息,都是保留了默认的信息),这里,192.168.1.100是node0的ip。其他几个节点的IP都和node在一个网段。
1 # Hosts on local network are less restricted. 2 restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap 3 4 # Use public servers from the pool.ntp.org project. 5 # Please consider joining the pool (http://www.pool.ntp.org/join.html). 6 #server 0.centos.pool.ntp.org iburst 7 #server 1.centos.pool.ntp.org iburst 8 #server 2.centos.pool.ntp.org iburst 9 #server 3.centos.pool.ntp.org iburst 10 server 192.168.1.100 iburst
而client的ntp.conf配置信息,和master的基本一致,就是少了restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap。
最开始遇到的问题,在这里已经没有办法重现了,就是tcpdump的时候,提式bad udp cksum的错误。google了下,发现只需要在server和client端都执行下面的两指令就好,主要是消除tcp offloading的问题。
1 ethtool --offload em1 rx off tx off 2 ethtool -K em1 gso off
调试过程中,node0上通过tcpdump查看IP包的走向,node3(client)上执行ntpdate -d node0命令。分别得到下面的信息。首先看master上的tcpdump的内容:
1 [root@node0 tools]# tcpdump -vvv -i em1 host 192.168.1.130 -n 2 tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes 3 08:44:03.717606 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 4 192.168.1.130.42275 > 192.168.1.100.ntp: [udp sum ok] NTPv4, length 48 5 Client, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 3 (8s), precision -6 6 Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec) 7 Reference Timestamp: 0.000000000 8 Originator Timestamp: 0.000000000 9 Receive Timestamp: 0.000000000 10 Transmit Timestamp: 3663362643.719085752 (2016/02/02 08:44:03) 11 Originator - Receive Timestamp: 0.000000000 12 Originator - Transmit Timestamp: 3663362643.719085752 (2016/02/02 08:44:03) 13 08:44:03.717667 IP (tos 0xc0, ttl 64, id 10861, offset 0, flags [none], proto ICMP (1), length 104) 14 192.168.1.100 > 192.168.1.130: ICMP host 192.168.1.100 unreachable - admin prohibited, length 84 #这行信息有问题。找不到主机。 15 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 16 192.168.1.130.42275 > 192.168.1.100.ntp: [udp sum ok] NTPv4, length 48 17 Client, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 3 (8s), precision -6 18 Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec) 19 Reference Timestamp: 0.000000000 20 Originator Timestamp: 0.000000000 21 Receive Timestamp: 0.000000000 22 Transmit Timestamp: 3663362643.719085752 (2016/02/02 08:44:03) 23 Originator - Receive Timestamp: 0.000000000 24 。。。。。。。。。。。。。。。。。。。。。。
而在node3上,执行ntpdate -d node0时,得到下面的错误日志(红色部分,说明此时,两个机器之间没有数据交互,ntp协议不通):
1 [root@node3 tools]# ntpdate -d node0 2 2 Feb 08:44:03 ntpdate[1306]: ntpdate 4.2.6p5@1.2349-o Mon Jan 25 14:27:35 UTC 2016 (1) 3 Looking for host node0 and service ntp 4 host found : node2 5 transmit(192.168.1.100) 6 transmit(192.168.1.100) 7 transmit(192.168.1.100) 8 transmit(192.168.1.100) 9 transmit(192.168.1.100) 10 192.168.1.100: Server dropped: no data 11 server 192.168.1.100, port 123 12 stratum 0, precision 0, leap 00, trust 000 13 refid [192.168.1.100], delay 0.00000, dispersion 64.00000 14 transmitted 4, in filter 4 15 reference time: 00000000.00000000 Mon, Jan 1 1900 8:05:43.000 16 originate timestamp: 00000000.00000000 Mon, Jan 1 1900 8:05:43.000 17 transmit timestamp: da5a7a59.b8115280 Tue, Feb 2 2016 8:44:09.719 18 filter delay: 0.00000 0.00000 0.00000 0.00000 19 0.00000 0.00000 0.00000 0.00000 20 filter offset: 0.000000 0.000000 0.000000 0.000000 21 0.000000 0.000000 0.000000 0.000000 22 delay 0.00000, dispersion 64.00000 23 offset 0.000000 24 25 2 Feb 08:44:11 ntpdate[1306]: no server suitable for synchronization found
在调试过程中,iptables都关闭了,还是找不到原因,网上google了很多信息,最终找到一点线索。centos7中有个firewalld的防火墙程序,将这个也关闭了。发现没有上面的错误,但是显示新的错误类型了:
1 [root@node1 tools]# ntpdate -d 192.168.1.100 2 2 Feb 09:11:25 ntpdate[2120]: ntpdate 4.2.6p5@1.2349-o Mon Jan 25 14:27:35 UTC 2016 (1) 3 Looking for host 192.168.1.100 and service ntp 4 host found : node0 5 transmit(192.168.1.100) 6 receive(192.168.1.100) 7 transmit(192.168.1.100) 8 receive(192.168.1.100) 9 transmit(192.168.1.100) 10 receive(192.168.1.100) 11 transmit(192.168.1.100) 12 receive(192.168.1.100) 13 192.168.1.100: Server dropped: strata too high 14 server 192.168.1.100, port 123 15 stratum 16, precision -23, leap 11, trust 000 16 refid [192.168.1.100], delay 0.02582, dispersion 0.00000 17 transmitted 4, in filter 4 18 reference time: 00000000.00000000 Mon, Jan 1 1900 8:05:43.000 19 originate timestamp: da5a80c3.f225c875 Tue, Feb 2 2016 9:11:31.945 20 transmit timestamp: da5a80c3.fef87c8f Tue, Feb 2 2016 9:11:31.995 21 filter delay: 0.02585 0.02583 0.02582 0.02583 22 0.00000 0.00000 0.00000 0.00000 23 filter offset: -0.05033 -0.05031 -0.05030 -0.05030 24 0.000000 0.000000 0.000000 0.000000 25 delay 0.02582, dispersion 0.00000 26 offset -0.050306 27 28 2 Feb 09:11:31 ntpdate[2120]: no server suitable for synchronization found
继续找解决方案,最后发现,若将自己作为standalone的time server,那么server节点的server配置就不能用自己的NIC上配置的IP,而用127.127.1.1的回环IP,如下配置:
即将server 192.168.1.100 iburst改为server 127.127.1.1 iburst后,
重启ntpd,测试就通过了,node1时间就同步上node0了:
1 [root@node1 tools]# ntpdate -d 192.168.1.100 2 2 Feb 09:19:15 ntpdate[2190]: ntpdate 4.2.6p5@1.2349-o Mon Jan 25 14:27:35 UTC 2016 (1) 3 Looking for host 192.168.1.100 and service ntp 4 host found : node0 5 transmit(192.168.1.100) 6 receive(192.168.1.100) 7 transmit(192.168.1.100) 8 receive(192.168.1.100) 9 transmit(192.168.1.100) 10 receive(192.168.1.100) 11 transmit(192.168.1.100) 12 receive(192.168.1.100) 13 server 192.168.1.100, port 123 14 stratum 6, precision -23, leap 00, trust 000 15 refid [192.168.1.100], delay 0.02580, dispersion 0.00000 16 transmitted 4, in filter 4 17 reference time: da5a828d.a102d9f2 Tue, Feb 2 2016 9:19:09.628 18 originate timestamp: da5a8299.e4141d4c Tue, Feb 2 2016 9:19:21.890 19 transmit timestamp: da5a8299.f02c1325 Tue, Feb 2 2016 9:19:21.938 20 filter delay: 0.02583 0.02583 0.02580 0.02583 21 0.00000 0.00000 0.00000 0.00000 22 filter offset: -0.04751 -0.04749 -0.04747 -0.04745 23 0.000000 0.000000 0.000000 0.000000 24 delay 0.02580, dispersion 0.00000 25 offset -0.047470 26 27 2 Feb 09:19:21 ntpdate[2190]: adjust time server 192.168.1.100 offset -0.047470 sec
这个时候,node0的tcpdump的信息如下:
1 [root@node0 etc]# tcpdump -vvv -i em1 host 192.168.1.110 -n 2 tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes 3 09:19:15.890792 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 4 192.168.1.110.42130 > 192.168.1.100.ntp: [udp sum ok] NTPv4, length 48 5 Client, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 3 (8s), precision -6 6 Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec) 7 Reference Timestamp: 0.000000000 8 Originator Timestamp: 0.000000000 9 Receive Timestamp: 0.000000000 10 Transmit Timestamp: 3663364755.938188076 (2016/02/02 09:19:15) 11 Originator - Receive Timestamp: 0.000000000 12 Originator - Transmit Timestamp: 3663364755.938188076 (2016/02/02 09:19:15) 13 09:19:15.890922 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 14 192.168.1.100.ntp > 192.168.1.110.42130: [udp sum ok] NTPv4, length 48 15 Server, Leap indicator: (0), Stratum 6 (secondary reference), poll 3 (8s), precision -23 16 Root Delay: 0.000000, Root dispersion: 7.947586, Reference-ID: 127.127.1.1 17 Reference Timestamp: 3663364749.628949761 (2016/02/02 09:19:09) 18 Originator Timestamp: 3663364755.938188076 (2016/02/02 09:19:15) 19 Receive Timestamp: 3663364755.890792727 (2016/02/02 09:19:15) 20 Transmit Timestamp: 3663364755.890906155 (2016/02/02 09:19:15) 21 Originator - Receive Timestamp: -0.047395322 22 Originator - Transmit Timestamp: -0.047281891 23 09:19:17.890788 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 24 192.168.1.110.42130 > 192.168.1.100.ntp: [udp sum ok] NTPv4, length 48 25 Client, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 3 (8s), precision -6 26 Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec) 27 Reference Timestamp: 0.000000000 28 Originator Timestamp: 0.000000000 29 Receive Timestamp: 0.000000000 30 Transmit Timestamp: 3663364757.938173294 (2016/02/02 09:19:17) 31 Originator - Receive Timestamp: 0.000000000 32 Originator - Transmit Timestamp: 3663364757.938173294 (2016/02/02 09:19:17) 33 09:19:17.890901 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 34 192.168.1.100.ntp > 192.168.1.110.42130: [udp sum ok] NTPv4, length 48 35 Server, Leap indicator: (0), Stratum 6 (secondary reference), poll 3 (8s), precision -23 36 Root Delay: 0.000000, Root dispersion: 7.947616, Reference-ID: 127.127.1.1 37 Reference Timestamp: 3663364749.628949761 (2016/02/02 09:19:09) 38 Originator Timestamp: 3663364757.938173294 (2016/02/02 09:19:17) 39 Receive Timestamp: 3663364757.890788793 (2016/02/02 09:19:17) 40 Transmit Timestamp: 3663364757.890883028 (2016/02/02 09:19:17) 41 Originator - Receive Timestamp: -0.047384545 42 Originator - Transmit Timestamp: -0.047290302 43 09:19:19.890806 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 44 192.168.1.110.42130 > 192.168.1.100.ntp: [udp sum ok] NTPv4, length 48 45 Client, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 3 (8s), precision -6 46 Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec) 47 Reference Timestamp: 0.000000000 48 Originator Timestamp: 0.000000000 49 Receive Timestamp: 0.000000000 50 Transmit Timestamp: 3663364759.938178777 (2016/02/02 09:19:19) 51 Originator - Receive Timestamp: 0.000000000 52 Originator - Transmit Timestamp: 3663364759.938178777 (2016/02/02 09:19:19) 53 09:19:19.890935 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 54 192.168.1.100.ntp > 192.168.1.110.42130: [udp sum ok] NTPv4, length 48 55 Server, Leap indicator: (0), Stratum 6 (secondary reference), poll 3 (8s), precision -23 56 Root Delay: 0.000000, Root dispersion: 7.947647, Reference-ID: 127.127.1.1 57 Reference Timestamp: 3663364749.628949761 (2016/02/02 09:19:09) 58 Originator Timestamp: 3663364759.938178777 (2016/02/02 09:19:19) 59 Receive Timestamp: 3663364759.890806376 (2016/02/02 09:19:19) 60 Transmit Timestamp: 3663364759.890924751 (2016/02/02 09:19:19) 61 Originator - Receive Timestamp: -0.047372374 62 Originator - Transmit Timestamp: -0.047254003 63 09:19:20.895650 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.100 tell 192.168.1.110, length 46 64 09:19:20.895665 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.100 is-at 18:03:73:f0:c3:98, length 28 65 09:19:21.890822 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 66 192.168.1.110.42130 > 192.168.1.100.ntp: [udp sum ok] NTPv4, length 48 67 Client, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 3 (8s), precision -6 68 Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec) 69 Reference Timestamp: 0.000000000 70 Originator Timestamp: 0.000000000 71 Receive Timestamp: 0.000000000 72 Transmit Timestamp: 3663364761.938172519 (2016/02/02 09:19:21) 73 Originator - Receive Timestamp: 0.000000000 74 Originator - Transmit Timestamp: 3663364761.938172519 (2016/02/02 09:19:21) 75 09:19:21.890948 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 76) 76 192.168.1.100.ntp > 192.168.1.110.42130: [udp sum ok] NTPv4, length 48 77 Server, Leap indicator: (0), Stratum 6 (secondary reference), poll 3 (8s), precision -23 78 Root Delay: 0.000000, Root dispersion: 7.947677, Reference-ID: 127.127.1.1 79 Reference Timestamp: 3663364749.628949761 (2016/02/02 09:19:09) 80 Originator Timestamp: 3663364761.938172519 (2016/02/02 09:19:21) 81 Receive Timestamp: 3663364761.890823006 (2016/02/02 09:19:21) 82 Transmit Timestamp: 3663364761.890931904 (2016/02/02 09:19:21) 83 Originator - Receive Timestamp: -0.047349534 84 Originator - Transmit Timestamp: -0.047240607
最后,为了控制时间层数,即stratum的限制,加入了fudge 127.127.1.0 stratum 10的配置信息。下面将最终的server端的ntp.conf配置附上:
1 # For more information about this file, see the man pages 2 # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). 3 4 #server 127.127.1.1 iburst 5 #fudge 127.127.1.0 stratum 10 6 #multicastclient 7 #broadcastdelay 0.008 8 #authenticate no 9 10 driftfile /var/lib/ntp/drift 11 12 # Permit time synchronization with our time source, but do not 13 # permit the source to query or modify the service on this system. 14 restrict default nomodify notrap nopeer noquery 15 16 # Permit all access over the loopback interface. This could 17 # be tightened as well, but to do so would effect some of 18 # the administrative functions. 19 restrict 127.0.0.1 20 restrict ::1 21 22 # Hosts on local network are less restricted. 23 restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap 24 25 # Use public servers from the pool.ntp.org project. 26 # Please consider joining the pool (http://www.pool.ntp.org/join.html). 27 #server 0.centos.pool.ntp.org iburst 28 #server 1.centos.pool.ntp.org iburst 29 #server 2.centos.pool.ntp.org iburst 30 #server 3.centos.pool.ntp.org iburst 31 server 127.127.1.1 iburst 32 fudge 127.127.1.0 stratum 10 33 34 #broadcast 192.168.1.255 autokey # broadcast server 35 #broadcastclient # broadcast client 36 #broadcast 224.0.1.1 autokey # multicast server 37 #multicastclient 224.0.1.1 # multicast client 38 #manycastserver 239.255.254.254 # manycast server 39 #manycastclient 239.255.254.254 autokey # manycast client 40 41 # Enable public key cryptography. 42 #crypto 43 44 includefile /etc/ntp/crypto/pw 45 46 # Key file containing the keys and key identifiers used when operating 47 # with symmetric key cryptography. 48 keys /etc/ntp/keys 49 50 # Specify the key identifiers which are trusted. 51 #trustedkey 4 8 42 52 53 # Specify the key identifier to use with the ntpdc utility. 54 #requestkey 8 55 56 # Specify the key identifier to use with the ntpq utility. 57 #controlkey 8 58 59 # Enable writing of statistics records. 60 #statistics clockstats cryptostats loopstats peerstats 61 62 # Disable the monitoring facility to prevent amplification attacks using ntpdc 63 # monlist command when default restrict does not include the noquery flag. See 64 # CVE-2013-5211 for more details. 65 # Note: Monitoring will not be disabled with the limited restriction flag. 66 disable monitor
最后,说说,当时间差很大的时候,比如我测试过程中,相差一年,其实不需要这么大,只要在小时级别就可以,会提示你,时间跨度大,这个时候,不要认为是错误,ntp会慢慢的将时间调整过来,避免一次调整过大,影响应用的正常运行。
1 [root@node3 tools]# ntpdate -d node0 2 2 Feb 01:05:38 ntpdate[3091]: ntpdate 4.2.6p5@1.2349-o Mon Jan 25 14:27:35 UTC 2016 (1) 3 Looking for host node0 and service ntp 4 host found : node0 5 transmit(192.168.1.100) 6 receive(192.168.1.100) 7 transmit(192.168.1.100) 8 receive(192.168.1.100) 9 transmit(192.168.1.100) 10 receive(192.168.1.100) 11 transmit(192.168.1.100) 12 receive(192.168.1.100) 13 server 192.168.1.100, port 123 14 stratum 6, precision -23, leap 00, trust 000 15 refid [192.168.1.100], delay 0.02582, dispersion 0.00000 16 transmitted 4, in filter 4 17 reference time: da5a87a5.da45128d Tue, Feb 2 2016 9:40:53.852 18 originate timestamp: da5a87d1.6c4aead5 Tue, Feb 2 2016 9:41:37.423 19 transmit timestamp: d878db68.d01598ae Mon, Feb 2 2015 1:05:44.812 20 filter delay: 0.02591 0.02585 0.02582 0.02583 21 0.00000 0.00000 0.00000 0.00000 22 filter offset: 31566952 31566952 31566952 31566952 23 0.000000 0.000000 0.000000 0.000000 24 delay 0.02582, dispersion 0.00000 25 offset 31566952.609973 26 27 2 Feb 01:05:44 ntpdate[3091]: step time server 192.168.1.100 offset 31566952.609973 sec