关于dpdk timer跨越CPU core调度的准确性问题
首先dpdk的timer接口里边使用 cpu cycle来比较时间。根据之前的内容
[dpdk] dpdk --lcores参数
当一个EAL thread映射在多个processor上的时候,cpu cycle有可能在不同的CPU core上面获得,
又因为cpu cycle是使用rdtsc指令获取的,这样会造成拿到的cpu cycle不准的问题。
首先,调查一下 rdtsc 指令:
https://stackoverflow.com/questions/3388134/rdtsc-accuracy-across-cpu-cores?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
Invariant TSC
X86_FEATURE_CONSTANT_TSC
+ X86_FEATURE_NONSTOP_TSC
"16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for
invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving
forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or
HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or
access to a platform resource."
[root@D128 ~]# cat /proc/cpuinfo |grep tsc constant_tsc nonstop_tsc
只能保证在单个core 改变频率或挂起的时候的tsc准确性,不能保证跨CPU core的同步问题。
https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/388964
Hello Samuel, The 'Invariant TSC' means that the TSC runs at a fixed frequency and doesn't stop when the cpu halts. The TSCs are not guaranteed to be synchronized although the OS usually does try to synchronize the TSC at boot time. This is one reason
for the rdtscp instruction. On Nehalem and later cpus, the rdtscp instruction returns the TSC and an identifier indicating on which cpu
you read the TSC. RDTSCP is a serializing instruction... unlike the regular rdtsc instruction. Pat
HPET
https://en.wikipedia.org/wiki/High_Precision_Event_Timer
An HPET chip consists of a 64-bit up-counter (main counter) counting at a frequency of at least 10 MHz,
and a set of (at least three, up to 256) comparators. These comparators are 32- or 64-bit-wide. The HPET
is programmed via a memory mapped I/O window that is discoverable via Advanced Configuration and Power
Interface (ACPI). The HPET circuit in modern PCs is integrated into the southbridge chip.[a]
HPET是一个芯片全局的计数器,最小精度为10纳秒,一般集成在南桥。
HPET提供最少3最多256个独立的计数器。
The Linux kernel can also use HPET as its clock source. The documentation of Red Hat MRG version 2 states that TSC is the preferred
clock source due to its much lower overhead, but it uses HPET as a fallback. A benchmark in that environment for 10 million event
counts found that TSC took about 0.6 seconds, HPET took slightly over 12 seconds, and ACPI Power Management Timer took around 24 seconds.[5]
虽然精度高,到底有性能损耗,linux Kernel仍然推荐TSC作为首选计数器,HPET作为备选。
查看HPET是否启用:
[root@D129 cli]# grep hpet /proc/timer_list Clock Event Device: hpet set_next_event: hpet_legacy_next_event set_mode: hpet_legacy_set_mode [root@D129 haha-walawala]# cat /sys/devices/system/clocksource/clocksource0/available_clocksource kvm-clock hpet acpi_pm [root@D129 haha-walawala]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock [root@D129 haha-walawala]# ll /dev/hpet crw-------. 1 root root 10, 228 May 3 16:23 /dev/hpet [root@D129 haha-walawala]#
dpdk如何配置生效:
https://dpdk.org/doc/guides/linux_gsg/enable_func.html#high-precision-event-timer-hpet-functionality
rdtscp
ACPI
略。
Event Timer Adapter Library
https://dpdk.org/doc/guides/prog_guide/event_timer_adapter.html#id1
看完以上文档,读一下代码,确定两个问题:
1. RDTSC的调用时机
2. Event Timer backend的hardware是什么?
官方没有Event Timer的例子,看一下Event Device library库的用法:
http://dpdk.org/doc/guides/prog_guide/eventdev.html
其他参考阅读:
https://www.ibm.com/developerworks/cn/linux/l-cn-timerm/