• C++优化笔记: -O2/-O3/-ffast-math/SIMD


    1. 参考资料

    gcc编译优化选项: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
    Optimizing C++: https://pashminacameron.github.io/cpp/
    gcc/g++ 优化标识 -O1 -O2 -O3 -Os -Ofast -Og的作用: https://blog.csdn.net/liang_baikai/article/details/110137374
    浮点优化选项 -ffast-math:极大地提高浮点运算速度 https://www.cnblogs.com/sky-heaven/p/6742610.html

    2. 一些问题

    Would it make sense to enable ffast-math for simd types? https://github.com/hsivonen/simd/issues/19
    -ffast-math 对于加速浮点运算非常有用,特别是允许更容易的向量化。 我看到在此基准测试中执行 -ffast-math 时,clang 中矩阵乘法的运行时间减少了约 30%:
    正如在 rust issue 中提到的,内在函数已经允许其中的一部分,并且已经可以实现 f32/f64 的包装器类型。 由于 SIMD 类型已经针对矢量化并且包装/展开的成本已经存在,无论如何为它们启用 -ffast-math 是否有意义? 或者,如果在某些情况下这没有意义,为了方便起见复制所有类型的慢速和快速版本是否有用?

    Pre-RFC: What’s the best way to implement -ffast-math? https://internals.rust-lang.org/t/pre-rfc-whats-the-best-way-to-implement-ffast-math/5740
    gcc, simd intrinsics and fast-math concepts https://stackoverflow.com/questions/4966489/gcc-simd-intrinsics-and-fast-math-concepts
    What does gcc's ffast-math actually do? https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do?noredirect=1&lq=1
    Why doesn't GCC optimize aaaaaa to (aaa)(aaa)? https://stackoverflow.com/questions/6430448/why-doesnt-gcc-optimize-aaaaaa-to-aaaaaa
    What kind of optimizations are included in -funsafe-math-optimizations? https://stackoverflow.com/questions/28134064/what-kind-of-optimizations-are-included-in-funsafe-math-optimizations

    3. Optimizing C++ 笔记:

    Compiler options can make quite a difference in the speed (as well as size and behaviour) of the code.
    -O2 is the highest level of optimization you can request without sacrificing safety of the code.
    Going from -O2 to -O3 shows very little gain in speed, but adding -ffast-math (which turns on -O3)does improve the speed noticeably.
    However, this comes at a cost.

    -ffast-math essentially turns on unsafe math optimizations and the changes due to this compiler option can propagate to the code that may link against your code in future (see Note of -ffast-math in References).
    While this option does make your code faster, it is very important that you understand the implications of turning it on and if possible, mitigate against it.
    A safer option that gives similar performance is to use either function-specific optimization (see Selective optimizations) or write some intrinsics or assembly to optimize just the bottlenecks rather than letting the compiler wreak havoc on all of your code (and your downstream dependencies).
    We see from the graph below that AVX code performs better than the -ffast-math code and is also safer.
    This is definitely a case in which the effort of writing SIMD intrinsics is worth it.

    4. gcc 编译优化选项关系

    -Ofast = -O3 + -ffast-math + -fallow-store-data-races

    -ffast-math
    Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast

    -funsafe-math-optimizations
    -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.
    其中 -fassociative-math 依赖 -fno-signed-zeros, -fno-trapping-math

    解释
    -funsafe-math-optimizations
    允许优化浮点运算
    (a) 假设论证和结果是有效的,并且
    (b) 可能违反 IEEE 或 ANSI 标准。
    在链接时使用时,它可能包含更改默认 FPU 控制字或其他类似优化的库或启动文件。

    其中又包括了多个编译选项:  -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.
    影响耗时的编译选项为:
    -fno-signed-zeros, -fno-trapping-math, -fassociative-math
    测试发现缺一不可,主要因为 -fassociative-math 依赖 -fno-signed-zeros, -fno-trapping-math
    解释: -fassociative-math
    允许在一系列浮点运算中重新关联操作数。 可能会改变计算结果,违反了 ISO C 和 C++ 语言标准。 可能会改变0的符号(IEEE 算术指定不同 +0.0 和 -0.0 值的行为,并且禁止简化表达式,例如 x+0.0 或 0.0*x),忽略 NaN 并禁止或创建下溢或溢出(因此不能用于依赖舍入行为的代码,如 (x + 252) - 252。 还可以对浮点比较重新排序,因此在需要排序比较时可能不要去使用。

    官方文档对 -fno-trapping-math的说明: This option should never be turned on by any -O option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

  • 相关阅读:
    总结
    kafka
    kafka前传 JMS
    currentHashMap
    mapPartitionsWithIndex foreachPartitionAsync foreachPartition
    hbase
    hive
    zookeeper kafka storm
    flume的简单使用
    spring-data-jpa
  • 原文地址:https://www.cnblogs.com/gnivor/p/15220558.html
Copyright © 2020-2023  润新知