OpenCV中的HAL方法调用流程分析

在OpenCV中有一些所谓HAL(Hardware Acceleration Layer)实现，看名字好像和硬件相关，其实也不尽然，可以理解为比常规的OCV实现更快的版本就好了。此文要做的就是要找到其实现或者切入流程，打通整个函数调用逻辑。本文将以resize和GaussianBlur两个函数来分析。

resize

首先定位到imgproc模块的imgproc.hpp文件，找到其中的CV_EXPORTS_W void resize( InputArray src, OutputArray dst, Size dsize, double fx = 0, double fy = 0, int interpolation = INTER_LINEAR );方法。因为我们在外部使用的时候都是引用头文件来使用，也就是头文件的函数是我们使用的入口函数。而OCV实现会有许多的分支，一下难以确定，所以我们从入口来找是比较方便的。然后跳转到该函数的实现，如果IDE不支持，可以在对应的resize.cpp搜索有相同的函数声明函数就是对应函数的实现，如下：

void cv::resize( InputArray _src, OutputArray _dst, Size dsize,
                 double inv_scale_x, double inv_scale_y, int interpolation )
{
    CV_INSTRUMENT_REGION();

    Size ssize = _src.size();

    CV_Assert( !ssize.empty() );
    if( dsize.empty() )
    {
        CV_Assert(inv_scale_x > 0); CV_Assert(inv_scale_y > 0);
        dsize = Size(saturate_cast<int>(ssize.width*inv_scale_x),
                     saturate_cast<int>(ssize.height*inv_scale_y));
        CV_Assert( !dsize.empty() );
    }
    else
    {
        inv_scale_x = (double)dsize.width/ssize.width;
        inv_scale_y = (double)dsize.height/ssize.height;
        CV_Assert(inv_scale_x > 0); CV_Assert(inv_scale_y > 0);
    }

    if (interpolation == INTER_LINEAR_EXACT && (_src.depth() == CV_32F || _src.depth() == CV_64F))
        interpolation = INTER_LINEAR; // If depth isn't supported fallback to generic resize

    CV_OCL_RUN(_src.dims() <= 2 && _dst.isUMat() && _src.cols() > 10 && _src.rows() > 10,
               ocl_resize(_src, _dst, dsize, inv_scale_x, inv_scale_y, interpolation))

    Mat src = _src.getMat();
    _dst.create(dsize, src.type());
    Mat dst = _dst.getMat();

    if (dsize == ssize)
    {
        // Source and destination are of same size. Use simple copy.
        src.copyTo(dst);
        return;
    }

    hal::resize(src.type(), src.data, src.step, src.cols, src.rows, dst.data, dst.step, dst.cols, dst.rows, inv_scale_x, inv_scale_y, interpolation);
}

我们看到该函数实现做了三件事：

参数检查
检测是否有OpenCL支持或启用
使用hal空间的resize函数来实现

跳转到hal的实现，同样位于resize.cpp，部分代码：

namespace hal {

void resize(int src_type,
            const uchar * src_data, size_t src_step, int src_width, int src_height,
            uchar * dst_data, size_t dst_step, int dst_width, int dst_height,
            double inv_scale_x, double inv_scale_y, int interpolation)
{
    CV_INSTRUMENT_REGION();

    CV_Assert((dst_width > 0 && dst_height > 0) || (inv_scale_x > 0 && inv_scale_y > 0));
    if (inv_scale_x < DBL_EPSILON || inv_scale_y < DBL_EPSILON)
    {
        inv_scale_x = static_cast<double>(dst_width) / src_width;
        inv_scale_y = static_cast<double>(dst_height) / src_height;
    }

    CALL_HAL(resize, cv_hal_resize, src_type, src_data, src_step, src_width, src_height, dst_data, dst_step, dst_width, dst_height, inv_scale_x, inv_scale_y, interpolation);
    //剩下部分代码是常规实现

然后我们就看到这里有CALL_HAL这样一个宏，跳转到其实现，位于hal_replacement.hpp,

#define CALL_HAL(name, fun, ...) 
    int res = __CV_EXPAND(fun(__VA_ARGS__)); 
    if (res == CV_HAL_ERROR_OK) 
        return; 
    else if (res != CV_HAL_ERROR_NOT_IMPLEMENTED) 
        CV_Error_(cv::Error::StsInternal, 
            ("HAL implementation " CVAUX_STR(name) " ==> " CVAUX_STR(fun) " returned %d (0x%08x)", res, res));

我们可以看到，它实际上调用了fun函数，如果该函数返回CV_HAL_ERROR_OK，那么就会return，显然hal::resize也会返回；否则，会调用CV_Error_，这个并不会让函数结束或者和程序异常一样直接终止整个函数，以后再细讲。反正其结果就是会让hal::resize继续往下执行，下面就是常规的实现，并不会在此宏里就return。
然后我们在hal_replacement.hpp找到cv_hal_resize的定义为

#define cv_hal_resize hal_ni_resize

然后继续找到hal_ni_resize的实现为

inline int hal_ni_resize(int src_type, const uchar *src_data, size_t src_step, int src_width, int src_height, uchar *dst_data, size_t dst_step, int dst_width, int dst_height, double inv_scale_x, double inv_scale_y, int interpolation) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }

到这里，我们发现该函数直接返回CV_HAL_ERROR_NOT_IMPLEMENTED，按照上面的分析，hal::resize继续往下执行。那么，hal的实现是怎么切入进来的呢？
我们发现，hal_replacement.hpp中的CALL_HAL宏上有一句#include "custom_hal.hpp"，好奇怪，include不一般都放在开头嘛？然后我们看下这个custom_hal.cpp，发现它只有一句#include "carotene/tegra_hal.hpp"，我们继续跟踪下去。因为前面分析的函数为hal_ni_resize，直接findhal_ni_resize，没有结果。然后我们findcv_hal_resize，发现有：

#undef cv_hal_resize
#define cv_hal_resize TEGRA_RESIZE

顿时就感觉快打通了，这里竟然把cv_hal_resize给undef掉了，我们知道在hal_replacement.hpp中是#define cv_hal_resize hal_ni_resize的，并且从文件的位置来看，这个def就会被undef掉，然后重新定义为TEGRA_RESIZE，find它，发现其定义：

#define TEGRA_RESIZE(src_type, src_data, src_step, src_width, src_height, dst_data, dst_step, dst_width, dst_height, inv_scale_x, inv_scale_y, interpolation) 
( 
    interpolation == CV_HAL_INTER_LINEAR ? 
        CV_MAT_DEPTH(src_type) == CV_8U && CAROTENE_NS::isResizeLinearOpenCVSupported(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), ((src_type >> CV_CN_SHIFT) + 1)) && 
        inv_scale_x > 0 && inv_scale_y > 0 && 
        (dst_width - 0.5)/inv_scale_x - 0.5 < src_width && (dst_height - 0.5)/inv_scale_y - 0.5 < src_height && 
        (dst_width + 0.5)/inv_scale_x + 0.5 >= src_width && (dst_height + 0.5)/inv_scale_y + 0.5 >= src_height && 
        std::abs(dst_width / inv_scale_x - src_width) < 0.1 && std::abs(dst_height / inv_scale_y - src_height) < 0.1 ? 
            CAROTENE_NS::resizeLinearOpenCV(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), 
                                            src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, ((src_type >> CV_CN_SHIFT) + 1)), 
            CV_HAL_ERROR_OK : CV_HAL_ERROR_NOT_IMPLEMENTED : 
    interpolation == CV_HAL_INTER_AREA ? 
        CV_MAT_DEPTH(src_type) == CV_8U && CAROTENE_NS::isResizeAreaSupported(1.0/inv_scale_x, 1.0/inv_scale_y, ((src_type >> CV_CN_SHIFT) + 1)) && 
        std::abs(dst_width / inv_scale_x - src_width) < 0.1 && std::abs(dst_height / inv_scale_y - src_height) < 0.1 ? 
            CAROTENE_NS::resizeAreaOpenCV(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), 
                                          src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, ((src_type >> CV_CN_SHIFT) + 1)), 
            CV_HAL_ERROR_OK : CV_HAL_ERROR_NOT_IMPLEMENTED : 
    /*nearest neighbour interpolation disabled due to rounding accuracy issues*/ 
    /*interpolation == CV_HAL_INTER_NEAREST ? 
        (src_type == CV_8UC1 || src_type == CV_8SC1) && CAROTENE_NS::isResizeNearestNeighborSupported(CAROTENE_NS::Size2D(src_width, src_height), 1) ? 
            CAROTENE_NS::resizeNearestNeighbor(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), 
                                               src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, 1), 
            CV_HAL_ERROR_OK : 
        (src_type == CV_8UC3 || src_type == CV_8SC3) && CAROTENE_NS::isResizeNearestNeighborSupported(CAROTENE_NS::Size2D(src_width, src_height), 3) ? 
            CAROTENE_NS::resizeNearestNeighbor(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), 
                                               src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, 3), 
            CV_HAL_ERROR_OK : 
        (src_type == CV_8UC4 || src_type == CV_8SC4 || src_type == CV_16UC2 || src_type == CV_16SC2 || src_type == CV_32SC1) && 
        CAROTENE_NS::isResizeNearestNeighborSupported(CAROTENE_NS::Size2D(src_width, src_height), 4) ? 
            CAROTENE_NS::resizeNearestNeighbor(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), 
                                               src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, 4), 
            CV_HAL_ERROR_OK : CV_HAL_ERROR_NOT_IMPLEMENTED :*/ 
    CV_HAL_ERROR_NOT_IMPLEMENTED 
)

这个宏的定义大概做了这些事情：

如果是双线性插值，为CV_8U数据类型，且尺寸以及通道满足一定的要求，那么就resizeLinearOpenCV去真正实现resize，且返回CV_HAL_ERROR_OK，不满足这些条件的双线性插值就不支持，返回CV_HAL_ERROR_NOT_IMPLEMENTED，这样，就会走hal::resize的普通实现
如果是AREA插值，情况也是和双线性插值类似
其他插值方式则目前不支持。但是从注释的这些代码来看，应该是计划支持的，只是现在还没做好而已。
这个宏的定义用到了不常注意的逗号运算符来实现CV_HAL_ERROR_OK值的返回，逗号运算符返回的是其最右边的值。
然后我们就可以跳转到resizeLinearOpenCV以及resizeAreaOpenCV来追踪真正的快速实现方法了。

可以发现，切入的关键就在于那个undef和define操作了。

GaussianBlur

同样，我们在smooth.cpp中找到cv_hal_gaussianBlur方法的实现，发现其hal宏为cv_hal_gaussianBlur，然后到tegra_hal.hpp中findcv_hal_gaussianBlur，发现没有结果。这说明高斯模糊没有对应的hal快速版本。然后发现carotene库中有高斯模糊相关的代码，看起来应该有实现？我们通过写demo以及在源码中打log的方式，发现这些实现函数确实没有被调用，都是在CALL_HAL宏那里就返回CV_HAL_ERROR_NOT_IMPLEMENTED了。应该是这些实现还不够好，比如我发现已有的一些代码还没有考虑到高斯核系数，所以没有切入进去，慢慢等待吧。

相关阅读:
设计模式：观察者模式
 设计模式：享元模式
 面试题：redis主从数据库不一致如何解决?
面试题：Object obj = new Object()这句话在内存里占用了多少字节
 面试题：AtomicInteger和LongAdder区别
 @PrePersist 注解
 树莓派 zero w 添加微雪墨水屏显示天气
 数据仓库项目中的数据建模和ETL日志体系
 TO B业务的发展
 Kylin、Druid、ClickHouse核心技术对比
原文地址：https://www.cnblogs.com/willhua/p/12521581.html