• OpenCL双边滤波实现美颜功能


        OpenCL是一个并行异构计算的框架,包括intel,AMD,英伟达等等许多厂家都有对它的支持,不过英伟达只到1.2版本,主要发展自己的CUDA去了。虽然没有用过CUDA,但个人感觉CUDA比OpenCL更好一点,但OpenCL支持面更管,CPU,GPU,DSP,FPGA等多种芯片都能支持OpenCL。OpenCL与D3D中的像素着色器非常相似。

    1.双边滤波原理

        双边滤波器的原理参考女神Rachel-Zhang的博客 双边滤波器的原理及实现. 引自Rachel-Zhang的博客,原理如下:

    双边滤波(Bilateral filter)是一种可以保边去噪的滤波器。之所以可以达到此去噪效果,是因为滤波器是由两个函数构成。一个函数是由几何空间距离决定滤波器系数。另一个由像素差值决定滤波器系数。可以与其相比较的两个filter:高斯低通滤波器(http://en.wikipedia.org/wiki/Gaussian_filter)和α-截尾均值滤波器(去掉百分率为α的最小值和最大之后剩下像素的均值作为滤波器)。

    双边滤波器中,输出像素的值依赖于邻域像素的值的加权组合,

              权重系数w(i,j,k,l)取决于定义域核和值域核的乘积。同时考虑了空间域与值域的差别,而Gaussian Filter和α均值滤波分别只考虑了空间域和值域差别。

    本文基于这个公式用OpenCL实现双边滤波来做美颜。

    2.核函数

        磨皮算法原理参考自http://www.zealfilter.com/portal.php?mod=view&aid=138,其中的肤色检测算法不好,我给去掉了,本来还要做个锐化处理的,但发现不做锐化效果也蛮好,所以就先没做,学下一步的OpenCL时在做锐化。

    const sampler_t sampler = CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
    
    kernel void bilateralBlur(read_only image2d_t src,write_only image2d_t dst)  
    {
        int x = (int)get_global_id(0);  
        int y = (int)get_global_id(1);  
        if (x >= get_image_width(src) || y >= get_image_height(src))  
            return;  
    
        int ksize = 11;
        float sigma_d = 3.0;
        float sigma_r = 0.1;
    
        float4 fij = read_imagef(src, sampler, (int2)(x, y));
        float alpha = 0.2;
    
        float4 fkl;
        float dkl;
        float4 rkl;
        float4 wkl;
    
        float4 numerator = (float4)(0.0f,0.0f,0.0f,0.0f);
        float4 denominator = (float4)(1.0f, 1.0f, 1.0f, 1.0f);
        for (int K = -ksize / 2; K <= ksize / 2; K++)
        {
            for (int L = -ksize / 2; L <= ksize / 2; L++)
            {
                fkl = read_imagef(src, sampler, (int2)(x + K, y + L));
    
                dkl = -(K*K + L*L) / (2 * sigma_d*sigma_d);
                rkl.x = -(fij.x - fkl.x)*(fij.x - fkl.x) / (2 * sigma_r*sigma_r);
                rkl.y = -(fij.y - fkl.y)*(fij.y - fkl.y) / (2 * sigma_r*sigma_r);
                rkl.z = -(fij.z - fkl.z)*(fij.z - fkl.z) / (2 * sigma_r*sigma_r);
    
                wkl.x = exp(dkl + rkl.x);
                wkl.y = exp(dkl + rkl.y);
                wkl.z = exp(dkl + rkl.z);
    
                numerator.x += fkl.x * wkl.x;
                numerator.y += fkl.y * wkl.y;
                numerator.z += fkl.z * wkl.z;
    
                denominator.x += wkl.x;
                denominator.y += wkl.y;
                denominator.z += wkl.z;
            }
        }
        
        float4 gij = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
        if (denominator.x > 0 && denominator.y > 0 && denominator.z)
        {
            gij.x = numerator.x / denominator.x;
            gij.y = numerator.y / denominator.y;
            gij.z = numerator.z / denominator.z;
    
            //双边滤波后再做一个融合
             gij.x = fij.x*alpha + gij.x*(1.0 - alpha);
            gij.y = fij.y*alpha + gij.y*(1.0 - alpha);
            gij.z = fij.z*alpha + gij.z*(1.0 - alpha);
        }
    
        write_imagef(dst, (int2)(x, y), gij);
    }

    kernel函数里面基本就是把数学公式写出来,可以说是非常简单的。

    3.host端代码

        OpenCL代码分为host端的代码和device端的代码,kernel是跑在并行设备device上的,host一般适合跑串行的逻辑性强的代码,device则比较适合用来做计算,如卷积运算。计算机中,通常把CPU当host,把GPU当device。不过实际上CPU也可以作为device,因为intel也是支持OpenCL的。本文以CPU为host,GPU为device。

    #include "stdafx.h"
    
    #include <iostream>  
    #include <fstream>  
    #include <sstream>  
    #include <malloc.h> 
    #include <string.h>  
    #include <opencv2/opencv.hpp>  
    
    #include <CL/cl.h>  
     
     
     //----------获取OpenCL平台设备信息---------
    
    void DisplayPlatformInfo(
        cl_platform_id id,
        cl_platform_info name,
        std::string str)
    {
        cl_int errNum;
        std::size_t paramValueSize;
    
        errNum = clGetPlatformInfo(
            id,
            name,
            0,
            NULL,
            &paramValueSize);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
            return;
        }
    
        char * info = (char *)alloca(sizeof(char) * paramValueSize);
        errNum = clGetPlatformInfo(
            id,
            name,
            paramValueSize,
            info,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
            return;
        }
    
        std::cout << "	" << str << ":	" << info << std::endl;
    }
    
    template<typename T>
    void appendBitfield(T info, T value, std::string name, std::string & str)
    {
        if (info & value)
        {
            if (str.length() > 0)
            {
                str.append(" | ");
            }
            str.append(name);
        }
    }
    
    ///
    // Display information for a particular device.
    // As different calls to clGetDeviceInfo may return
    // values of different types a template is used. 
    // As some values returned are arrays of values, a templated class is
    // used so it can be specialized for this case, see below.
    //
    template <typename T>
    class InfoDevice
    {
    public:
        static void display(
            cl_device_id id,
            cl_device_info name,
            std::string str)
        {
            cl_int errNum;
            std::size_t paramValueSize;
    
            errNum = clGetDeviceInfo(
                id,
                name,
                0,
                NULL,
                &paramValueSize);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
                return;
            }
    
            T * info = (T *)alloca(sizeof(T) * paramValueSize);
            errNum = clGetDeviceInfo(
                id,
                name,
                paramValueSize,
                info,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
                return;
            }
    
            // Handle a few special cases
            switch (name)
            {
            case CL_DEVICE_TYPE:
            {
                std::string deviceType;
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_CPU,
                    "CL_DEVICE_TYPE_CPU",
                    deviceType);
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_GPU,
                    "CL_DEVICE_TYPE_GPU",
                    deviceType);
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_ACCELERATOR,
                    "CL_DEVICE_TYPE_ACCELERATOR",
                    deviceType);
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_DEFAULT,
                    "CL_DEVICE_TYPE_DEFAULT",
                    deviceType);
    
                std::cout << "		" << str << ":	" << deviceType << std::endl;
            }
                break;
            case CL_DEVICE_SINGLE_FP_CONFIG:
            {
                std::string fpType;
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_DENORM,
                    "CL_FP_DENORM",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_INF_NAN,
                    "CL_FP_INF_NAN",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_ROUND_TO_NEAREST,
                    "CL_FP_ROUND_TO_NEAREST",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_ROUND_TO_ZERO,
                    "CL_FP_ROUND_TO_ZERO",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_ROUND_TO_INF,
                    "CL_FP_ROUND_TO_INF",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_FMA,
                    "CL_FP_FMA",
                    fpType);
    
    #ifdef CL_FP_SOFT_FLOAT
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_SOFT_FLOAT,
                    "CL_FP_SOFT_FLOAT",
                    fpType);
    #endif
    
                std::cout << "		" << str << ":	" << fpType << std::endl;
            }
            case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:
            {
                std::string memType;
    
                appendBitfield<cl_device_mem_cache_type>(
                    *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                    CL_NONE,
                    "CL_NONE",
                    memType);
                appendBitfield<cl_device_mem_cache_type>(
                    *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                    CL_READ_ONLY_CACHE,
                    "CL_READ_ONLY_CACHE",
                    memType);
    
                appendBitfield<cl_device_mem_cache_type>(
                    *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                    CL_READ_WRITE_CACHE,
                    "CL_READ_WRITE_CACHE",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            case CL_DEVICE_LOCAL_MEM_TYPE:
            {
                std::string memType;
    
                appendBitfield<cl_device_local_mem_type>(
                    *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                    CL_GLOBAL,
                    "CL_LOCAL",
                    memType);
    
                appendBitfield<cl_device_local_mem_type>(
                    *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                    CL_GLOBAL,
                    "CL_GLOBAL",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            case CL_DEVICE_EXECUTION_CAPABILITIES:
            {
                std::string memType;
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_EXEC_KERNEL,
                    "CL_EXEC_KERNEL",
                    memType);
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_EXEC_NATIVE_KERNEL,
                    "CL_EXEC_NATIVE_KERNEL",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            case CL_DEVICE_QUEUE_PROPERTIES:
            {
                std::string memType;
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE,
                    "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE",
                    memType);
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_QUEUE_PROFILING_ENABLE,
                    "CL_QUEUE_PROFILING_ENABLE",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            default:
                std::cout << "		" << str << ":	" << *info << std::endl;
                break;
            }
        }
    };
    
    ///
    // Simple trait class used to wrap base types.
    //
    template <typename T>
    class ArrayType
    {
    public:
        static bool isChar() { return false; }
    };
    
    ///
    // Specialized for the char (i.e. null terminated string case).
    //
    template<>
    class ArrayType<char>
    {
    public:
        static bool isChar() { return true; }
    };
    
    ///
    // Specialized instance of class InfoDevice for array types.
    //
    template <typename T>
    class InfoDevice<ArrayType<T> >
    {
    public:
        static void display(
            cl_device_id id,
            cl_device_info name,
            std::string str)
        {
            cl_int errNum;
            std::size_t paramValueSize;
    
            errNum = clGetDeviceInfo(
                id,
                name,
                0,
                NULL,
                &paramValueSize);
            if (errNum != CL_SUCCESS)
            {
                std::cerr
                    << "Failed to find OpenCL device info "
                    << str
                    << "."
                    << std::endl;
                return;
            }
    
            T * info = (T *)alloca(sizeof(T) * paramValueSize);
            errNum = clGetDeviceInfo(
                id,
                name,
                paramValueSize,
                info,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr
                    << "Failed to find OpenCL device info "
                    << str
                    << "."
                    << std::endl;
                return;
            }
    
            if (ArrayType<T>::isChar())
            {
                std::cout << "	" << str << ":	" << info << std::endl;
            }
            else if (name == CL_DEVICE_MAX_WORK_ITEM_SIZES)
            {
                cl_uint maxWorkItemDimensions;
    
                errNum = clGetDeviceInfo(
                    id,
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                    sizeof(cl_uint),
                    &maxWorkItemDimensions,
                    NULL);
                if (errNum != CL_SUCCESS)
                {
                    std::cerr
                        << "Failed to find OpenCL device info "
                        << "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS."
                        << std::endl;
                    return;
                }
    
                std::cout << "	" << str << ":	";
                for (cl_uint i = 0; i < maxWorkItemDimensions; i++)
                {
                    std::cout << info[i] << " ";
                }
                std::cout << std::endl;
            }
        }
    };
    
    ///
    //  Enumerate platforms and display information about them 
    //  and their associated devices.
    //
    void displayInfo(void)
    {
        cl_int errNum;
        cl_uint numPlatforms;
        cl_platform_id * platformIds;
        cl_context context = NULL;
    
        // First, query the total number of platforms
        errNum = clGetPlatformIDs(0, NULL, &numPlatforms);
        if (errNum != CL_SUCCESS || numPlatforms <= 0)
        {
            std::cerr << "Failed to find any OpenCL platform." << std::endl;
            return;
        }
    
        // Next, allocate memory for the installed plaforms, and qeury 
        // to get the list.
        platformIds = (cl_platform_id *)alloca(sizeof(cl_platform_id) * numPlatforms);
        // First, query the total number of platforms
        errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find any OpenCL platforms." << std::endl;
            return;
        }
    
        std::cout << "Number of platforms: 	" << numPlatforms << std::endl;
        // Iterate through the list of platforms displaying associated information
        for (cl_uint i = 0; i < numPlatforms; i++) {
            // First we display information associated with the platform
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_PROFILE,
                "CL_PLATFORM_PROFILE");
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_VERSION,
                "CL_PLATFORM_VERSION");
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_VENDOR,
                "CL_PLATFORM_VENDOR");
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_EXTENSIONS,
                "CL_PLATFORM_EXTENSIONS");
    
            // Now query the set of devices associated with the platform
            cl_uint numDevices;
            errNum = clGetDeviceIDs(
                platformIds[i],
                CL_DEVICE_TYPE_ALL,
                0,
                NULL,
                &numDevices);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL devices." << std::endl;
                return;
            }
    
            cl_device_id * devices = (cl_device_id *)alloca(sizeof(cl_device_id) * numDevices);
            errNum = clGetDeviceIDs(
                platformIds[i],
                CL_DEVICE_TYPE_ALL,
                numDevices,
                devices,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL devices." << std::endl;
                return;
            }
    
            std::cout << "	Number of devices: 	" << numDevices << std::endl;
            // Iterate through each device, displaying associated information
            for (cl_uint j = 0; j < numDevices; j++)
            {
                InfoDevice<cl_device_type>::display(
                    devices[j],
                    CL_DEVICE_TYPE,
                    "CL_DEVICE_TYPE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_VENDOR_ID,
                    "CL_DEVICE_VENDOR_ID");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_COMPUTE_UNITS,
                    "CL_DEVICE_MAX_COMPUTE_UNITS");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                    "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS");
    
                InfoDevice<ArrayType<size_t> >::display(
                    devices[j],
                    CL_DEVICE_MAX_WORK_ITEM_SIZES,
                    "CL_DEVICE_MAX_WORK_ITEM_SIZES");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_MAX_WORK_GROUP_SIZE,
                    "CL_DEVICE_MAX_WORK_GROUP_SIZE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE");
    
    #ifdef CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_INT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF");
    #endif
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_CLOCK_FREQUENCY,
                    "CL_DEVICE_MAX_CLOCK_FREQUENCY");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_ADDRESS_BITS,
                    "CL_DEVICE_ADDRESS_BITS");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE,
                    "CL_DEVICE_MAX_MEM_ALLOC_SIZE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_IMAGE_SUPPORT,
                    "CL_DEVICE_IMAGE_SUPPORT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_READ_IMAGE_ARGS,
                    "CL_DEVICE_MAX_READ_IMAGE_ARGS");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS,
                    "CL_DEVICE_MAX_WRITE_IMAGE_ARGS");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE2D_MAX_WIDTH,
                    "CL_DEVICE_IMAGE2D_MAX_WIDTH");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE2D_MAX_WIDTH,
                    "CL_DEVICE_IMAGE2D_MAX_WIDTH");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT,
                    "CL_DEVICE_IMAGE2D_MAX_HEIGHT");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE3D_MAX_WIDTH,
                    "CL_DEVICE_IMAGE3D_MAX_WIDTH");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT,
                    "CL_DEVICE_IMAGE3D_MAX_HEIGHT");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE3D_MAX_DEPTH,
                    "CL_DEVICE_IMAGE3D_MAX_DEPTH");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_SAMPLERS,
                    "CL_DEVICE_MAX_SAMPLERS");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_MAX_PARAMETER_SIZE,
                    "CL_DEVICE_MAX_PARAMETER_SIZE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN,
                    "CL_DEVICE_MEM_BASE_ADDR_ALIGN");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE,
                    "CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE");
    
                InfoDevice<cl_device_fp_config>::display(
                    devices[j],
                    CL_DEVICE_SINGLE_FP_CONFIG,
                    "CL_DEVICE_SINGLE_FP_CONFIG");
    
                InfoDevice<cl_device_mem_cache_type>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE,
                    "CL_DEVICE_GLOBAL_MEM_CACHE_TYPE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE,
                    "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE,
                    "CL_DEVICE_GLOBAL_MEM_CACHE_SIZE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_SIZE,
                    "CL_DEVICE_GLOBAL_MEM_SIZE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE,
                    "CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_CONSTANT_ARGS,
                    "CL_DEVICE_MAX_CONSTANT_ARGS");
    
                InfoDevice<cl_device_local_mem_type>::display(
                    devices[j],
                    CL_DEVICE_LOCAL_MEM_TYPE,
                    "CL_DEVICE_LOCAL_MEM_TYPE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_LOCAL_MEM_SIZE,
                    "CL_DEVICE_LOCAL_MEM_SIZE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT,
                    "CL_DEVICE_ERROR_CORRECTION_SUPPORT");
    
    #ifdef CL_DEVICE_HOST_UNIFIED_MEMORY
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_HOST_UNIFIED_MEMORY,
                    "CL_DEVICE_HOST_UNIFIED_MEMORY");
    #endif
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION,
                    "CL_DEVICE_PROFILING_TIMER_RESOLUTION");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_ENDIAN_LITTLE,
                    "CL_DEVICE_ENDIAN_LITTLE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_AVAILABLE,
                    "CL_DEVICE_AVAILABLE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_COMPILER_AVAILABLE,
                    "CL_DEVICE_COMPILER_AVAILABLE");
    
                InfoDevice<cl_device_exec_capabilities>::display(
                    devices[j],
                    CL_DEVICE_EXECUTION_CAPABILITIES,
                    "CL_DEVICE_EXECUTION_CAPABILITIES");
    
                InfoDevice<cl_command_queue_properties>::display(
                    devices[j],
                    CL_DEVICE_QUEUE_PROPERTIES,
                    "CL_DEVICE_QUEUE_PROPERTIES");
    
                InfoDevice<cl_platform_id>::display(
                    devices[j],
                    CL_DEVICE_PLATFORM,
                    "CL_DEVICE_PLATFORM");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_NAME,
                    "CL_DEVICE_NAME");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_VENDOR,
                    "CL_DEVICE_VENDOR");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DRIVER_VERSION,
                    "CL_DRIVER_VERSION");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_PROFILE,
                    "CL_DEVICE_PROFILE");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_VERSION,
                    "CL_DEVICE_VERSION");
    
    #ifdef CL_DEVICE_OPENCL_C_VERSION
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_OPENCL_C_VERSION,
                    "CL_DEVICE_OPENCL_C_VERSION");
    #endif
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_EXTENSIONS,
                    "CL_DEVICE_EXTENSIONS");
    
    
                std::cout << std::endl << std::endl;
            }
        }
    }
    
    //-----------以上为获取并显示OpenCL设备信息的代码------------------
    
    cl_program CreateProgram(cl_context context, cl_device_id device, const char* fileName)  
    {  
        cl_int errNum;  
        cl_program program;  
    
        std::ifstream kernelFile(fileName, std::ios::in);  
        if (!kernelFile.is_open())  
        {  
            std::cerr << "Failed to open file for reading: " << fileName << std::endl;  
            return NULL;  
        }  
    
        std::ostringstream oss;  
        oss << kernelFile.rdbuf();  
    
        std::string srcStdStr = oss.str();  
        const char *srcStr = srcStdStr.c_str();  
        program = clCreateProgramWithSource(context, 1,  
            (const char**)&srcStr,  
            NULL, NULL);  
        if (program == NULL)  
        {  
            std::cerr << "Failed to create CL program from source." << std::endl;  
            return NULL;  
        }  
    
        errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);  
        if (errNum != CL_SUCCESS)  
        {  
            // Determine the reason for the error  
            char buildLog[16384];  
            clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,  
                sizeof(buildLog), buildLog, NULL);  
    
            std::cerr << "Error in kernel: " << std::endl;  
            std::cerr << buildLog;  
            clReleaseProgram(program);  
            return NULL;  
        }  
    
        return program;  
    }  
    
    
    void Cleanup(cl_context context, cl_command_queue commandQueue,  
                 cl_program program, cl_kernel kernel, cl_mem imageObjects[2])  
    {  
        for (int i = 0; i < 2; i++)  
        {  
            if (imageObjects[i] != 0)  
                clReleaseMemObject(imageObjects[i]);  
        }  
        if (commandQueue != 0)  
            clReleaseCommandQueue(commandQueue);  
    
        if (kernel != 0)  
            clReleaseKernel(kernel);  
    
        if (program != 0)  
            clReleaseProgram(program);  
    
        if (context != 0)  
            clReleaseContext(context);  
    
    }  
      
    cl_mem LoadImage(cl_context context, char *fileName, int &width, int &height)  
    {  
        cv::Mat image1 = cv::imread(fileName);  
        width = image1.cols;  
        height = image1.rows;  
        char *buffer = new char[width * height * 4];  
        int w = 0;  
        for (int v = height - 1; v >= 0; v--)  
        {  
            for (int u = 0; u <width; u++)  
            {  
                buffer[w++] = image1.at<cv::Vec3b>(v, u)[0];  
                buffer[w++] = image1.at<cv::Vec3b>(v, u)[1];  
                buffer[w++] = image1.at<cv::Vec3b>(v, u)[2];  
                w++;  
            }  
        }  
    
        // Create OpenCL image  
        cl_image_format clImageFormat;  
        clImageFormat.image_channel_order = CL_RGBA;  
        clImageFormat.image_channel_data_type = CL_UNORM_INT8;  
    
        cl_int errNum;  
        cl_mem clImage;  
        clImage = clCreateImage2D(context,  
            CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,  
            &clImageFormat,  
            width,  
            height,  
            0,  
            buffer,  
            &errNum);  
    
        if (errNum != CL_SUCCESS)  
        {  
            std::cerr << "Error creating CL image object" << std::endl;  
            return 0;  
        }  
    
        return clImage;  
    }  
    
    size_t RoundUp(int groupSize, int globalSize)  
    {  
        int r = globalSize % groupSize;  
        if (r == 0)  
        {  
            return globalSize;  
        }  
        else  
        {  
            return globalSize + groupSize - r;  
        }  
    }  
    
    int main(int argc, char** argv)  
    {  
        cl_context context = 0;  
        cl_command_queue commandQueue = 0;  
        cl_program program = 0;  
        cl_device_id device = 0;  
        cl_kernel kernel = 0;  
        cl_mem imageObjects[2] = { 0, 0 };  
        cl_int errNum;  
    
        //打印所有OpenCL平台设备信息
        displayInfo();
    
        cl_uint numplatforms;
        errNum = clGetPlatformIDs(0, NULL, &numplatforms);
        if (errNum != CL_SUCCESS || numplatforms <= 0){
            printf("没有找到OpenCL平台 
    ");
            return 1;
        }
    
        cl_platform_id * platformIds;
        platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id)*numplatforms);
        errNum = clGetPlatformIDs(numplatforms, platformIds, NULL);
        if (errNum != CL_SUCCESS){
            printf("没有找到OpenCL平台 
    ");
            return 1;
        }
        printf("平台数:%d 
    ", numplatforms);
    
        //选用CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡
        cl_uint numDevices,index_platform = 0,index_device = 0;
        cl_device_id *devicesIds;
        std::size_t paramValueSize = 0;
        for (cl_uint i = 0; i < numplatforms; i++){
            errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
            if (errNum != CL_SUCCESS || numDevices <= 0){
                printf("平台 %d 没有找到设备",i);
                continue;
            }
            devicesIds = (cl_device_id*)alloca(sizeof(cl_device_id)*numDevices);
            errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, numDevices, devicesIds, NULL);
            if (errNum != CL_SUCCESS ){
                printf("平台 %d 获取设备ID失败", i);
                continue;
            }
    
            for (cl_uint j = 0; j < numDevices; j++){
                std::size_t tmpSize = 0;
                errNum = clGetDeviceInfo(devicesIds[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &tmpSize, NULL);
                if (errNum != CL_SUCCESS){
                    std::cerr << "Failed to find OpenCL device info " << std::endl;
                    continue;
                }
    
                if (tmpSize >= paramValueSize){
                    index_platform = i;
                    index_device = j;
                }
            }
        }
    
        cl_context_properties contextProperties[] ={
            CL_CONTEXT_PLATFORM,
            (cl_context_properties)platformIds[index_platform],
            0
        };
        context = clCreateContext(contextProperties, numDevices, devicesIds, NULL, NULL, &errNum);
        if (errNum != CL_SUCCESS){
            std::cerr << "Failed to Create Context " << std::endl;
            return 1;
        }
    
        device = devicesIds[index_device];
    
        // Create a command-queue on the first device available  
        // on the created context  
        commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &errNum);
        if (commandQueue == NULL)  {  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Make sure the device supports images, otherwise exit  
        cl_bool imageSupport = CL_FALSE;  
        clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool), &imageSupport, NULL);  
        if (imageSupport != CL_TRUE)  {  
            std::cerr << "OpenCL device does not support images." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Load input image from file and load it into  
        // an OpenCL image object  
        int width, height;  
        char *src0 = "test.png";
        imageObjects[0] = LoadImage(context, src0, width, height);  
        if (imageObjects[0] == 0)  {  
            std::cerr << "Error loading: " << std::string(src0) << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Create ouput image object  
        cl_image_format clImageFormat;  
        clImageFormat.image_channel_order = CL_RGBA;  
        clImageFormat.image_channel_data_type = CL_UNORM_INT8;  
        imageObjects[1] = clCreateImage2D(context,  
            CL_MEM_WRITE_ONLY,  
            &clImageFormat,  
            width,  
            height,  
            0,  
            NULL,  
            &errNum);  
    
        if (errNum != CL_SUCCESS){  
            std::cerr << "Error creating CL output image object." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Create OpenCL program  
        program = CreateProgram(context, device, "bilateralBlur.cl");  
        if (program == NULL)  {  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
        // Create OpenCL kernel  
        kernel = clCreateKernel(program, "bilateralBlur", NULL);  
        if (kernel == NULL)  {  
            std::cerr << "Failed to create kernel" << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Set the kernel arguments  
        errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &imageObjects[0]);  
        errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &imageObjects[1]);  
        if (errNum != CL_SUCCESS)  {  
            std::cerr << "Error setting kernel arguments." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
            system("pause") ; return 1; 
        }  
    
        size_t localWorkSize[2] = { 32, 32 };  
        size_t globalWorkSize[2] = { RoundUp(localWorkSize[0], width),  
            RoundUp(localWorkSize[1], height) };  
    
        cl_event prof_event;
    
        // Queue the kernel up for execution  
        errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL,  
            globalWorkSize, localWorkSize,  
            0, NULL, &prof_event);
        if (errNum != CL_SUCCESS)  
        {  
            std::cerr << "Error queuing kernel for execution." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }
    
        clFinish(commandQueue);
        errNum = clWaitForEvents(1, &prof_event);
        if (errNum)
        {
            printf("clWaitForEvents() failed for histogram_rgba_unorm8 kernel. (%d)
    ", errNum);
            return EXIT_FAILURE;
        }
    
        cl_ulong ev_start_time = (cl_ulong)0;
        cl_ulong ev_end_time = (cl_ulong)0;
        size_t return_bytes;
    
        errNum = clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &ev_start_time, &return_bytes);
        errNum |= clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,sizeof(cl_ulong), &ev_end_time, &return_bytes);
        if (errNum)
        {
            printf("clGetEventProfilingInfo() failed for kernel. (%d)
    ", errNum);
            return EXIT_FAILURE;
        }
    
        double run_time = (double)(ev_end_time - ev_start_time);
    
        printf("Image dimensions: %d x %d pixels, Image type = CL_RGBA, CL_UNORM_INT8
    ", width, height);
        printf("Work Timer:%lfms
    ", run_time / 1000000);
    
        clReleaseEvent(prof_event);
    
        // Read the output buffer back to the Host  
        char *buffer = new char[width * height * 4];  
        size_t origin[3] = { 0, 0, 0 };  
        size_t region[3] = { width, height, 1 };  
        errNum = clEnqueueReadImage(commandQueue, imageObjects[1], CL_TRUE,  
            origin, region, 0, 0, buffer,  
            0, NULL, NULL);  
        if (errNum != CL_SUCCESS)  {  
            std::cerr << "Error reading result buffer." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        std::cout << std::endl;  
        std::cout << "Executed program succesfully." << std::endl;  
    
        // Save the image out to disk  
        char *saveImage = "output.jpg";
        //std::cout << buffer << std::endl;  
        cv::Mat imageColor = cv::imread(src0);  
        cv::Mat imageColor2;  
        imageColor2.create(imageColor.rows, imageColor.cols, imageColor.type());  
        int w = 0;  
        for (int v = imageColor2.rows-1; v >=0; v--)  {  
            for (int u =0 ; u <imageColor2.cols; u++)  {  
                imageColor2.at<cv::Vec3b>(v, u)[0] = buffer[w++];  
                imageColor2.at<cv::Vec3b>(v, u)[1] = buffer[w++];  
                imageColor2.at<cv::Vec3b>(v, u)[2] = buffer[w++];  
                w++;  
            }  
        }
    
        cv::imshow("原始图像", imageColor);
        cv::imshow("磨皮后", imageColor2);  
        cv::imwrite(saveImage, imageColor2);  
        cv::waitKey(0);  
    
        delete[] buffer;  
    
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
    
        return 0;  
    }

        这个host端的程序包含了opencv的一点内容,主要是用opencv来读取图片,用其他方式读取图片当然也是可以的。实际上,opencv本身有一个ocl模块,貌似是由AMD给opencv做得OpenCL扩展,其中包括了许多用OpenCL实现的opencv的一些常用函数,其中就已经包括了双边滤波和自适应双边滤波。

        这段程序选用了CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡,最佳的OpenCL设备的选择应当综合考虑,在我的电脑上CL_DEVICE_MAX_WORK_GROUP_SIZE的CPU似乎就是最佳的OpenCL设备,虽然在实际获取的设备信息中CPU的许多参数比GPU强,但是实际运行的时长却是GPU的几倍,所以对于用哪些参数来判断一个OpenCL设备是最佳的我也不是很清楚,希望懂得朋友可以指导一二。

        另外,这段程序其实是很简单的,实际有效的代码只有300多行,获取设备信息的代码只是为了看看自己的电脑上有哪些OpenCL设备以及相关的信息,main中的displayInfo();完全可以注释掉。

        另外关于OpenCL库文件的获取,可以从intel,英伟达,AMD等获取到,我所使用的OpenCL的头文件和lib文件就是从英伟达的CUDA里面copy出来的,你也可以直接就是用我的。

    4.运行结果

    (1)硬件信息

    imageimage

    (2)控制台输出OpenCL设备的信息

    Number of platforms:    2
            CL_PLATFORM_PROFILE:    FULL_PROFILE
            CL_PLATFORM_VERSION:    OpenCL 2.0
            CL_PLATFORM_VENDOR:     Intel(R) Corporation
            CL_PLATFORM_EXTENSIONS: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
            Number of devices:      2
                    CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                    CL_DEVICE_VENDOR_ID:    32902
                    CL_DEVICE_MAX_COMPUTE_UNITS:    24
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
            CL_DEVICE_MAX_WORK_ITEM_SIZES:  256 256 256
                    CL_DEVICE_MAX_WORK_GROUP_SIZE:  256
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:        0
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   0
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     1
                    CL_DEVICE_MAX_CLOCK_FREQUENCY:  1050
                    CL_DEVICE_ADDRESS_BITS: 32
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE:   390280806
                    CL_DEVICE_IMAGE_SUPPORT:        1
                    CL_DEVICE_MAX_READ_IMAGE_ARGS:  128
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 128
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                    CL_DEVICE_MAX_SAMPLERS: 16
                    CL_DEVICE_MAX_PARAMETER_SIZE:   1024
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:        CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        524288
                    CL_DEVICE_GLOBAL_MEM_SIZE:      1561123226
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                    CL_DEVICE_MAX_CONSTANT_ARGS:    8
                    CL_DEVICE_LOCAL_MEM_TYPE:
                    CL_DEVICE_LOCAL_MEM_SIZE:       65536
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                    CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION:   83
                    CL_DEVICE_ENDIAN_LITTLE:        1
                    CL_DEVICE_AVAILABLE:    1
                    CL_DEVICE_COMPILER_AVAILABLE:   1
                    CL_DEVICE_EXECUTION_CAPABILITIES:       CL_EXEC_KERNEL
                    CL_DEVICE_QUEUE_PROPERTIES:     CL_QUEUE_PROFILING_ENABLE
                    CL_DEVICE_PLATFORM:     00DEC488
            CL_DEVICE_NAME: Intel(R) HD Graphics 520
            CL_DEVICE_VENDOR:       Intel(R) Corporation
            CL_DRIVER_VERSION:      20.19.15.4364
            CL_DEVICE_PROFILE:      FULL_PROFILE
            CL_DEVICE_VERSION:      OpenCL 2.0
            CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
            CL_DEVICE_EXTENSIONS:   cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_ctz cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing cl_intel_motion_estimation cl_intel_simultaneous_sharing cl_intel_subgroups cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_fp16 cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir


                    CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
                    CL_DEVICE_VENDOR_ID:    32902
                    CL_DEVICE_MAX_COMPUTE_UNITS:    4
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
            CL_DEVICE_MAX_WORK_ITEM_SIZES:  8192 8192 8192
                    CL_DEVICE_MAX_WORK_GROUP_SIZE:  8192
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:        1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     32
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    16
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      8
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     4
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    8
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   4
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                    CL_DEVICE_MAX_CLOCK_FREQUENCY:  2500
                    CL_DEVICE_ADDRESS_BITS: 32
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536838144
                    CL_DEVICE_IMAGE_SUPPORT:        1
                    CL_DEVICE_MAX_READ_IMAGE_ARGS:  480
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 480
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_WIDTH:    2048
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT:   2048
                    CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                    CL_DEVICE_MAX_SAMPLERS: 480
                    CL_DEVICE_MAX_PARAMETER_SIZE:   3840
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:        CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        262144
                    CL_DEVICE_GLOBAL_MEM_SIZE:      2147352576
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     131072
                    CL_DEVICE_MAX_CONSTANT_ARGS:    480
                    CL_DEVICE_LOCAL_MEM_TYPE:       CL_LOCAL | CL_GLOBAL
                    CL_DEVICE_LOCAL_MEM_SIZE:       32768
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                    CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION:   395
                    CL_DEVICE_ENDIAN_LITTLE:        1
                    CL_DEVICE_AVAILABLE:    1
                    CL_DEVICE_COMPILER_AVAILABLE:   1
                    CL_DEVICE_EXECUTION_CAPABILITIES:       CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL
                    CL_DEVICE_QUEUE_PROPERTIES:     CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
                    CL_DEVICE_PLATFORM:     00DEC488
            CL_DEVICE_NAME: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
            CL_DEVICE_VENDOR:       Intel(R) Corporation
            CL_DRIVER_VERSION:      5.2.0.10094
            CL_DEVICE_PROFILE:      FULL_PROFILE
            CL_DEVICE_VERSION:      OpenCL 2.0 (Build 10094)
            CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
            CL_DEVICE_EXTENSIONS:   cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer


            CL_PLATFORM_PROFILE:    FULL_PROFILE
            CL_PLATFORM_VERSION:    OpenCL 1.2 CUDA 8.0.44
            CL_PLATFORM_VENDOR:     NVIDIA Corporation
            CL_PLATFORM_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
            Number of devices:      1
                    CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                    CL_DEVICE_VENDOR_ID:    4318
                    CL_DEVICE_MAX_COMPUTE_UNITS:    3
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
            CL_DEVICE_MAX_WORK_ITEM_SIZES:  1024 1024 64
                    CL_DEVICE_MAX_WORK_GROUP_SIZE:  1024
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:        1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                    CL_DEVICE_MAX_CLOCK_FREQUENCY:  1241
                    CL_DEVICE_ADDRESS_BITS: 32
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536870912
                    CL_DEVICE_IMAGE_SUPPORT:        1
                    CL_DEVICE_MAX_READ_IMAGE_ARGS:  256
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_WIDTH:    4096
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT:   4096
                    CL_DEVICE_IMAGE3D_MAX_DEPTH:    4096
                    CL_DEVICE_MAX_SAMPLERS: 32
                    CL_DEVICE_MAX_PARAMETER_SIZE:   4352
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN:  4096
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF | CL_FP_FMA
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:        CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    128
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        49152
                    CL_DEVICE_GLOBAL_MEM_SIZE:      2147483648
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                    CL_DEVICE_MAX_CONSTANT_ARGS:    9
                    CL_DEVICE_LOCAL_MEM_TYPE:
                    CL_DEVICE_LOCAL_MEM_SIZE:       49152
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                    CL_DEVICE_HOST_UNIFIED_MEMORY:  0
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION:   1000
                    CL_DEVICE_ENDIAN_LITTLE:        1
                    CL_DEVICE_AVAILABLE:    1
                    CL_DEVICE_COMPILER_AVAILABLE:   1
                    CL_DEVICE_EXECUTION_CAPABILITIES:       CL_EXEC_KERNEL
                    CL_DEVICE_QUEUE_PROPERTIES:     CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
                    CL_DEVICE_PLATFORM:     00E30580
            CL_DEVICE_NAME: GeForce 940MX
            CL_DEVICE_VENDOR:       NVIDIA Corporation
            CL_DRIVER_VERSION:      369.30
            CL_DEVICE_PROFILE:      FULL_PROFILE
            CL_DEVICE_VERSION:      OpenCL 1.2 CUDA
            CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 1.2
            CL_DEVICE_EXTENSIONS:   cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts


    平台数:2
    Image dimensions: 273 x 415 pixels, Image type = CL_RGBA, CL_UNORM_INT8
    Work Timer:3.422816ms

    Executed program succesfully.

    273X415大小的图片用时不到4ms。

    (3)双边滤波的效果

    image

        效果应该来说是很明显的。不过由于没有肤色检测和最后一步锐化,以及参数的设置等问题,连我朋友都说这个磨皮效果太嫩了,看着很假。所以在算法上我这个是有待完善的。

        另外,在速度上,这个算法应该依然有优化的空间。

    源码:http://download.csdn.net/download/qq_33892166/9761287

        源码如果报错“Error queuing kernel for execution.”,尝试修改 size_t localWorkSize[2] = { 32, 32 }; 为 size_t localWorkSize[2] = { 16, 16 };

      

  • 相关阅读:
    文本信息检索基本知识【转】
    单纯形【转】
    云飘忽不定?且看多种多样的云存储应用【转】
    海量数据来临 分层存储才是解决之道【转】
    伽玛函数_gamma
    设计模式理解创建模式
    Memory management demo from TCPL
    Welcome to asm!
    Monokai scheme for Qt
    Astyle for qtcreator
  • 原文地址:https://www.cnblogs.com/betterwgo/p/6431522.html
Copyright © 2020-2023  润新知