kd tree的优化查找算法BBF 【转】

BBF（Best Bin First）是一种改进的k-d树最近邻查询算法。从前两篇标准的k-d树查询过程可以看出其搜索过程中的“回溯”是由“查询路径”来决定的，并没有考虑查询路径上数据点本身的一些性质。BBF的查询思路就是将“查询路径”上的节点进行排序，如按各自分割超平面（称为Bin）与查询点的距离排序。回溯检查总是从优先级最高的（Best Bin）的树节点开始。另外BBF还设置了一个运行超时限制，当优先级队列中的所有节点都经过检查或者超出时间限制时，算法返回当前找到的最好结果作为近似的最近邻。采用了best-bin-first search方法就可以将k-d树扩展到高维数据集上。

　　下面我们通过大牛Rob Hess基于OpenCV的SIFT实现中的相关代码来具体学习下BBF算法。

/*
Finds an image feature's approximate k nearest neighbors in a kd tree using
Best Bin First search.

@param kd_root root of an image feature kd tree
@param feat image feature for whose neighbors to search
@param k number of neighbors to find
@param nbrs pointer to an array in which to store pointers to neighbors
    in order of increasing descriptor distance
@param max_nn_chks search is cut off after examining this many tree entries

@return Returns the number of neighbors found and stored in nbrs, or
    -1 on error.
*/
//参数和返回值参看以上注释
//基于k-d tree + bbf的k近邻查找函数
int kdtree_bbf_knn( struct kd_node* kd_root, struct feature* feat, int k,
                    struct feature*** nbrs, int max_nn_chks )
{
    struct kd_node* expl;　　　　　　//expl是特征k-d tree中的一个节点
    struct min_pq* min_pq;         //min_pq是优先级队列
    struct feature* tree_feat, ** _nbrs;//tree_feat是一个SIFT特征，_nbrs中存放着查找出来的近邻特征节点
    struct bbf_data* bbf_data;　　　//bbf_data是一个用来存放临时特征数据和特征间距离的缓存结构
    int i, t = 0, n = 0;　　　　　　 //t是运行时限，n是查找出来的近邻个数
    if( ! nbrs  ||  ! feat  ||  ! kd_root )
    {
        fprintf( stderr, "Warning: NULL pointer error, %s, line %d\n",
                __FILE__, __LINE__ );
        return -1;
    }

    _nbrs = calloc( k, sizeof( struct feature* ) );　　//给查找结果分配相应大小的内存
    min_pq = minpq_init();　　　　　　　　　　　　　　　　　//min_pq队列初始化
    minpq_insert( min_pq, kd_root, 0 );　　　　　　　　　//将根节点先插入到min_pq优先级队列中
    while( min_pq->n > 0  &&  t < max_nn_chks )       //min_pq队列没有回溯完且未达到时限
    {
        expl = (struct kd_node*)minpq_extract_min( min_pq );//从min_pq中提取优先级最高的节点（并移除）
        if( ! expl )
        {
            fprintf( stderr, "Warning: PQ unexpectedly empty, %s line %d\n",
                    __FILE__, __LINE__ );
            goto fail;
        }

        expl = explore_to_leaf( expl, feat, min_pq );　　　　//从expl节点开始查找到叶子节点（下详）　
        if( ! expl )
        {
            fprintf( stderr, "Warning: PQ unexpectedly empty, %s line %d\n",
                    __FILE__, __LINE__ );
            goto fail;
        }

        for( i = 0; i < expl->n; i++ )　　//开始比较查找最近邻
        {
            tree_feat = &expl->features[i];
            bbf_data = malloc( sizeof( struct bbf_data ) );
            if( ! bbf_data )
            {
                fprintf( stderr, "Warning: unable to allocate memory,"
                    " %s line %d\n", __FILE__, __LINE__ );
                goto fail;
            }
            bbf_data->old_data = tree_feat->feature_data;
            bbf_data->d = descr_dist_sq(feat, tree_feat);　　//计算叶子节点特征和目标特征的距离
            tree_feat->feature_data = bbf_data;
            n += insert_into_nbr_array( tree_feat, _nbrs, n, k );//判断并插入符合条件的近邻到_nbrs中
        }
        t++;
    }
　　 //释放内存并返回结果
    minpq_release( &min_pq );
    for( i = 0; i < n; i++ )
    {
        bbf_data = _nbrs[i]->feature_data;
        _nbrs[i]->feature_data = bbf_data->old_data;
        free( bbf_data );
    }
    *nbrs = _nbrs;
    return n;

fail:
    minpq_release( &min_pq );
    for( i = 0; i < n; i++ )
    {
        bbf_data = _nbrs[i]->feature_data;
        _nbrs[i]->feature_data = bbf_data->old_data;
        free( bbf_data );
    }
    free( _nbrs );
    *nbrs = NULL;
    return -1;
}

　　整个kdtree_bbf_knn函数包括了优先级队列的建立和k邻近查找两个过程。其中最关键的两个数据结构就是min_pq优先级队列和_nbrs存放k邻近结果的队列。min_pq优先级队列是按照各节点的分割超平面和目标查询特征点之间的距离升序排列的，第一个节点就是最小距离（优先级最高）的节点。另外_nbrs中也是按照与目标特征的距离升序排列，直接取结果的前k个特征就是对应的k近邻。注意：上述代码中的一些数据结构的定义以及一些对应的函数如：minpq_insert，minpq_extract_min， insert_into_nbr_array， descr_dist_sq等在这里就不贴了，详细代码可参看Rob Hess主页（http://blogs.oregonstate.edu/hess/）中代码参考文档。

　　下面来详细看看函数explore_to_leaf是如何实现的。

/*
Explores a kd tree from a given node to a leaf.  Branching decisions are
made at each node based on the descriptor of a given feature.  Each node
examined but not explored is put into a priority queue to be explored
later, keyed based on the distance from its partition key value to the
given feature's desctiptor.

@param kd_node root of the subtree to be explored
@param feat feature upon which branching decisions are based
@param min_pq a minimizing priority queue into which tree nodes are placed
    as described above

@return Returns a pointer to the leaf node at which exploration ends or
    NULL on error.
*/
//参数和返回值参看以上注释
//搜索路径和优先级队列的生成函数
static struct kd_node* explore_to_leaf( struct kd_node* kd_node, struct feature* feat,
                                        struct min_pq* min_pq )
{
    struct kd_node* unexpl, * expl = kd_node;　　//unexpl中存放着优先级队列的候选特征点
　　　　　　　　　　　　　　　　　　　　　　　　　　　　　//expl为开始搜索节点
    double kv;　　　　　　　　　　　　　　　　　　　　 //kv是分割维度的数据
    int ki;　　　　　　　　　　　　　　　　　　　　    //ki是分割维度序号
    while( expl  &&  ! expl->leaf )
    {
        ki = expl->ki;　　　　　　　　　　　　　　　 //获得分割节点的ki，kv数据
        kv = expl->kv;

        if( ki >= feat->d )
        {
            fprintf( stderr, "Warning: comparing imcompatible descriptors, %s" \
                    " line %d\n", __FILE__, __LINE__ );
            return NULL;
        }
        if( feat->descr[ki] <= kv )　　　　　　　　//目标特征和分割节点分割维上的数据比较
        {
            unexpl = expl->kd_right;　　　　　　  //小于右子树根节点成为候选节点　
            expl = expl->kd_left;               //并进入左子树搜索
        }
        else
        {
            unexpl = expl->kd_left;　　　　　　　 //大于左子树根节点成为候选节点
            expl = expl->kd_right;　　　　　　　  //并进入右子树搜索
        }
　　　　  //将候选节点unexpl根据目标与分割超平面的距离插入到优先级队列中
        if( minpq_insert( min_pq, unexpl, ABS( kv - feat->descr[ki] ) ) )　　
        {
            fprintf( stderr, "Warning: unable to insert into PQ, %s, line %d\n",
                    __FILE__, __LINE__ );
            return NULL;
        }
    }

    return expl;　　//返回搜索路径中最后的叶子节点
}

　　从explore_to_leaf函数的实现中可以看到，优先级队列和搜索路径是同时生成的，这也是BBF算法的精髓所在：在二叉搜索的时候将搜索路径另一侧的分支加入到优先级队列中，供回溯时查找。而优先级队列的排序就是根据目标特征与分割超平面的距离ABS( kv - feat->descr[ki] )

注意：是目标特征和分割超平面间的距离，不是候选节点和分割超平面的距离。如还是上两篇例子中的数据，查找（2,4.5）的k近邻，当搜索到（5,4）节点时，应将（4,7）节点加入搜索路径而将（2,3）节点选为优先级队列的候选节点，优先级的计算是：abs（4 - 4.5） = 0.5。

转载请注明出处：http://www.cnblogs.com/eyeszjwang/articles/2437706.html

相关阅读:
ORACLE增删改查以及case when的基本用法
 ORACLE自增函数，一般函数
 Charles下载与破解方法
 ORACLE常见问题收集
 SpringBoot项目 org.springframework.boot.context.embedded.EmbeddedServletContainerException: Unable to start embedded Jetty servlet container报错
 Cordova搭建环境与问题小结
 redis集群搭建
 centos下的redis一键安装shell脚本
 爬虫那些事儿
 Scrapy爬虫框架基本使用
原文地址：https://www.cnblogs.com/retrieval/p/2439131.html