• 用Java实现MVPtree——MVPtree核心算法代码的搭建


      项目需要,需要把MVPtree这种冷门的数据结构写入Java,然网上没有成形的Java实现,虽说C++看惯了不过对C++实现复杂结构也是看得蒙蔽,幸好客户给了个github上job什么的人用Java写的VPtree,大体结构可以嵌入MVPtree。

      对于MVPtree的其他信息请左转百度= =本文只讲述算法实现。

      点查找树结构主要需解决的问题有2个:如何减少非必要点的搜索,以及如何减少距离计算次数。前者的解决方法比较容易想到,把点集分割为左右对称的两半长方形,或者脑洞大点的,通过距离切分(效率很高,因为所有查询都是基于点距离的)成为圆和圆环。后者适用面不是很广,优化思路通常是预先计算与基准点的距离,查询点时筛点。

      VPtree就是使用距离划分点集的例子。每个结点一个点集,随意定个点作为基准点,然后把点集根据与基准点距离分成数量相等的2个子集,这2个子集再分别进入此结点的子结点,用点查找出点集的过程如出一辙,但是没有对第2点进行优化,这个结构适合于距离函数是曼哈顿距离或者欧几里得距离的情况。

      MVPtree继承了VPtree用距离划分的特点,只不过一个结点会划分4个点集,同时通过path数组限制距离函数运行次数。划分为4个点集而不是2个点集,可以分割得细一些,减少无效点;使用一定数量的基准点限制,可以在查询频繁的情况下减少距离计算次数,并且这些基准点通常被切分得很散,大片大片的无效区域被排除了,效果拔群。这个结构适合于距离函数是计算次数过高的切比雪夫函数之流。

      接下来就是代码的实现了。

      MVPtree与VPtree的点有个不同之处,就是MVPtree的点还附上了与基准点的距离数组,这里就需要使用特别的点数据结构:MVPtree用点

      核心代码如下:

    public class MVPTreePoint<P> {
        
        private ArrayList<Double> path;
        
        private P point;
        
        private final int maxLevel;
        
        public MVPTreePoint(final P point, final int maxLevel) {
            this.point = point;
            this.maxLevel = maxLevel;
            this.path = new ArrayList<>();
        }
        
        public void addDistanceToSelf(final MVPTreePoint<P> vantagePointElement, final DistanceFunction<P> distanceFunction) {
            if(this.path.size() < this.maxLevel)
                this.path.add(distanceFunction.getDistance(this.point, vantagePointElement.point));
        }
        
        public void addDistanceToSelf(final P vantagePoint, final DistanceFunction<P> distanceFunction) {
            if(this.path.size() < this.maxLevel)
                this.path.add(distanceFunction.getDistance(this.point, vantagePoint));
        }
        
        public void addDistanceToSelf(final double distance) {
            if(this.path.size() < this.maxLevel) {
                this.path.add(distance);
            }
        }
        
        public void removeDistanceToSelf(final int position) {
            if(position < this.path.size()) {
                this.path.remove(position);
            }
        }
    
        public double getDistanceToSelf(int i) {
            return this.path.get(i);
        }
        
        public int size() {
            return this.path.size();
        }
        
        public void clearPath() {
            this.path.clear();
        }
        
        public P getPoint() {
            return this.point;
        }
        
        @SuppressWarnings("unchecked")
        public boolean equals(Object o){ 
            MVPTreePoint<P> t = (MVPTreePoint<P>) o;  
            return this.point.equals(t.point);
        } 
    }
    MVPTreePoint

      把距离数组写到点类上而不是集成到树结点类上,结构会清晰一些,并且从点里取出距离也方便。

      MVPtree与VPtree有好多不同的地方,但是好多都只是改一下类名,把P,E改成MVPTreePoint<P>,MVPTreePoint<E>,这里主讲核心算法——初始化树和点查询。

      初始化MVPtree不仅要多选出一个基准点,多切分2次数组,还要把基准点到每个点的距离都分别储存起来。

      capacity就是叶子结点的容量,要设中间一些,根据数据规模定吧。

      原论文把基准点从点集取出来放到单独的位置上,但是实际编写程序时,把基准点仅仅当作一个基准点,基准点还是作为点集的一部分初始化。这样,数据结构仅仅是多出quantityOfPoint/capacity个点,但是程序编写方便了很多。

    public MVPTreeNode(
                final Collection<MVPTreePoint<E>> pointNodes,
                final DistanceFunction<P> distanceFunction,
                final MVPThresholdSelectionStrategy<P, E> thresholdSelectionStrategy,
                final int capacity, final int maxLevel) {
    
            if (capacity < 1) {
                throw new IllegalArgumentException("Capacity must be positive.");
            }
    
            if (pointNodes.isEmpty()) {
                throw new IllegalArgumentException(
                        "Cannot create a MVPTreeNode with an empty list of points.");
            }
    
            this.capacity = capacity;
            this.maxLevel = maxLevel;
            this.distanceFunction = distanceFunction;
            this.thresholdSelectionStrategy = thresholdSelectionStrategy;
            this.pointNodes = new ArrayList<>(pointNodes);
            this.children = new MVPTreeNode[2][2];
            this.vantagePoint = (E[]) new Object[2];
            this.secondThreshold = new double[2];
    
            this.anneal();
        }
    
        protected void anneal() {
            if (this.pointNodes == null) {
                int childrenSize[][] = new int[2][2];
                for (int i = 0; i < 2; i++) {
                    for (int j = 0; j < 2; j++) {
                        childrenSize[i][j] = this.children[i][j].size();
                    }
                }
    
                if (childrenSize[0][0] == 0 || childrenSize[0][1] == 0
                        || childrenSize[1][0] == 0 || childrenSize[1][1] == 0) {
                    // One of the child nodes has become empty, and needs to be
                    // pruned.
                    this.pointNodes = new ArrayList<>(childrenSize[0][0]
                            + childrenSize[0][1] + childrenSize[1][0]
                            + childrenSize[1][1]);
                    this.addAllPointsToCollection(this.pointNodes);
                    for (MVPTreePoint<E> pointNode : this.pointNodes) {
                        pointNode.clearPath();
                    }
                    for (int i = 0; i < 2; i++) {
                        for (int j = 0; j < 2; j++) {
                            this.children[i][j] = null;
                        }
                    }
                    this.anneal();
                } else {
                    for (int i = 0; i < 2; i++) {
                        for (int j = 0; j < 2; j++) {
                            this.children[i][j].anneal();
                        }
                    }
                }
            } else {
                int firstVantagePointIndex = new Random().nextInt(this.pointNodes
                        .size());
                this.vantagePoint[0] = this.pointNodes.get(firstVantagePointIndex)
                        .getPoint();
                this.firstThreshold = this.thresholdSelectionStrategy
                        .selectThreshold(this.pointNodes, this.vantagePoint[0],
                                this.distanceFunction);
                int firstIndexPastThreshold;
                try {
                    firstIndexPastThreshold = MVPTreeNode.partitionPoints(
                            this.pointNodes, this.vantagePoint[0],
                            this.firstThreshold, this.distanceFunction);
    
                } catch (final PartitionException e) {
                    this.storeInOneNode();
                    return;
                }
    
                if (this.pointNodes.size() > this.capacity) {
                    List<MVPTreePoint<E>> subTreeList[] = new List[2];
    
                    subTreeList[0] = this.pointNodes.subList(0,
                            firstIndexPastThreshold);
                    subTreeList[1] = this.pointNodes.subList(
                            firstIndexPastThreshold, this.pointNodes.size());
    
                    // if points can be divided into 2 parts, find second vantage
                    // point and try to split point array
                    int secondVantagePointIndex = new Random()
                            .nextInt(subTreeList[1].size());
                    this.vantagePoint[1] = subTreeList[1].get(
                            secondVantagePointIndex).getPoint();
                    int splitPosition[] = new int[2];
                    for (int i = 0; i < 2; i++) {
                        this.secondThreshold[i] = this.thresholdSelectionStrategy
                                .selectThreshold(subTreeList[i],
                                        this.vantagePoint[1], this.distanceFunction);
                        try {
                            splitPosition[i] = MVPTreeNode.partitionPoints(
                                    subTreeList[i], this.vantagePoint[1],
                                    this.secondThreshold[i], this.distanceFunction);
                        } catch (final PartitionException e) {
                            this.storeInOneNode();
                            return;
                        }
                    }
                    for (MVPTreePoint<E> pointNode : this.pointNodes) {
                        pointNode.addDistanceToSelf(this.distanceFunction
                                .getDistance(pointNode.getPoint(),
                                        this.vantagePoint[0]));
                        pointNode.addDistanceToSelf(this.distanceFunction
                                .getDistance(pointNode.getPoint(),
                                        this.vantagePoint[1]));
                    }
                    for (int i = 0; i < 2; i++) {
                        this.children[i][0] = new MVPTreeNode<>(
                                subTreeList[i].subList(0, splitPosition[i]),
                                this.distanceFunction,
                                this.thresholdSelectionStrategy, this.capacity,
                                this.maxLevel);
                        this.children[i][1] = new MVPTreeNode<>(
                                subTreeList[i].subList(splitPosition[i],
                                        subTreeList[i].size()),
                                this.distanceFunction,
                                this.thresholdSelectionStrategy, this.capacity,
                                this.maxLevel);
                    }
                    this.pointNodes = null;
                } else {
                    this.storeInOneNode();
                }
            }
        }
    
        private void storeInOneNode() {
            int maxIndex = 0;
            double maxDistance = this.distanceFunction.getDistance(this.pointNodes
                    .get(0).getPoint(), this.vantagePoint[0]);
            for (int i = 1; i < this.pointNodes.size(); i++) {
                double curDistance = this.distanceFunction.getDistance(
                        this.pointNodes.get(i).getPoint(), this.vantagePoint[0]);
                if (maxDistance < curDistance) {
                    maxDistance = curDistance;
                    maxIndex = i;
                }
            }
            this.vantagePoint[1] = this.pointNodes.get(maxIndex).getPoint();
    
            for (int i = 0; i < 2; i++) {
                for (int j = 0; j < 2; j++) {
                    this.children[i][j] = null;
                }
            }
        }
    init MVPtree

       原作者给出了2种查询方式:找离查询点前k近点和找离查询点不远于u点。

      找离查询点前k点的算法可以沿用查询VPtree时的做法,先查找查询点所在的子结点,再查找其他子结点,注意要先判定收集者是否装满(没装满的话,不管是啥点都直接塞),再判定收集者与查询点的最远距离(对第二种查找方式来说是固定距离)是否小于点/点集与查询点的最近距离(在树结点和叶子结点都有用处)。

    public void collectNearestNeighbors(
                final NearestNeighborCollector<P, E> collector, int depth) {
            if (this.pointNodes == null) {
                // O1-Q
                final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction
                    .getDistance(this.vantagePoint[0],
                        collector.getQueryPoint().getPoint());
    
                // O2-Q
                final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction
                    .getDistance(this.vantagePoint[1],
                        collector.getQueryPoint().getPoint());
    
                collector.getQueryPoint().addDistanceToSelf(
                        distanceFromFirstVantagePointToQueryPoint);
                collector.getQueryPoint().addDistanceToSelf(
                        distanceFromSecondVantagePointToQueryPoint);
                
                final MVPTreeNode<P, E> index = this
                        .getChildNodeForPoint(collector.getQueryPoint().getPoint());
                index.collectNearestNeighbors(collector, depth + 1);
                
                // O1-Q - O1-S1
                double basicDistance = distanceFromFirstVantagePointToQueryPoint
                        - this.firstThreshold;
                
                for(int i = 0;i < 2;i ++){
                    if (!collector.isFull() || basicDistance <= collector.getRadius()) {
                        // O2-Q - O2-S2
                        double touchDistance = distanceFromSecondVantagePointToQueryPoint
                                - this.secondThreshold[i];
    
                        for(int j = 0;j < 2;j ++){
                            if (index != this.children[i][j]
                                    && (!collector.isFull() || touchDistance <= collector.getRadius())) {
                                this.children[i][j].collectNearestNeighbors(collector, depth + 1);
                            }
                            touchDistance *= -1;
                        }
                    }
                    basicDistance *= -1;
                }
                collector.getQueryPoint().removeDistanceToSelf(depth + depth + 1);
                collector.getQueryPoint().removeDistanceToSelf(depth + depth);
            } else {
                for (final MVPTreePoint<E> pointNode : this.pointNodes) {
                    if(!collector.isFull() || this.isAbleToInsert(collector.getRadius(), 
                                    collector.getQueryPoint(), pointNode)) {
                        collector.offerPoint(pointNode.getPoint());
                    }
                }
            }
        }
    collectNearestNeighbors

      找离查询点不远于u点算法就是论文里讲述的算法,执行步骤与收集第k近有相同之处,不同在于限定距离是固定值,且任何时候都必须判定,点集没有数量限制。

    public void collectAllWithinDistance(final MVPTreePoint<P> queryPoint,
                final double maxDistance, final Collection<E> collection, int depth) {
            if (this.pointNodes == null) {
                final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction
                        .getDistance(this.vantagePoint[0], queryPoint.getPoint());
                final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction
                        .getDistance(this.vantagePoint[1], queryPoint.getPoint());
    
                queryPoint
                        .addDistanceToSelf(distanceFromFirstVantagePointToQueryPoint);
                queryPoint
                        .addDistanceToSelf(distanceFromSecondVantagePointToQueryPoint);
    
                // We want to search any of this node's children that intersect with
                // the query region
                if (distanceFromFirstVantagePointToQueryPoint <= this.firstThreshold
                        + maxDistance) {
                    if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[0]
                            + maxDistance) {
                        this.children[0][0].collectAllWithinDistance(queryPoint,
                                maxDistance, collection, depth + 1);
                    }
    
                    if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[0]) {
                        this.children[0][1].collectAllWithinDistance(queryPoint,
                                maxDistance, collection, depth + 1);
                    }
                }
    
                if (distanceFromFirstVantagePointToQueryPoint + maxDistance >= this.firstThreshold) {
                    if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[1]
                            + maxDistance) {
                        this.children[1][0].collectAllWithinDistance(queryPoint,
                                maxDistance, collection, depth + 1);
                    }
    
                    if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[1]) {
                        this.children[1][1].collectAllWithinDistance(queryPoint,
                                maxDistance, collection, depth + 1);
                    }
                }
                queryPoint.removeDistanceToSelf(depth + depth + 1);
                queryPoint.removeDistanceToSelf(depth + depth);
            } else {
                for (MVPTreePoint<E> pointNode : pointNodes) {
                    if (this.isAbleToInsert(maxDistance, queryPoint, pointNode))
                        collection.add(pointNode.getPoint());
                }
            }
        }
    collectAllWithinDistance

      这两种查询方式都需要比较预先计算的距离,把这种计算合为一个函数:

    public boolean isAbleToInsert(double limitDistance,
                MVPTreePoint<P> queryPoint, MVPTreePoint<E> pointNode) {
    
            for (int i = 0; i < queryPoint.size(); i++) {
                double disOffset = queryPoint.getDistanceToSelf(i)
                        - pointNode.getDistanceToSelf(i);
    
                if (Math.abs(disOffset) > limitDistance) {
                    return false;
                }
            }
    
            return this.distanceFunction.getDistance(pointNode.getPoint(),
                    queryPoint.getPoint()) <= limitDistance;
        }
    isAbleToInsert

      其他函数也需要修改,但是没有像这3个函数一样大幅度的修改结构。

    -------------------------------我是分割线------------------------------------

    代码地址:https://coding.net/u/funcfans/p/MVPtree-for-Java/git

  • 相关阅读:
    学习Karma+Jasmine+istanbul+webpack自动化单元测试
    学习测试框架Mocha
    WebSockets通信
    简单的CSS圆形缩放动画
    css3 实现图片等比例放大与缩小
    CSS3之多列布局columns详解
    scp传输文件的命令
    学习rollup.js模块文件打包
    go语言之进阶篇通过switch实现类型断言
    go语言之进阶篇通过if实现类型断言
  • 原文地址:https://www.cnblogs.com/dgutfly/p/6880320.html
Copyright © 2020-2023  润新知