• 关于HuffmanCoding的简单分析


    1.what's problem we faced?

    /**
    *    Q: what's problem we faced?
    *
    *    A: Data compression is still a problem, even now. we want to compress
    *        the space of data. This desire is more and more stronger when we
    *        need to deal with some operation about data transmission. Before
    *        we start this article, it may be helpful if you try to provide a valid way
    *        to compress data . I tried, but failed obviously. That why I write this
    *        article. ^_^
    */

    2. How can I solve it?

    /**
    *    Q: How can I solve it?
    *
    *    A: Where have problem is where have an answer, although it not always
    *        the best one. In 1951, a algorithm was introduced by David A. Huffman.
    *        It is different from the normal code and is a variable length code, which
    *        have different length of code for different symbol. Now, there are two
    *        problems:
    *
    *        No.1: is  variable length code possible? How can we know the length
    *                of current symbol?
    *
    *                The answer is prefix code. Think about this, a tree like following:
    *
    *                                        
    *                                         O
    *                                   1 /     0
    *                                    O       O
    *                               1 /    0   c
    *                                O      O
    *                                a       b
    *
    *                This is a simple binary tree. There are three leaf node: a, b ,and c.we
    *                label all of left branch as 1, and all of right branch as 0. So if we want
    *                to arrive the leaf node a, the path is 11. In a similar way, we can get
    *                all of nodes:
    *                        a : 11
    *                        b : 10
    *                        c : 0
    *
    *                By accident, we get a variable length code.
    *
    *
    *        No.2: How can we use variable length code to compress a series of symbol?


    *
    *                Now that we have a ability about variable length code. Some funny thing
    *                will happen. Image this, In a data, which consist of a series of symbols,
    *                some of symbols have occur at high proportion. some of symbols has occur
    *                at low proportion. If we use some shorter code to indicate those symbols
    *                which have a high proportion, the space of data will smaller than ever.
    *                That is what we want.
    *
    *        Now, we have been know that we could compress a data by use variable length
    *        code. However, the next problem is what kind of variable length code is what we
    *        want. what kind of code is optimal ?
    */

    3. What is HuffmanCoding ?

    /**
    *    Q: What is HuffmanCoding ?


    *
    *    A:Now,the problem is how can I create a optimal tree ? Do you have any idea?


    *        Huffman was introduced a algorithm. It is looks like greedy algorithm. It is may
    *        be simple, but the result is valid( this will be demonstrated below). The simplest
    *        construction algorithm use a priority queue where the node with lowest probability
    *        is given highest priority, the steps as following:
    *
    *        1. create a leaf node for each symbol, and add it to the priority queue.
    *        2. while there is more than one node in the queue:
    *            1. remove two nodes that have the highest priority.
    *            2. create a new node as the parent node of the two nodes above. the
    *                probability of this one is equal to the sum of the two nodes' probabilities.
    *            3. add the new node to the queue.
    *        3. the remaining node is the root of this tree. Read it's code as we do above.
    *
    */

    4. is it optimal ?

    /**
    *    Q: is it optimal ?
    *
    *    A: Hard to say. I haven't a valid method to measure this. About this issue, it is necessary to hear
    *        about other people's advice. I believe there must be some exciting advice. By the way, this article
    *        is just talk about compress of independent symbol, another important issue is about related symbol.
    *        That maybe a serious problem.
    *
    */

    5. source code

    /**
    *    Here is an simple example
    */
    
    #include <stdio.h>
    #include <iostream>
    
    
    /**
    *    In a Huffman tree, some of nodes is valid symbol, and other is a combine node, which
    *    haven't a valid symbol. we need to label it in our nodes.
    */
    enum ELEM_TYPE {
            ET_VALID,
            ET_INVALID,
            ET_MAX,
    };
    
    typedef int    INDEX;
    
    /**
    *    this is a container, we push all of element to it, and pop element by a priority. It is
    *    a class template since we don't know the type of data element.
    */
    template <class ELEM>
    class Container {
            public:
                    Container( int capacity);
                    ~Container( );
                    /*
                *    push a element to this container.
                */
                    bool push( ELEM item);
                    /*
                *    pop a element from this container, the smallest one have the most priority.
                *    Of course, the element must have provide a reload function for operator '<'.
                */
                    bool pop( ELEM &item );
    		
            private:
                    bool _find_idle( INDEX &num);
                    bool _set_elem( INDEX num, ELEM &elem);
                    bool _get_elem( INDEX num, ELEM &elem);
    		
                    ELEM                *ele;
                    ELEM_TYPE    *stat;
                    int                        cap;
    };
    
    template <class ELEM>
    Container<ELEM>::Container(  int capacity)
    {
            this->ele = new ELEM[capacity] ;
            this->stat = new ELEM_TYPE[capacity];
    
            int        i;
            for( i=0; i<capacity; i++)
                    this->stat[i] = ET_INVALID;
    
            this->cap = capacity ;
    }
    
    template <class ELEM>
    Container<ELEM>::~Container(  )
    {
            if( this->ele!=NULL )
                    delete []this->ele;
    
            if( this->stat!=NULL )
                    delete []this->stat;
    
            this->cap = 0;
    }
    
    template <class ELEM>
    bool Container<ELEM>::push( ELEM item)
    {
            INDEX        num = -1;
    
            if( (!this->_find_idle( num))
                    ||(!this->_set_elem( num, item)))
                    return false;
    
            return true;
    }
    
    template <class ELEM>
    bool Container<ELEM>::pop( ELEM &item )
    {
            INDEX    i = 0;
            INDEX    Min;
    
            /*
           *    find the first valid element.
           */
            while( (this->stat[i]!=ET_VALID)
                            &&( i<this->cap))
                                i++;
    
            for( Min = i ; i<this->cap; i++)
            {
                    if(  ( this->stat[i]==ET_VALID)
                          &&( this->ele[i]<this->ele[Min]))
                        {
                                Min = i;
                        }
            }
    
            return this->_get_elem( Min, item);
    }
    
    
    template <class ELEM>
    bool Container<ELEM>::_find_idle( INDEX &num)
    {
            INDEX        i;
            for( i=0; i<this->cap; i++)
            {
                    if( this->stat[i]==ET_INVALID )
                    {
                            num = i;
                            return true;
                    }
            }
    
            return false;
    }
    
    template <class ELEM>
    bool Container<ELEM>::_set_elem( INDEX num, ELEM &elem)
    {
            if( (num>=this->cap)
                    ||(num<0) )
                        return false;
    
            this->stat[num] = ET_VALID;
            this->ele[num] = elem;
    
            return true;
    }
    
    template <class ELEM>
    bool Container<ELEM>::_get_elem( INDEX num, ELEM &elem)
    {
            if( (num<0)
                    ||(num>=this->cap))
                        return false;
    
            this->stat[num] = ET_INVALID;
            elem =  this->ele[num];
    
            return true;
    }
    
    /**
    *    define a type of symbol. It will be used to record all information about a symbol.
    */
    typedef char SYMINDEX;
    typedef int SYMFRE;
    
    class Symbol {
            public:
                    /*
                *    In the Huffman tree, we need to compute the sum of two child symbol.
                *    For convenience,build a reload function is necessary.
                */
                    Symbol operator + ( Symbol &s);
                    SYMINDEX        sym;
                    SYMFRE            freq;
    };
    
    Symbol Symbol::operator +( Symbol &s)
    {
            Symbol        ret;
            ret.sym = '';
            ret.freq = this->freq + s.freq;
            return ret;
    }
    
    /**
    *    define a node of binary tree. It will be used to create a Huffman tree.
    */
    class HTreeNode {
            public:
                    /*
                *    In the container, we need compare two nodes. So this node must
                *    provide a reload function about '<'.
                */
                    bool operator< ( HTreeNode &n);
    
                    HTreeNode        *lchild;
                    HTreeNode        *rchild;
                    Symbol                sym;
    };
    
    bool HTreeNode::operator < ( HTreeNode &n)
    {
    
            return this->sym.freq<n.sym.freq?

    true: false; } /** * This is the core structure. It will build a Huffman coding based on our input symbol. */ class HuffmanCoding { public: HuffmanCoding( ); ~HuffmanCoding( ); bool Set( Symbol s[], int num); bool Work( void); private: /* * create a Huffman tree. */ bool CreateTree(Symbol s[], int num ); bool DestroyTree( ); /* * read Huffman coding from a Huffman tree. */ bool ReadCoding( ); bool TravelTree( HTreeNode *parent, char *buf, INDEX cur); Symbol *sym ; int sym_num ; HTreeNode *root ; }; HuffmanCoding::HuffmanCoding( ) { this->sym = NULL; this->sym_num = 0; this->root = NULL; } HuffmanCoding::~HuffmanCoding( ) { if( this->sym!=NULL) delete []this->sym; this->sym_num = 0; this->DestroyTree( ); } /** * receive data from outside. Actually, this function is not necessary.But for make the * algorithm looks like more concise,maybe this function is necessary. */ bool HuffmanCoding::Set( Symbol s [ ], int num) { this->DestroyTree( ); this->sym = new Symbol[num]; for( int i=0; i<num; i++) this->sym[i] = s[i]; if( NULL!=this->sym) { this->sym_num = num; return true; } else { this->sym_num = 0; return false; } } /** * The core function. In this function, we create a Huffman tree , then read it. */ bool HuffmanCoding::Work( void) { //Create a Huffman tree if( !this->CreateTree( this->sym, this->sym_num)) return false; //read Huffman coding if( !this->ReadCoding( )) return false; return true; } bool HuffmanCoding::CreateTree( Symbol s[], int num) { /* * create a priority tank. It always pop the element of the highest priority in the tank. */ Container<HTreeNode> tank(num); for( int i=0; i<this->sym_num; i++) { HTreeNode node; node.lchild = NULL; node.rchild = NULL; node.sym = s[i]; tank.push( node); } /* * always pop two nodes, if fail, that's means there is only one node remain and it * is the root node of this Huffman tree. */ HTreeNode node1; HTreeNode node2; while( tank.pop( node1) && tank.pop( node2) ) { HTreeNode parent; parent.lchild = new HTreeNode; parent.rchild = new HTreeNode; *parent.lchild = node1; *parent.rchild = node2; parent.sym = node1.sym + node2.sym; /* * push new node to the tank. */ tank.push( parent); } this->root = new HTreeNode(node1); return true; } bool HuffmanCoding::DestroyTree( ) { return false; } bool HuffmanCoding::ReadCoding( ) { char *code; code = new char[this->sym_num + 1]; /* * travel the Huffman tree and print the code of all valid symbols. */ this->TravelTree( this->root, code, 0); delete []code; return true; } #define LCHAR '1' #define RCHAR '0' bool HuffmanCoding::TravelTree( HTreeNode *parent, char *buf, INDEX cur) { buf[cur] = ''; if( (parent->lchild==NULL) &&(parent->rchild==NULL) ) {//end node printf("[ %c] : %s ", parent->sym.sym, buf); } if( parent->lchild!=NULL ) { buf[cur] = LCHAR; this->TravelTree( parent->lchild, buf, cur + 1); } if( parent->rchild!=NULL ) { buf[cur] = RCHAR; this->TravelTree( parent->rchild, buf, cur + 1); } return true; } static Symbol sArr[ ] = { { '0', 0}, { '1', 1}, { '2', 2}, { '3', 3}, { '4', 4}, { '5', 5}, { '6', 6}, { '7', 7}, { '8', 8}, { '9', 9}, }; int main() { HuffmanCoding hcoding; hcoding.Set( sArr, 10); hcoding.Work( ); return 0; }


     

  • 相关阅读:
    (十三)过滤器Filter(转)
    (十二)会话跟踪技术之servlet通信(forward和include)
    (十一)会话跟踪技术之作用域(request、session、servletContext)
    openjdk源码目录结构
    java socket相关的timeout
    eclipse创建maven web app
    hadoop mapred和mapreduce包
    hadoop shuffle
    bash shell和进程
    bash shell中的特殊用法
  • 原文地址:https://www.cnblogs.com/yangykaifa/p/7140457.html
Copyright © 2020-2023  润新知