LZW压缩算法原理及其Java实现

LZW压缩算法原理及其Java实现

LZW压缩算法是一种新颖的压缩方法，由Lemple-Ziv-Welch 三人共同创造，用他们的名字命名。

它采用了一种先进的串表压缩不，将每个第一次出现的串放在一个串表中，用一个数字来表示串，压

缩文件只存贮数字，则不存贮串，从而使图象文件的压缩效率得到较大的提高。奇妙的是，不管是在

压缩还是在解压缩的过程中都能正确的建立这个串表，压缩或解压缩完成后，这个串表又被丢弃。

1.基本原理
首先建立一个字符串表，把每一个第一次出现的字符串放入串表中，并用一个数字来表示，这个

数字与此字符串在串表中的位置有关，并将这个数字存入压缩文件中，如果这个字符串再次出现时，

即可用表示它的数字来代替，并将这个数字存入文件中。压缩完成后将串表丢弃。如"print" 字符串

，如果在压缩时用266表示，只要再次出现，均用266表示，并将"print"字符串存入串表中，在图象解

码时遇到数字266，即可从串表中查出266所代表的字符串"print"，在解压缩时，串表可以根据压缩数

据重新生成。

2.实现方法
A.初始化串表
在压缩图象信息时，首先要建立一个字符串表，用以记录每个第一次出现的字符串。一个字符串

表最少由两个字符数组构成，一个称为当前数组，一个称为前缀数组，因为在GIF文件中每个基本字符

串的长度通常为2（但它表示的实际字符串长度可达几百甚至上千），一个基本字符串由当前字符和它

前面的字符（也称前缀）构成。前缀数组中存入字符串中的首字符，当前数组存放字符串中的尾字符

，其存入位置相同，因此只要确定一个下标，就可确定它所存贮的基本字符串，所以在数据压缩时，

用下标代替基本字符串。一般串表大小为4096个字节（即2 的12次方），这意味着一个串表中最多能

存贮4096个基本字符串，在初始化时根据图象中色彩数目多少，将串表中起始位置的字节均赋以数字

，通常当前数组中的内容为该元素的序号（即下标），如第一个元素为0，第二个元素为1，第15个元

素为14 ，直到下标为色彩数目加2的元素为止。如果色彩数为256，则要初始化到第258个字节，该字

节中的数值为257。其中数字256表示清除码，数字257 为图象结束码。后面的字节存放文件中每一个

第一次出现的串。同样也要音乐会前缀数组初始化，其中各元素的值为任意数，但一般均将其各位置

1，即将开始位置的各元素初始化为0XFF，初始化的元素数目与当前数组相同，其后的元素则要存入每

一个第一次出现的字符串了。如果加大串表的长度可进一步提高压缩效率，但会降低解码速度。

B.压缩方法
了解压缩方法时，先要了解几个名词，一是字符流，二是代码流，三是当前码，四是当前前缀。

字符流是源图象文件中未经压缩的图象数据；代码流是压缩后写入GIF 文件的压缩图象数据；当前码

是从字符流中刚刚读入的字符；当前前缀是刚读入字符前面的字符。
GIF 文件在压缩时，不论图象色彩位数是多少，均要将颜色值按字节的单位放入代码流中，每个字节

均表示一种颜色。虽然在源图象文件中用一个字节表示16色、4色、2色时会出现4位或更多位的浪费（

因为用一个字节中的4位就可以表示16色），但用LZW 压缩法时可回收字节中的空闲位。在压缩时，先

从字符流中读取第一个字符作为当前前缀，再取第二个字符作为当前码，当前前缀与当前码构成第一

个基本字符串（如当前前缀为A，当前码为B则此字符串即为AB），查串表，此时肯定不会找到同样字

符串，则将此字符串写入串表，当前前缀写入前缀数组，当前码写入当前数组，并将当前前缀送入代

码流，当前码放入当前前缀，接着读取下一个字符，该字符即为当前码了，此时又形成了一个新的基

本字符串（若当前码为C，则此基本字符串为BC），查串表，若有此串，则丢弃当前前缀中的值，用

该串在串表中的位置代码（即下标）作为当前前缀，再读取下一个字符作为当前码，形成新的基本字

符串，直到整幅图象压缩完成。由此可看出，在压缩时，前缀数组中的值就是代码流中的字符，大于

色彩数目的代码肯定表示一个字符串，而小于或等于色彩数目的代码即为色彩本身。

C.清除码
事实上压缩一幅图象时，常常要对串表进行多次初始化，往往一幅图象中出现的第一次出现的基

本字符串个数会超过4096个，在压缩过程中只要字符串的长度超过了4096，就要将当前前缀和当前码

输入代码流，并向代码流中加入一个清除码，初始化串表，继续按上述方法进行压缩。

D.结束码
当所有压缩完成后，就向代码流中输出一个图象结束码，其值为色彩数加1，在256色文件中，结

束码为257。

E.字节空间回收
在GIF文件输出的代码流中的数据，除了以数据包的形式存放之外，所有的代码均按单位存贮，样

就有效的节省了存贮空间。这如同4位彩色（16色）的图象，按字节存放时，只能利用其中的4位，另

外的4位就浪费了，可按位存贮时，每个字节就可以存放两个颜色代码了。事实上在GIF 文件中，使用

了一种可变数的存贮方法，由压缩过程可看出，串表前缀数组中各元素的值颁是有规律的，以256色的

GIF文件中，第258-511元素中值的范围是0-510 ，正好可用9位的二进制数表示，第512-1023元素中值

的范围是0-1022，正好可用10位的二进制数表示，第1024-2047 元素中值的范围是0-2046，正好用11

位的二进制数表示，第2048-4095元素中值的范围是0-4094，正好用12位的二进制数表示。用可变位数

存贮代码时，基础位数为图象色彩位数加1，随着代码数的增加，位数也在加大，直到位数超过为12（

此时字符串表中的字符串个数正好为2 的12次方，即4096个）。其基本方法是：每向代码流加入一个

字符，就要判别此字符所在串在串表中的位置（即下标）是否超过2的当前位数次方，一旦超过，位数

加1。如在4位图象中，对于刚开始的代码按5位存贮，第一个字节的低5位放第一个代码，高三位为第

二个代码的低3位，第二个字节的低2位放第二个代码的高两位，依次类推。对于8位（256色）的图象

，其基础位数就为9，一个代码最小要放在两个字节。

F.压缩范围
以下为256色GIF文件编码实例，如果留心您会发现这是一种奇妙的编码方法，同时为什么在压缩

完成后不再需要串表，而且还在解码时根据代码流信息能重新创建串表。
字符串: 1,2,1,1,1,1,2,3,4,1,2,3,4,5,9,…
当前码: 2,1,1,1,1,2,3,4,1,2,3,4,5,9,…
当前前缀: 1,2,1,1,260,1,258,3,4,1,258,262,4,5,…
当前数组: 2,1,1, 1, 3,4,1, 4,5,9,…
数组下标: 258,259,260,261,262,263,264,265,266,267,…
代码流: 1,2,1,260,258,3,4,262,4,5,…

GIF文件作为一种重要的图形图象文件格式，尽管其编码规则极复杂，但其压缩效率是极高的，特

别是对某些平滑过渡的图象的图形，压缩效果更好。同时由于其在压缩过程中的对图象信息能够完整

的保存，在目前流行的电子图片及电子图书中得到了广泛的应用。

附：LZW算法的Java模拟实现，

package com.anywhere;
import java.io.*;

public class lzwCode
{

Dictionary dic=new Dictionary();

int count1=0,count2=0;

BufferedInputStream in;

BufferedOutputStream out;
final short END=4095;

public static void main(String []args)
{
if ( args.length<=1 || args.length>4 )
{
System.out.println("-c sourceFile [targetFile] [-dic] 建立一个压
缩文件 ");
System.out.println("-d sourceFile [targetFile] [-dic] 解压缩一个
文件 ");
}
else if(! ( args[0].equals(new String("-c") )||args[0].equals(new
String("-d") ) ) )
{
System.out.println("-c sourceFile [targetFile] 建立一个压缩文件
n");
System.out.println("-d sourceFile [targetFile] 解压缩一个文件 "
);
}
else if(args.length>=2)
{
lzwCode a=new lzwCode(args);
a.run(args);

}
return ;
}

public lzwCode(String []args)
{

try{
String f=new String();
in =new BufferedInputStream(
new FileInputStream(
new File(args[1])));
if(args.length==3 && !args[2].equals(new String("-dic")))
{
f=args[2];
}
else
{
int i=args[1].lastIndexOf(new String(".") );
f=args[1].substring(0,i)+((args[0].equals("-c")
)?".lzw":".dlzw");
}
out=new BufferedOutputStream(
new FileOutputStream(
new File(f)));

}//try
catch(FileNotFoundException e )
{
System.err.println(e);
return;
}

catch(IOException e )
{
System.err.println(e);
return;
}

}

public void run(String args[] )
{

if(args[0].equals(new String("-c")) )
{
code(in,out);
}
else
{
decode(in,out);
}
if(args[args.length-1].equals(new String("-dic") ))
System.out.println(dic.toString ());

}

public void code(BufferedInputStream in,BufferedOutputStream out)
{
System.out.println("coding... "+ "....... ");

//a:the buffer byte read from the input file,then to be converted to
String
//buf: the codestream to store in the code file
//prefix :the pre_String of the dictory
// the indexbuf[] is the index of dictionary to be converted in
// the code file
//str: the current charecter of the character input Stream
byte a[]=new byte[1],buf[]=new byte[3];

String prefix="",cur="";
byte i=0;
short indexbuf[]=new short[2];

String str=null;
try{
short m=0;
while( (a[0]=(byte)in.read() ) != -1 )
{
cur=new String(a);// be converted
count1++; // the number of bytes of input file
str=prefix;
str=str.concat(cur);
m=(short)dic.indexOf(str);

if( m!=-1)//the prefix is in the dictionary,
{
prefix=str;
}
else//
{

if(i==0)//the first indexbuf,store in codebuf[]
{
indexbuf[0]=(short)dic.indexOf(prefix);
i=1;
}
else// now have 2 index number,then ouput to the code file
{
indexbuf[1]=(short)dic.indexOf(prefix);
zipOutput(out,indexbuf);

count2+=3;//3 bytes stored to the code file
i=0;
}

dic.add(str);
prefix=cur;

}//else

}//while

// System.out.println("i="+i);
if(i==(byte)1) //this is the case that the
//input file has only odd index number to store
{
indexbuf[1]=END;//put a special index number
//(the max number of the dictionary) END to the
code file
zipOutput(out,indexbuf);
count2+=3;

}

dic.add(str);
in.close ();
out.close ();

System.out.println("zip rate:"+(float)count2*100/count1+"% ");
}catch(IOException e )
{
System.err.println(e);
return;
}
catch(OutDictionaryException e)
{
System.err.println(e);
return;
}

}
public void decode(BufferedInputStream in,BufferedOutputStream out)
{
System.out.println("decoding... "+"....... ");

short precode=0,curcode=0;
String prefix=null;
short i=0;
short bufcode[]=new short[2];//2 code read from the code file
boolean more=true;//indicate the end of the file or some error while
input the file

// DataOutputStream out2=new DataOutputStream(out);
try{

more=zipInput(in,bufcode);//first input 2 code
if(more)
{
curcode=bufcode[0];
// out2.writeChars(dic.getString(curcode));
stringOut(out,dic.getString(curcode) );

}
else
System.out.println("error in the beginning...");

while(more)
{
precode=curcode;

if(i==0)
{
curcode=bufcode[1];
i=1;
}
else
{
more=zipInput(in,bufcode);

curcode=bufcode[0];
if(bufcode[1]==END)
{

stringOut(out,dic.getString (bufcode[0] ));
break;
}
i=0;
}

if(curcode
dictory
{
// out2.writeChars(dic.getString(curcode));
stringOut(out,dic.getString(curcode) );
prefix=dic.getString(precode);

prefix+=(dic.getString(curcode)).substring(0,1);
dic.add(prefix);

}
else
{
prefix=dic.getString(precode);
prefix+=prefix.substring(0,1);
// out2.writeChars(prefix);
stringOut(out,prefix );
dic.add(prefix);

}//else
}//while

in.close ();
out.close ();

}catch( OutDictionaryException e )
{
System.err.println(e);
return;
}
catch(IOException e)
{
System.err.println(e);
return;
}

}

private void zipOutput(BufferedOutputStream out,short index[])
{
try{

byte buf[]=new byte[3];

buf[1]=(byte)(index[0]<<4);

buf[0]=(byte)(index[0]>>4);

buf[2]=(byte)index[1];
buf[1]+=(byte)(index[1]>>8);

out.write(buf,0,3);

//out put the decoding
// System.out.println(index[0]+" "+index[1]+" ");

}catch( IOException e )
{
System.err.println(e);
return;
}

}

private boolean zipInput(BufferedInputStream in,short codebuf[])
{
byte buf[]=new byte[3],temp;
//int intbuf[]=new int[3],temp;
short le=(short)dic.length();
try{

if(in.read(buf,0,3)!=3)
{
System.out.println("the end of the file!");
return false;
}
//codebuf[0]=(short)(buf[0]<<4);
codebuf[0]=toRight(buf[0],4);
codebuf[0]+=(short)(toRight(buf[1],0)>>4);

//codebuf[1]=(short)buf[2];
codebuf[1]=toRight(buf[2],0);
//codebuf[1]=(byte)(buf[1]<<4);
temp=(byte)(toRight(buf[1],4));
codebuf[1]+=toRight(temp,4);
// System.out.println(codebuf[0]+" "+codebuf[1]);

if(codebuf[0]<-1 ||codebuf[1]<-1)
{
System.out.println("erroring while getting the code
:"+codebuf[0]+" "+codebuf[1]);
System.out.println(dic);
return false;
}
//System.out.println(codebuf[0]+" "+codebuf[1]);
}
catch(IOException e )
{
System.err.println(e);
return false;
}
return true;
}

private short toRight(byte buf,int n)
{
short s=0;
for(short i=7;i>=0;i--)
{
if( ( (1L<<i)&buf )!=0 )
s+=(short)(1L<<(i+n));
}
return s;
}

private void stringOut(BufferedOutputStream out,String str)
{
byte a[]=str.getBytes();
try{
out.write(a,0,str.length());
}
catch(IOException e )
{
System.err.println(e);

}

}
}

//Dictionary.java

package com.anywhere;
import java.util.*;

class OutDictionaryException extends Exception
{
public String toString()
{
return (super.toString ()+"out of the dictionary size!!");
}
}

public class Dictionary
{

ArrayList ar=new ArrayList();

public Dictionary()
{
// byte i[]=new byte[1];
char c[]=new char[1];
for( c[0]=0;c[0]<128;c[0]++)
{

ar.add(new String(c));

}
}

public int indexOf(String a)
{
return ar.indexOf(a);
}

public void add (String a) throws OutDictionaryException
{

if( length()<4096)
ar.add(a);
else
{

throw(new OutDictionaryException());

}
}

public int length()
{

return (short)ar.size();
}

public String toString()
{
Integer le=new Integer(length() );

String str="size of the dictionary: "+le.toString ()+" ";
for(int i=0;i
str+=new String(i+": "+(String)ar.get(i)+" ");
return str;
}

public String getString(short i)
{
return (String)ar.get(i);
}

public static void main(String []args )
{
Dictionary a=new Dictionary();

System.out.println(a);
}
}

--
当我为某件事睡不着觉的时候，
那便是我得狂热战胜了理智的时候……

http://jiangzhengjun.iteye.com/blog/517186

http://snowolf.iteye.com/blog/465433

http://blog.csdn.net/isea533/article/details/7995472
相关阅读:
bert源码的文件、参数理解
 除了利用打印的方法保存colab，如何直接转化为图片（附使用tf自己预训练模型导入办法）
sse、mse、rmse、 r-square
我的开源之旅（也许中道崩卒哈哈哈）
attention_utils无法导入
 那些天，shell脚本中曾经踩过的坑
 python通过webservice接口实现配置下发
 python源文件转换成exe问题解决贴
 suds库使用说明官方文档
 两个实用linux小工具
原文地址：https://www.cnblogs.com/liuzhuqing/p/7480448.html