利用第三方的Jar包内的类和方法来判别文件编码

利用第三方的Jar包内的类和方法来判别文件编码
今天在论坛里看见了一个人发帖子问，如何查看文件的编码。有一个人回帖推荐了一片文章，我看了一下，觉得挺好的，我把原文的意思取舍了一下，整理成了下面的文章。原文请查看：http://www.javaeye.com/topic/108540，作者是hdwangyi 。

他在帖子中贴出了一段代码来判断一个文本是UTF-8编码的，思路是读取一个txt文件，截取字符流的前3个字符（即标示位），判断是否与UTF-8的格式的标示位一致。
Java代码
1. import java.io.*;
3. public class TestText {
4. public static void main(String args[]){
5. File file = new File("C:/1.txt");
7. try{
8. InputStream stream = new FileInputStream(file);
9. byte[] byteArray = new byte[3];
11. stream.read(byteArray);
12. stream.close();
14. if (byteArray[0] == -17 && byteArray[1] == -69 && byteArray[2] == -65)
15. System.out.println("UTF-8");
16. else
17. System.out.println("可能是其他的编码");
19. }
20. catch(Exception e){
21. e.printStackTrace();
22. }
23. }
24. }
```
import java.io.*;

public class TestText {
    public static void main(String args[]){
        File file = new File("C:/1.txt");
        
        try{
            InputStream stream = new FileInputStream(file);
            byte[] byteArray = new byte[3];
            
            stream.read(byteArray);
            stream.close();
            
            if (byteArray[0] == -17 && byteArray[1] ==  -69 && byteArray[2] == -65)
                System.out.println("UTF-8");
            else
                System.out.println("可能是其他的编码");
            
        }
        catch(Exception e){
            e.printStackTrace();
        }
    }
}
```
但显然这种方式有很大的局限性，而且需要对文件编码有着一定的了解。于是作者用到了第三方的jar包cpdetector，下载地址：http://cpdetector.sourceforge.net/。如何在Eclipse中导入jar包，请看http://blog.csdn.net/justinavril/archive/2008/08/07/2783182.aspx。这个jar包内的类和方法能够准确判断文件的编码格式。
Java代码
1. import cpdetector.io.*;
2. import java.io.*;
3. import java.nio.charset.*;
5. public class PageCode {
6. public static void main(String args[]){
8. CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance();
9. detector.add(JChardetFacade.getInstance());
11. Charset charset = null;
13. File f = new File("C:/1.txt");
15. try {
16. charset = detector.detectCodepage(f.toURL());
17. }
18. catch (Exception e) {
19. e.printStackTrace();
20. }
21. if(charset!=null){
22. System.out.println(f.getName()+"编码是："+charset.name());
23. }else
24. System.out.println(f.getName()+"未知");
25. }
26. }
```
import cpdetector.io.*;
import java.io.*;
import java.nio.charset.*;

public class PageCode {
    public static void main(String args[]){
        
        CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance();
        detector.add(JChardetFacade.getInstance());
        
        Charset charset = null;
        
        File f = new File("C:/1.txt");  
        
        try {   
            charset = detector.detectCodepage(f.toURL());   
        } 
        catch (Exception e) {
            e.printStackTrace();
        }   
        if(charset!=null){   
            System.out.println(f.getName()+"编码是："+charset.name());   
        }else  
            System.out.println(f.getName()+"未知");  
    }
}
```
输出：
Java代码

1.txt编码是：GB2312
```
1.txt编码是：GB2312 
```
可以把main函数的相关内容改成一个方法重用，从而达到自由判断文件编码的目的。
相关阅读:
ubuntu配置实验
 初始linux系统--ubuntu
部署WSUS服务（一）
web站点启用https （二）
web站点启用https （一）
windows 域的安装方法
 链表大合集（一）
神奇的幻方
 二叉树的存储结构以及重建二叉树
 html列表
原文地址：https://www.cnblogs.com/myfreefield/p/2004396.html