• Java下进行字符串的编码转换


    最近实习工作要求将一些txt文本从Unicode16(小端模式)编码转为ISO-8859之类的编码,我能想到的途径有三种:

    1)使用IBM的ICU工具包;

    2)使用Java的转换方法;

    3)自己通过Map表编写程序;

    为了节省时间,我最终选择了方法2),但由于许久没用过Java及开始并不懂Java的编码方式,导致纠结了好久才搞定,下面就简单的阐述一下具体的方法。

    在Java中实现编码转换主要用到了两个函数:

    • getBytes(decode): 根据指定的decode编码返回某字符串在该编码下的byte数组表示
    • new String(byte[] b, encoding):使用decode指定的编码来将byte[]解析成字符串,该字符串的编码为Java默认的Unicode16(BE)

    始终要注意Java的默认编码方式是Unicode16(BE),所以new String得到的字符串对应的编码也是Unicode16(BE)。

    读写文本文件采用的是InputStream和OutStream的方式,目的是为了对字节流进行读写,这样就可以避免在读写时改变了编码格式,示例代码如下:

      1 import java.io.*;
      2 import java.nio.charset.Charset;
      3 import java.util.Iterator;
      4 import java.util.Set;
      5 import java.io.UnsupportedEncodingException;
      6 
      7 public class Convertor {
      8     public static final String Old_Charset = "Unicode";
      9     public static final String ISO_8859_6 = "ISO-8859-6";
     10     public static final String ISO_8859_7 = "ISO-8859-7";
     11     public static final String ISO_8859_8 = "ISO-8859-8";
     12     public static final String TIS_620 = "TIS-620";
     13     
     14     public static void PrintAllSupportedCharset()
     15     {
     16         Set<String> charsetNames = Charset.availableCharsets().keySet();  
     17         System.out.println("-----the number of jdk1.67's charset is "+charsetNames.size()+"-----");  
     18         for (Iterator<String> it = charsetNames.iterator(); it.hasNext();)  
     19         {  
     20             String charsetName = (String) it.next();   
     21             System.out.println(charsetName);  
     22         }  
     23     }
     24     
     25     public static void PrintBytes(byte[] b)
     26     {
     27         for (int i = 0; i < b.length; i++) { 
     28             String hex = Integer.toHexString(b[i] & 0xFF); 
     29             if (hex.length() == 1) { 
     30             hex = '0' + hex; 
     31             } 
     32             System.out.print(hex.toUpperCase() + " "); 
     33         }
     34         System.out.println();
     35     }
     36     
     37     public static void ConvertCharset(String filePath, String fileName, String destDir, String oldCharset, String newCharset)
     38     {
     39         try
     40         {
     41             System.out.println(filePath);
     42             InputStream in = new FileInputStream(filePath);
     43 
     44             String srcStr = "";
     45             if(in != null)
     46             {
     47                 int byteNum = in.available();
     48                 byte[] b = new byte[byteNum];                
     49                 in.read(b);//以字节流方式读入源文件 
     50                 in.close();                    
     51                 PrintBytes(b);            
     52                 //Convert    
     53                 srcStr += new String(b, oldCharset); //生成的新的字符串是Unicode16(BE)编码的        
     54                 byte[] nnb = srcStr.getBytes(newCharset);//获取Unicode16编码的字符串对应newCharset的字节数组,从而实现转码 
     55                 PrintBytes(nnb);                
     56                 String destFilePath = destDir + fileName;
     57                 OutputStream out = new FileOutputStream(destFilePath);
     58                 out.write(nnb);//以字节流方式输出 
     59                 out.flush();                
     60                 out.close();
     61             }
     62         }catch(Exception e)
     63         {
     64             e.printStackTrace();
     65         }
     66     }
     67     
     68     public static boolean readfile(String filepath, String destDir, String newCharset) throws FileNotFoundException, IOException 
     69     {
     70         try {
     71 
     72                 File file = new File(filepath);
     73                 if (!file.isDirectory()) {
     74                         System.out.println("文件");
     75                         System.out.println("path=" + file.getPath());
     76                         System.out.println("absolutepath=" + file.getAbsolutePath());
     77                         System.out.println("name=" + file.getName());
     78                         ConvertCharset(file.getAbsolutePath(), file.getName(), destDir, Convertor.Old_Charset, newCharset);
     79                         
     80 
     81                 } else if (file.isDirectory()) {
     82                         System.out.println("文件夹");
     83                         String[] filelist = file.list();
     84                         for (int i = 0; i < filelist.length; i++) {
     85                                 File readfile = new File(filepath + "\" + filelist[i]);
     86                                 if (!readfile.isDirectory()) {
     87                                         System.out.println("path=" + readfile.getPath());
     88                                         System.out.println("absolutepath="
     89                                                         + readfile.getAbsolutePath());
     90                                         System.out.println("name=" + readfile.getName());
     91                                         ConvertCharset(readfile.getAbsolutePath(), readfile.getName(), destDir, Convertor.Old_Charset, newCharset);
     92 
     93                                 } else if (readfile.isDirectory()) {
     94                                         readfile(filepath + "\" + filelist[i], destDir, newCharset);
     95                                 }
     96                         }
     97                 }
     98 
     99         } catch (FileNotFoundException e) {
    100                 System.out.println("readfile()   Exception:" + e.getMessage());
    101         }
    102         return true;
    103     }
    104 
    105     
    106     
    107     public static void main(String args[]) throws FileNotFoundException, IOException
    108     {
    109         //PrintAllSupportedCharset();
    110         //System.out.println(Charset.defaultCharset());
    111         String srcPath = "H:\LYDATA\四国语言\Unicode16\Thai_Unicode\";
    112         String destDir = "H:\LYDATA\四国语言\Thai_TIS-620\";
    113         readfile(srcPath, destDir, Convertor.TIS_620);
    114     }
    115 }


    几种编码方式对应的Unicode映射表:http://www.lingua-systems.com/knowledge/unicode-mappings/iso-8859-7-to-unicode.html

  • 相关阅读:
    每日一题_191118
    每日一题_191117
    每日一题_191116
    每日一题_191115
    每日一题_191114
    每日一题_191113
    每日一题_191112
    每日一题_191111
    每日一题_191110
    一道抛物线自编题的思考
  • 原文地址:https://www.cnblogs.com/sophia-yun/p/3406625.html
Copyright © 2020-2023  润新知