原地址:http://www.tracefact.net/CSharp-Programming/Network-Programming-Part2.aspx
ASCII、UTF8、Uncicode编码下的中英文字符大小
-
ASCII不能保存中文
-
UTF8是变长编码。在对ASCII字符编码时,UTF更省空间,只占1个字节,与ASCII编码方式和长度相同;Unicode在对ASCII字符编码时,占用2个字节,且第2个字节补零。
-
UTF8在对中文编码时需要占用3个字节;Unicode对中文编码则只需要2个字节。
代码示例:
1 private static void ShowCode() { 2 string[] strArray = { "b", "abcd", "乙", "甲乙丙丁" }; 3 byte[] buffer; 4 string mode, back; 5 6 foreach (string str in strArray) { 7 8 for (int i = 0; i <= 2; i++) { 9 if (i == 0) { 10 buffer = Encoding.ASCII.GetBytes(str); 11 back = Encoding.ASCII.GetString(buffer, 0, buffer.Length); 12 mode = "ASCII"; 13 } else if (i == 1) { 14 buffer = Encoding.UTF8.GetBytes(str); 15 back = Encoding.UTF8.GetString(buffer, 0, buffer.Length); 16 mode = "UTF8"; 17 } else { 18 buffer = Encoding.Unicode.GetBytes(str); 19 back = Encoding.Unicode.GetString(buffer, 0, buffer.Length); 20 mode = "Unicode"; 21 } 22 23 Console.WriteLine("Mode: {0}, String: {1}, Buffer.Length: {2}", 24 mode, str, buffer.Length); 25 26 Console.WriteLine("Buffer:"); 27 for (int j = 0; j <= buffer.Length - 1; j++) { 28 Console.Write(buffer[j] + " "); 29 } 30 31 Console.WriteLine(" Retrived: {0} ", back); 32 } 33 } 34 }
运行结果:
1 Mode: ASCII, String: b, Buffer.Length: 1 2 Buffer: 98 3 Retrived: b 4 5 Mode: UTF8, String: b, Buffer.Length: 1 6 Buffer: 98 7 Retrived: b 8 9 Mode: Unicode, String: b, Buffer.Length: 2 10 Buffer: 98 0 11 Retrived: b 12 13 Mode: ASCII, String: abcd, Buffer.Length: 4 14 Buffer: 97 98 99 100 15 Retrived: abcd 16 17 Mode: UTF8, String: abcd, Buffer.Length: 4 18 Buffer: 97 98 99 100 19 Retrived: abcd 20 21 Mode: Unicode, String: abcd, Buffer.Length: 8 22 Buffer: 97 0 98 0 99 0 100 0 23 Retrived: abcd 24 25 Mode: ASCII, String: 乙, Buffer.Length: 1 26 Buffer: 63 27 Retrived: ? 28 29 Mode: UTF8, String: 乙, Buffer.Length: 3 30 Buffer: 228 185 153 31 Retrived: 乙 32 33 Mode: Unicode, String: 乙, Buffer.Length: 2 34 Buffer: 89 78 35 Retrived: 乙 36 37 Mode: ASCII, String: 甲乙丙丁, Buffer.Length: 4 38 Buffer: 63 63 63 63 39 Retrived: ???? 40 41 Mode: UTF8, String: 甲乙丙丁, Buffer.Length: 12 42 Buffer: 231 148 178 228 185 153 228 184 153 228 184 129 43 Retrived: 甲乙丙丁 44 45 Mode: Unicode, String: 甲乙丙丁, Buffer.Length: 8 46 Buffer: 50 117 89 78 25 78 1 78 47 Retrived: 甲乙丙丁
得出结论:
1 ASCII不能保存中文(貌似谁都知道=_-`)。 2 UTF8是变长编码。在对ASCII字符编码时,UTF更省空间,只占1个字节,与ASCII编码方式和长度相同;Unicode在对ASCII字符编码时,占用2个字节,且第2个字节补零。 3 UTF8在对中文编码时需要占用3个字节;Unicode对中文编码则只需要2个字节。