Encoding.GetEncoding(936)).Contains(@"这是简体中文")
在.NET的世界里,string永远是unicode,所以通过读取TXT文件的每行,然后来判断其内容时,需要进行解码。
foreach (string line in File.ReadAllLines(“D:\\test.txt"))
{
Console.writeline (" {0}" + line);
}
具体编码参考MSDN. Encoding类
http://msdn.microsoft.com/zh-cn/library/system.text.encoding(v=vs.100).aspx
As defined by Microsoft, a locale is either a language or a language in combination with a country. SeeMicrosoft definitions of locale.
CLICK one of the Column Titles to sort the table by that item.
Language (Locale) | LCID Decimal | LCID Hexade. | Codepage | Country code |
---|---|---|---|---|
Telugu | 1098 | 044a | 0 | IND |
Gujarati | 1095 | 0447 | 0 | IND |
Punjabi | 1094 | 0446 | 0 | IND |
Sanskrit | 1103 | 044f | 0 | IND |
Konkani | 1111 | 0457 | 0 | IND |
Syriac | 1114 | 045a | 0 | SYR |
Kannada | 1099 | 044b | 0 | IND |
Marathi | 1102 | 044e | 0 | IND |
Divehi | 1125 | 0465 | 0 | MDV |
Armenian | 1067 | 042b | 0 | ARM |
Hindi | 1081 | 0439 | 0 | IND |
Georgian | 1079 | 0437 | 0 | GEO |
Tamil | 1097 | 0449 | 0 | IND |
Thai | 1054 | 041e | 874 | THA |
Japanese | 1041 | 0411 | 932 | JPN |
Chinese (PRC) | 2052 | 0804 | 936 | CHN |
Chinese (Singapore) | 4100 | 1004 | 936 | SGP |
Korean | 1042 | 0412 | 949 | KOR |
Chinese (Macau S.A.R.) | 5124 | 1404 | 950 | MCO |
Chinese (Hong Kong S.A.R.) | 3076 | 0c04 | 950 | HKG |
Chinese (Taiwan) | 1028 | 0404 | 950 | TWN |
Romanian | 1048 | 0418 | 1250 | ROM |
Slovenian | 1060 | 0424 | 1250 | SVN |
Hungarian | 1038 | 040e | 1250 | HUN |
Slovak | 1051 | 041b | 1250 | SVK |
Polish | 1045 | 0415 | 1250 | POL |
Albanian | 1052 | 041c | 1250 | ALB |
Serbian (Latin) | 2074 | 081a | 1250 | SPB |
Croatian | 1050 | 041a | 1250 | HRV |
Czech | 1029 | 0405 | 1250 | CZE |
Mongolian (Cyrillic) | 1104 | 0450 | 1251 | MNG |
FYRO Macedonian | 1071 | 042f | 1251 | MKD |
Uzbek (Cyrillic) | 2115 | 0843 | 1251 | UZB |
Ukrainian | 1058 | 0422 | 1251 | UKR |
Azeri (Cyrillic) | 2092 | 082c | 1251 | AZE |
Tatar | 1092 | 0444 | 1251 | RUS |
Kazakh | 1087 | 043f | 1251 | KAZ |
Belarusian | 1059 | 0423 | 1251 | BLR |
Kyrgyz (Cyrillic) | 1088 | 0440 | 1251 | KGZ |
Bulgarian | 1026 | 0402 | 1251 | BGR |
Serbian (Cyrillic) | 3098 | 0c1a | 1251 | SPB |
Russian | 1049 | 0419 | 1251 | RUS |
English (Jamaica) | 8201 | 2009 | 1252 | JAM |
French (Canada) | 3084 | 0c0c | 1252 | CAN |
French (France) | 1036 | 040c | 1252 | FRA |
French (Luxembourg) | 5132 | 140c | 1252 | LUX |
English (New Zealand) | 5129 | 1409 | 1252 | NZL |
English (Ireland) | 6153 | 1809 | 1252 | IRL |
Dutch (Netherlands) | 1043 | 0413 | 1252 | NLD |
English (Caribbean) | 9225 | 2409 | 1252 | CAR |
French (Switzerland) | 4108 | 100c | 1252 | CHE |
English (Canada) | 4105 | 1009 | 1252 | CAN |
Galician | 1110 | 0456 | 1252 | ESP |
English (Belize) | 10249 | 2809 | 1252 | BLZ |
German (Austria) | 3079 | 0c07 | 1252 | AUT |
French (Monaco) | 6156 | 180c | 1252 | MCO |
English (Zimbabwe) | 12297 | 3009 | 1252 | ZWE |
Basque | 1069 | 042d | 1252 | ESP |
Dutch (Belgium) | 2067 | 0813 | 1252 | BEL |
French (Belgium) | 2060 | 080c | 1252 | BEL |
Finnish | 1035 | 040b | 1252 | FIN |
Faroese | 1080 | 0438 | 1252 | FRO |
German (Germany) | 1031 | 0407 | 1252 | DEU |
English (Australia) | 3081 | 0c09 | 1252 | AUS |
English (United States) | 1033 | 0409 | 1252 | USA |
English (United Kingdom) | 2057 | 0809 | 1252 | GBR |
Catalan | 1027 | 0403 | 1252 | ESP |
English (Trinidad) | 11273 | 2c09 | 1252 | TTO |
English (South Africa) | 7177 | 1c09 | 1252 | ZAF |
Danish | 1030 | 0406 | 1252 | DNK |
English (Philippines) | 13321 | 3409 | 1252 | PHL |
Spanish (Paraguay) | 15370 | 3c0a | 1252 | PRY |
Spanish (Colombia) | 9226 | 240a | 1252 | COL |
Spanish (Costa Rica) | 5130 | 140a | 1252 | CRI |
Spanish (Dominican Republic) | 7178 | 1c0a | 1252 | DOM |
Spanish (Ecuador) | 12298 | 300a | 1252 | ECU |
Spanish (El Salvador) | 17418 | 440a | 1252 | SLV |
Spanish (Guatemala) | 4106 | 100a | 1252 | GTM |
Spanish (Honduras) | 18442 | 480a | 1252 | HND |
Spanish (International Sort) | 3082 | 0c0a | 1252 | ESP |
Spanish (Chile) | 13322 | 340a | 1252 | CHL |
Spanish (Nicaragua) | 19466 | 4c0a | 1252 | NIC |
Spanish (Mexico) | 2058 | 080a | 1252 | MEX |
Spanish (Peru) | 10250 | 280a | 1252 | PER |
Spanish (Puerto Rico) | 20490 | 500a | 1252 | PRI |
Spanish (Traditional Sort) | 1034 | 040a | 1252 | ESP |
Spanish (Uruguay) | 14346 | 380a | 1252 | URY |
Spanish (Venezuela) | 8202 | 200a | 1252 | VEN |
Swahili | 1089 | 0441 | 1252 | KEN |
Swedish | 1053 | 041d | 1252 | SWE |
Swedish (Finland) | 2077 | 081d | 1252 | FIN |
German (Liechtenstein) | 5127 | 1407 | 1252 | LIE |
Afrikaans | 1078 | 0436 | 1252 | ZAF |
Spanish (Panama) | 6154 | 180a | 1252 | PAN |
German (Luxembourg) | 4103 | 1007 | 1252 | LUX |
Spanish (Bolivia) | 16394 | 400a | 1252 | BOL |
German (Switzerland) | 2055 | 0807 | 1252 | CHE |
Icelandic | 1039 | 040f | 1252 | ISL |
Indonesian | 1057 | 0421 | 1252 | IDN |
Italian (Italy) | 1040 | 0410 | 1252 | ITA |
Italian (Switzerland) | 2064 | 0810 | 1252 | CHE |
Norwegian (Nynorsk) | 2068 | 0814 | 1252 | NOR |
Spanish (Argentina) | 11274 | 2c0a | 1252 | ARG |
Portuguese (Brazil) | 1046 | 0416 | 1252 | BRA |
Norwegian (Bokmal) | 1044 | 0414 | 1252 | NOR |
Malay (Malaysia) | 1086 | 043e | 1252 | MYS |
Malay (Brunei Darussalam) | 2110 | 083e | 1252 | BRN |
Portuguese (Portugal) | 2070 | 0816 | 1252 | PRT |
Greek | 1032 | 0408 | 1253 | GRC |
Uzbek (Latin) | 1091 | 0443 | 1254 | UZB |
Azeri (Latin) | 1068 | 042c | 1254 | AZE |
Turkish | 1055 | 041f | 1254 | TUR |
Hebrew | 1037 | 040d | 1255 | ISR |
Arabic (Algeria) | 5121 | 1401 | 1256 | DZA |
Arabic (Bahrain) | 15361 | 3c01 | 1256 | BHR |
Arabic (Yemen) | 9217 | 2401 | 1256 | YEM |
Arabic (Egypt) | 3073 | 0c01 | 1256 | EGY |
Arabic (Iraq) | 2049 | 0801 | 1256 | IRQ |
Arabic (Jordan) | 11265 | 2c01 | 1256 | JOR |
Arabic (Kuwait) | 13313 | 3401 | 1256 | KWT |
Arabic (Lebanon) | 12289 | 3001 | 1256 | LBN |
Arabic (Libya) | 4097 | 1001 | 1256 | LBY |
Arabic (Morocco) | 6145 | 1801 | 1256 | MAR |
Arabic (Oman) | 8193 | 2001 | 1256 | OMN |
Arabic (Qatar) | 16385 | 4001 | 1256 | QAT |
Arabic (Saudi Arabia) | 1025 | 0401 | 1256 | SAU |
Arabic (Syria) | 10241 | 2801 | 1256 | SYR |
Arabic (U.A.E.) | 14337 | 3801 | 1256 | ARE |
Farsi | 1065 | 0429 | 1256 | IRN |
Urdu | 1056 | 0420 | 1256 | PAK |
Arabic (Tunisia) | 7169 | 1c01 | 1256 | TUN |
Estonian | 1061 | 0425 | 1257 | EST |
Latvian | 1062 | 0426 | 1257 | LVA |
Lithuanian | 1063 | 0427 | 1257 | LTU |
Vietnamese | 1066 | 042a | 1258 | VNM |
This table was generated from information at List of Locale IDs and Language Groups for Microsoft Windows 2000
Definitions
Locale: A collection of language-related, user-preference information represented as a list of values. (Reference)
Locale ID (LCID): A 32-bit value defined by Microsoft Windows that consists of a language ID, sort ID, and reserved bits that identify a particular language.
Codepage: "An ordered set of characters in which a numeric index (code point values) is associated with each character. The first 128 characters of each codepage are functionally the same and include all characters needed to type English text. The upper 128 characters of OEM and ANSI codepages contain characters used in a language or group of languages (Taken from Related resources below)".
Character Encoding Recommendation for Language
IANA encoding | Java Canonical Name | Language | Comment |
UTF-8 | UTF8 | 8bit Universal character set | |
UTF-16 | UTF-16 | 16bit Universal character set | |
US-ASCII | ASCII | American Standard Code for Information Interchange | |
windows-1250 | Cp1250 | Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) | Windows encoding |
windows-1251 | Cp1251 | Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian | Windows encoding |
windows-1252 | Cp1252 | Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) | Windows encoding |
windows-1253 | Cp1253 | Greek | Windows encoding |
windows-1254 | Cp1254 | Turkish | Windows encoding |
windows-1255 | Cp1255 | Hebrew | Windows encoding |
windows-1256 | Cp1256 | Arabic | Windows encoding |
windows-1257 | Cp1257 | Baltic | Windows encoding |
windows-1258 | Cp1258 | Vietnamese | Windows encoding |
ISO-8859-1 | ISO8859_1 | Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) | Euro Symbol is not supported |
ISO-8859-2 | ISO8859_2 | Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) | Euro Symbol is not supported |
ISO-8859-3 | ISO8859_3 | Southeastern European (Afrikaans, Catalan, Dutch, English, Esperanto, German, Italian, Maltese, Spanish, Turkish) | |
ISO-8859-4 | ISO8859_4 | Northern European (Danish, English, Estonian, Finnish, German, Greenlandic, Latin, Latvian, Lithuanian, Norwegian, Sテ。mi, Slovenian, Swedish) | |
ISO-8859-5 | ISO8859_5 | Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian) | |
ISO-8859-6 | ISO8859_6 | Arabic | |
ISO-8859-7 | ISO8859_7 | Greek | |
ISO-8859-8 | ISO8859_8 | Hebrew | |
ISO-8859-9 | ISO8859_9 | Western European (Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, English, Finnish, French, Frisian, Galician, German, Greenlandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish, Turkish) | |
ISO-8859-13 | ISO8859_13 | Baltic Rim (English, Estonian, Finnish, Latin, Latvian, Norwegian) | |
ISO-8859-15 | ISO8859_15 | Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) | ISO-8859-1 with Euro symbol support |
windows-31j | MS932 | Japanese | Windows encoding |
EUC-JP | EUC_JP | Japanese | EUC encoding used on Unix platform |
Shift_JIS | SJIS | Japanese | Shift JIS, does not support MS external characters |
ISO-2022-JP | ISO2022JP | Japanese | JIS X 0201, 0208, in ISO 2022 form, this is used for e-mail |
x-mswin-936 | MS936 | Simplified Chinese | Windows encoding, This is not registered in IANA. |
GB18030 | GB18030 | Simplified Chinese | PRC standard |
x-EUC-CN | EUC_CN | Simplified Chinese | GB2312, EUC encoding |
GBK | GBK | Simplified Chinese | |
x-windows-949 | MS949 | Korean | Windows encoding, this is not registered in IANA. |
EUC-KR | EUC_KR | Korean | KS C 5601, EUC encoding |
x-windows-950 | MS950 | Traditional Chinese | Windows encoding, this is not registered in IANA |
x-MS950-HKSCS | MS950_HKSCS | Traditional Chinese with Hong Kong extensions | Windows encoding, this is not registered in IANA |
x-EUC-TW | EUC_TW | Traditional Chinese | CNS11643 (Plane 1-3), EUC encoding, this is not registered in IANA |
Big5 | Big5 | Traditional Chinese | |
Big5-HKSCS | Big5_HKSCS | Traditional Chinese | Big5 with Hong Kong extensions |
TIS-620 | TIS620 | Thai |