sudo apt-get install unicode 这个有用吗?
然后使用iconv。
使用一个库将string转化为utf-8?
好像确实是很麻烦的,thrift也不支持unicode。
还有wcsrtombs这个转换函数。
std::string
is a basic_string
templated on a char
, and std::wstring
on a wchar_t
.
char
vs. wchar_t
char
is supposed to hold a character, usually a 1-byte character. wchar_t
is supposed to hold a wide character, and then, things get tricky: On Linux, a wchar_t
is 4-bytes, while on Windows, it's 2-bytes
1. When I should use std::wstring over std::string?
On Linux? Almost never (§).
On Windows? Almost always (§).
On cross-plateform code? Depends on your toolkit...
ou can store unicode strings fine into std::string
using the utf-8
encoding too. But it won't understand the meaning of unicode code points. So str.size()
won't give you the amount of logical characters in your string, but merely the amount of char or wchar_t elements stored in that string/wstring. For that reason, the gtk/glib C++ wrapper folks have developed a Glib::ustring
class that can handle utf-8.
(§) : unless you use a toolkit/framework saying otherwise
std::string 操作的是实际上是 C 字符串。C 字符串的特点是以 0 字符结尾,并且在结尾 0 字符之前不含有 0 字符(否则就形成多个字符串了)。
UTF-8 是 Unicode 的一种常用变长字符编码方式,Unicode 字符集中的每个用 1 ~ 4 个字节表示,并且其中的任何一个字节都不是 0 字符,所以 std::string 对 UTF-8 只具有有限的支持:可以拷贝、比较、连接,但用 size() 得到的长度只是编码字节的多少;除非是 ASCII 字符(在 UTF-8 中用一个字节表示),否则直接得不到实际字符的个数。
UTF-16 是另外一种编码方式。由于很多 Unicode 中的字符的编码中含有 0 字符,所以本质上不适合用 std::string 来处理。为此,比如 Qt 中专门提供了能够处理 Unicode 的 QString 类。