• std::u32string conversion to/from std::string and std::u16string


    I need to convert between UTF-8, UTF-16 and UTF-32 for different API's/modules and since I know have the option to use C++11 am looking at the new string types.

    It looks like I can use stringu16string and u32string for UTF-8, UTF-16 and UTF-32. I also found codecvt_utf8 and codecvt_utf16 which look to be able to do a conversion between charor char16_t and char32_t and what looks like a higher level wstring_convert but that only appears to work with bytes/std::string and not a great deal of documentation.

    Am I meant to use a wstring_convert somehow for the UTF-16 ↔ UTF-32 and UTF-8 ↔ UTF-32 case? I only really found examples for UTF-8 to UTF-16, which I am not even sure will be correct on Linux where wchar_t is normally considered UTF-32... Or do something more complex with those codecvt things directly?

    Or is this just still not really in a usable state and I should stick with my own existing small routines using 8, 16 and 32bit unsigned integers?

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------

    answer:

    If you read the documentation at CppReference.com for wstring_convertcodecvt_utf8codecvt_utf16, and codecvt_utf8_utf16, the pages include a table that tells you exactly what you can use for the various UTF conversions.

    table

    And yes, you would use std::wstring_convert to facilitate the conversion between the various UTFs. Despite its name, it is not limited to just std::wstring, it actually operates with any std::basic_string type (which std::stringstd::wstring, and std::uXXstring are all based on).

    Class template std::wstring_convert performs conversions between byte string std::stringand wide string std::basic_string<Elem>, using an individual code conversion facet Codecvt. std::wstring_convert assumes ownership of the conversion facet, and cannot use a facet managed by a locale. The standard facets suitable for use with std::wstring_convert are std::codecvt_utf8 for UTF-8/UCS2 and UTF-8/UCS4 conversions and std::codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.

    For example:

    typedef std::string u8string;
    
    u8string To_UTF8(const std::u16string &s)
    {
        std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
        return conv.to_bytes(s);
    }
    
    u8string To_UTF8(const std::u32string &s)
    {
        std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
        return conv.to_bytes(s);
    }
    
    std::u16string To_UTF16(const u8string &s)
    {
        std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
        return conv.from_bytes(s);
    }
    
    std::u16string To_UTF16(const std::u32string &s)
    {
        std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
        std::string bytes = conv.to_bytes(s);
        return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t));
    }
    
    std::u32string To_UTF32(const u8string &s)
    {
        std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv;
        return conv.from_bytes(s);
    }
    
    std::u32string To_UTF32(const std::u16string &s)
    {
        const char16_t *pData = s.c_str();
        std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
        return conv.from_bytes(reinterpret_cast<const char*>(pData), reinterpret_cast<const char*>(pData+s.length()));
    }
  • 相关阅读:
    独一份秘籍 | 开发数字孪生3D可视化炫酷场景?还有MAC电脑大奖可拿!
    数字孪生城市,如何破旧立新?ThingJS
    官方示例(十六):3D场景中BIM剖切面参数化开发ThingJS
    官方案例(十五):3D开发构造器参数测量多边形面积 ThingJS
    如何在Spring Boot 中使用 HandlerMethodArgumentResolver
    python的基本数据类型
    canvas 隐藏 踩坑
    小程序canvas 圆角框带填充颜色
    小程序 canvas 文字加粗
    flex布局 一行4个元素 后面不够4个元素对齐
  • 原文地址:https://www.cnblogs.com/yuanxiaoping_21cn_com/p/6720214.html
Copyright © 2020-2023  润新知