• [C] wchar_t的格式控制字符(VC、BCB、GCC、C99标准)


    作者:zyl910

      随着wchar_t类型引入C语言,字符串处理变得越来越复杂。例如字符串输出有printf、wprintf这两个函数,当参数中既有char字符串又有wchar_t字符串时,该怎么填写格式控制字符呢?本文对此进行探讨。


    一、翻阅文档

      先翻阅一下各个编译器的文档及C99标准,看看它们对格式控制字符的说明。


    1.1 VC的文档

      在MSDN官网上,可以找到printf与wprintf的格式字符串的说明,在《Format Specification Fields: printf and wprintf Functions》(http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx)。摘录——
    A format specification, which consists of optional and required fields, has the following form:
    % [flags] [width] [.precision] [{h | l | ll | I | I32 | I64}]type

      先点“type”查看类型,进入《printf Type Field Characters》页面(http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx)。摘录——
    printf Type Field Characters

    Character
    Type Output format
    c int or wint_t When used with printf functions, specifies a single-byte character; when used with wprintf functions, specifies a wide character.
    C int or wint_t When used with printf functions, specifies a wide character; when used with wprintf functions, specifies a single-byte character.
    s String When used with printf functions, specifies a single-byte–character string; when used with wprintf functions, specifies a wide-character string. Characters are displayed up to the first null character or until the precision value is reached.
    S String When used with printf functions, specifies a wide-character string; when used with wprintf functions, specifies a single-byte–character string. Characters are displayed up to the first null character or until the precision value is reached.


      后退,再点击《Size Specification》(http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx)的链接。摘录——

    To specify
    Use prefix With type specifier
    Single-byte character with printf functions h c or C
    Single-byte character with wprintf functions h c or C
    Wide character with printf functions l c or C
    Wide character with wprintf functions l c or C
    Single-byte – character string with printf functions h s or S
    Single-byte – character string with wprintf functions h s or S
    Wide-character string with printf functions l s or S
    Wide-character string with wprintf functions l s or S
    Wide character w c
    Wide-character string w s
     

    Thus to print single-byte or wide-characters with printf functions and wprintf functions, use format specifiers as follows.

    To print character as
    Use function With format specifier
    single byte printf c, hc, or hC
    single byte wprintf C, hc, or hC
    wide wprintf c, lc, lC, or wc
    wide printf C, lc, lC, or wc
     

    To print strings with printf functions and wprintf functions, use the prefixes h and l analogously with format type-specifiers s and S.


      上面介绍了很多控制字符。整理一下,发现对字符串来说,最有用的是这三个——
    hs:printf、wprintf均是char字符串。
    ls:printf、wprintf均是wchar_t字符串。
    s:printf是char字符串,而wprintf是wchar_t字符串。与TCHAR搭配使用很方便。


    1.2 BCB的文档

      打开BCB6帮助文件中的“C Runtime Library Reference”,在索引中输入“printf”,能很快找到格式控制字符的说明——

      观察后可发现,它与VC是兼容的。可以使用hs/ls/s分别处理char/wchar_t/TCHAR字符串。


    1.3 GCC的文档

      我这里装了Fedora 17,并装好了GCC 4.7.0。
      打开控制台,输入“man 3 wprintf”查看wprintf函数的文档。摘录——
    c
    If no l modifier is present, the int argument is converted to a wide character by a call to the btowc(3) function, and the resulting wide character is written. If an l modifier is present, the wint_t (wide character) argument is written.

    s
    If no l modifier is present: The const char * argument is expected to be a pointer to an array of character type (pointer to a string) containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted to wide characters (each by a call to the mbrtowc(3) function with a conversion state starting in the initial state before the first byte). The resulting wide characters are written up to (but not including) the terminating null wide character. If a precision is specified, no more wide characters than the number specified are written. Note that the precision determines the number of wide characters written, not the number of bytes or screen positions. The array must contain a terminating null byte, unless a precision is given and it is so small that the number of converted wide characters reaches it before the end of the array is reached.
    If an l modifier is present: The const wchar_t * argument is expected to be a pointer to an array of wide characters. Wide characters from the array are written up to (but not including) a terminating null wide character. If a precision is specified, no more than the number specified are written. The array must contain a terminating null wide character, unless a precision is given and it is smaller than or equal to the number of wide characters in the array.


      根据上面的描述,GCC似乎只支持这两种字符串的格式控制字符——
    s:printf、wprintf均是char字符串。
    ls:printf、wprintf均是wchar_t字符串。


    1.4 C99标准

      在C99标准的“7.24.2.1 The fwprintf function”中介绍了fwprintf等宽字符函数的格式控制字符。摘录——
    7 The length modifiers and their meanings are:

    h
    Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.

    l (ell)
    Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.

    ……

    8 The conversion specifiers and their meanings are:

    c
    If no l length modifier is present, the int argument is converted to a wide character as if by calling btowc and the resulting wide character is written.
    If an l length modifier is present, the wint_t argument is converted to wchar_t and written.

    s
    If no l length modifier is present, the argument shall be a pointer to the initial element of a character array containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted as if by repeated calls to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted, and written up to (but not including) the terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the converted array, the converted array shall contain a null wide character.
    If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are written up to (but not including) a terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null wide character.


      可见,C99标准中c、s仅有“l”长度修正,没“l”的是char字符串,有“l”的是wchar_t字符串。


    1.5 小结

      根据上面的资料,可以整理出一份表格——

      VC和BCB GCC和C99标准
    printf wprintf printf wprintf
    s char wchar_t char char
    S wchar_t char * *
    hs char char * *
    ls wchar_t wchar_t wchar_t wchar_t

    *:未定义。


    二、测试程序

      参考了上述文档,我觉的应该编写一个测试程序,实际测一下各个编译器对wchar_t格式控制字符的支持性。
      测试程序的代码如下——

    #include <stdio.h>
    #include <locale.h>
    #include <string.h>
    #include <wchar.h>
    
    char* psa = "CHAR";    // 单字节字符串.
    wchar_t* psw = L"WCHAR";    // 宽字符串.
    wchar_t* pst = L"TCHAR";    // 类型与printf/wprintf匹配的字符串.
    
    int main()
    {
        setlocale(LC_ALL, "");    // 使用系统当前代码页.
        
        // test
        wprintf(L"A:\t%hs\n", psa);
        wprintf(L"W:\t%ls\n", psw);
        wprintf(L"T:\t%s\n", pst);
        
        return 0;
    }

      如果运行正常的话,该程序的输出结果应该是——
    A: CHAR
    W: WCHAR
    T: TCHAR


    三、测试结果

    3.1 VC6与BCB6测试

      跟意料中的一样,VC6与BCB6均正确输出了——
    A: CHAR
    W: WCHAR
    T: TCHAR


    3.2 fedora中的GCC测试

      Fedora 17,GCC 4.7.0——

      第3项的输出结果有误是很容易理解的。因为GCC文档与C99标准都规定“无l时的s代表char字符串”,而pst实际上是一个wchar_t字符串。
      而第1项正确的输出结果反倒有点迷惑——GCC文档和C99标准中s不是没有“h”长度修正吗。想了一下才明白,文档上说的是“无l时的s代表char字符串”,因“hs”没有“l”,所以被识别为char字符串也是符合标准。


    3.3 mingw中的GCC测试

      MinGW(20120426),GCC 4.6.2——

      MinGW虽然用的也是GCC编译器,但为了兼容Windows环境,它调整了格式控制字符规则,与VC保持一致。


    四、总结

      根据上面的测试结果,修订前面的表格——

      VC、BCB、MinGW Linux下的GCC、C99标准
    printf wprintf printf wprintf
    s char wchar_t char char
    S wchar_t char * *
    hs char char char char
    ls wchar_t wchar_t wchar_t wchar_t

      总结如下——
    1) 需要输出char字符串时,使用“hs”。
    2) 需要输出wchar_t字符串时,使用“ls”。
    3) 需要输出TCHAR字符串时,使用“s”,仅对VC、BCB、MinGW等Windows平台的编译器有效。

    参考文献——
    《ISO/IEC 9899:1999 (C99)》。ISO/IEC,1999。www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
    《C99标准》。yourtommy。http://blog.csdn.net/yourtommy/article/details/7495033
    《[VS2012] Format Specification Fields: printf and wprintf Functions》。http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx
    《[VS2012] printf Type Field Characters》。http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx
    《[VS2012] Size Specification》。http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx
    《wprintf(3) - Linux manual page》。http://www.kernel.org/doc/man-pages/online/pages/man3/wprintf.3.html

    源码下载——
    https://files.cnblogs.com/zyl910/wcharfmt.rar

    作者:zyl910
    版权声明:自由转载-非商用-非衍生-保持署名 | Creative Commons BY-NC-ND 3.0.
  • 相关阅读:
    CodeForces:847D-Dog Show
    CodeForces 699C
    CodeForces:699B-One Bomb
    哈夫曼树:HDU5884-Sort(队列、哈夫曼树)
    Educational Codeforces Round 31- D. Boxes And Balls
    经典:区间dp-合并石子
    Codeforces Round #879 (Div. 2) C. Short Program
    卡顿
    异常断点
    自动布局
  • 原文地址:https://www.cnblogs.com/zyl910/p/wcharfmt.html
Copyright © 2020-2023  润新知