• QStringLiteral


    QStringLiteral explained

    QStringLiteral is a new macro introduced in Qt 5 to create QString from string literals. (String literals are strings inside "" included in the source code). In this blog post, I explain its inner working and implementation.

    Summary

    Let me start by giving a guideline on when to use it: If you want to initialize a QString from a string literal in Qt5, you should use:

    • Most of the cases: QStringLiteral("foo") if it will actually be converted to QString
    • QLatin1String("foo") if it is use with a function that has an overload for QLatin1String. (such as operator==, operator+, startWith, replace, ...)

    I have put this summary at the beginning for the ones that don't want to read the technical details that follow.

    Read on to understand how QStringLiteral works

    Reminder on how QString works

    QString, as many classes in Qt, is an implicitly shared class. Its only member is a pointer to the 'private' data. The QStringData is allocated with malloc, and enough room is allocated after it to put the actual string data in the same memory block.

    // Simplified for the purpose of this blog
    struct QStringData {
      QtPrivate::RefCount ref; // wrapper around a QAtomicInt
      int size; // size of the string
      uint alloc : 31; // amount of memory reserved after this string data
      uint capacityReserved : 1; // internal detail used for reserve()
    
      qptrdiff offset; // offset to the data (usually sizeof(QSringData))
    
      inline ushort *data()
      { return reinterpret_cast<ushort *>(reinterpret_cast<char *>(this) + offset); }
    };
    
    // ...
    
    class QString {
      QStringData *d;
    public:
      // ... public API ...
    };

    The offset is a pointer to the data relative to the QStringData. In Qt4, it used to be an actual pointer. We'll see why it has been changed.

    The actual data in the string is stored in UTF-16, which uses 2 bytes per character.

    Literals and Conversion

    Strings literals are the strings that appears directly in the source code, between quotes.
    Here are some examples. (suppose action, string, and filename are QString

        o->setObjectName("MyObject");
        if (action == "rename")
            string.replace("%FileName%", filename);

    In the first line, we call the function QObject::setObjectName(const QString&). There is an implicit conversion from const char* to QString, via its constructor. A new QStringData is allocated with enough room to hold "MyObject", and then the string is copied and converted from UTF-8 to UTF-16.

    The same happens in the last line where the function QString::replace(const QString &, const QString &) is called. A new QStringData is allocated for "%FileName%".

    Is there a way to prevent the allocation of QStringData and copy of the string?

    Yes, one solution to avoid the costly creation of a temporary QString object is to have overload for common function that takes const char* parameter.
    So we have those overloads for operator==

    
    

    The overloads do not need to create a new QString object for our literal and can operate directly on the raw char*.

    Encoding and QLatin1String

    In Qt5, we changed the default decoding for the char* strings to UTF-8. But many algorithms are much slower with UTF-8 than with plain ASCII or latin1

    Hence you can use QLatin1String, which is just a thin wrapper around char * that specify the encoding. There are overloads taking QLatin1String for functions that can opperate or the raw latin1 data directly without conversion.

    So our first example now looks like:

        o->setObjectName(QLatin1String("MyObject"));
        if (action == QLatin1String("rename"))
            string.replace(QLatin1String("%FileName%"), filename);

    The good news is that QString::replace and operator== have overloads for QLatin1String. So that is much faster now.

    In the call to setObjectName, we avoided the conversion from UTF-8, but we still have an (implicit) conversion from QLatin1String to QString which has to allocate the QStringData on the heap.

    Introducing QStringLiteral

    Is it possible to avoid the allocation and copy of the string literal even for the cases like setObjectName? Yes, that is what QStringLiteral is doing.

    This macro will try to generate the QStringData at compile time with all the field initialized. It will even be located in the .rodata section, so it can be shared between processes.

    We need two languages feature to do that:

    1. The possibility to generate UTF-16 at compile time:
      On Windows we can use the wide char L"String". On Unix we are using the new C++11 Unicode literal: u"String". (Supported by GCC 4.4 and clang.)
    2. The ability to create static data from expressions.
      We want to be able to put QStringLiteral everywhere in the code. One way to do that is to put a static QStringData inside a C++11 lambda expression. (Supported by MSVC 2010 and GCC 4.5) (And we also make use of the GCC __extension__ ({ }) Update: The support for the GCC extension was removed before the beta because it does not work in every context lambas are working, such as in default functions arguments)

    Implementation

    We will need need a POD structure that contains both the QStringData and the actual string. Its structure will depend on the method we use to generate UTF-16.

    The code bellow was extracted from qstring.h, with added comments and edited for readability.

    /* We define QT_UNICODE_LITERAL_II and declare the qunicodechar
       depending on the compiler */
    #if defined(Q_COMPILER_UNICODE_STRINGS)
       // C++11 unicode string
       #define QT_UNICODE_LITERAL_II(str) u"" str
       typedef char16_t qunicodechar;
    #elif __SIZEOF_WCHAR_T__ == 2
       // wchar_t is 2 bytes  (condition a bit simplified)
       #define QT_UNICODE_LITERAL_II(str) L##str
       typedef wchar_t qunicodechar;
    #else
       typedef ushort qunicodechar; // fallback
    #endif
    
    // The structure that will contain the string.
    // N is the string size
    template <int N>
    struct QStaticStringData
    {
        QStringData str;
        qunicodechar data[N + 1];
    };
    
    // Helper class wrapping a pointer that we can pass to the QString constructor
    struct QStringDataPtr
    { QStringData *ptr; };
    
    #if defined(QT_UNICODE_LITERAL_II)
    // QT_UNICODE_LITERAL needed because of macro expension rules
    # define QT_UNICODE_LITERAL(str) QT_UNICODE_LITERAL_II(str)
    # if defined(Q_COMPILER_LAMBDA)
    
    #  define QStringLiteral(str) 
        ([]() -> QString { 
            enum { Size = sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; 
            static const QStaticStringData<Size> qstring_literal = { 
                Q_STATIC_STRING_DATA_HEADER_INITIALIZER(Size), 
                QT_UNICODE_LITERAL(str) }; 
            QStringDataPtr holder = { &qstring_literal.str }; 
            const QString s(holder); 
            return s; 
        }()) 
    
    # elif defined(Q_CC_GNU)
    // Use GCC To  __extension__ ({ }) trick instead of lambda
    // ... <skiped> ...
    # endif
    #endif
    
    #ifndef QStringLiteral
    // no lambdas, not GCC, or GCC in C++98 mode with 4-byte wchar_t
    // fallback, return a temporary QString
    // source code is assumed to be encoded in UTF-8
    # define QStringLiteral(str) QString::fromUtf8(str, sizeof(str) - 1)
    #endif

    Let us simplify a bit this macro and look how the macro would expand

    o->setObjectName(QStringLiteral("MyObject"));
    // would expand to: 
    o->setObjectName(([]() {
            // We are in a lambda expression that returns a QStaticString
    
            // Compute the size using sizeof, (minus the null terminator)
            enum { Size = sizeof(u"MyObject")/2 - 1 };
    
            // Initialize. (This is static data initialized at compile time.)
            static const QStaticStringData<Size> qstring_literal =
            { { /* ref = */ -1, 
                /* size = */ Size, 
                /* alloc = */ 0, 
                /* capacityReserved = */ 0, 
                /* offset = */ sizeof(QStringData) },
              u"MyObject" };
    
             QStringDataPtr holder = { &qstring_literal.str };
             QString s(holder); // call the QString(QStringDataPtr&) constructor
             return s;
        }()) // Call the lambda
      );

    The reference count is initialized to -1. A negative value is never incremented or decremented because we are in read only data.

    One can see why it is so important to have an offset (qptrdiff) rather than a pointer to the string (ushort*) as it was in Qt4. It is indeed impossible to put pointer in the read only section because pointers might need to be relocated at load time. That means that each time an application or library, the OS needs to re-write all the pointers addresses using the relocation table.

    Results

    For fun, we can look at the assembly generated for a very simple call to QStringLiteral. We can see that there is almost no code, and how the data is laid out in the .rodata section

    We notice the overhead in the binary. The string takes twice as much memory since it is encoded in UTF-16, and there is also a header of sizeof(QStringData) = 24. This memory overhead is the reason why it still makes sense to still use QLatin1String when the function you are calling has an overload for it.

    QString returnAString() {
        return QStringLiteral("Hello");
    }

    Compiled with g++ -O2 -S -std=c++0x (GCC 4.7) on x86_64

        .text
        .globl  _Z13returnAStringv
        .type   _Z13returnAStringv, @function
    _Z13returnAStringv:
        ; load the address of the QStringData into %rdx
        leaq    _ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal(%rip), %rdx
        movq    %rdi, %rax
        ; copy the QStringData from %rdx to the QString return object
        ; allocated by the caller.  (the QString constructor has been inlined)
        movq    %rdx, (%rdi)
        ret
        .size   _Z13returnAStringv, .-_Z13returnAStringv
        .section    .rodata
        .align 32
        .type   _ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal, @object
        .size   _ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal, 40
    _ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal:
        .long   -1   ; ref
        .long   5    ; size
        .long   0    ; alloc + capacityReserved 
        .zero   4    ; padding
        .quad   24   ; offset
        .string "H"  ; the data. Each .string add a terminal ''
        .string "e"
        .string "l"
        .string "l"
        .string "o"
        .string ""
        .string ""
        .zero   4
    

    Conclusion

    I hope that now that you have read this you will have a better understanding on where to use and not to use QStringLiteral.
    There is another macro QByteArrayLiteral, which work exactly on the same principle but creates a QByteArray.

    Update: See also the internals of QMutex and more C++11 features in Qt5.

  • 相关阅读:
    zepto.js介绍
    box-sizing属性
    响应式网页高度自适应原理
    固定-比例-固定
    常用的js正则验证
    mysql之连接查询(多表查询)
    数据约束
    mysql表数据的增删改查
    mysql学习之数据库管理与表管理
    滑动选项卡的制作
  • 原文地址:https://www.cnblogs.com/liujx2019/p/13891155.html
Copyright © 2020-2023  润新知