• String源码(1.8)


    1.String  存储的值就是一个char数组

     1 /** The value is used for character storage. */ 2 private final char value[]; 

    2.传入int作为参数,这个int是这个字对应的Unicode(16进制数)。每个最大65535 0xFFFF

    public static final int MIN_CODE_POINT = 0x000000;

    public static final int MAX_CODE_POINT = 0X10FFFF;

    UTF-16中的基本单位是两个字节的码元,基本的码元范围是(0x0000-0xFFFF), UTF-16的字符映射范围是(U+0000,U+10FFFF),

    当一个生僻字符需要使用0xFFFF以上的映射范围时,其需要使用两个码元(4Byte)进行表示. 其映射规则如下

    第一个码元(前导代理)范围:0xD800 - 0xDBFF

    第二个码元(后尾代理)范围:0xDC00 - 0xDFFF

    有:(0xDBFF-0xD800+1)*(0xDFFF-0xDC00+1) === (0x10FFFF-0xFFFF)双射

    所以(0xD800 - 0xDBFF)范围内的码元不能单独表示字符,其必须与后尾代理一起构成一个完整字符.

     参考:https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

     1     public String(int[] codePoints, int offset, int count) {
     2         if (offset < 0) {
     3             throw new StringIndexOutOfBoundsException(offset);
     4         }
     5         if (count <= 0) {
     6             if (count < 0) {
     7                 throw new StringIndexOutOfBoundsException(count);
     8             }
     9             if (offset <= codePoints.length) {
    10                 this.value = "".value;
    11                 return;
    12             }
    13         }
    14         // Note: offset or count might be near -1>>>1.
    15         if (offset > codePoints.length - count) {
    16             throw new StringIndexOutOfBoundsException(offset + count);
    17         }
    18 
    19         final int end = offset + count;
    20 
    21         // Pass 1: Compute precise size of char[]
    22         int n = count;
    23         for (int i = offset; i < end; i++) {
    24             int c = codePoints[i];
    25             if (Character.isBmpCodePoint(c))
    26                 continue;
    27             else if (Character.isValidCodePoint(c))
    28                 n++;
    29             else throw new IllegalArgumentException(Integer.toString(c));
    30         }
    31 
    32         // Pass 2: Allocate and fill in char[]
    33         final char[] v = new char[n];
    34 
    35         for (int i = offset, j = 0; i < end; i++, j++) {
    36             int c = codePoints[i];
    37             if (Character.isBmpCodePoint(c))
    38                 v[j] = (char)c;
    39             else
    40                 Character.toSurrogates(c, v, j++);
    41         }
    42 
    43         this.value = v;
    44     }
    Character.isBmpCodePoint(c) 判断是不是只有一个码元的字符,
    Character.isValidCodePoint(c) 判断在字符范围内。此时n++,这个int要用2个char表示。
    Character.toSurrogates(c, v, j++) 将int分解成2个char

    3.length()返回的是码元char的数量,而不是字的数量,有些字要占两个char

    1     public int length() {
    2         return value.length;
    3     }

    4.String.join  免去StringBuild自己拼还要去掉最后一个delimiter

    1 public static String join(CharSequence delimiter, CharSequence... elements)

    5.native 关键字 调用别的语言的代码。

    1 public native String intern();

    深入解析String#intern

    https://tech.meituan.com/in_depth_understanding_string_intern.html





     
  • 相关阅读:
    实现自动进行金额汇总
    实现模糊查询
    手电筒查询
    lov的建立
    日历 的建立
    快速创建Folder
    TAB页制作
    堆叠画布
    弹性域的开发
    注销记录的实现
  • 原文地址:https://www.cnblogs.com/xuemanjiangnan/p/7404533.html
Copyright © 2020-2023  润新知