• Java字符串拼接


    字符串拼接


    String

    在Java中,String是一个不可变类,所以String对象一旦在堆中被创建出来就不能修改。

    package java.lang;
    //import ...
    public final class String
        implements java.io.Serializable, Comparable<String>, CharSequence {
        /** The value is used for character storage. */
        private final char value[];
    }
    

    Java字符串其实是基于字符数组实现的,该数组被关键字final标注,一经赋值就不可修改。

    既然字符串是不可变的,那么字符串拼接又是怎么回事呢?

    字符串不变性与字符串拼接

    其实所谓的字符串拼接,都是重新生成了一个新的字符串(JDK7开始,substring() 操作也是重新生成一个新的字符串)。下面一段字符串拼接代码:

    String s = "hello ";
    s = s.concat("world!");
    

    其实生成了一个新字符串,s最终保存的是一个新字符串的引用,如下图所示:


    Java字符串拼接方式

    + 语法糖

    在Java中,拼接字符串最简单的方式就是直接使用符号+来拼接,如:

    public class Main2 {
        public static void main(String[] args) {
            String s1 = "hello " + "world " + "!";
            String s2 = "xzy ";
            String s3 = s2 + s1;
        }
    
        private void concat(String s1) {
            String s2 = "xzy" + s1;
        }
    }
    

    这里要特别说明一点,有人把Java中使用+拼接字符串的功能理解为运算符重载。其实并不是,Java是不支持运算符重载的,这其实只是Java提供的一个语法糖

    编译,反编译上面的代码:

    public class Main2 {
        public Main2() {
        }
    
        public static void main(String[] var0) {
            String var1 = "hello world !";
            String var2 = "xzy ";
            (new StringBuilder()).append(var2).append(var1).toString();
        }
    
        private void concat(String var1) {
            (new StringBuilder()).append("xzy").append(var1).toString();
        }
    }
    

    通过查看反编译后的代码,我们发现,使用 + 进行字符串拼接,最终是通过StringBuilder,创建一个新的String对象。

    concat

    除了使用+拼接字符串之外,还可以使用String类中的方法concat方法来拼接字符串,如:

        public static void main(String[] args) {
            String s1 = "hello " + "world " + "!";
            String s2 = "xzy ";
            String s3 = s2.concat(s1);
        }
    

    concat方法的源码如下:

    public final class String
        implements java.io.Serializable, Comparable<String>, CharSequence {
        
        /** The value is used for character storage. */
        private final char value[];
        
       /**
         * Concatenates the specified string to the end of this string.
         * <p>
         * If the length of the argument string is {@code 0}, then this
         * {@code String} object is returned. Otherwise, a
         * {@code String} object is returned that represents a character
         * sequence that is the concatenation of the character sequence
         * represented by this {@code String} object and the character
         * sequence represented by the argument string.<p>
         * Examples:
         * <blockquote><pre>
         * "cares".concat("s") returns "caress"
         * "to".concat("get").concat("her") returns "together"
         * </pre></blockquote>
         *
         * @param   str   the {@code String} that is concatenated to the end
         *                of this {@code String}.
         * @return  a string that represents the concatenation of this object's
         *          characters followed by the string argument's characters.
         */
        public String concat(String str) {
            int otherLen = str.length();
            if (otherLen == 0) {
                return this;
            }
            int len = value.length;
            char buf[] = Arrays.copyOf(value, len + otherLen);
            str.getChars(buf, len);
            return new String(buf, true);
        }
        
       /**
         * Copy characters from this string into dst starting at dstBegin.
         * This method doesn't perform any range checking.
         */
        void getChars(char dst[], int dstBegin) {
            System.arraycopy(value, 0, dst, dstBegin, value.length);
        }
    }
    

    Arrays.copyOf()方法源码:

    //创建一个长度为newLength的字符数组,然后将original字符数组中的字符拷贝过去。
    public static char[] copyOf(char[] original, int newLength) {
        char[] copy = new char[newLength];
        System.arraycopy(original, 0, copy, 0,
                         Math.min(original.length, newLength));
        return copy;
    }
    

    从上面的源码看出,使用a.concat(b)拼接字符串a b,创建了一个长度为a.length + b.length的字符数组,a和b先后被拷贝进字符数组,最后使用这个字符数组创建了一个新的String对象。

    StringBuffer 和StringBuilder

    关于字符串,Java中除了定义了一个可以用来定义字符串常量String类以外,还提供了可以用来定义字符串变量StringBuffer类、StringBuilder,它的对象是可以扩充和修改的,如:

    public static void main(String[] args) {
        StringBuffer stringBuffer = new StringBuffer();
        String s1 = "hello " + "world " + "!";
        String s2 = "xzy ";
        String s3;
        stringBuffer.append(s1).append(s2);
        s3 = stringBuffer.toString();
    }
    
    public static void main(String[] args) {
        StringBuilder stringBuilder = new StringBuilder();
        String s1 = "hello " + "world " + "!";
        String s2 = "xzy ";
        String s3;
        stringBuilder.append(s1).append(s2);
        s3 = stringBuilder.toString();
    }
    

    接下来看看StringBuffer和StringBuilder的实现原理。

    StringBuffer和StringBuilder都继承自AbstractStringBuilder,下面是AbstractStringBuilder的部分源码:

    abstract class AbstractStringBuilder implements Appendable, CharSequence {
        /**
         * The value is used for character storage.
         */
        char[] value;
    
        /**
         * The count is the number of characters used.
         */
        int count;
    }
    

    与String类似,AbstractStringBuilder也封装了一个字符数组,不同的是,这个字符数组没有使用final关键字修改,也就是所,这个字符数组是可以修改的。还要一个差异就是,这个字符数组不一定所有位置都要被占满,AbstractStringBuilder中有一个count变量同来记录字符数组中存在的字符个数。

    试着看看StringBuffer、StringBuilder、AbstractStringBuilder中append方法的源码:

     public final class StringBuffer 
         extends AbstractStringBuilder 
         implements java.io.Serializable, CharSequence{
        
       /**
         * A cache of the last value returned by toString. Cleared
         * whenever the StringBuffer is modified.
         */
        private transient char[] toStringCache;
         
        @Override
        public synchronized StringBuffer append(String str) {
            toStringCache = null;
            super.append(str);
            return this;
        }
     }
    
    public final class StringBuilder
        extends AbstractStringBuilder
        implements java.io.Serializable, CharSequence{
            
        @Override
        public StringBuilder append(String str) {
            super.append(str);
            return this;
        }
    }
    
    abstract class AbstractStringBuilder implements Appendable, CharSequence {
        /**
         * The value is used for character storage.
         */
        char[] value;
    
        /**
         * The count is the number of characters used.
         */
        int count;
            
       /**
         * Appends the specified string to this character sequence.
         * <p>
         * The characters of the {@code String} argument are appended, in
         * order, increasing the length of this sequence by the length of the
         * argument. If {@code str} is {@code null}, then the four
         * characters {@code "null"} are appended.
         * <p>
         * Let <i>n</i> be the length of this character sequence just prior to
         * execution of the {@code append} method. Then the character at
         * index <i>k</i> in the new character sequence is equal to the character
         * at index <i>k</i> in the old character sequence, if <i>k</i> is less
         * than <i>n</i>; otherwise, it is equal to the character at index
         * <i>k-n</i> in the argument {@code str}.
         *
         * @param   str   a string.
         * @return  a reference to this object.
         */
        public AbstractStringBuilder append(String str) {
            if (str == null)
                return appendNull();
            int len = str.length();
            ensureCapacityInternal(count + len);
            //拷贝字符到内部的字符数组中,如果字符数组长度不够,进行扩展。
            str.getChars(0, len, value, count);
            count += len;
            return this;
        }
    }
    

    可以观察到一个比较明显的差异,StringBuffer类的append方法使用synchronized关键字修饰,说明StringBuffer的append方法是线程安全的,为了实现线程安全,StringBuffer牺牲了部分性能。


    效率比较

    既然有这么多种字符串拼接的方法,那么到底哪一种效率最高呢?我们来简单对比一下。

    long t1 = System.currentTimeMillis();
    //这里是初始字符串定义
    for (int i = 0; i < 50000; i++) {
        //这里是字符串拼接代码
    }
    long t2 = System.currentTimeMillis();
    System.out.println("cost:" + (t2 - t1));
    
    public class Main2 {
        public static void main(String[] args) {
            test1();
            test2();
            test3();
            test4();
        }
    
        public static void test1() {
            long t1 = System.currentTimeMillis();
            String str = "";
            for (int i = 0; i < 50000; i++) {
                String s = String.valueOf(i);
                str += s;
            }
            long t2 = System.currentTimeMillis();
            System.out.println("+ cost:" + (t2 - t1));
        }
    
        public static void test2() {
            long t1 = System.currentTimeMillis();
            String str = "";
            for (int i = 0; i < 50000; i++) {
                String s = String.valueOf(i);
                str = str.concat("hello");
            }
            long t2 = System.currentTimeMillis();
            System.out.println("concat cost:" + (t2 - t1));
        }
    
        public static void test3() {
            long t1 = System.currentTimeMillis();
            String str;
            StringBuffer stringBuffer = new StringBuffer();
            for (int i = 0; i < 50000; i++) {
                String s = String.valueOf(i);
                stringBuffer.append(s);
            }
            str = stringBuffer.toString();
            long t2 = System.currentTimeMillis();
            System.out.println("stringBuffer cost:" + (t2 - t1));
        }
    
        public static void test4() {
            long t1 = System.currentTimeMillis();
            String str;
            StringBuilder stringBuilder = new StringBuilder();
            for (int i = 0; i < 50000; i++) {
                String s = String.valueOf(i);
                stringBuilder.append(s);
            }
            str = stringBuilder.toString();
            long t2 = System.currentTimeMillis();
            System.out.println("stringBuilder cost:" + (t2 - t1));
        }
    }
    

    我们使用形如以上形式的代码,分别测试下五种字符串拼接代码的运行时间。得到结果如下:

    + cost:7135
    concat cost:1759
    stringBuffer cost:5
    stringBuilder cost:5
    

    从结果可以看出,用时从短到长的对比是:

    StringBuilder < StringBuffer < concat < +

    那么问题来了,前面我们分析过,其实使用+拼接字符串的实现原理也是使用的StringBuilder,那为什么结果相差这么多,高达1000多倍呢?

    反编译上面的代码:

    /*
     * Decompiled with CFR 0.149.
     */
    package com.learn.java;
    
    public class Main2 {
        public static void main(String[] arrstring) {
            Main2.test1();
            Main2.test2();
            Main2.test3();
            Main2.test4();
        }
    
        public static void test1() {
            long l = System.currentTimeMillis();
            String string = "";
            for (int i = 0; i < 50000; ++i) {
                String string2 = String.valueOf(i);
                string = new StringBuilder().append(string).append(string2).toString();
            }
            long l2 = System.currentTimeMillis();
            System.out.println(new StringBuilder().append("+ cost:").append(l2 - l).toString());
        }
    
        public static void test2() {
            long l = System.currentTimeMillis();
            String string = "";
            for (int i = 0; i < 50000; ++i) {
                String string2 = String.valueOf(i);
                string = string.concat("hello");
            }
            long l2 = System.currentTimeMillis();
            System.out.println(new StringBuilder().append("concat cost:").append(l2 - l).toString());
        }
    
        public static void test3() {
            long l = System.currentTimeMillis();
            StringBuffer stringBuffer = new StringBuffer();
            for (int i = 0; i < 50000; ++i) {
                String string = String.valueOf(i);
                stringBuffer.append(string);
            }
            String string = stringBuffer.toString();
            long l2 = System.currentTimeMillis();
            System.out.println(new StringBuilder().append("stringBuffer cost:").append(l2 - l).toString());
        }
    
        public static void test4() {
            long l = System.currentTimeMillis();
            StringBuilder stringBuilder = new StringBuilder();
            for (int i = 0; i < 50000; ++i) {
                String string = String.valueOf(i);
                stringBuilder.append(string);
            }
            String string = stringBuilder.toString();
            long l2 = System.currentTimeMillis();
            System.out.println(new StringBuilder().append("stringBuilder cost:").append(l2 - l).toString());
        }
    }
    

    我们可以看到,反编译后的代码,在for循环中,每次都是new了一个StringBuilder,然后再把String转成StringBuilder,再进行append

    而频繁的新建对象当然要耗费很多时间了,不仅仅会耗费时间,频繁的创建对象,还会造成内存资源的浪费。

    所以,阿里巴巴Java开发手册建议:循环体内,字符串的连接方式,使用 StringBuilderappend 方法进行扩展。而不要使用+

    总结

    常用的字符串拼接方式有:+、使用concat、使用StringBuilder、使用StringBuffer

    由于字符串拼接过程中会创建新的对象,所以如果要在一个循环体中进行字符串拼接,就要考虑内存问题和效率问题。

    经过对比,我们发现,直接使用StringBuilder的方式是效率最高的。因为StringBuilder天生就是设计来定义可变字符串和字符串的变化操作的。

    但是,还要强调的是:

    1、如果不是在循环体中进行字符串拼接的话,直接使用+就好了。

    2、如果在并发场景中进行字符串拼接的话,要使用StringBuffer来代替StringBuilder


    参考文献:https://hollischuang.github.io/toBeTopJavaer/#/basics/java-basic/string-concat

  • 相关阅读:
    什么时候应该使用C#的属性
    Unicode和字符集小结
    C#编译器怎么检查代码是否会执行
    C#中如何操作2个list
    用Windbg来看看CLR的JIT是什么时候发生的
    bzoj-1579: [Usaco2009 Feb]Revamping Trails 道路升级
    次小生成树
    bzoj-3687: 简单题
    bzoj-3669: [Noi2014]魔法森林
    uva 11732 (trie树)
  • 原文地址:https://www.cnblogs.com/XiaoZhengYu/p/12798082.html
Copyright © 2020-2023  润新知