• Java6 String.substring()方法的内存泄露


    substring(start,end)在Java编程里面经常使用,没想到如果使用不当,会出现内存泄露。

    要了解substring(),最好的方法便是查看源码(jdk6):

     1  /**
     2      * <blockquote><pre>
     3      * "hamburger".substring(4, 8) returns "urge"
     4      * "smiles".substring(1, 5) returns "mile"
     5      * </pre></blockquote>
     6      *
     7      * @param      beginIndex   the beginning index, inclusive.
     8      * @param      endIndex     the ending index, exclusive.
     9      * @return     the specified substring.
    10      * @exception  IndexOutOfBoundsException  if the
    11      *             <code>beginIndex</code> is negative, or
    12      *             <code>endIndex</code> is larger than the length of
    13      *             this <code>String</code> object, or
    14      *             <code>beginIndex</code> is larger than
    15      *             <code>endIndex</code>.
    16      */
    17     public String substring(int beginIndex, int endIndex) {
    18     if (beginIndex < 0) {
    19         throw new StringIndexOutOfBoundsException(beginIndex);
    20     }
    21     if (endIndex > count) {
    22         throw new StringIndexOutOfBoundsException(endIndex);
    23     }
    24     if (beginIndex > endIndex) {
    25         throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    26     }
    27     return ((beginIndex == 0) && (endIndex == count)) ? this :
    28         new String(offset + beginIndex, endIndex - beginIndex, value);
    29     }

    插一句,这段substring()的源代码,为如何编写api提供了很好的一个例子,让我想起了老赵的一篇文章,对参数的判断,异常的处理,思路上有点接近。

    值得注意的是,如果调用substring(i,i)的话(即beginIndex==endIndex)或者是substring(stringLength)(即是beginIndex==字符串长度),并不会抛出异常,而是会返回一个空的字符串,因为new String(offset + beginIndex , 0 , value)。

    言归正传,真正创建字符串的,是一个String(int,in,char[])的构造函数,源代码如下:

    1 // Package private constructor which shares value array for speed.
    2     String(int offset, int count, char value[]) {
    3     this.value = value;
    4     this.offset = offset;
    5     this.count = count;
    6     }

    Java里的字符串,其实是由三个私有变量定义:

    public final class String
        implements java.io.Serializable, Comparable<String>, CharSequence
    {
        /** The value is used for character storage. */
        private final char value[];
    
        /** The offset is the first index of the storage that is used. */
        private final int offset;
    
        /** The count is the number of characters in the String. */
        private final int count;
    }

    当为字符串分配内存时,char数组存储字符,offset=0,count=字符串长度。问题在于,由substring(start,end)调用构造函数String(int,in,char[])时,实际上是改变offset和count的位置达到取得子字符串的目的,而子字符串里的value[]数组,仍然指向原字符串。假设原字符串s有1GB,且我们需要的是s.substring(1,10)这样一段小的字符串,但由于substring()里的value[]数组仍然指向1GB的原字符串,导致原字符串无法在GC中释放,从而产生了内存泄露。

    但为什么要这样设计呢?由于String是不可变的(immutable),基于这种共享同一个字符数组的设计有以下好处:

    调用substring()时无需复制数组,可重用value[]数组;且substring()的运行是常数时间,非线性,性能得到提高(这也是第二段代码注释的意思:share values for speed)。

    而劣势,便是可能会产生内存泄露(实际上,Oracle早有人提出这个bug:http://bugs.sun.com/view_bug.do?bug_id=4513622)。

    如何避免这个问题呢?有一个变通的方案,通过一个构造函数,复制一段数组:

     1 /**
     2      * Initializes a newly created {@code String} object so that it represents
     3      * the same sequence of characters as the argument; in other words, the
     4      * newly created string is a copy of the argument string. Unless an
     5      * explicit copy of {@code original} is needed, use of this constructor is
     6      * unnecessary since Strings are immutable.
     7      *
     8      * @param  original
     9      *         A {@code String}
    10      */
    11     public String(String original) {
    12     int size = original.count;
    13     char[] originalValue = original.value;
    14     char[] v;
    15       if (originalValue.length > size) {
    16          // The array representing the String is bigger than the new
    17          // String itself.  Perhaps this constructor is being called
    18          // in order to trim the baggage, so make a copy of the array.
    19             int off = original.offset;
    20             v = Arrays.copyOfRange(originalValue, off, off+size);
    21      } else {
    22          // The array representing the String is the same
    23          // size as the String, so no point in making a copy.
    24         v = originalValue;
    25      }
    26     this.offset = 0;
    27     this.count = size;
    28     this.value = v;
    29     }
    30 
    31 //smalStr no longer holds the value[] of 1GB
    32 String smallStr = new String(s.substring(1,10));

    上面的构造方法,重新复制了一段数组给v,然后再将v给字符串的数组,从而避免内存泄露。

    在Java7里,String的实现已经改变,substring()方法的实现,由原来的共享数组变成了传统的拷贝,杜绝了内存泄露的同时也将运行时间由常数变成了线性:

     1 public String substring(int beginIndex, int endIndex) {
     2         if (beginIndex < 0) {
     3             throw new StringIndexOutOfBoundsException(beginIndex);
     4         }
     5         if (endIndex > value.length) {
     6             throw new StringIndexOutOfBoundsException(endIndex);
     7         }
     8         int subLen = endIndex - beginIndex;
     9         if (subLen < 0) {
    10             throw new StringIndexOutOfBoundsException(subLen);
    11         }
    12         return ((beginIndex == 0) && (endIndex == value.length)) ? this
    13                 : new String(value, beginIndex, subLen);
    14     }
    /**
         * Allocates a new {@code String} that contains characters from a subarray
         * of the character array argument. The {@code offset} argument is the
         * index of the first character of the subarray and the {@code count}
         * argument specifies the length of the subarray. The contents of the
         * subarray are copied; subsequent modification of the character array does
         * not affect the newly created string.
         *
         * @param  value
         *         Array that is the source of characters
         *
         * @param  offset
         *         The initial offset
         *
         * @param  count
         *         The length
         *
         * @throws  IndexOutOfBoundsException
         *          If the {@code offset} and {@code count} arguments index
         *          characters outside the bounds of the {@code value} array
         */
        public String(char value[], int offset, int count) {
            if (offset < 0) {
                throw new StringIndexOutOfBoundsException(offset);
            }
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            // Note: offset or count might be near -1>>>1.
            if (offset > value.length - count) {
                throw new StringIndexOutOfBoundsException(offset + count);
            }
            this.value = Arrays.copyOfRange(value, offset, offset+count);
        }

    这个构造函数,每次都会复制数组,实现与Java6并不一样。至于哪个好哪个坏,其实很难说清楚。

    据说有一种Rope的数据结构,可以更加高效地处理字符串,得好好看看。

    参考:

    http://javarevisited.blogspot.hk/2011/10/how-substring-in-java-works.html

    http://eyalsch.wordpress.com/2009/10/27/stringleaks/

    http://blog.zhaojie.me/2013/03/string-and-rope-1-string-in-dotnet-and-java.html

    http://www.transylvania-jug.org/archives/5530

  • 相关阅读:
    Java常见问题汇总
    前端url参数中带有callback并产生错误
    shiro中ecache-core版本引起的异常
    深入SpringMVC注解
    导出表格数据到excel并下载(HSSFWorkbook版)
    layui数据表格及分页
    签名的生成
    程序的健壮性Robustness
    ASP.NET MVC中注册Global.asax的Application_Error事件处理全局异常
    生成二维码功能
  • 原文地址:https://www.cnblogs.com/techyc/p/3324021.html
Copyright © 2020-2023  润新知