• 深入研究BufferedInputStream内幕


    1 概述

    最近研究JDK源码,发现IO体系中的BufferedInputStream,很有意思,平时对这个类有不少误解,于是写下这篇博客,以供学习

    2 BufferedInputStream源码分析

    /**
      * 此类继承FilterInputStream,该类使用了装饰着设计模式,FilterInputStream的源码超级简单
      */
    public class BufferedInputStream extends FilterInputStream {
        
    	// 默认的buf[]缓存数组大小
        private static int DEFAULT_BUFFER_SIZE = 8192;
    
        /**
         * The maximum size of array to allocate.
         * Some VMs reserve some header words in an array.
         * Attempts to allocate larger arrays may result in
         * OutOfMemoryError: Requested array size exceeds VM limit
         *
         * buf[]缓存数组最大值 为什么会 减去8呢?因为一些JVM会数组头部存一些数据
         */
        private static int MAX_BUFFER_SIZE = Integer.MAX_VALUE - 8;
    
        /**
         * The internal buffer array where the data is stored. When necessary,
         * it may be replaced by another array of
         * a different size.
         *
         * 缓存数组,核心成员变量,所有操作都是围绕buf[]
         */
        protected volatile byte buf[];
    
        /**
         * Atomic updater to provide compareAndSet for buf. This is
         * necessary because closes can be asynchronous. We use nullness
         * of buf[] as primary indicator that this stream is closed. (The
         * "in" field is also nulled out on close.)
         *
         * 多线程相关,确保操作线程安全
         */
        private static final
            AtomicReferenceFieldUpdater<BufferedInputStream, byte[]> bufUpdater =
            AtomicReferenceFieldUpdater.newUpdater
            (BufferedInputStream.class,  byte[].class, "buf");
    
        /**
         * The index one greater than the index of the last valid byte in
         * the buffer.
         * This value is always
         * in the range <code>0</code> through <code>buf.length</code>;
         * elements <code>buf[0]</code>  through <code>buf[count-1]
         * </code>contain buffered input data obtained
         * from the underlying  input stream.
         *
         * buf[]数组中,有效数据的总数
         */
        protected int count;
    
        /**
         * The current position in the buffer. This is the index of the next
         * character to be read from the <code>buf</code> array.
         * <p>
         * This value is always in the range <code>0</code>
         * through <code>count</code>. If it is less
         * than <code>count</code>, then  <code>buf[pos]</code>
         * is the next byte to be supplied as input;
         * if it is equal to <code>count</code>, then
         * the  next <code>read</code> or <code>skip</code>
         * operation will require more bytes to be
         * read from the contained  input stream.
         *
         * @see     java.io.BufferedInputStream#buf
         *
         * buf[]数组中,当前读取位置
         */
        protected int pos;
    
        /**
         * The value of the <code>pos</code> field at the time the last
         * <code>mark</code> method was called.
         * <p>
         * This value is always
         * in the range <code>-1</code> through <code>pos</code>.
         * If there is no marked position in  the input
         * stream, this field is <code>-1</code>. If
         * there is a marked position in the input
         * stream,  then <code>buf[markpos]</code>
         * is the first byte to be supplied as input
         * after a <code>reset</code> operation. If
         * <code>markpos</code> is not <code>-1</code>,
         * then all bytes from positions <code>buf[markpos]</code>
         * through  <code>buf[pos-1]</code> must remain
         * in the buffer array (though they may be
         * moved to  another place in the buffer array,
         * with suitable adjustments to the values
         * of <code>count</code>,  <code>pos</code>,
         * and <code>markpos</code>); they may not
         * be discarded unless and until the difference
         * between <code>pos</code> and <code>markpos</code>
         * exceeds <code>marklimit</code>.
         *
         * @see     java.io.BufferedInputStream#mark(int)
         * @see     java.io.BufferedInputStream#pos
         *
         * 最后一次,调用mark方法,标记的位置
         */
        protected int markpos = -1;
    
        /**
         * The maximum read ahead allowed after a call to the
         * <code>mark</code> method before subsequent calls to the
         * <code>reset</code> method fail.
         * Whenever the difference between <code>pos</code>
         * and <code>markpos</code> exceeds <code>marklimit</code>,
         * then the  mark may be dropped by setting
         * <code>markpos</code> to <code>-1</code>.
         *
         * @see     java.io.BufferedInputStream#mark(int)
         * @see     java.io.BufferedInputStream#reset()
         *
         * 该变量唯一入口就是mark(int readLimit),比如调用方法,mark(1024),那么后面读取的数据如果
         * 超过了1024字节,那么此次mark就为无效标记,子类可以选择抛弃该mark标记,从头开始。不过具体实现
         * 跟具体的子类有关,在BufferedInputStream中,会抛弃mark标记,重新将markpos赋值为-1
         */
        protected int marklimit;
    
        /**
         * Check to make sure that underlying input stream has not been
         
         * nulled out due to close; if not return it;
         *
         * 获取真正的输入流
         */
        private InputStream getInIfOpen() throws IOException {
            InputStream input = in;
            if (input == null)
                throw new IOException("Stream closed");
            return input;
        }
    
        /**
         * Check to make sure that buffer has not been nulled out due to
         * close; if not return it;
         *
         * 获取缓存数组
         */
        private byte[] getBufIfOpen() throws IOException {
            byte[] buffer = buf;
            if (buffer == null)
                throw new IOException("Stream closed");
            return buffer;
        }
    
        /**
         * Creates a <code>BufferedInputStream</code>
         * and saves its  argument, the input stream
         * <code>in</code>, for later use. An internal
         * buffer array is created and  stored in <code>buf</code>.
         *
         * @param   in   the underlying input stream.
         *
         * 默认缓存数组大小为8kb
         */
        public BufferedInputStream(InputStream in) {
            this(in, DEFAULT_BUFFER_SIZE);
        }
    
        /**
         * Creates a <code>BufferedInputStream</code>
         * with the specified buffer size,
         * and saves its  argument, the input stream
         * <code>in</code>, for later use.  An internal
         * buffer array of length  <code>size</code>
         * is created and stored in <code>buf</code>.
         *
         * @param   in     the underlying input stream.
         * @param   size   the buffer size.
         * @exception IllegalArgumentException if {@code size <= 0}.
         */
        public BufferedInputStream(InputStream in, int size) {
            super(in);
            if (size <= 0) {
                throw new IllegalArgumentException("Buffer size <= 0");
            }
            buf = new byte[size];
        }
    
        /**
         * Fills the buffer with more data, taking into account
         * shuffling and other tricks for dealing with marks.
         * Assumes that it is being called by a synchronized method.
         * This method also assumes that all data has already been read in,
         * hence pos > count.
         *
         * 该方法作用,通过丢弃buf[]数据、增大buf[]数组,以腾出位置,将输入流中新的数据保存到buf[]缓存数组中
         */
        private void fill() throws IOException {
            byte[] buffer = getBufIfOpen();
            if (markpos < 0)
                // 因为没有mark标记,直接丢弃buf[]数据
                pos = 0;            /* no mark: throw away the buffer */
            else if (pos >= buffer.length)  /* no room left in buffer */
                if (markpos > 0) {  /* can throw away early part of the buffer */
                    int sz = pos - markpos;
                    System.arraycopy(buffer, markpos, buffer, 0, sz);
                    pos = sz;
                    markpos = 0;
                // !!!往下执行,markpos全部等于0
                } else if (buffer.length >= marklimit) {
                    markpos = -1;   /* buffer got too big, invalidate mark */
                    pos = 0;        /* drop buffer contents */
                } else if (buffer.length >= MAX_BUFFER_SIZE) {
                    throw new OutOfMemoryError("Required array size too large");
                } else {            /* grow buffer */
                    int nsz = (pos <= MAX_BUFFER_SIZE - pos) ?
                            pos * 2 : MAX_BUFFER_SIZE;
                    if (nsz > marklimit)
                        // buf[]长度不超过marklimit,这样mark标记始终有效
                        nsz = marklimit;
                    byte nbuf[] = new byte[nsz];
                    System.arraycopy(buffer, 0, nbuf, 0, pos);
                    if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {
                        // Can't replace buf if there was an async close.
                        // Note: This would need to be changed if fill()
                        // is ever made accessible to multiple threads.
                        // But for now, the only way CAS can fail is via close.
                        // assert buf == null;
                        throw new IOException("Stream closed");
                    }
                    buffer = nbuf;
                }
            count = pos;
            // 将输入流中的数据独到buf[]数组中
            int n = getInIfOpen().read(buffer, pos, buffer.length - pos);
            if (n > 0)
                count = n + pos;
        }
    
        /**
         * See
         * the general contract of the <code>read</code>
         * method of <code>InputStream</code>.
         *
         * @return     the next byte of data, or <code>-1</code> if the end of the
         *             stream is reached.
         * @exception  IOException  if this input stream has been closed by
         *                          invoking its {@link #close()} method,
         *                          or an I/O error occurs.
         * @see        java.io.FilterInputStream#in
         */
        public synchronized int read() throws IOException {
            // 说明当前buf[]数组大小不够了,需要fill()
            if (pos >= count) {
                fill();
                // 说明没有读取到任何数据
                if (pos >= count)
                    return -1;
            }
            return getBufIfOpen()[pos++] & 0xff;
        }
    
        /**
         * Read characters into a portion of an array, reading from the underlying
         * stream at most once if necessary.
         */
        private int read1(byte[] b, int off, int len) throws IOException {
            int avail = count - pos;
            if (avail <= 0) {
                /* If the requested length is at least as large as the buffer, and
                   if there is no mark/reset activity, do not bother to copy the
                   bytes into the local buffer.  In this way buffered streams will
                   cascade harmlessly. */
                // !!!这个位置代码很重要
                // !!!这个位置代码很重要
                // !!!这个位置代码很重要
                /**
                  * 当写入指定数组b的长度大小超过BufferedInputStream中核心缓存数组buf[]的大小并且
                  * markpos < 0,那么就直接从数据流中读取数据给b数组,而不通过buf[]缓存数组,避免buf[]数组急剧增大
                  * 
                  */
                if (len >= getBufIfOpen().length && markpos < 0) {
                    return getInIfOpen().read(b, off, len);
                }
                fill();
                avail = count - pos;
                if (avail <= 0) return -1;
            }
            int cnt = (avail < len) ? avail : len;
            System.arraycopy(getBufIfOpen(), pos, b, off, cnt);
            pos += cnt;
            return cnt;
        }
    
        /**
         * Reads bytes from this byte-input stream into the specified byte array,
         * starting at the given offset.
         *
         * <p> This method implements the general contract of the corresponding
         * <code>{@link InputStream#read(byte[], int, int) read}</code> method of
         * the <code>{@link InputStream}</code> class.  As an additional
         * convenience, it attempts to read as many bytes as possible by repeatedly
         * invoking the <code>read</code> method of the underlying stream.  This
         * iterated <code>read</code> continues until one of the following
         * conditions becomes true: <ul>
         *
         *   <li> The specified number of bytes have been read,
         *
         *   <li> The <code>read</code> method of the underlying stream returns
         *   <code>-1</code>, indicating end-of-file, or
         *
         *   <li> The <code>available</code> method of the underlying stream
         *   returns zero, indicating that further input requests would block.
         *
         * </ul> If the first <code>read</code> on the underlying stream returns
         * <code>-1</code> to indicate end-of-file then this method returns
         * <code>-1</code>.  Otherwise this method returns the number of bytes
         * actually read.
         *
         * <p> Subclasses of this class are encouraged, but not required, to
         * attempt to read as many bytes as possible in the same fashion.
         *
         * @param      b     destination buffer.
         * @param      off   offset at which to start storing bytes.
         * @param      len   maximum number of bytes to read.
         * @return     the number of bytes read, or <code>-1</code> if the end of
         *             the stream has been reached.
         * @exception  IOException  if this input stream has been closed by
         *                          invoking its {@link #close()} method,
         *                          or an I/O error occurs.
         *
         * 该方法主要调用read1(byte[] b, int off, int len)
         */
        public synchronized int read(byte b[], int off, int len)
            throws IOException
        {
            getBufIfOpen(); // Check for closed stream
            if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
                throw new IndexOutOfBoundsException();
            } else if (len == 0) {
                return 0;
            }
    
            int n = 0;
            for (;;) {
                int nread = read1(b, off + n, len - n);
                if (nread <= 0)
                    return (n == 0) ? nread : n;
                n += nread;
                if (n >= len)
                    return n;
                // if not closed but no bytes available, return
                InputStream input = in;
                if (input != null && input.available() <= 0)
                    return n;
            }
        }
    
        /**
         * See the general contract of the <code>skip</code>
         * method of <code>InputStream</code>.
         *
         * @exception  IOException  if the stream does not support seek,
         *                          or if this input stream has been closed by
         *                          invoking its {@link #close()} method, or an
         *                          I/O error occurs.
         *
         * 跳过流中指定字节数,感觉该方法用处不大,至少到目前为止,我本人还从来没有用过skip方法
         */
        public synchronized long skip(long n) throws IOException {
            getBufIfOpen(); // Check for closed stream
            if (n <= 0) {
                return 0;
            }
            long avail = count - pos;
    
            if (avail <= 0) {
                // If no mark position set then don't keep in buffer
                if (markpos <0)
                    return getInIfOpen().skip(n);
    
                // Fill in buffer to save bytes for reset
                fill();
                avail = count - pos;
                if (avail <= 0)
                    return 0;
            }
    
            long skipped = (avail < n) ? avail : n;
            pos += skipped;
            return skipped;
        }
    
        /**
         * Returns an estimate of the number of bytes that can be read (or
         * skipped over) from this input stream without blocking by the next
         * invocation of a method for this input stream. The next invocation might be
         * the same thread or another thread.  A single read or skip of this
         * many bytes will not block, but may read or skip fewer bytes.
         * <p>
         * This method returns the sum of the number of bytes remaining to be read in
         * the buffer (<code>count&nbsp;- pos</code>) and the result of calling the
         * {@link java.io.FilterInputStream#in in}.available().
         *
         * @return     an estimate of the number of bytes that can be read (or skipped
         *             over) from this input stream without blocking.
         * @exception  IOException  if this input stream has been closed by
         *                          invoking its {@link #close()} method,
         *                          or an I/O error occurs.
         *
         * buf[]数组剩余字节数+输入流中剩余字节数
         */
        public synchronized int available() throws IOException {
            int n = count - pos;
            int avail = getInIfOpen().available();
            return n > (Integer.MAX_VALUE - avail)
                        ? Integer.MAX_VALUE
                        : n + avail;
        }
    
        /**
         * See the general contract of the <code>mark</code>
         * method of <code>InputStream</code>.
         *
         * @param   readlimit   the maximum limit of bytes that can be read before
         *                      the mark position becomes invalid.
         * @see     java.io.BufferedInputStream#reset()
         *
         * 标记位置,marklimit只有在这里才能够被赋值,readlimit表示mark()方法执行后,最多能够从流中
         * 读取的数据,如果超过该字节大小,那么在fill()的时候,就会认为此mark()标记无效,重新将
         * markpos = -1,pos = 0
         */
        public synchronized void mark(int readlimit) {
            marklimit = readlimit;
            markpos = pos;
        }
    
        /**
         * See the general contract of the <code>reset</code>
         * method of <code>InputStream</code>.
         * <p>
         * If <code>markpos</code> is <code>-1</code>
         * (no mark has been set or the mark has been
         * invalidated), an <code>IOException</code>
         * is thrown. Otherwise, <code>pos</code> is
         * set equal to <code>markpos</code>.
         *
         * @exception  IOException  if this stream has not been marked or,
         *                  if the mark has been invalidated, or the stream
         *                  has been closed by invoking its {@link #close()}
         *                  method, or an I/O error occurs.
         * @see        java.io.BufferedInputStream#mark(int)
         */
        public synchronized void reset() throws IOException {
            getBufIfOpen(); // Cause exception if closed
            if (markpos < 0)
                throw new IOException("Resetting to invalid mark");
            pos = markpos;
        }
    
        /**
         * Tests if this input stream supports the <code>mark</code>
         * and <code>reset</code> methods. The <code>markSupported</code>
         * method of <code>BufferedInputStream</code> returns
         * <code>true</code>.
         *
         * @return  a <code>boolean</code> indicating if this stream type supports
         *          the <code>mark</code> and <code>reset</code> methods.
         * @see     java.io.InputStream#mark(int)
         * @see     java.io.InputStream#reset()
         */
        public boolean markSupported() {
            return true;
        }
    
        /**
         * Closes this input stream and releases any system resources
         * associated with the stream.
         * Once the stream has been closed, further read(), available(), reset(),
         * or skip() invocations will throw an IOException.
         * Closing a previously closed stream has no effect.
         *
         * @exception  IOException  if an I/O error occurs.
         */
        public void close() throws IOException {
            byte[] buffer;
            while ( (buffer = buf) != null) {
                if (bufUpdater.compareAndSet(this, buffer, null)) {
                    InputStream input = in;
                    in = null;
                    if (input != null)
                        input.close();
                    return;
                }
                // Else retry in case a new buf was CASed in fill()
            }
        }
    }
    

    3 BufferedInputStream在实际场景中,没有太多用处

    网上很多博客,说BufferedInputStream很有用,可以一次性从IO中读入很多数据,然后缓存在buf[]中,这样就减少了IO消耗,很多博主,甚至给出了一些代码实操,证明BufferedInputStream确实可以提高效率,这本身没有任何问题,但是经我深入源码研究过后,却发现实际场景中,该类使用频率很少,根本不需要BufferedInputStream

    我将结合代码,进行更有力的说明:

    // file文件大小1个G
            private static String file = "D:\StudySoftware\VMware_virtualbox\Data_vmware\VMwareMachine\kafka_single\kafka-single-103-da5cf665.vmem";
    
    
    private static void file() throws IOException{
            long beginTime = System.currentTimeMillis();
            FileInputStream input = new FileInputStream(file);
            byte[] bytes = new byte[1024 * 1];
            int read = 0;
            while ((read = input.read(bytes, 0, bytes.length)) != -1) {
                // 不执行任何操作,仅仅读取文件
            }
            long endTime = System.currentTimeMillis();
            System.out.println("file: 耗费时间:" + (endTime - beginTime));
        }
    
        private static void bufferd() throws IOException{
            long beginTime = System.currentTimeMillis();
            FileInputStream input = new FileInputStream(file);
            BufferedInputStream bufferedInput = new BufferedInputStream(input);
            byte[] bytes = new byte[1024 * 1];
            int read = 0;
            while ((read = bufferedInput.read(bytes, 0, bytes.length)) != -1) {
                //不执行任何操作,仅仅读取文件
            }
            long endTime = System.currentTimeMillis();
            System.out.println("buffered: 耗费时间:" + (endTime - beginTime));
        }
    

    注意:

    代码操作的时候,两个方法不能够对同一个文件进行操作,防止JVM会自动优化,因为第一个方法读完整个文件,第二个方法再读的时候,JVM可能保存了部分信息,从而造成测试数据的不准确。并且为了最大程度保证测试数据的准确性,一次JVM启动,只测试一个方法

    结果:

    ①当 byte[] bytes = new byte[1024 * 1]; 数组大小为1024

    buffered: 耗费时间:855
    file: 耗费时间:3073

    ②当 byte[] bytes = new byte[1024 * 2]; 数组大小为2018

    buffered: 耗费时间:813
    file: 耗费时间:1909

    ③当 byte[] bytes = new byte[1024 * 3]; 数组大小为3072

    buffered: 耗费时间:1304
    file: 耗费时间:1476

    ④当 byte[] bytes = new byte[1024 * 4]; 数组大小为4096

    buffered: 耗费时间:844
    file: 耗费时间:1287

    ⑤当 byte[] bytes = new byte[1024 * 5]; 数组大小为5120

    buffered: 耗费时间:1343
    file: 耗费时间:1061

    ⑥当 byte[] bytes = new byte[1024 * 6]; 数组大小为6144

    buffered: 耗费时间:1280
    file: 耗费时间:985

    ⑦当 byte[] bytes = new byte[1024 * 7]; 数组大小为7168

    buffered: 耗费时间:1443
    file: 耗费时间:851

    ⑧当 byte[] bytes = new byte[1024 * 8]; 数组大小为8192

    buffered: 耗费时间:774
    file: 耗费时间:739

    ⑨当 byte[] bytes = new byte[1024 * 9]; 数组大小为9216

    buffered: 耗费时间:734
    file: 耗费时间:749

    ⑩当 byte[] bytes = new byte[1024 * 10]; 数组大小为10240

    buffered: 耗费时间:739
    file: 耗费时间:697

    ... ... ...

    我们可以得出以下重要结论:

    当bytes比较小时,使用BufferedInputStream确实读取文件时要快不少,可是当bytes逐步增大,尤其是达到8kb的时候,我们会发现 BufferedInputStreamFileInputStream读取文件速度差不多了,没有明显差异

    我们深入源码,即可发现:

    因此当我们把 while ((read = input.read(bytes, 0, bytes.length)) != -1)中的bytes增大时,BufferedInputStream没有任何作用(除非有mark、reset需求)

    有的小伙伴,肯定会说,那我将BufferedInputStream中的buf[]大小提高不就行了吗?

    可以是可以,但是我将 while ((read = input.read(bytes, 0, bytes.length)) != -1)中的bytes大小增大不就行了? 说到底都是字节数组,一个是在BufferedInputStream外面,一个是在BufferedInputStream内部,而现在我们进行流读取的时候,很多时候是不需要mark、reset操作的,并且我们设置外部bytes大小通常会比较大,这个时候,完全可以不使用BufferedInputStream

    4 BufferedInputStream使用场景

    ①第一种使用场景,就是当我们需要mark、reset特性时。不过要特别注意,mark、reset的使用,里面涉及到很多东西,特别是当BufferedInputStream执行fill()操作时

    public static void main(String[] args) {
        try {
            final byte[] src = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20};
            final ByteArrayInputStream bis = new ByteArrayInputStream(src);
            final BufferedInputStream bufis = new BufferedInputStream(bis, 5);
            int data = -1;
            int i = 0;
            while((data = bufis.read()) != -1) {
                if(data == 4) {
                    bufis.mark(2);
                }
                if(i++ == 9) {
                    bufis.reset();
                }
                System.out.printf("%d", data);
            }
        } catch(IOException ioex) {
            ioex.printStackTrace();
        }
    }
    // 原文链接:https://blog.csdn.net/qq_26971305/article/details/79472696
    

    有兴趣的朋友,可以debug上面的代码,debug下面的情况,相应你对BufferedInputStream有更深的理解

    if(i++ == 5)

    if(i++ == 6)

    if(i++ == 7)

    if(i++ == 8)

    if(i++ == 9)

    if(i++ == 10)

    ... ... ... 时间多的朋友,可以设置BufferedInputStream中buf[]的大小长度和if(i++ == xx)判断语句中的值来看看BufferedInputStream类的执行流程

    mark、reset特性不可乱用,不然会抛出异常的

        public synchronized void reset() throws IOException {
            getBufIfOpen(); // Cause exception if closed
            if (markpos < 0)
                throw new IOException("Resetting to invalid mark");
            pos = markpos;
        }
    
    

    ②第二种使用场景,当BufferedInputStream配合DataInputStreamObjectOutputStream使用时

    ObjectInputStream input = new ObjectInputStream(new BufferedInputStream(new FileInputStream("E:\obejct.txt")));
    
    DataInputStream input = new DataInputStream(new BufferedInputStream(new FileInputStream("E:\obejct.txt")));
    

    DataXxxStream从管道读取字节流的时候,是一个一个字节读取的

    ObjectInputStream底层依赖了DataXxxStream对象


    参考链接:
    作者:一杯热咖啡AAA
    出处:https://www.cnblogs.com/AdaiCoffee/
    本文以学习、研究和分享为主,欢迎转载。如果文中有不妥或者错误的地方还望指出,以免误人子弟。如果你有更好的想法和意见,可以留言讨论,谢谢!
  • 相关阅读:
    phpmyadmin详细的图文使用教程
    从入门到深入FIDDLER 2
    TestNG学习-001-基础理论知识
    TestNG学习-002-annotaton 注解概述及其执行顺序
    自动化测试如何解决验证码的问题
    自动化测试 -- 通过Cookie跳过登录验证码
    JMeter学习-012-JMeter 配置元件之-HTTP Cookie管理器-实现 Cookie 登录
    自动化测试框架
    并发和并行概念及原理
    匿名内部实现多线程的两种方式创建
  • 原文地址:https://www.cnblogs.com/AdaiCoffee/p/11369699.html
Copyright © 2020-2023  润新知