• TCP接收窗口的调整算法(下)


    本文内容:分析TCP接收窗口的调整算法,主要是接收窗口的调整算法和总结。

    内核版本:3.2.12

    作者:zhangskd @ csdn blog

    接收窗口的调整算法

    经过一系列的前奏,我们终于到了最关键的地方。接下来我们可以看到,接收窗口的大小

    主要取决于剩余的接收缓存,以及接收窗口当前阈值。

    决定接收窗口大小的函数tcp_select_window()在tcp_transmit_skb()中调用,也就是说每次我们要发送数据

    包时,都要使用tcp_select_window()来决定通告的接收窗口大小。

    static int tcp_transmit_skb (struct sock *sk, struct sk_buff *skb, int clone_it, 
                                 gfp_t gfp_mask)
    {
        const struct inet_connection_sock *icsk = inet_csk(sk);
        struct inet_sock *inet;
        struct tcp_sock *tp;
        struct tcp_skb_cb *tcb;
        struct tcphdr *th;
        ...
        /* Build TCP header and checksum it,以下是TCP头的赋值*/
        th = tcp_hdr(skb); /* skb->transport_header */
        th->source = inet->inet_sport;
        th->dest = inet->inet_dport;
        th->seq = htonl(tcb->seq);
        th->ack_seq = htonl(tp->rcv_nxt);
        /* 这个语句可以看出C语言的强大*/
        *(((__be16 *) th) + 6) = htons(((tcp_header_size >> 2) << 12) | tcb->tcp_flags);
        
        if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) {
            /* RFC1323: The window in SYN & SYN/ACK segments in never scaled.
             * 从这里我们可以看到,在三次握手阶段,接收窗口并没有按扩大因子缩放。
              */
            th->window = htons(min(tp->rcv_wnd, 65535U));
    
        } else {
            th->window = htons(tcp_select_window(sk)); /* 更新接收窗口的大小*/
        }
        th->check = 0;
        th->urg_ptr = 0;
        ...
    }
    

    来看下tcp_select_window()。

    注意,接收窗口的返回值只有16位,所以如果不使用窗口扩大选项,那么接收窗口的最大值为65535。

    static u16 tcp_select_window(struct sock *sk)
    {
        struct tcp_sock *tp = tcp_sk(sk);
    
        u32 cur_win = tcp_receive_window(tp); /* 当前接收窗口的剩余大小*/
        u32 new_win = __tcp_select_window(sk); /*根据剩余的接收缓存,计算新的接收窗口的大小 */
    
        /* Never shrink the offered window,不允许缩小已分配的接收窗口*/
        if (new_win < cur_win) {
            /* Danger Will Robinson!
             * Don't update rcv_wup/rcv_wnd here or else
             * we will not be able to advertise a zero window in time. --DaveM
             * Relax Will Robinson.
             */
            new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);
        }
    
        /* 更新接收窗口大小。个人觉得这句代码应该后移,因为此时接收窗口的大小还未最终确定!*/
        tp->rcv_wnd = new_win;
        tp->rcv_wup = tp->rcv_nxt; /* 更新接收窗口的左边界,把未确认的数据累积确认*/
     
        /* 确保接收窗口大小不超过规定的最大值。
          * Make sure we do not exceed the maximum possible scaled window.
         */
        if (! tp->rx_opt.rcv_wscale && sysctl_tcp_workaround_signed_windows)
            /* 不能超过32767,因为一些奇葩协议采用有符号的接收窗口大小*/
            new_win = min(new_win, MAX_TCP_WINDOW); 
    
        else
            new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
     
        /* RFC1323 scaling applied. 按比例因子缩小接收窗口,这样最多能表示30位*/
        new_win >>= tp->rx_opt.rcv_wscale;
     
        /* If we advertise zero window, disable fast path. */
        if (new_win == 0)
            tp->pred_flags = 0;
     
        return new_win; /* 返回最终的接收窗口大小*/
    }

    每次发送一个TCP数据段,都要构建TCP首部,这时会调用tcp_select_window选择接收窗口大小。

    窗口大小选择的基本算法:

    1. 计算当前接收窗口的剩余大小cur_win。

    2. 计算新的接收窗口大小new_win,这个值为剩余接收缓存的3/4,且不能超过rcv_ssthresh。

    3. 取cur_win和new_win中值较大者作为接收窗口大小。

    tcp_workaround_signed_windows

    标识在未启用窗口扩大因子选项时,是否使用初始值不超过32767的TCP窗口,默认值为0(不启用)。

    我们知道在不启用窗口扩大因子选项时,接收窗口有16位,最大值为65535。但是有些很糟糕的协议

    采用的是有符号的窗口大小,所以最大值只能为32767。当然,这种协议并不多见:)。

    @include/net/tcp.h:
    /*
     * Never offer a window over 32767 without using window scaling.
     * Some poor stacks do signed 16bit maths! 
     */
    #define MAX_TCP_WINDOW 32767U
    

    计算当前接收窗口的剩余大小cur_win。

    /* 
     * Compute the actual receive window we are currently advertising.
     * rcv_nxt can be after the window if our peer push more data than
     * the offered window.
     */
    static inline u32 tcp_receive_window (const struct tcp_sock *tp)
    {
        s32 win = tp->rcv_wup + tp->rcv_wnd - tp->rcv_nxt;
     
        if (win < 0)
            win = 0;
    
        return (u32) win;
    }

    详细说明:

    This is calculated as the last advertised window minus unacknowledged data length:

    tp->rcv_wnd - (tp->rcv_nxt - tp->rcv_wup)

    tp->rcv_wup is synced with next byte to be received (tp->rcv_nxt) only when we are sending ACK in

    tcp_select_window(). If there is no unacknowledged bytes, the routine returns the exact receive

    window advertised last.

    计算新的接收窗口大小new_win,这个是关键函数,我们将看到rcv_ssthresh所起的作用。

    /* 
     * calculate the new window to be advertised.
     */
    u32 __tcp_select_window(struct sock *sk)
    {
        struct inet_connection_sock *icsk = inet_csk(sk);
        struct tcp_sock *tp = tcp_sk(sk);
     
        /* MSS for the peer's data. Previous versions used mss_clamp here.
         * I don't know if the value based on our guesses of peer's MSS is better
         * for the performance. It's more correct but may be worse for the performance
         * because of rcv_mss fluctuations. —— SAW 1998/11/1
         */
        int mss = icsk->icsk_ack.rcv_mss;/*这个是估计目前对端有效的发送mss,而不是最大的*/  
        int free_space = tcp_space(sk); /* 剩余接收缓存的3/4 */
        int full_space = min_t(int, tp->window_clamp, tcp_full_space(sk)); /* 总的接收缓存 */
        int window;
     
        if (mss > full_space)
            mss = full_space; /* 减小mss,因为接收缓存太小了*/
     
        /* receive buffer is half full,接收缓存使用一半以上时要小心了 */
        if (free_space < (full_space >> 1)) {
            icsk->icsk_ack.quick = 0; /* 可以快速发送ACK段的数量置零*/
     
            if (tcp_memory_pressure)/*有内存压力时,把接收窗口限制在5840字节以下*/
                tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss);
    
            if (free_space < mss) /* 剩余接收缓存不足以接收mss的数据*/
                return 0;
        }
     
        if (free_space > tp->rcv_ssthresh)
            /* 看!不能超过当前接收窗口阈值,这可以达接收窗口平滑增长的效果*/
            free_space = tp->rcv_ssthresh;  
    
        /* Don't do rounding if we are using window scaling, since the scaled window will
         * not line up with the MSS boundary anyway.
         */
        window = tp->rcv_wnd;
        if (tp->rx_opt.rcv_wscale) { /* 接收窗口扩大因子不为零*/
            window = free_space;
    
            /* Advertise enough space so that it won't get scaled away.
             * Import case: prevent zero window announcement if 1 << rcv_wscale > mss.
             * 防止四舍五入造通告的接收窗口偏小。
              */
            if (((window >> tp->rx_opt.rcv_wscale) << tp->rx_opt.rcv_wscale) != window)
                window =(((window >> tp->rx_opt.rcv_wscale) + 1) << tp->rx_opt.rcv_wscale);
    
        } else {
            /* Get the largest window that is a nice multiple of mss.
             * Window clamp already applied above.
             * If our current window offering is within 1 mss of the free space we just keep it.
             * This prevents the divide and multiply from happening most of the time.
             * We also don't do any window rounding when the free space is too small.
             */
            /* 截取free_space中整数个mss,如果rcv_wnd和free_space的差距在一个mss以上*/
            if (window <= free_space - mss || window > free_space) 
                window = (free_space / mss) * mss;
            /* 如果free space过小,则直接取free space值*/
            else if (mss = full_space && free_space > window + (full_space >> 1))
                window = free_space;
            /* 当free_space -mss < window < free_space时,直接使用rcv_wnd,不做修改*/
        }    
    
        return window;
    } 
    /* 剩余接收缓存的3/4。
     * Note: caller must be prepared to deal with negative returns.
     */
    static inline int tcp_space (const struct sock *sk)
    {
        return tcp_win_from_space(sk->sk_rcvbuf - atomic_read(&sk->sk_rmem_alloc));
    }
    
    static inline int tcp_win_from_space(int space)
    {
        return sysctl_tcp_adv_win_scale <= 0 ? (space >> (-sysctl_tcp_adv_win_scale)) :
            space - (space >> sysctl_tcp_adv_win_scale);
    }
    
    /* 最大的接收缓存的3/4 */
    static inline int tcp_full_space(const struct sock *sk)
    {
        return tcp_win_from_space(sk->sk_rcvbuf);
    }

    总体来说,新的接收窗口大小值为:剩余接收缓存的3/4,但不能超过接收缓存的阈值。

    小结

    接收窗口的调整算法主要涉及:

    (1)window_clamp和sk_rcvbuf的调整,在之前的blog《TCP接收缓存大小的动态调整》中有分析。

    (2)rcv_ssthresh接收窗口当前阈值的动态调整,一般增长2*advmss。

    (3)rcv_wnd接收窗口的动态调整,一般为min(3/4 free space in sk_rcvbuf, rcv_ssthresh)。

    如果剩余的接收缓存够大,rcv_wnd受限于rcv_ssthresh。这个时候每收到一个大的数据包,rcv_wnd就增大

    2920字节(由于缩放原因这个值可能波动)。这就像慢启动一样,接收窗口指数增长。

    接收窗口当然不能无限制增长,当它增长到一定大小时,就会受到一系列因素的限制,比如window_clamp和

    sk_rcvbuf,或者剩余接收缓存区大小。

    当应用程序读取接收缓冲区数据不够快时,或者发生了丢包时,接收窗口会变小,这主要受限于剩余的接收缓存

    的大小。

    总的来说,接收窗口的调整算法涉及到一些变量,由于这些变量本身又是动态变化的,所以分析起来比较复杂,

    笔者也还需要再进行深入了解:)

  • 相关阅读:
    Linux系统备份与恢复
    CentOS7修改设置静态IP和DNS
    CentOS系统基础优化16条知识汇总
    CentOS英文提示修改为中文提示的方法
    CentOS修改主机名和网络信息
    CentOS 7系统查看系统版本和机器位数
    Linux下设置SSH Server设置时间链接限制
    查看Linux下系统资源占用常用命令(top、free、uptime)
    查看CentOS系统运行了多久使用uptime命令
    设计模式(七)学习----命令模式
  • 原文地址:https://www.cnblogs.com/aiwz/p/6333354.html
Copyright © 2020-2023  润新知