tcp: be more careful in tcp_fragment() [Linux 4.4.189]

This Linux kernel change "tcp: be more careful in tcp_fragment()" is included in the Linux 4.4.189 release. This change is authored by Eric Dumazet <edumazet [at] google.com> on Tue Aug 6 17:09:14 2019 +0200. The commit for this change in Linux stable tree is 5917ca4 (patch) which is from upstream commit b617158. The same Linux upstream change may have been applied to various maintained Linux releases and you can find all Linux releases containing changes from upstream b617158.

tcp: be more careful in tcp_fragment()

[ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ]

Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.

We should allow these flows to make progress.

This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.

It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15

Note for < 4.15 backports :
 tcp_rtx_queue_tail() will probably look like :

static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
    struct sk_buff *skb = tcp_send_head(sk);

    return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}

Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

There are 28 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 include/net/tcp.h     | 17 +++++++++++++++++
 net/ipv4/tcp_output.c | 11 ++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 77438a8..0410fd2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1526,6 +1526,23 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli
        tcp_sk(sk)->highest_sack = NULL;
 }

+static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk)
+{
+   struct sk_buff *skb = tcp_write_queue_head(sk);
+
+   if (skb == tcp_send_head(sk))
+       skb = NULL;
+
+   return skb;
+}
+
+static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
+{
+   struct sk_buff *skb = tcp_send_head(sk);
+
+   return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
+}
+
 static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
 {
    __skb_queue_tail(&sk->sk_write_queue, skb);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 53edd60..76ffce0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1151,6 +1151,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
    struct tcp_sock *tp = tcp_sk(sk);
    struct sk_buff *buff;
    int nsize, old_factor;
+   long limit;
    int nlen;
    u8 flags;

@@ -1161,7 +1162,15 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
    if (nsize < 0)
        nsize = 0;

-   if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 0x20000)) {
+   /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb.
+    * We need some allowance to not penalize applications setting small
+    * SO_SNDBUF values.
+    * Also allow first and last skb in retransmit queue to be split.
+    */
+   limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
+   if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
+            skb != tcp_rtx_queue_head(sk) &&
+            skb != tcp_rtx_queue_tail(sk))) {
        NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
        return -ENOMEM;
    }

Leave a Reply

Your email address will not be published. Required fields are marked *