tcp: remove empty skb from write queue in error cases [Linux 4.19.72]

This Linux kernel change "tcp: remove empty skb from write queue in error cases" is included in the Linux 4.19.72 release. This change is authored by Eric Dumazet <edumazet [at] google.com> on Mon Aug 26 09:19:15 2019 -0700. The commit for this change in Linux stable tree is 5977bc1 (patch) which is from upstream commit fdfc5c8. The same Linux upstream change may have been applied to various maintained Linux releases and you can find all Linux releases containing changes from upstream fdfc5c8.

tcp: remove empty skb from write queue in error cases

[ Upstream commit fdfc5c8594c24c5df883583ebd286321a80e0a67 ]

Vladimir Rutsky reported stuck TCP sessions after memory pressure
events. Edge Trigger epoll() user would never receive an EPOLLOUT
notification allowing them to retry a sendmsg().

Jason tested the case of sk_stream_alloc_skb() returning NULL,
but there are other paths that could lead both sendmsg() and sendpage()
to return -1 (EAGAIN), with an empty skb queued on the write queue.

This patch makes sure we remove this empty skb so that
Jason code can detect that the queue is empty, and
call sk->sk_write_space(sk) accordingly.

Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

There are 29 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 net/ipv4/tcp.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b7ef367..611ba17 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -934,6 +934,22 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
    return mss_now;
 }

+/* In some cases, both sendpage() and sendmsg() could have added
+ * an skb to the write queue, but failed adding payload on it.
+ * We need to remove it to consume less memory, but more
+ * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
+ * users.
+ */
+static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
+{
+   if (skb && !skb->len) {
+       tcp_unlink_write_queue(skb, sk);
+       if (tcp_write_queue_empty(sk))
+           tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+       sk_wmem_free_skb(sk, skb);
+   }
+}
+
 ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
             size_t size, int flags)
 {
@@ -1056,6 +1072,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
    return copied;

 do_error:
+   tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk));
    if (copied)
        goto out;
 out_err:
@@ -1409,17 +1426,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
    sock_zerocopy_put(uarg);
    return copied + copied_syn;

+do_error:
+   skb = tcp_write_queue_tail(sk);
 do_fault:
-   if (!skb->len) {
-       tcp_unlink_write_queue(skb, sk);
-       /* It is the one place in all of TCP, except connection
-        * reset, where we can be unlinking the send_head.
-        */
-       tcp_check_send_head(sk, skb);
-       sk_wmem_free_skb(sk, skb);
-   }
+   tcp_remove_empty_skb(sk, skb);

-do_error:
    if (copied + copied_syn)
        goto out;
 out_err:

Leave a Reply

Your email address will not be published. Required fields are marked *