perf/ring_buffer: Fix exposing a temporarily decreased data_head [Linux 4.14.129]

perf/ring_buffer: Fix exposing a temporarily decreased data_head [Linux 4.14.129]

This Linux kernel change "perf/ring_buffer: Fix exposing a temporarily decreased data_head" is included in the Linux 4.14.129 release. This change is authored by Yabin Cui <yabinc [at]> on Fri May 17 13:52:31 2019 +0200. The commit for this change in Linux stable tree is 9e2de43 (patch) which is from upstream commit 1b038c6. The same Linux upstream change may have been applied to various maintained Linux releases and you can find all Linux releases containing changes from upstream 1b038c6.

perf/ring_buffer: Fix exposing a temporarily decreased data_head

[ Upstream commit 1b038c6e05ff70a1e66e3e571c2e6106bdb75f53 ]

In perf_output_put_handle(), an IRQ/NMI can happen in below location and
write records to the same ring buffer:

    ...                          <-- an IRQ/NMI can happen here
    rb->user_page->data_head = head;

In this case, a value A is written to data_head in the IRQ, then a value
B is written to data_head after the IRQ. And A > B. As a result,
data_head is temporarily decreased from A to B. And a reader may see
data_head < data_tail if it read the buffer frequently enough, which
creates unexpected behaviors.

This can be fixed by moving dec(&rb->nest) to after updating data_head,
which prevents the IRQ/NMI above from updating data_head.

[ Split up by peterz. ]

Signed-off-by: Yabin Cui <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables")
Link:[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>

There are 24 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 kernel/events/ring_buffer.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 489dc6b..fde8532 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -52,11 +52,18 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
    head = local_read(&rb->head);

-    * IRQ/NMI can happen here, which means we can miss a head update.
+    * IRQ/NMI can happen here and advance @rb->head, causing our
+    * load above to be stale.

-   if (!local_dec_and_test(&rb->nest))
+   /*
+    * If this isn't the outermost nesting, we don't have to update
+    * @rb->user_page->data_head.
+    */
+   if (local_read(&rb->nest) > 1) {
+       local_dec(&rb->nest);
        goto out;
+   }

     * Since the mmap() consumer (userspace) can run on a different CPU:
@@ -88,9 +95,18 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
    rb->user_page->data_head = head;

-    * Now check if we missed an update -- rely on previous implied
-    * compiler barriers to force a re-read.
+    * We must publish the head before decrementing the nest count,
+    * otherwise an IRQ/NMI can publish a more recent head value and our
+    * write will (temporarily) publish a stale value.
+    */
+   barrier();
+   local_set(&rb->nest, 0);
+   /*
+    * Ensure we decrement @rb->nest before we validate the @rb->head.
+    * Otherwise we cannot be sure we caught the 'last' nested update.
+   barrier();
    if (unlikely(head != local_read(&rb->head))) {
        goto again;

Leave a Reply

Your email address will not be published. Required fields are marked *