perf/core: Fix pmu::filter_match for SW-led groups

This change “perf/core: Fix pmu::filter_match for SW-led groups” in Linux kernel is authored by Mark Rutland <mark.rutland [at]> on Tue Jun 14 16:10:41 2016 +0100.

perf/core: Fix pmu::filter_match for SW-led groups

The following commit:

  66eb579e66ec ("perf: allow for PMU-specific event filtering")

added the pmu::filter_match() callback. This was intended to
avoid HW constraints on events from resulting in extremely
pessimistic scheduling.

However, pmu::filter_match() is only called for the leader of each event
group. When the leader is a SW event, we do not filter the groups, and
may fail at pmu::add() time, and when this happens we'll give up on
scheduling any event groups later in the list until they are rotated
ahead of the failing group.

This can result in extremely sub-optimal event scheduling behaviour,
e.g. if running the following on a big.LITTLE platform:

$ taskset -c 0 ./perf stat 
 -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' 
 -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' 

     <not counted>      context-switches                                              (0.00%)
     <not counted>      armv8_cortex_a57/config=0x11/                                 (0.00%)
                24      context-switches                                              (37.36%)
          57589154      armv8_cortex_a53/config=0x11/                                 (37.36%)

Here the 'a53' event group was always eligible to be scheduled, but
the 'a57' group never eligible to be scheduled, as the task was always
affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57'
group was eligible, but the HW event failed at pmu::add() time,
resulting in ctx_flexible_sched_in giving up on scheduling further
groups with HW events.

One way of avoiding this is to check pmu::filter_match() on siblings
as well as the group leader. If any of these fail their
pmu::filter_match() call, we must skip the entire group before
attempting to add any events.

Signed-off-by: Mark Rutland <>
Signed-off-by: Peter Zijlstra (Intel) <>
Cc: Alexander Shishkin <>
Cc: Arnaldo Carvalho de Melo <>
Cc: Arnaldo Carvalho de Melo <>
Cc: Jiri Olsa <>
Cc: Linus Torvalds <>
Cc: Peter Zijlstra <>
Cc: Thomas Gleixner <>
Cc: Will Deacon <>
Fixes: 66eb579e66ec ("perf: allow for PMU-specific event filtering")
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <>

This Linux change may have been applied to various maintained Linux releases and you can find Linux releases including commit 2c81a64.

There are 23 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 kernel/events/core.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 85cd418..43d43a2d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1678,12 +1678,33 @@ static bool is_orphaned_event(struct perf_event *event)
 	return event->state == PERF_EVENT_STATE_DEAD;
-static inline int pmu_filter_match(struct perf_event *event)
+static inline int __pmu_filter_match(struct perf_event *event)
 	struct pmu *pmu = event->pmu;
 	return pmu->filter_match ? pmu->filter_match(event) : 1;
+ * Check whether we should attempt to schedule an event group based on
+ * PMU-specific filtering. An event group can consist of HW and SW events,
+ * potentially with a SW leader, so we must check all the filters, to
+ * determine whether a group is schedulable:
+ */
+static inline int pmu_filter_match(struct perf_event *event)
+	struct perf_event *child;
+	if (!__pmu_filter_match(event))
+		return 0;
+	list_for_each_entry(child, &event->sibling_list, group_entry) {
+		if (!__pmu_filter_match(child))
+			return 0;
+	}
+	return 1;
 static inline int
 event_filter_match(struct perf_event *event)

The commit for this change in Linux stable tree is 2c81a64 (patch).

Leave a Reply

Your email address will not be published. Required fields are marked *