From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 22 Jun 2016 13:14:26 +0000 (+0200)
Subject: sched/fair: Initialize and rework throttle_count for new task-groups
X-Git-Tag: v4.1.12-124.31.3~881
X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=f2b8303c04f14cd682bc415ea7dc041b864a53ff;p=users%2Fjedix%2Flinux-maple.git

sched/fair: Initialize and rework throttle_count for new task-groups

This patch is a combination of the following three patches from mainline:

094f469172e0 sched/fair: Initialize throttle_count for new task-groups lazily

Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

8663e24d56dc sched/fair: Reorder cgroup creation code

A future patch needs rq->lock held _after_ we link the task_group into
the hierarchy. In order to avoid taking every rq->lock twice, reorder
things a little and create online_fair_sched_group() to be called
after we link the task_group.

All this code is still ran from css_alloc() so css_online() isn't in
fact used for this.

55e16d30bd99 sched/fair: Rework throttle_count sync

Since we already take rq->lock when creating a cgroup, use it to also
sync the throttle_count and avoid the extra state and enqueue path
branch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: linux-kernel@vger.kernel.org
[ Fixed build warning. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

The patches have been combined because applying them separately will
cause a KABI breakage and introduce a dummy function.

Orabug: 27787518

Conflicts:
	kernel/sched/fair.c
	kernel/sched/core.c
	kernel/sched/sched.h

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Mridula Shastry <mridula.c.shastry@oracle.com>
---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index db133e1c96ab..32e91382ee1e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7549,6 +7549,8 @@ void sched_online_group(struct task_group *tg, struct task_group *parent)
 	INIT_LIST_HEAD(&tg->children);
 	list_add_rcu(&tg->siblings, &parent->children);
 	spin_unlock_irqrestore(&task_group_lock, flags);
+
+	online_fair_sched_group(tg);
 }
 
 /* rcu callback to free various structures associated with a task group */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e25be521f06f..34a64a85c753 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3846,6 +3846,23 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
 		throttle_cfs_rq(cfs_rq);
 }
 
+static void sync_throttle(struct task_group *tg, int cpu)
+{
+	struct cfs_rq *pcfs_rq, *cfs_rq;
+
+	if (!cfs_bandwidth_used())
+		return;
+
+	if (!tg->parent)
+		return;
+
+	cfs_rq = tg->cfs_rq[cpu];
+	pcfs_rq = tg->parent->cfs_rq[cpu];
+
+	cfs_rq->throttle_count = pcfs_rq->throttle_count;
+	pcfs_rq->throttled_clock_task = rq_clock_task(cpu_rq(cpu));
+}
+
 /* conditionally throttle active cfs_rq's from put_prev_entity() */
 static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 {
@@ -3998,6 +4015,7 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
 static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) {}
 static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { return false; }
 static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {}
+static inline void sync_throttle(struct task_group *tg, int cpu) {}
 static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
 
 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
@@ -8275,6 +8293,20 @@ err:
 	return 0;
 }
 
+void online_fair_sched_group(struct task_group *tg)
+{
+	struct rq *rq;
+	int i;
+
+	for_each_possible_cpu(i) {
+		rq = cpu_rq(i);
+
+		raw_spin_lock_irq(&rq->lock);
+		sync_throttle(tg, i);
+		raw_spin_unlock_irq(&rq->lock);
+	}
+}
+
 void unregister_fair_sched_group(struct task_group *tg, int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -8371,6 +8403,8 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
 	return 1;
 }
 
+void online_fair_sched_group(struct task_group *tg) { }
+
 void unregister_fair_sched_group(struct task_group *tg, int cpu) { }
 
 #endif /* CONFIG_FAIR_GROUP_SCHED */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b5116cc26655..0602d6513c77 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -314,6 +314,7 @@ extern int tg_nop(struct task_group *tg, void *data);
 
 extern void free_fair_sched_group(struct task_group *tg);
 extern int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent);
+extern void online_fair_sched_group(struct task_group *tg);
 extern void unregister_fair_sched_group(struct task_group *tg, int cpu);
 extern void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
 			struct sched_entity *se, int cpu,