Current task filter checks task->pid which is different for each
thread.  But we want to profile all the threads in the process.  So
let's compare process id (or thread-group id: tgid) instead.
Before:
  $ sudo perf record --off-cpu -- perf bench sched messaging -t
  $ sudo perf report --stat | grep -A1 offcpu
  offcpu-time stats:
            SAMPLE events:        2
After:
  $ sudo perf record --off-cpu -- perf bench sched messaging -t
  $ sudo perf report --stat | grep -A1 offcpu
  offcpu-time stats:
            SAMPLE events:      850
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220811185456.194721-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
                u8 val = 1;
 
                skel->bss->has_task = 1;
+               skel->bss->uses_tgid = 1;
                fd = bpf_map__fd(skel->maps.task_filter);
                pid = perf_thread_map__pid(evlist->core.threads, 0);
                bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
 
 int has_cpu = 0;
 int has_task = 0;
 int has_cgroup = 0;
+int uses_tgid = 0;
 
 const volatile bool has_prev_state = false;
 const volatile bool needs_cgroup = false;
 
        if (has_task) {
                __u8 *ok;
-               __u32 pid = t->pid;
+               __u32 pid;
+
+               if (uses_tgid)
+                       pid = t->tgid;
+               else
+                       pid = t->pid;
 
                ok = bpf_map_lookup_elem(&task_filter, &pid);
                if (!ok)