]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
8 years agoperf bpf: Tighten detection of BPF events
Andi Kleen [Fri, 11 Aug 2017 23:26:19 +0000 (16:26 -0700)]
perf bpf: Tighten detection of BPF events

  perf stat -e cpu/uops_executed.core,cmask=1/

would be detected as a BPF source event because the .c matches the .c
source BPF pattern.

v2:

Originally I tried to use lex lookahead, but it doesn't seem to work.

This now extends the BPF pattern to match longer events, but then does
an extra check in the C code to reject BPF matches that do not end with
.c/.o/.obj

This uses REJECT, which makes the flex scanner slower, but that
shouldn't be a big problem for the perf events.

Committer testing:

  # perf trace -e write -e /home/acme/bpf/tracepoint.c cat /etc/passwd > /dev/null
     0.000 ( 0.006 ms): cat/18485 write(fd: 1, buf: 0x7f59eebe1000, count: 3494                         ) ...
     0.006 (         ): raw_syscalls:sys_enter:NR 1 (1, 7f59eebe1000, da6, 22, 7f59eebe0010, 0))
     0.008 (         ): perf_bpf_probe:_write:(ffffffff9626b2c0))
     0.000 ( 0.010 ms): cat/18485  ... [continued]: write()) = 3494
  #

It continues doing what was expected, i.e. identifying
/home/acme/bpf/tracepoint.c as a BPF event and activates the clang
machinery to build an eBPF object and then uses sys_bpf() to hook it up
to the raw_syscalls:sys_enter tracepoint, etc.

Andi forgot to add Wang to the CC list, fix it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170811232634.30465-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf evsel: Fix buffer overflow while freeing events
Andi Kleen [Fri, 11 Aug 2017 23:26:17 +0000 (16:26 -0700)]
perf evsel: Fix buffer overflow while freeing events

Fix buffer overflow for:

  % perf stat -e msr/tsc/,cstate_core/c7-residency/ true

that causes glibc free list corruption. For some reason it doesn't
trigger in valgrind, but it is visible in AS:

  =================================================================
  ==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
  READ of size 4 at 0x603000003f5c thread T0
    #0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
    #1 0x56c57a in perf_evsel__close util/evsel.c:1717
    #2 0x55ed5f in perf_evlist__close util/evlist.c:1631
    #3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
    #4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
    #5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
    #6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
    #7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
    #8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
    #9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
    #10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
    #11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)

  0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
  allocated by thread T0 here:
    #0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
    #1 0x648a2d in zalloc util/util.h:23
    #2 0x648a88 in xyarray__new util/xyarray.c:9
    #3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
    #4 0x56b427 in perf_evsel__open util/evsel.c:1529
    #5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
    #6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
    #7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
    #8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
    #9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
    #10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
    #11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
    #12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
    #13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
    #14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)

The event is allocated with cpus == 1, but freed with cpus == real number
When the evsel close function walks the file descriptors it exceeds the
fd xyarray boundaries and reads random memory.

v2:

Now that xyarrays save their original dimensions we can use these to
iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
evsel.c to use these fields directly. This allows simplifying the code
and dropping quite a few function arguments. Adjust all callers by
removing the unneeded arguments.

The actual perf event reading still uses the original values from the
evsel list.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-2-andi@firstfloor.org
[ Fix up xy_max_[xy]() -> xyarray__max_[xy]() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf xyarray: Save max_x, max_y
Andi Kleen [Fri, 11 Aug 2017 23:26:16 +0000 (16:26 -0700)]
perf xyarray: Save max_x, max_y

Save the original array dimensions in xyarrays, so that users can
retrieve them later. Add some inline functions to access these fields.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-1-andi@firstfloor.org
[ As noticed by Jiri, fix up namespacing: xy__method() -> xyarray__method() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge tag 'perf-core-for-mingo-4.14-20170821' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Tue, 22 Aug 2017 10:16:39 +0000 (12:16 +0200)]
Merge tag 'perf-core-for-mingo-4.14-20170821' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

- Support --show-nr-samples in annotate's --stdio and --tui, using
  the existing 't' toggle to circulate 'percent', 'total-period' and
  'nr-samples' as the first column (Taeung Song)

- Support FCMask and PortMask in JSON vendor events (Andi Kleen)

- Fix off by one string allocation problem in 'perf trace' (Arnaldo Carvalho de Melo)

- Use just one parse events state struct in yyparse(), fixing one
  reported segfault when a routine received a different data struct,
  smaller than the one it expected to use (Arnaldo Carvalho de Melo)

- Remove unused cpu_relax() macros, they stopped being used when
  tools/perf lived in Documentation/ (Arnaldo Carvalho de Melo)

- Fix double file test in libbpf's Makefile (Daniel Díaz):

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf annotate browser: Circulate percent, total-period and nr-samples view
Taeung Song [Fri, 18 Aug 2017 08:47:08 +0000 (17:47 +0900)]
perf annotate browser: Circulate percent, total-period and nr-samples view

Using the existing 't' hotkey, support the three views: percent, total
period and number of samples on the annotate TUI browser, circulating
them like below:

  Percent -> Total Period -> Nr Samples -> Percent ...

Committer notes:

Removed new 'e' hotkey, should be resubmitted as a separate patch, with
proper justification for its inclusion.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Link: http://lkml.kernel.org/r/1503046028-5691-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf annotate browser: Support --show-nr-samples option
Taeung Song [Fri, 18 Aug 2017 08:47:03 +0000 (17:47 +0900)]
perf annotate browser: Support --show-nr-samples option

Support the --show-nr-samples in the TUI browser.

Committer notes:

Lift the restriction about --tui but leave it for --gtk:

  $ export LD_LIBRARY_PATH=~/lib64
  $ perf annotate --gtk --show-nr-samples --show-nr-samples is not available in --gtk mode at this time
  $

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1503046023-5646-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf annotate: Document --show-total-period option
Taeung Song [Fri, 18 Aug 2017 08:46:53 +0000 (17:46 +0900)]
perf annotate: Document --show-total-period option

When the --show-total-period option was introduced we forgot to add an
entry in the man page, fix it.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Fixes: 0c4a5bcea460 ("perf annotate: Display total number of samples with --show-total-period")
Link: http://lkml.kernel.org/r/1503046013-5555-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf annotate stdio: Support --show-nr-samples option
Taeung Song [Fri, 18 Aug 2017 08:46:48 +0000 (17:46 +0900)]
perf annotate stdio: Support --show-nr-samples option

Add --show-nr-samples option to "perf annotate" so that it matches "perf
report".

Committer note:

Note that it can't be used together with --show-total-period, which
seems like a silly limitation, that can be lifted at some point.

Made it bail out if not on --stdio.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1503046008-5511-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Use default CPUINFO_PROC where it fits
Arnaldo Carvalho de Melo [Thu, 17 Aug 2017 19:58:21 +0000 (16:58 -0300)]
perf tools: Use default CPUINFO_PROC where it fits

Several architectures don't need to define it since the string is the
same as the default one, so nuke them.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-v1e1jr1u474w9xcelpaoxamu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Remove unused cpu_relax() macros
Arnaldo Carvalho de Melo [Thu, 17 Aug 2017 19:44:36 +0000 (16:44 -0300)]
perf tools: Remove unused cpu_relax() macros

Since 195564390210 ("perf_counter: kerneltop: simplify data_head read")
we do not use it, and this was way back in 2009, remove it before some
other arch maintainer adds its implementation, like so many did,
needlessly :-)

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3l2su9c58eaq4twjzrf9uu08@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf events parse: Rename parse_events_parse arguments
Arnaldo Carvalho de Melo [Thu, 17 Aug 2017 19:13:34 +0000 (16:13 -0300)]
perf events parse: Rename parse_events_parse arguments

Calling them just "data" is too vague, call it 'perf_state', to make it
clearer, for instance, when looking at patch hunks.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-rnhk5yb05wem77rjpclrh7so@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf events parse: Use just one parse events state struct
Arnaldo Carvalho de Melo [Thu, 17 Aug 2017 17:22:50 +0000 (14:22 -0300)]
perf events parse: Use just one parse events state struct

Andi reported problems when parse errors were detected with vendor
events (json), because in the yyparse/parse_events_parse function we
dereferenced the _data parameter to two different structs, with
different layouts, which ended up making parse_events_evlist->error to
point to random stack addresses.

Fix it by making _data to always be struct parse_events_state, changing
the only place where 'struct parse_events_term' was used in
parse_events.y.

Reported-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-bc27lshz823hxl8n9nkelcgh@git.kernel.org
Fixes: 90e2b22dee90 ("perf/tool: Add support to reuse event grammar to parse out terms")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf events parse: Rename parsing state struct to clearer name
Arnaldo Carvalho de Melo [Thu, 17 Aug 2017 17:13:25 +0000 (14:13 -0300)]
perf events parse: Rename parsing state struct to clearer name

Rename it from 'parse_events_evlist' to 'parse_events_state' to better
state that this is parsing state that has to be passed around.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-dursqtg2h2w98ztaa297u43x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf events parse: Remove some needless local variables
Arnaldo Carvalho de Melo [Thu, 17 Aug 2017 18:57:30 +0000 (15:57 -0300)]
perf events parse: Remove some needless local variables

Those are just casting a void pointer to a struct to then pass them to
functions, i.e. remove the local variables and pass the void pointer
directly, the casting will be done and the code will be shorter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-bzfodzr3mb46gy7u7v0mqad6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf trace: Fix off by one string allocation problem
Arnaldo Carvalho de Melo [Thu, 17 Aug 2017 18:12:55 +0000 (15:12 -0300)]
perf trace: Fix off by one string allocation problem

We need to consider the null terminator, oops, fix it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 017037ff3d0b ("perf trace: Allow specifying list of syscalls and events in -e/--expr/--event")
Link: http://lkml.kernel.org/n/tip-j79jpqqe91gvxqmsgxgfn2ni@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf jevents: Support FCMask and PortMask
Andi Kleen [Wed, 16 Aug 2017 22:02:00 +0000 (15:02 -0700)]
perf jevents: Support FCMask and PortMask

Skylake server uncore IIO events need new FCMask/PortMask fields. Support
those in the json parser and pass it through as a filter.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170816220201.19182-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotools lib bpf: Fix double file test in Makefile
Daniel Díaz [Tue, 15 Aug 2017 16:33:30 +0000 (11:33 -0500)]
tools lib bpf: Fix double file test in Makefile

The Makefile verifies the same file exists twice:
  test -f ../../../include/uapi/linux/bpf.h -a \
       -f ../../../include/uapi/linux/bpf.h

The purpose of the check is to ensure the diff (immediately after the
test) doesn't fail with these two files:

  tools/include/uapi/linux/bpf.h
  include/uapi/linux/bpf.h

Same recipe for bpf_common:
  test -f ../../../include/uapi/linux/bpf_common.h -a \
       -f ../../../include/uapi/linux/bpf_common.h

This corrects the location of the tests.

Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1502814810-960-1-git-send-email-daniel.diaz@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge tag 'perf-core-for-mingo-4.14-20170816' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Thu, 17 Aug 2017 07:41:56 +0000 (09:41 +0200)]
Merge tag 'perf-core-for-mingo-4.14-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf core improvements and fixes:

New features:

- Support exporting Intel PT data to sqlite3 with python perf scripts,
  this is in addition to the postgresql support that was already there (Adrian Hunter)

Infrastructure changes:

- Handle perf tool builds with less features in perf shell tests, such
  as those with NO_LIBDWARF=1 or even without 'perf probe' (Arnaldo Carvalho de Melo)

- Replace '|&' with '2>&1 |' to work with more shells in the just
  introduced perf test shell harness (Kim Phillips)

Architecture related fixes:

- Fix endianness problem when loading parameters in the BPF prologue
  generated by perf, noticed using 'perf test BPF' in s390x systems (Wang Nan, Thomas Richter)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge branch 'linus' into perf/core, to pick up fixes
Ingo Molnar [Thu, 17 Aug 2017 07:41:41 +0000 (09:41 +0200)]
Merge branch 'linus' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Thu, 17 Aug 2017 00:21:20 +0000 (17:21 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "A couple of minor fixes (st, ses) and some bigger driver fixes for
  qla2xxx (crash triggered by fw dump) and ipr (lockdep problems with
  mq)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ses: Fix wrong page error
  scsi: ipr: Fix scsi-mq lockdep issue
  scsi: st: fix blk_get_queue usage
  scsi: qla2xxx: Fix system crash while triggering FW dump

8 years agoMerge tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoor...
Linus Torvalds [Wed, 16 Aug 2017 23:48:34 +0000 (16:48 -0700)]
Merge tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit fixes from Paul Moore:
 "Two small fixes to the audit code, both explained well in the
  respective patch descriptions, but the quick summary is one
  use-after-free fix, and one silly fanotify notification flag fix"

* tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: Receive unmount event
  audit: Fix use after free in audit_remove_watch_rule()

8 years agoperf test shell: Replace '|&' with '2>&1 |' to work with more shells
Kim Phillips [Wed, 16 Aug 2017 19:20:40 +0000 (16:20 -0300)]
perf test shell: Replace '|&' with '2>&1 |' to work with more shells

Since we do not specify bash (and/or zsh) as a requirement, use the
standard error redirection that is more widely supported.

Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Link: http://lkml.kernel.org/n/tip-ji5mhn3iilgch3eaay6csr6z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf bpf: Fix endianness problem when loading parameters in prologue
Wang Nan [Tue, 15 Aug 2017 09:21:59 +0000 (11:21 +0200)]
perf bpf: Fix endianness problem when loading parameters in prologue

Perf's BPF prologue generator unconditionally fetches 8 bytes for
function parameters, which causes problems on big endian machines. Thomas
gives a detailed analysis for this problem:

 http://lkml.kernel.org/r/968ebda5-abe4-8830-8d69-49f62529d151@linux.vnet.ibm.com

 ---- 8< ----
  I investigated perf test BPF for s390x and have a question regarding
  the 38.3 subtest (bpf-prologue test) which fails on s390x.

  When I turn on trace_printk in tests/bpf-script-test-prologue.c
  I see this output in /sys/kernel/debug/tracing/trace:

  [root@s8360047 perf]# cat /sys/kernel/debug/tracing/trace
  perf-30229 [000] d..2 170161.535791: : f_mode 2001d00000000 offset:0 orig:0
  perf-30229 [000] d..2 170161.535809: : f_mode 6001f00000000 offset:0 orig:0
  perf-30229 [000] d..2 170161.535815: : f_mode 6001f00000000 offset:1 orig:0
  perf-30229 [000] d..2 170161.535819: : f_mode 2001d00000000 offset:1 orig:0
  perf-30229 [000] d..2 170161.535822: : f_mode 2001d00000000 offset:2 orig:1
  perf-30229 [000] d..2 170161.535825: : f_mode 6001f00000000 offset:2 orig:1
  perf-30229 [000] d..2 170161.535828: : f_mode 6001f00000000 offset:3 orig:1
  perf-30229 [000] d..2 170161.535832: : f_mode 2001d00000000 offset:3 orig:1
  perf-30229 [000] d..2 170161.535835: : f_mode 2001d00000000 offset:4 orig:0
  perf-30229 [000] d..2 170161.535841: : f_mode 6001f00000000 offset:4 orig:0

  [...]

  There are 3 parameters the eBPF program tests/bpf-script-test-prologue.c
  accesses: f_mode (member of struct file at offset 140) offset and orig.  They
  are parameters of the lseek() system call triggered in this test case in
  function llseek_loop().

  What is really strange is the value of f_mode. It is an 8 byte value, whereas
  in the probe event it is defined as a 4 byte value.  The lower 4 bytes are all
  zero and do not belong to member f_mode.  The correct value should be 2001d for
  read-only and 6001f for read-write open mode.

  Here is the output of the 'perf test -vv bpf' trace:
  Try to find probe point from debuginfo.
  Matched function: null_lseek [2d9310d]
   Probe point found: null_lseek+0
  Searching 'file' variable in context.
  Converting variable file into trace event.
  converting f_mode in file
  f_mode type is unsigned int.
  Opening /sys/kernel/debug/tracing//README write=0
  Searching 'offset' variable in context.
  Converting variable offset into trace event.
  offset type is long long int.
  Searching 'orig' variable in context.
  Converting variable orig into trace event.
  orig type is int.
  Found 1 probe_trace_events.
  Opening /sys/kernel/debug/tracing//kprobe_events write=1
  Writing event: p:perf_bpf_probe/func _text+8794224 f_mode=+140(%r2):x32
 ---- 8< ----

This patch parses the type of each argument and converts data from memory to
expected type.

Now the test runs successfully on 4.13.0-rc5:

  [root@s8360046 perf]# ./perf test  bpf
  38: BPF filter                                 :
  38.1: Basic BPF filtering                      : Ok
  38.2: BPF pinning                              : Ok
  38.3: BPF prologue generation                  : Ok
  38.4: BPF relocation checker                   : Ok
  [root@s8360046 perf]#

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20170815092159.31912-1-tmricht@linux.vnet.ibm.com
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 16 Aug 2017 01:52:28 +0000 (18:52 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel
    Grumbach.

 2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From
    Vivien Didelot.

 3) Fix two regressions with bonding on wireless, from Andreas Born.

 4) Fix build when busypoll is disabled, from Daniel Borkmann.

 5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet.

 6) Set SKB cached route properly in inet_rtm_getroute(), from Florian
    Westphal.

 7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding
    Tianhong.

 8) Fix module refcnt leak in ULP code, from Sabrina Dubroca.

 9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric
    Dumazet.

10) Need to purge socket write queue in dccp_destroy_sock(), also from
    Eric Dumazet.

11) Make bpf_trace_printk() work properly on 32-bit architectures, from
    Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  bpf: fix bpf_trace_printk on 32 bit archs
  PCI: fix oops when try to find Root Port for a PCI device
  sfc: don't try and read ef10 data on non-ef10 NIC
  net_sched: remove warning from qdisc_hash_add
  net_sched/sfq: update hierarchical backlog when drop packet
  net_sched: reset pointers to tcf blocks in classful qdiscs' destructors
  ipv4: fix NULL dereference in free_fib_info_rcu()
  net: Fix a typo in comment about sock flags.
  ipv6: fix NULL dereference in ip6_route_dev_notify()
  tcp: fix possible deadlock in TCP stack vs BPF filter
  dccp: purge write queue in dccp_destroy_sock()
  udp: fix linear skb reception with PEEK_OFF
  ipv6: release rt6->rt6i_idev properly during ifdown
  af_key: do not use GFP_KERNEL in atomic contexts
  tcp: ulp: avoid module refcnt leak in tcp_set_ulp
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  PCI: Disable Relaxed Ordering Attributes for AMD A1100
  PCI: Disable Relaxed Ordering for some Intel processors
  PCI: Disable PCIe Relaxed Ordering if unsupported
  ...

8 years agobpf: fix bpf_trace_printk on 32 bit archs
Daniel Borkmann [Tue, 15 Aug 2017 23:45:33 +0000 (01:45 +0200)]
bpf: fix bpf_trace_printk on 32 bit archs

James reported that on MIPS32 bpf_trace_printk() is currently
broken while MIPS64 works fine:

  bpf_trace_printk() uses conditional operators to attempt to
  pass different types to __trace_printk() depending on the
  format operators. This doesn't work as intended on 32-bit
  architectures where u32 and long are passed differently to
  u64, since the result of C conditional operators follows the
  "usual arithmetic conversions" rules, such that the values
  passed to __trace_printk() will always be u64 [causing issues
  later in the va_list handling for vscnprintf()].

  For example the samples/bpf/tracex5 test printed lines like
  below on MIPS32, where the fd and buf have come from the u64
  fd argument, and the size from the buf argument:

    [...] 1180.941542: 0x00000001: write(fd=1, buf=  (null), size=6258688)

  Instead of this:

    [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)

One way to get it working is to expand various combinations
of argument types into 8 different combinations for 32 bit
and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
as well that it resolves the issue.

Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: James Hogan <james.hogan@imgtec.com>
Tested-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoPCI: fix oops when try to find Root Port for a PCI device
dingtianhong [Tue, 15 Aug 2017 15:24:48 +0000 (23:24 +0800)]
PCI: fix oops when try to find Root Port for a PCI device

Eric report a oops when booting the system after applying
the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):

[    4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[    4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[    4.253011] PGD 0
[    4.253011] P4D 0
[    4.253011]
[    4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[    4.262015] Modules linked in:
[    4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[    4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[    4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000
[    4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[    4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246
[    4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006
[    4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000
[    4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000
[    4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0
[    4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818
[    4.331002] FS:  0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000
[    4.339002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0
[    4.351002] Call Trace:
[    4.354012]  ? pci_configure_device+0x19f/0x570
[    4.359002]  ? pci_conf1_read+0xb8/0xf0
[    4.363002]  ? raw_pci_read+0x23/0x40
[    4.366011]  ? pci_read+0x2c/0x30
[    4.370014]  ? pci_read_config_word+0x67/0x70
[    4.374012]  pci_device_add+0x28/0x230
[    4.378012]  ? pci_vpd_f0_read+0x50/0x80
[    4.382014]  pci_scan_single_device+0x96/0xc0
[    4.386012]  pci_scan_slot+0x79/0xf0
[    4.389001]  pci_scan_child_bus+0x31/0x180
[    4.394014]  acpi_pci_root_create+0x1c6/0x240
[    4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[    4.402012]  acpi_pci_root_add+0x2e6/0x400
[    4.406012]  ? acpi_evaluate_integer+0x37/0x60
[    4.411002]  acpi_bus_attach+0xdf/0x200
[    4.415002]  acpi_bus_attach+0x6a/0x200
[    4.418014]  acpi_bus_attach+0x6a/0x200
[    4.422013]  acpi_bus_scan+0x38/0x70
[    4.426011]  acpi_scan_init+0x10c/0x271
[    4.429001]  acpi_init+0x2fa/0x348
[    4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[    4.437001]  do_one_initcall+0x43/0x169
[    4.441001]  kernel_init_freeable+0x1d0/0x258
[    4.445003]  ? rest_init+0xe0/0xe0
[    4.449001]  kernel_init+0xe/0x150

====================== cut here =============================

It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.

Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: don't try and read ef10 data on non-ef10 NIC
Bert Kenward [Tue, 15 Aug 2017 13:55:32 +0000 (14:55 +0100)]
sfc: don't try and read ef10 data on non-ef10 NIC

The MAC stats command takes a port ID, which doesn't exist on
pre-ef10 NICs (5000- and 6000- series). This is extracted from the
NIC specific data; we misinterpret this as the ef10 data structure,
causing us to read potentially unallocated data. With a KASAN kernel
this can cause errors with:
   BUG: KASAN: slab-out-of-bounds in efx_mcdi_mac_stats

Fixes: 0a2ab4d988d7 ("sfc: set the port-id when calling MC_CMD_MAC_STATS")
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet_sched: remove warning from qdisc_hash_add
Konstantin Khlebnikov [Tue, 15 Aug 2017 13:39:05 +0000 (16:39 +0300)]
net_sched: remove warning from qdisc_hash_add

It was added in commit e57a784d8cae ("pkt_sched: set root qdisc
before change() in attach_default_qdiscs()") to hide duplicates
from "tc qdisc show" for incative deivices.

After 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
it triggered when classful qdisc is added to inactive device because
default qdiscs are added before switching root qdisc.

Anyway after commit ea3274695353 ("net: sched: avoid duplicates in
qdisc dump") duplicates are filtered right in dumper.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet_sched/sfq: update hierarchical backlog when drop packet
Konstantin Khlebnikov [Tue, 15 Aug 2017 13:37:04 +0000 (16:37 +0300)]
net_sched/sfq: update hierarchical backlog when drop packet

When sfq_enqueue() drops head packet or packet from another queue it
have to update backlog at upper qdiscs too.

Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet_sched: reset pointers to tcf blocks in classful qdiscs' destructors
Konstantin Khlebnikov [Tue, 15 Aug 2017 13:35:21 +0000 (16:35 +0300)]
net_sched: reset pointers to tcf blocks in classful qdiscs' destructors

Traffic filters could keep direct pointers to classes in classful qdisc,
thus qdisc destruction first removes all filters before freeing classes.
Class destruction methods also tries to free attached filters but now
this isn't safe because tcf_block_put() unlike to tcf_destroy_chain()
cannot be called second time.

This patch set class->block to NULL after first tcf_block_put() and
turn second call into no-op.

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv4: fix NULL dereference in free_fib_info_rcu()
Eric Dumazet [Tue, 15 Aug 2017 12:26:17 +0000 (05:26 -0700)]
ipv4: fix NULL dereference in free_fib_info_rcu()

If fi->fib_metrics could not be allocated in fib_create_info()
we attempt to dereference a NULL pointer in free_fib_info_rcu() :

    m = fi->fib_metrics;
    if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
            kfree(m);

Before my recent patch, we used to call kfree(NULL) and nothing wrong
happened.

Instead of using RCU to defer freeing while we are under memory stress,
it seems better to take immediate action.

This was reported by syzkaller team.

Fixes: 3fb07daff8e9 ("ipv4: add reference counting to metrics")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Fix a typo in comment about sock flags.
Tonghao Zhang [Tue, 15 Aug 2017 11:28:54 +0000 (04:28 -0700)]
net: Fix a typo in comment about sock flags.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: fix NULL dereference in ip6_route_dev_notify()
Eric Dumazet [Tue, 15 Aug 2017 11:09:51 +0000 (04:09 -0700)]
ipv6: fix NULL dereference in ip6_route_dev_notify()

Based on a syzkaller report [1], I found that a per cpu allocation
failure in snmp6_alloc_dev() would then lead to NULL dereference in
ip6_route_dev_notify().

It seems this is a very old bug, thus no Fixes tag in this submission.

Let's add in6_dev_put_clear() helper, as we will probably use
it elsewhere (once available/present in net-next)

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff88019f456680 task.stack: ffff8801c6e58000
RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
 in6_dev_put include/net/addrconf.h:335 [inline]
 ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
 call_netdevice_notifiers net/core/dev.c:1694 [inline]
 rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
 rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
 register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
 register_netdev+0x1a/0x30 net/core/dev.c:7669
 loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
 ops_init+0x10a/0x570 net/core/net_namespace.c:118
 setup_net+0x313/0x710 net/core/net_namespace.c:294
 copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
 create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
 SYSC_unshare kernel/fork.c:2347 [inline]
 SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512c9
RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
ffff8801c6e5f1b0
RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
ffff8801c6e5f1b0
RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
ffff8801c6e5f1b0
---[ end trace e441d046c6410d31 ]---

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoperf script python: Add support for sqlite3 to call-graph-from-sql.py
Adrian Hunter [Thu, 3 Aug 2017 08:31:30 +0000 (11:31 +0300)]
perf script python: Add support for sqlite3 to call-graph-from-sql.py

Add support for SQLite 3 to the call-graph-from-sql.py script. The SQL
statements work as is, so just detect the database type by checking if the
SQLite 3 file exists.

Committer notes:

Tested collecting the PT data on a RHEL7.4, generating the SQLite3
database there and then moving it to a Fedora 26 system where the
call-graph-from-sql.py script was run, using python-pyside version
1.2.2-7fc26 to see the callgraphs using Qt4.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1501749090-20357-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoaudit: Receive unmount event
Jan Kara [Tue, 15 Aug 2017 11:00:37 +0000 (13:00 +0200)]
audit: Receive unmount event

Although audit_watch_handle_event() can handle FS_UNMOUNT event, it is
not part of AUDIT_FS_WATCH mask and thus such event never gets to
audit_watch_handle_event(). Thus fsnotify marks are deleted by fsnotify
subsystem on unmount without audit being notified about that which leads
to a strange state of existing audit rules with dead fsnotify marks.

Add FS_UNMOUNT to the mask of events to be received so that audit can
clean up its state accordingly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
8 years agoaudit: Fix use after free in audit_remove_watch_rule()
Jan Kara [Tue, 15 Aug 2017 11:00:36 +0000 (13:00 +0200)]
audit: Fix use after free in audit_remove_watch_rule()

audit_remove_watch_rule() drops watch's reference to parent but then
continues to work with it. That is not safe as parent can get freed once
we drop our reference. The following is a trivial reproducer:

mount -o loop image /mnt
touch /mnt/file
auditctl -w /mnt/file -p wax
umount /mnt
auditctl -D
<crash in fsnotify_destroy_mark()>

Grab our own reference in audit_remove_watch_rule() earlier to make sure
mark does not get freed under us.

CC: stable@vger.kernel.org
Reported-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
8 years agoMerge tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 15 Aug 2017 19:49:43 +0000 (12:49 -0700)]
Merge tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "This update consists of important compile and run-time error fixes to
  timers/freq-step, kmod, and sysctl tests"

* tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: timers: freq-step: fix compile error
  selftests: futex: fix run_tests target
  test_sysctl: fix sysctl.sh by making it executable
  test_kmod: fix kmod.sh by making it executable

8 years agoperf script python: Rename call-graph-from-postgresql.py to call-graph-from-sql.py
Adrian Hunter [Thu, 3 Aug 2017 08:31:29 +0000 (11:31 +0300)]
perf script python: Rename call-graph-from-postgresql.py to call-graph-from-sql.py

Rename call-graph-from-postgresql.py to call-graph-from-sql.py in
preparation for adding support to it for SQLite 3.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/1501749090-20357-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf script python: Add support for exporting to sqlite3
Adrian Hunter [Thu, 3 Aug 2017 08:31:28 +0000 (11:31 +0300)]
perf script python: Add support for exporting to sqlite3

Add support for exporting to SQLite 3 the same data as the PostgreSQL
export.

Committer note:

Tested on RHEL 7.4 using the 1.2.2-4el python-pyside packages from EPEL.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1501749090-20357-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf scripts python: Fix query in call-graph-from-postgresql.py
Adrian Hunter [Thu, 3 Aug 2017 08:31:27 +0000 (11:31 +0300)]
perf scripts python: Fix query in call-graph-from-postgresql.py

Add a missing space which seemed not to affect PostgreSQL but upsets
SQLite.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/1501749090-20357-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf scripts python: Fix missing call_path_id in export-to-postgresql script
Adrian Hunter [Thu, 3 Aug 2017 08:31:26 +0000 (11:31 +0300)]
perf scripts python: Fix missing call_path_id in export-to-postgresql script

The export does not work if only branches are exported because of a
missing column in the samples table.  Fix by adding the missing
call_path_id.

Fixes: 3521f3bc9dae ("perf script: Update export-to-postgresql to support callchain export")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/1501749090-20357-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge tag 'wireless-drivers-for-davem-2017-08-15' of git://git.kernel.org/pub/scm...
David S. Miller [Tue, 15 Aug 2017 17:19:14 +0000 (10:19 -0700)]
Merge tag 'wireless-drivers-for-davem-2017-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.13

This time quite a few fixes for iwlwifi and one major regression fix
for brcmfmac. For the iwlwifi aggregation bug a small change was
needed for mac80211, but as Johannes is still away the mac80211 patch
is taken via wireless-drivers tree.

brcmfmac

* fix firmware crash (a recent regression in bcm4343{0,1,8}

iwlwifi

* Some simple PCI HW ID fix-ups and additions for family 9000

* Remove a bogus warning message with new FWs (bug #196915)

* Don't allow illegal channel options to be used (bug #195299)

* A fix for checksum offload in family 9000

* A fix serious throughput degradation in 11ac with multiple streams

* An old bug in SMPS where the firmware was not aware of SMPS changes

* Fix a memory leak in the SAR code

* Fix a stuck queue case in AP mode;

* Convert a WARN to a simple debug in a legitimate race case (from
  which we can recover)

* Fix a severe throughput aggregation on 9000-family devices due to
  aggregation issues, needed a small change in mac80211
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoperf test shell vfs_getname: Skip for tools built with NO_LIBDWARF=1
Arnaldo Carvalho de Melo [Mon, 14 Aug 2017 17:40:58 +0000 (14:40 -0300)]
perf test shell vfs_getname: Skip for tools built with NO_LIBDWARF=1

If that is the case, or if the required lib is not present, e.g.
elfutils-devel in Fedora systems, then just skip the tests requiring
DWARF analysis.

Before:

  # rpm -e elfutils-devel
  # perf test ping vfs_getname
  60: Use vfs_getname probe to get syscall args filenames   : FAILED!
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: FAILED!
  63: Add vfs_getname probe to get syscall args filenames   : FAILED!
  #

After:

  # perf test vfs_getname
  60: Use vfs_getname probe to get syscall args filenames   : Skip
  62: Check open filename arg using perf trace + vfs_getname: Skip
  63: Add vfs_getname probe to get syscall args filenames   : Skip
  #

Then, reinstalling elfutils-devel, rebuilding the tool and running
again:

  # perf test vfs_getname
  60: Use vfs_getname probe to get syscall args filenames   : Ok
  62: Check open filename arg using perf trace + vfs_getname: Ok
  63: Add vfs_getname probe to get syscall args filenames   : Ok
  #

Reported-by: Kim Phillips <kim.phillips@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-d67tvn401fxrwr97pu5ihfb1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test shell: Check if 'perf probe' is available, skip tests if not
Arnaldo Carvalho de Melo [Mon, 14 Aug 2017 17:26:09 +0000 (14:26 -0300)]
perf test shell: Check if 'perf probe' is available, skip tests if not

Add a library function that checks if 'perf probe' is built into the
tool being tested, skipping tests that need it.

Testing it on a system after removing the library needed to build
'probe' as a perf subcommand:

  # perf test ping vfs_getname
  59: Use vfs_getname probe to get syscall args filenames   : Skip
  60: probe libc's inet_pton & backtrace it with ping       : Skip
  61: Check open filename arg using perf trace + vfs_getname: Skip
  62: Add vfs_getname probe to get syscall args filenames   : Skip
  # perf probe
  perf: 'probe' is not a perf-command. See 'perf --help'.
  #

Now reinstalling elfutils-libelf-devel on this Fedora 26 system to
rebuild perf and then retest this:

  # perf test ping vfs_getname
  60: Use vfs_getname probe to get syscall args filenames   : Ok
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: Ok
  63: Add vfs_getname probe to get syscall args filenames   : Ok
  #

Reported-by: Kim Phillips <kim.phillips@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ctdck2gzsskqhjzu3ebb62zm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tests shell: Remove duplicate skip_if_no_debuginfo() function
Arnaldo Carvalho de Melo [Mon, 14 Aug 2017 16:53:10 +0000 (13:53 -0300)]
perf tests shell: Remove duplicate skip_if_no_debuginfo() function

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3zxjswdbs2au3ih0rino0iy1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotcp: fix possible deadlock in TCP stack vs BPF filter
Eric Dumazet [Tue, 15 Aug 2017 00:44:43 +0000 (17:44 -0700)]
tcp: fix possible deadlock in TCP stack vs BPF filter

Filtering the ACK packet was not put at the right place.

At this place, we already allocated a child and put it
into accept queue.

We absolutely need to call tcp_child_process() to release
its spinlock, or we will deadlock at accept() or close() time.

Found by syzkaller team (Thanks a lot !)

Fixes: 8fac365f63c8 ("tcp: Add a tcp_filter hook before handle ack packet")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodccp: purge write queue in dccp_destroy_sock()
Eric Dumazet [Mon, 14 Aug 2017 21:10:25 +0000 (14:10 -0700)]
dccp: purge write queue in dccp_destroy_sock()

syzkaller reported that DCCP could have a non empty
write queue at dismantle time.

WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 panic+0x1e4/0x417 kernel/panic.c:180
 __warn+0x1c4/0x1d9 kernel/panic.c:541
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
 inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
 dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
 sock_release+0x8d/0x1e0 net/socket.c:597
 sock_close+0x16/0x20 net/socket.c:1126
 __fput+0x327/0x7e0 fs/file_table.c:210
 ____fput+0x15/0x20 fs/file_table.c:246
 task_work_run+0x18a/0x260 kernel/task_work.c:116
 exit_task_work include/linux/task_work.h:21 [inline]
 do_exit+0xa32/0x1b10 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:969
 get_signal+0x7e8/0x17e0 kernel/signal.c:2330
 do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
 exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
 syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoudp: fix linear skb reception with PEEK_OFF
Al Viro [Mon, 14 Aug 2017 19:31:38 +0000 (21:31 +0200)]
udp: fix linear skb reception with PEEK_OFF

copy_linear_skb() is broken; both of its callers actually
expect 'len' to be the amount we are trying to copy,
not the offset of the end.
Fix it keeping the meanings of arguments in sync with what the
callers (both of them) expect.
Also restore a saner behavior on EFAULT (i.e. preserving
the iov_iter position in case of failure):

The commit fd851ba9caa9 ("udp: harden copy_linear_skb()")
avoids the more destructive effect of the buggy
copy_linear_skb(), e.g. no more invalid memory access, but
said function still behaves incorrectly: when peeking with
offset it can fail with EINVAL instead of copying the
appropriate amount of memory.

Reported-by: Sasha Levin <alexander.levin@verizon.com>
Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
Fixes: fd851ba9caa9 ("udp: harden copy_linear_skb()")
Signed-off-by: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Sasha Levin <alexander.levin@verizon.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: release rt6->rt6i_idev properly during ifdown
Wei Wang [Mon, 14 Aug 2017 17:44:59 +0000 (10:44 -0700)]
ipv6: release rt6->rt6i_idev properly during ifdown

When a dst is created by addrconf_dst_alloc() for a host route or an
anycast route, dst->dev points to loopback dev while rt6->rt6i_idev
points to a real device.
When the real device goes down, the current cleanup code only checks for
dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the
refcount leak on the real device in the above situation.
This patch makes sure to always release the refcount taken on
rt6->rt6i_idev during dst_dev_put().

Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of
dst_free()")
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoaf_key: do not use GFP_KERNEL in atomic contexts
Eric Dumazet [Mon, 14 Aug 2017 17:16:45 +0000 (10:16 -0700)]
af_key: do not use GFP_KERNEL in atomic contexts

pfkey_broadcast() might be called from non process contexts,
we can not use GFP_KERNEL in these cases [1].

This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in
af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
section.

[1] : syzkaller reported :

in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
3 locks held by syzkaller183439/2932:
 #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
 #1:  (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
 #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
 #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
 __might_sleep+0x95/0x190 kernel/sched/core.c:5947
 slab_pre_alloc_hook mm/slab.h:416 [inline]
 slab_alloc mm/slab.c:3383 [inline]
 kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
 skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
 pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
 pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
 dump_sp+0x3d6/0x500 net/key/af_key.c:2685
 xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
 pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
 pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
 pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
 pfkey_process+0x606/0x710 net/key/af_key.c:2814
 pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 ___sys_sendmsg+0x755/0x890 net/socket.c:2035
 __sys_sendmsg+0xe5/0x210 net/socket.c:2069
 SYSC_sendmsg net/socket.c:2080 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2076
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x445d79
RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000

Fixes: ba51b6be38c1 ("net: Fix RCU splat in af_key")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: ulp: avoid module refcnt leak in tcp_set_ulp
Sabrina Dubroca [Mon, 14 Aug 2017 16:04:24 +0000 (18:04 +0200)]
tcp: ulp: avoid module refcnt leak in tcp_set_ulp

__tcp_ulp_find_autoload returns tcp_ulp_ops after taking a reference on
the module. Then, if ->init fails, tcp_set_ulp propagates the error but
nothing releases that reference.

Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'Add-new-PCI_DEV_FLAGS_NO_RELAXED_ORDERING-flag'
David S. Miller [Tue, 15 Aug 2017 05:14:51 +0000 (22:14 -0700)]
Merge branch 'Add-new-PCI_DEV_FLAGS_NO_RELAXED_ORDERING-flag'

Ding Tianhong says:

====================
Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
    some platform which has some issues could use the new flag to indicate
    that it is not safe to enable relaxed ordering attribute, then we need
    to clear the relaxed ordering enable bits in the PCI configuration when
    initializing the device. So add a new second patch to modify the PCI
    initialization code to clear the relaxed ordering enable bit in the
    event that the root complex doesn't want relaxed ordering enabled.

    The third patch was base on the v1's second patch and only be changed
    to query the relaxed ordering enable bit in the PCI configuration space
    to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
    set.

    This version didn't plan to drop the defines for Intel Drivers to use the
    new checking way to enable relaxed ordering because it is not the hardest
    part of the moment, we could fix it in next patchset when this patches
    reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
    If a PCIe device didn't enable the relaxed ordering attribute default,
    we should not do anything in the PCIe configuration, otherwise we
    should check if any of the devices above us do not support relaxed
    ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
    the result if we get a return that indicate that the relaxed ordering
    is not supported we should update our device to disable relaxed ordering
    in configuration space. If the device above us doesn't exist or isn't
    the PCIe device, we shouldn't do anything and skip updating relaxed ordering
    because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and pcie_disable_relaxed_ordering
    according John's suggestion, and modify the description, use the true/false
    as the return value.

    We shouldn't enable relaxed ordering attribute by the setting in the root
    complex configuration space for PCIe device, so fix it for cxgb4.

    Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
    value, and add the check for VF device.

    Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
    Ordering Enable] in PCI Probe() routine, this will break our current
    solution for some platform which has problematic when enable the relaxed
    ordering attribute. According to the latest recommendations, remove the
    enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
    scene, but we agree to leave this problem until we really trigger it.

    Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
    add the acked-by from Alex and Ashok.

    Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

    Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
    bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
    processors based on Broadwell/Haswell microarchitecture has the problem
    with Relaxed Ordering Attribute enabled, so add the whole list Device ID
    from Intel to the patch.

v10: Significant rework based on Bjorn's feedback, reorganize the first 2 patches,
     now the Intel and AMD erratum soc has been divided to the different patches,
     rename the pcie_relaxed_ordering_supported() to pcie_relaxed_ordering_enabled(),
     and no need to check every intervening switch except the root ports, update
     some commits.

v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix the
     funny mistake.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
Casey Leedom [Tue, 15 Aug 2017 03:23:27 +0000 (11:23 +0800)]
net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
Casey Leedom [Tue, 15 Aug 2017 03:23:26 +0000 (11:23 +0800)]
net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoPCI: Disable Relaxed Ordering Attributes for AMD A1100
dingtianhong [Tue, 15 Aug 2017 03:23:25 +0000 (11:23 +0800)]
PCI: Disable Relaxed Ordering Attributes for AMD A1100

Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
Root Port where Upstream Transaction Layer Packets with the Relaxed
Ordering Attribute clear are allowed to bypass earlier TLPs with
Relaxed Ordering set, it would cause Data Corruption, so we need
to disable Relaxed Ordering Attribute when Upstream TLPs to the
Root Port.

Reported-and-suggested-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoPCI: Disable Relaxed Ordering for some Intel processors
dingtianhong [Tue, 15 Aug 2017 03:23:24 +0000 (11:23 +0800)]
PCI: Disable Relaxed Ordering for some Intel processors

According to the Intel spec section 3.9.1 said:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

It means some Intel processors has performance issue when use the Relaxed
Ordering Attribute, so disable Relaxed Ordering for these root port.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoPCI: Disable PCIe Relaxed Ordering if unsupported
dingtianhong [Tue, 15 Aug 2017 03:23:23 +0000 (11:23 +0800)]
PCI: Disable PCIe Relaxed Ordering if unsupported

When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targeted towards these affected root complexes.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the capability for this device.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'md/4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Linus Torvalds [Mon, 14 Aug 2017 20:09:59 +0000 (13:09 -0700)]
Merge tag 'md/4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md

Pull MD fixes from Shaohua Li:
 "Fix several bugs:

   - fix a rcu stall issue introduced in 4.12 (Neil Brown)

   - fix two raid5 cache race conditions (Song Liu)"

* tag 'md/4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  MD: not clear ->safemode for external metadata array
  md/r5cache: fix io_unit handling in r5l_log_endio()
  md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_set
  md: fix test in md_write_start()
  md: always clear ->safemode when md_check_recovery gets the mddev lock.

8 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Mon, 14 Aug 2017 18:35:56 +0000 (11:35 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "Fix an error path bug in ixp4xx as well as a read overrun in
 sha1-avx2"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/sha1 - Fix reads beyond the number of blocks passed
  crypto: ixp4xx - Fix error handling path in 'aead_perform()'

8 years agotipc: avoid inheriting msg_non_seq flag when message is returned
Jon Paul Maloy [Mon, 14 Aug 2017 16:28:49 +0000 (18:28 +0200)]
tipc: avoid inheriting msg_non_seq flag when message is returned

In the function msg_reverse(), we reverse the header while trying to
reuse the original buffer whenever possible. Those rejected/returned
messages are always transmitted as unicast, but the msg_non_seq field
is not explicitly set to zero as it should be.

We have seen cases where multicast senders set the message type to
"NOT dest_droppable", meaning that a multicast message shorter than
one MTU will be returned, e.g., during receive buffer overflow, by
reusing the original buffer. This has the effect that even the
'msg_non_seq' field is inadvertently inherited by the rejected message,
although it is now sent as a unicast message. This again leads the
receiving unicast link endpoint to steer the packet toward the broadcast
link receive function, where it is dropped. The affected unicast link is
thereafter (after 100 failed retransmissions) declared 'stale' and
reset.

We fix this by unconditionally setting the 'msg_non_seq' flag to zero
for all rejected/returned messages.

Reported-by: Canh Duc Luu <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotipc: accept PACKET_MULTICAST packets
Jon Paul Maloy [Mon, 14 Aug 2017 15:55:56 +0000 (17:55 +0200)]
tipc: accept PACKET_MULTICAST packets

On L2 bearers, the TIPC broadcast function is sending out packets using
the corresponding L2 broadcast address. At reception, we filter such
packets under the assumption that they will also be delivered as
broadcast packets.

This assumption doesn't always hold true. Under high load, we have seen
that a switch may convert the destination address and deliver the packet
as a PACKET_MULTICAST, something leading to inadvertently dropped
packets and a stale and reset broadcast link.

We fix this by extending the reception filtering to accept packets of
type PACKET_MULTICAST.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv4: route: fix inet_rtm_getroute induced crash
Florian Westphal [Sun, 13 Aug 2017 22:52:58 +0000 (00:52 +0200)]
ipv4: route: fix inet_rtm_getroute induced crash

"ip route get $daddr iif eth0 from $saddr" causes:
 BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
 Call Trace:
  ip_route_input_rcu+0x1535/0x1b50
  ip_route_input_noref+0xf9/0x190
  tcp_v4_early_demux+0x1a4/0x2b0
  ip_rcv+0xbcb/0xc05
  __netif_receive_skb+0x9c/0xd0
  netif_receive_skb_internal+0x5a8/0x890

Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
iif was provided) or ip_route_output_key_hash_rcu.

But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
bit (i.e. skb_dst_drop will release/free the entry while it should not).

Thus only set the dst if we called ip_route_output_key_hash_rcu().

I tested this patch by running:
 while true;do ip r get 10.0.1.2;done > /dev/null &
 while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
... and saw no crash or memory leak.

Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David Ahern <dsahern@gmail.com>
Fixes: ba52d61e0ff ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'perf-core-for-mingo-4.14-20170814' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Mon, 14 Aug 2017 17:38:40 +0000 (19:38 +0200)]
Merge tag 'perf-core-for-mingo-4.14-20170814' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

Infrastructure changes:

- Do not consider empty files as valid srclines (Milian Wolff)

- Fix wrong size in perf_record_mmap for last kernel module,
  which resulted in erroneous symbol resolution in at least s390x (Thomas Richter)

- Add missing newline to expr parser error messages (Andi Kleen)

- Fix saved values rbtree lookup in 'perf stat' (Andi Kleen)

- Add support for shell based tests in 'perf test', add a few that
  run 'perf probe', 'perf trace', using kprobes, uprobes to check
  the output of those tools and the effects on the system, checking,
  for instance, DWARF backtraces from uprobes (Arnaldo Carvalho de Melo)

Arch specific changes:

- Add ppc64le to audit uname list in the python scripting support (Naveen N. Rao)

- Update POWER9 vendor events tables (Sukadev Bhattiprolu)

- Fix module symbol adjustment for s390x (Thomas Richter)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agobrcmfmac: feature check for multi-scheduled scan fails on bcm4343x devices
Arend Van Spriel [Fri, 11 Aug 2017 10:07:36 +0000 (11:07 +0100)]
brcmfmac: feature check for multi-scheduled scan fails on bcm4343x devices

The firmware feature check introduced for multi-scheduled scan turned out
to be failing for bcm4343{0,1,8} devices resulting in a firmware crash.
The reason for this crash has not yet been root cause so this patch avoids
the feature check for those device as a short-term fix.

Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Reported-by: Ian Molton <ian@mnementh.co.uk>
Fixes: 9fe929aaace6 ("brcmfmac: add firmware feature detection for gscan feature")
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
8 years agobonding: ratelimit failed speed/duplex update warning
Andreas Born [Fri, 11 Aug 2017 22:36:55 +0000 (00:36 +0200)]
bonding: ratelimit failed speed/duplex update warning

bond_miimon_commit() handles the UP transition for each slave of a bond
in the case of MII. It is triggered 10 times per second for the default
MII Polling interval of 100ms. For device drivers that do not implement
__ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
fails persistently while the MII status could remain UP. That is, in
this and other cases where the speed/duplex update keeps failing over a
longer period of time while the MII state is UP, a warning is printed
every MII polling interval.

To address these excessive warnings net_ratelimit() should be used.
Printing a warning once would not be sufficient since the call to
bond_update_speed_duplex() could recover to succeed and fail again
later. In that case there would be no new indication what went wrong.

Fixes: b5bf0f5b16b9c (bonding: correctly update link status during mii-commit phase)
Signed-off-by: Andreas Born <futur.andy@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoLinux 4.13-rc5 v4.13-rc5
Linus Torvalds [Sun, 13 Aug 2017 23:01:32 +0000 (16:01 -0700)]
Linux 4.13-rc5

8 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 13 Aug 2017 22:34:28 +0000 (15:34 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "Another round of MIPS fixes:

   - compressed boot: Ignore a generated .c file

   - VDSO: Fix a register clobber list

   - DECstation: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression

   - Octeon: Fix recent cleanups that cleaned away a bit too much thus
     breaking the arch side of the EDAC and USB drivers.

   - uasm: Fix duplicate const in "const struct foo const bar[]" which
     GCC 7.1 no longer accepts.

   - Fix race on setting and getting cpu_online_mask

   - Fix preemption issue. To do so cleanly introduce macro to get the
     size of L3 cache line.

   - Revert include cleanup that sometimes results in build error

   - MicroMIPS uses bit 0 of the PC to indicate microMIPS mode. Make
     sure this bit is set for kernel entry as well.

   - Prevent configuring the kernel for both microMIPS and MT. There are
     no such CPUs currently and thus the combination is unsupported and
     results in build errors.

  This has been sitting in linux-next for a few days and has survived
  automated testing by Imagination's test farm. No known regressions
  pending except a number of issues that crept up due to lots of people
  switching to GCC 7.1"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Set ISA bit in entry-y for microMIPS kernels
  MIPS: Prevent building MT support for microMIPS kernels
  MIPS: PCI: Fix smp_processor_id() in preemptible
  MIPS: Introduce cpu_tcache_line_size
  MIPS: DEC: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression
  MIPS: VDSO: Fix clobber lists in fallback code paths
  Revert "MIPS: Don't unnecessarily include kmalloc.h into <asm/cache.h>."
  MIPS: OCTEON: Fix USB platform code breakage.
  MIPS: Octeon: Fix broken EDAC driver.
  MIPS: gitignore: ignore generated .c files
  MIPS: Fix race on setting and getting cpu_online_mask
  MIPS: mm: remove duplicate "const" qualifier on insn_table

8 years agoMerge tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 13 Aug 2017 19:44:18 +0000 (12:44 -0700)]
Merge tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are three firmware core fixes for 4.13-rc5.

  All three of these fix reported issues and have been floating around
  for a few weeks. They have been in linux-next with no reported
  problems"

* tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware: avoid invalid fallback aborts by using killable wait
  firmware: fix batched requests - send wake up on failure on direct lookups
  firmware: fix batched requests - wake all waiters

8 years agoMerge tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 13 Aug 2017 19:41:58 +0000 (12:41 -0700)]
Merge tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are two patches for 4.13-rc5.

  One is a fix for a reported thunderbolt issue, and the other a fix for
  an MEI driver issue. Both have been in linux-next with no reported
  issues"

* tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  thunderbolt: Do not enumerate more ports from DROM than the controller has
  mei: exclude device from suspend direct complete optimization

8 years agoMerge tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 13 Aug 2017 19:33:35 +0000 (12:33 -0700)]
Merge tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are two tty serial driver fixes for 4.13-rc5. One is a revert of
  a -rc1 patch that turned out to not be a good idea, and the other is a
  fix for the pl011 serial driver.

  Both have been in linux-next with no reported issues"

* tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: Delete dead code for CIR serial ports"
  tty: pl011: fix initialization order of QDF2400 E44

8 years agoMerge tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 13 Aug 2017 19:30:17 +0000 (12:30 -0700)]
Merge tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/iio fixes from Greg KH:
 "Here are some Staging and IIO driver fixes for 4.13-rc5.

  Nothing major, just a number of small fixes for reported issues. All
  of these have been in linux-next for a while now with no reported
  issues. Full details are in the shortlog"

* tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING
  iio: aspeed-adc: wait for initial sequence.
  iio: accel: bmc150: Always restore device to normal mode after suspend-resume
  staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
  iio: adc: axp288: Fix the GPADC pin reading often wrongly returning 0
  iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
  iio: accel: st_accel: add SPI-3wire support
  iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications"
  iio: adc: sun4i-gpadc-iio: fix unbalanced irq enable/disable
  iio: pressure: st_pressure_core: disable multiread by default for LPS22HB
  iio: light: tsl2563: use correct event code

8 years agoMerge tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 13 Aug 2017 19:27:42 +0000 (12:27 -0700)]
Merge tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of small USB driver fixes and new device ids for
  4.13-rc5. There is the usual gadget driver fixes, some new quirks for
  "messy" hardware, and some new device ids.

  All have been in linux-next with no reported issues"

* tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: pl2303: add new ATEN device id
  usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
  USB: Check for dropped connection before switching to full speed
  usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
  usb: renesas_usbhs: gadget: fix unused-but-set-variable warning
  usb: renesas_usbhs: Fix UGCTRL2 value for R-Car Gen3
  usb: phy: phy-msm-usb: Fix usage of devm_regulator_bulk_get()
  usb: gadget: udc: renesas_usb3: Fix usb_gadget_giveback_request() calling
  usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets
  USB: serial: option: add D-Link DWM-222 device ID
  usb: musb: fix tx fifo flush handling again
  usb: core: unlink urbs from the tail of the endpoint's urb_list
  usb-storage: fix deadlock involving host lock and scsi_done
  uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
  USB: hcd: Mark secondary HCD as dead if the primary one died
  USB: serial: cp210x: add support for Qivicon USB ZigBee dongle

8 years agoMerge tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd
Linus Torvalds [Sat, 12 Aug 2017 23:19:43 +0000 (16:19 -0700)]
Merge tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd

Pull another MTD fix from Brian Norris:
 "An mtdblock regression occurred in -rc1 (all writes were broken!), in
  the process of some block subsystem refactoring. Noticed and fixed
  last week, but I'm a little slow on the uptake"

* tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd:
  mtd: blkdevs: Fix mtd block write failure

8 years agomtd: blkdevs: Fix mtd block write failure
Abhishek Sahu [Wed, 2 Aug 2017 12:33:05 +0000 (18:03 +0530)]
mtd: blkdevs: Fix mtd block write failure

All the MTD block write requests are failing with
following error messages

    mkfs.ext4  /dev/mtdblock0

    print_req_error: I/O error, dev mtdblock0, sector 0
    Buffer I/O error on dev mtdblock0, logical block 0,
    lost async page write

The control is going to default case after block write request
because of missing return.

Fixes: commit 2a842acab109 ("block: introduce new block status code type")
Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Linus Torvalds [Sat, 12 Aug 2017 19:08:59 +0000 (12:08 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
 "The highlights include:

   - Fix iscsi-target payload memory leak during
     ISCSI_FLAG_TEXT_CONTINUE (Varun Prakash)

   - Fix tcm_qla2xxx incorrect use of tcm_qla2xxx_free_cmd during ABORT
     (Pascal de Bruijn + Himanshu Madhani + nab)

   - Fix iscsi-target long-standing issue with parallel delete of a
     single network portal across multiple target instances (Gary Guo +
     nab)

   - Fix target dynamic se_node GPF during uncached shutdown regression
     (Justin Maggard + nab)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix node_acl demo-mode + uncached dynamic shutdown regression
  iscsi-target: Fix iscsi_np reset hung task during parallel delete
  qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
  cxgbit: fix sg_nents calculation
  iscsi-target: fix invalid flags in text response
  iscsi-target: fix memory leak in iscsit_setup_text_cmd()
  cxgbit: add missing __kfree_skb()
  tcmu: free old string on reconfig
  tcmu: Fix possible to/from address overflow when doing the memcpy

8 years agoMerge tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Aug 2017 16:01:36 +0000 (09:01 -0700)]
Merge tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Some fixes for Xen:

   - a fix for a regression introduced in 4.13 for a Xen HVM-guest
     configured with KASLR

   - a fix for a possible deadlock in the xenbus driver when booting the
     system

   - a fix for lost interrupts in Xen guests"

* tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: Fix interrupt lost during irq_disable and irq_enable
  xen: avoid deadlock in xenbus
  xen: fix hvm guest with kaslr enabled
  xen: split up xen_hvm_init_shared_info()
  x86: provide an init_mem_mapping hypervisor hook

8 years agoMD: not clear ->safemode for external metadata array
Shaohua Li [Sat, 12 Aug 2017 03:34:45 +0000 (20:34 -0700)]
MD: not clear ->safemode for external metadata array

->safemode should be triggered by mdadm for external metadaa array, otherwise
array's state confuses mdadm.

Fixes: 33182d15c6bf(md: always clear ->safemode when md_check_recovery gets the mddev lock.)
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
8 years agoudp: harden copy_linear_skb()
Eric Dumazet [Fri, 11 Aug 2017 17:48:53 +0000 (10:48 -0700)]
udp: harden copy_linear_skb()

syzkaller got crashes with CONFIG_HARDENED_USERCOPY=y configs.

Issue here is that recvfrom() can be used with user buffer of Z bytes,
and SO_PEEK_OFF of X bytes, from a skb with Y bytes, and following
condition :

Z < X < Y

kernel BUG at mm/usercopy.c:72!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 2917 Comm: syzkaller842281 Not tainted 4.13.0-rc3+ #16
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
task: ffff8801d2fa40c0 task.stack: ffff8801d1fe8000
RIP: 0010:report_usercopy mm/usercopy.c:64 [inline]
RIP: 0010:__check_object_size+0x3ad/0x500 mm/usercopy.c:264
RSP: 0018:ffff8801d1fef8a8 EFLAGS: 00010286
RAX: 0000000000000078 RBX: ffffffff847102c0 RCX: 0000000000000000
RDX: 0000000000000078 RSI: 1ffff1003a3fded5 RDI: ffffed003a3fdf09
RBP: ffff8801d1fef998 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d1ea480e
R13: fffffffffffffffa R14: ffffffff84710280 R15: dffffc0000000000
FS:  0000000001360880(0000) GS:ffff8801dc000000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000202ecfe4 CR3: 00000001d1ff8000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 check_object_size include/linux/thread_info.h:108 [inline]
 check_copy_size include/linux/thread_info.h:139 [inline]
 copy_to_iter include/linux/uio.h:105 [inline]
 copy_linear_skb include/net/udp.h:371 [inline]
 udpv6_recvmsg+0x1040/0x1af0 net/ipv6/udp.c:395
 inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
 sock_recvmsg_nosec net/socket.c:792 [inline]
 sock_recvmsg+0xc9/0x110 net/socket.c:799
 SYSC_recvfrom+0x2d6/0x570 net/socket.c:1788
 SyS_recvfrom+0x40/0x50 net/socket.c:1760
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bpf-Minor-fix-in-bpf_convert_ctx_access'
David S. Miller [Fri, 11 Aug 2017 21:59:24 +0000 (14:59 -0700)]
Merge branch 'bpf-Minor-fix-in-bpf_convert_ctx_access'

Daniel Borkmann says:

====================
bpf: Minor fix in bpf_convert_ctx_access

First one was found while trying to compile the kernel
with !CONFIG_NET_RX_BUSY_POLL.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: fix two missing target_size settings in bpf_convert_ctx_access
Daniel Borkmann [Fri, 11 Aug 2017 16:31:25 +0000 (18:31 +0200)]
bpf: fix two missing target_size settings in bpf_convert_ctx_access

When CONFIG_NET_SCHED or CONFIG_NET_RX_BUSY_POLL is /not/ set and
we try a narrow __sk_buff load of tc_index or napi_id, respectively,
then verifier rightfully complains that it's misconfigured, because
we need to set target_size in each of the two cases. The rewrite
for the ctx access is just a dummy op, but needs to pass, so fix
this up.

Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
Reported-by: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fix compilation when busy poll is not enabled
Daniel Borkmann [Fri, 11 Aug 2017 16:31:24 +0000 (18:31 +0200)]
net: fix compilation when busy poll is not enabled

MIN_NAPI_ID is used in various places outside of
CONFIG_NET_RX_BUSY_POLL wrapping, so when it's not set
we run into build errors such as:

  net/core/dev.c: In function 'dev_get_by_napi_id':
  net/core/dev.c:886:16: error: ‘MIN_NAPI_ID’ undeclared (first use in this function)
    if (napi_id < MIN_NAPI_ID)
                  ^~~~~~~~~~~

Thus, have MIN_NAPI_ID always defined to fix these errors.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomISDN: Fix null pointer dereference at mISDN_FsmNew
Anton Vasilyev [Fri, 11 Aug 2017 12:57:22 +0000 (15:57 +0300)]
mISDN: Fix null pointer dereference at mISDN_FsmNew

If mISDN_FsmNew() fails to allocate memory for jumpmatrix
then null pointer dereference will occur on any write to
jumpmatrix.

The patch adds check on successful allocation and
corresponding error handling.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: do not update MTU from BH in flower app
Simon Horman [Fri, 11 Aug 2017 08:18:20 +0000 (10:18 +0200)]
nfp: do not update MTU from BH in flower app

The Flower app may receive a request to update the MTU of a representor
netdev upon receipt of a control message from the firmware. This requires
the RTNL lock which needs to be taken outside of the packet processing
path.

As a handling of this correctly seems a little to invasive for a fix simply
skip setting the MTU for now.

Relevant backtrace:
 [ 1496.288489] BUG: scheduling while atomic: kworker/0:3/373/0x00000100
 [ 1496.294911]  dca syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm mxm_wmi ahci pps_core libahci i2c_algo_bit wmi [last unloaded: nfp]
 [ 1496.294918] CPU: 0 PID: 373 Comm: kworker/0:3 Tainted: G           OE   4.13.0-rc3+ #3
 [ 1496.294919] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015
 [ 1496.294923] Workqueue: events work_for_cpu_fn
 [ 1496.294924] Call Trace:
 [ 1496.294927]  <IRQ>
 [ 1496.294931]  dump_stack+0x63/0x82
 [ 1496.294935]  __schedule_bug+0x54/0x70
 [ 1496.294937]  __schedule+0x62f/0x890
 [ 1496.294941]  ? intel_unmap_sg+0x90/0x90
 [ 1496.294942]  schedule+0x36/0x80
 [ 1496.294943]  schedule_preempt_disabled+0xe/0x10
 [ 1496.294945]  __mutex_lock.isra.2+0x445/0x4a0
 [ 1496.294947]  ? device_is_rmrr_locked+0x12/0x50
 [ 1496.294950]  ? kfree+0x162/0x170
 [ 1496.294952]  ? device_is_rmrr_locked+0x12/0x50
 [ 1496.294953]  ? iommu_should_identity_map+0x50/0xe0
 [ 1496.294954]  __mutex_lock_slowpath+0x13/0x20
 [ 1496.294955]  ? iommu_no_mapping+0x48/0xd0
 [ 1496.294956]  ? __mutex_lock_slowpath+0x13/0x20
 [ 1496.294957]  mutex_lock+0x2f/0x40
 [ 1496.294960]  rtnl_lock+0x15/0x20
 [ 1496.294979]  nfp_flower_cmsg_rx+0xc8/0x150 [nfp]
 [ 1496.294986]  nfp_ctrl_poll+0x286/0x350 [nfp]
 [ 1496.294989]  tasklet_action+0xf6/0x110
 [ 1496.294992]  __do_softirq+0xed/0x278
 [ 1496.294993]  irq_exit+0xb6/0xc0
 [ 1496.294994]  do_IRQ+0x4f/0xd0
 [ 1496.294996]  common_interrupt+0x89/0x89

Fixes: 948faa46c05b ("nfp: add support for control messages for flower app")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: stmmac: Use the right logging function in stmmac_mdio_register
Romain Perier [Thu, 10 Aug 2017 14:56:05 +0000 (16:56 +0200)]
net: stmmac: Use the right logging function in stmmac_mdio_register

Currently, the function stmmac_mdio_register() is only used by
stmmac_dvr_probe() from stmmac_main.c, in order to register the MDIO bus
and probe information about the PHY. As this function is called before
calling register_netdev(), all messages logged from stmmac_mdio_register
are prefixed by "(unnamed net_device)". The goal of netdev_info or
netdev_err is to dump useful infos about a net_device, when this data
structure is partially initialized, there is no point for using these
functions.

This commit fixes the issue by replacing all netdev_*() by the
corresponding dev_*() function for logging. The last netdev_info is
replaced by phy_attached_info(), as a valid phydev can be used at this
point.

Signed-off-by: Romain Perier <romain.perier@collabora.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/sched/hfsc: allocate tcf block for hfsc root class
Konstantin Khlebnikov [Thu, 10 Aug 2017 09:31:40 +0000 (12:31 +0300)]
net/sched/hfsc: allocate tcf block for hfsc root class

Without this filters cannot be attached.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobonding: require speed/duplex only for 802.3ad, alb and tlb
Andreas Born [Thu, 10 Aug 2017 04:41:44 +0000 (06:41 +0200)]
bonding: require speed/duplex only for 802.3ad, alb and tlb

The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent
with link state") puts the link state to down if
bond_update_speed_duplex() cannot retrieve speed and duplex settings.
Assumably the patch was written with 802.3ad mode in mind which relies
on link speed/duplex settings. For other modes like active-backup these
settings are not required. Thus, only for these other modes, this patch
reintroduces support for slaves that do not support reporting speed or
duplex such as wireless devices. This fixes the regression reported in
bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547).

Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent
with link state")
Signed-off-by: Andreas Born <futur.andy@googlemail.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: ksz: fix skb freeing
Vivien Didelot [Wed, 9 Aug 2017 20:46:09 +0000 (16:46 -0400)]
net: dsa: ksz: fix skb freeing

The DSA layer frees the original skb when an xmit function returns NULL,
meaning an error occurred. But if the tagging code copied the original
skb, it is responsible of freeing the copy if an error occurs.

The ksz tagging code currently has two issues: if skb_put_padto fails,
the skb copy is not freed, and the original skb will be freed twice.

To fix that, move skb_put_padto inside both branches of the skb_tailroom
condition, before freeing the original skb, and free the copy on error.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 11 Aug 2017 20:54:09 +0000 (13:54 -0700)]
Merge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "A few more NFS client bugfixes from me for rc5.

  Dros has a stable fix for flexfiles to prevent leaking the
  nfs4_ff_ds_version arrays when freeing a layout, Trond fixed a
  potential recovery loop situation with the TEST_STATEID operation, and
  Christoph fixed up the pNFS blocklayout Kconfig options to prevent
  unsafe use with kernels that don't have large block device support.
  Summary:

  Stable fix:
   - fix leaking nfs4_ff_ds_version array

  Other fixes:
   - improve TEST_STATEID OLD_STATEID handling to prevent recovery loop

   - require 64-bit sector_t for pNFS blocklayout to prevent 32-bit
     compile errors"

* tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  pnfs/blocklayout: require 64-bit sector_t
  NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid()
  nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays

8 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 11 Aug 2017 19:26:49 +0000 (12:26 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A set of fixes that should go into this series. This contains:

   - Fix from Bart for blk-mq requeue queue running, preventing a
     continued loop of run/restart.

   - Fix for a bio/blk-integrity issue, in two parts. One from
     Christoph, fixing where verification happens, and one from Milan,
     for a NULL profile.

   - NVMe pull request, most of the changes being for nvme-fc, but also
     a few trivial core/pci fixes"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: fix directive command numd calculation
  nvme: fix nvme reset command timeout handling
  nvme-pci: fix CMB sysfs file removal in reset path
  lpfc: support nvmet_fc defer_rcv callback
  nvmet_fc: add defer_req callback for deferment of cmd buffer return
  nvme: strip trailing 0-bytes in wwid_show
  block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time
  bio-integrity: only verify integrity on the lowest stacked driver
  bio-integrity: Fix regression if profile verify_fn is NULL

8 years agoperf test shell: Add uprobes + backtrace ping test
Arnaldo Carvalho de Melo [Thu, 10 Aug 2017 17:50:49 +0000 (14:50 -0300)]
perf test shell: Add uprobes + backtrace ping test

Installs a probe on libc's inet_pton function, that will use uprobes,
then use 'perf trace' on a ping to localhost asking for just one packet
with the a backtrace 3 levels deep, check that it is what we expect.
This needs no debuginfo package, all is done using the libc ELF symtab
and the CFI info in the binaries.

Testing it:

  # perf test ping
  61: probe libc's inet_pton & backtrace it with ping       : Ok

In verbose mode:

  # perf test -v ping
  61: probe libc's inet_pton & backtrace it with ping       :
  --- start ---
  test child forked, pid 1007
  PING ::1(::1) 56 data bytes
  64 bytes from ::1: icmp_seq=1 ttl=64 time=0.058 ms
  --- ::1 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms
  0.000 probe_libc:inet_pton:(7f75fce12a20))
  __GI___inet_pton (/usr/lib64/libc-2.24.so)
  getaddrinfo (/usr/lib64/libc-2.24.so)
  _init (/usr/bin/ping)
  test child finished with 0
  ---- end ----
  probe libc's inet_pton & backtrace it with ping: Ok
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-idrntt4nbg15aafu8hjmv7sk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf report: Fix module symbol adjustment for s390x
Thomas Richter [Thu, 3 Aug 2017 13:49:01 +0000 (15:49 +0200)]
perf report: Fix module symbol adjustment for s390x

The 'perf report' tool does not display the addresses of kernel module
symbols correctly.

For example symbol qeth_send_ipa_cmd in kernel module qeth.ko has this
relative address for function qeth_send_ipa_cmd():

  [root@s8360047 linux]# nm -g drivers/s390/net/qeth.ko | fgrep send_ipa_cmd
  0000000000013088 T qeth_send_ipa_cmd

The module is loaded at address:

  [root@s8360047 linux]# cat /sys/module/qeth/sections/.text
  0x000003ff80296d20
  [root@s8360047 linux]#

This should result in a start address of:

  0x13088 + 0x3ff80296d20 = 0x3ff802a9da8

Using crash to verify the address on a live system:

  [root@s8360046 linux]# crash vmlinux

  crash 7.1.9++
  Copyright (C) 2002-2016  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation

  [...]

  crash> mod -s qeth drivers/s390/net/qeth.ko
       MODULE       NAME        SIZE  OBJECT FILE
       3ff8028d700  qeth      151552  drivers/s390/net/qeth.ko
  crash> sym qeth_send_ipa_cmd
  3ff802a9da8 (T) qeth_send_ipa_cmd [qeth] /root/linux/drivers/s390/net/qeth_core_main.c: 2944
  crash>

Now perf report displays the address of symbol qeth_send_ipa_cmd:
symbol__new:

  qeth_send_ipa_cmd 0x130f0-0x132ce

There is a difference of 0x68 between the entry in the symbol table (see
nm command above) and perf. The difference is from the offset the .text
segment of qeth.ko:

  [root@s8360047 perf]# readelf -a drivers/s390/net/qeth.ko
  Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .note.gnu.build-i NOTE             0000000000000000  00000040
       0000000000000024  0000000000000000   A       0     0     4
  [ 2] .text             PROGBITS         0000000000000000  00000068
       000000000001c8a0  0000000000000000  AX       0     0     8

As seen the .text segment has an offset of 0x68 with start address 0x0.
Therefore 0x68 is added to the address of qeth_send_ipa_cmd and thus
0x13088 + 0x68 = 0x130f0 is displayed.

This is wrong, perf report needs to display the start address of symbol
qeth_send_ipa_cmd at 0x13088 + qeth.ko.text section start address.

The qeth.ko module .text start address is available in the qeth.ko DSO
map. Just identify the kernel module symbols and correct the addresses.

With the fix I see this correct address for symbol: symbol__new:
qeth_send_ipa_cmd 0x3ff802a9da8-0x3ff802a9f86

Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
LPU-Reference: 20170803134902.47207-1-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-q8lktlpoxb5e3dj52u1s1rw4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf record: Fix wrong size in perf_record_mmap for last kernel module
Thomas Richter [Thu, 3 Aug 2017 13:49:02 +0000 (15:49 +0200)]
perf record: Fix wrong size in perf_record_mmap for last kernel module

During work on perf report for s390 I ran into the following issue:

0 0x318 [0x78]: PERF_RECORD_MMAP -1/0:
        [0x3ff804d6990(0xfffffc007fb2966f) @ 0]:
        x /lib/modules/4.12.0perf1+/kernel/drivers/s390/net/qeth_l2.ko

This is a PERF_RECORD_MMAP entry of the perf.data file with an invalid
module size for qeth_l2.ko (the s390 ethernet device driver).

Even a mainframe does not have 0xfffffc007fb2966f bytes of main memory.

It turned out that this wrong size is created by the perf record
command.  What happens is this function call sequence from
__cmd_record():

  perf_session__new():
    perf_session__create_kernel_maps():
      machine__create_kernel_maps():
        machine__create_modules():   Creates map for all loaded kernel modules.
          modules__parse():   Reads /proc/modules and extracts module name and
                              load address (1st and last column)
            machine__create_module():   Called for every module found in /proc/modules.
                              Creates a new map for every module found and enters
                              module name and start address into the map. Since the
                              module end address is unknown it is set to zero.

This ends up with a kernel module map list sorted by module start
addresses.  All module end addresses are zero.

Last machine__create_kernel_maps() calls function map_groups__fixup_end().
This function iterates through the maps and assigns each map entry's
end address the successor map entry start address. The last entry of the
map group has no successor, so ~0 is used as end to consume the remaining
memory.

Later __cmd_record calls function record__synthesize() which in turn calls
perf_event__synthesize_kernel_mmap() and perf_event__synthesize_modules()
to create PERF_REPORT_MMAP entries into the perf.data file.

On s390 this results in the last module qeth_l2.ko
(which has highest start address, see module table:
        [root@s8360047 perf]# cat /proc/modules
        qeth_l2 86016 1 - Live 0x000003ff804d6000
        qeth 266240 1 qeth_l2, Live 0x000003ff80296000
        ccwgroup 24576 1 qeth, Live 0x000003ff80218000
        vmur 36864 0 - Live 0x000003ff80182000
        qdio 143360 2 qeth_l2,qeth, Live 0x000003ff80002000
        [root@s8360047 perf]# )
to be the last entry and its map has an end address of ~0.

When the PERF_RECORD_MMAP entry is created for kernel module qeth_l2.ko
its start address and length is written. The length is calculated in line:
    event->mmap.len   = pos->end - pos->start;
and results in 0xffffffffffffffff - 0x3ff804d6990(*) = 0xfffffc007fb2966f

(*) On s390 the module start address is actually determined by a __weak function
named arch__fix_module_text_start() in machine__create_module().

I think this improvable. We can use the module size (2nd column of /proc/modules)
to get each loaded kernel module size and calculate its end address.
Only for map entries which do not have a valid end address (end is still zero)
we can use the heuristic we have now, that is use successor start address or ~0.

Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
LPU-Reference: 20170803134902.47207-2-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-nmoqij5b5vxx7rq2ckwu8iaj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf srcline: Do not consider empty files as valid srclines
Milian Wolff [Sun, 6 Aug 2017 21:24:45 +0000 (23:24 +0200)]
perf srcline: Do not consider empty files as valid srclines

Sometimes we get a non-null, but empty, string for the filename from
bfd. This then results in srclines of the form ":0", which is different
from the canonical SRCLINE_UNKNOWN in the form "??:0".  Set the file to
NULL if it is empty to fix this.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170806212446.24925-14-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf util: Take elf_name as const string in dso__demangle_sym
Milian Wolff [Sun, 6 Aug 2017 21:24:34 +0000 (23:24 +0200)]
perf util: Take elf_name as const string in dso__demangle_sym

The input string is not modified and thus can be passed in as a pointer
to const data.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170806212446.24925-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test shell: Add test using vfs_getname + 'perf trace'
Arnaldo Carvalho de Melo [Fri, 4 Aug 2017 20:06:26 +0000 (17:06 -0300)]
perf test shell: Add test using vfs_getname + 'perf trace'

Uses the 'perf test shell' library to add probe:vfs_getname to the
system then use it with 'perf trace' using 'touch' to write to a temp
file, then checks that that was captured by the vfs_getname was used by
'perf trace', that already handles "probe:vfs_getname" if present, and
used in the "open" syscall "filename" argument beautifier.

Testing it:

  # perf test "trace + vfs_getname"
  61: Check open filename arg using perf trace + vfs_getname: Ok
  #

  # perf test -v "trace + vfs_getname"
  61: Check open filename arg using perf trace + vfs_getname:
  --- start ---
  test child forked, pid 30846
  Added new event:
    probe:vfs_getname    (on getname_flags:72 with pathname=result->name:string)

  You can now use it in all perf tools, such as:

   perf record -e probe:vfs_getname -aR sleep 1

       2.237 ( 0.012 ms): touch/30855 open(filename: /tmp/temporary_file.kmoWQ, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
  test child finished with 0
  ---- end ----
  Check open filename arg using perf trace + vfs_getname: Ok
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-j02nobfvvn9c7yrphdsnbqx0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test shell: Add test using probe:vfs_getname and verifying results
Arnaldo Carvalho de Melo [Fri, 4 Aug 2017 17:39:37 +0000 (14:39 -0300)]
perf test shell: Add test using probe:vfs_getname and verifying results

This test uses the 'perf test shell' library to add probe:vfs_getname to the
system then use it with 'perf record' using 'touch' to write to a temp file,
then checks that that was captured by the vfs_getname probe in the generated
perf.data file, with the temp file name as the pathname argument.

Using it:

  # perf test "Use vfs_getname"
  60: Use vfs_getname probe to get syscall args filenames: Ok
  # perf test -v "Use vfs_getname"
  60: Use vfs_getname probe to get syscall args filenames:
  --- start ---
  test child forked, pid 16414
  Added new event:
    probe:vfs_getname    (on getname_flags:72 with pathname=result->name:string)

  You can now use it in all perf tools, such as:

perf record -e probe:vfs_getname -aR sleep 1

  Recording open file:
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.022 MB /tmp/vaca.perf.data.QZsn7 (13 samples) ]
  Looking at perf.data file for vfs_getname records for the file we touched:
             touch 16421 [002] 1255152.879561: probe:vfs_getname: (ffffffffa626e608) pathname="/tmp/vaca.l10SL"
  test child finished with 0
  ---- end ----
  Use vfs_getname probe to get syscall args filenames: Ok
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-t555fnhbcbxnukltk23dqxur@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test shell: Move vfs_getname probe function to lib
Arnaldo Carvalho de Melo [Fri, 4 Aug 2017 17:18:29 +0000 (14:18 -0300)]
perf test shell: Move vfs_getname probe function to lib

Multiple tests will be able to reuse these functions, to test things
like perf report, 'trace', etc, using this probe.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-48xagvozhouhyi8fjota6o2d@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test shell: Install shell tests
Arnaldo Carvalho de Melo [Fri, 4 Aug 2017 15:19:44 +0000 (12:19 -0300)]
perf test shell: Install shell tests

Now that we have shell tests, install them.

Developers don't need this pass, as 'perf test' will look first at the
in tree scripts at tools/perf/tests/shell/.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-j21u4v0jsehi0lpwqwjb4j45@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test shell: Add 'probe_vfs_getname' shell test
Arnaldo Carvalho de Melo [Thu, 3 Aug 2017 19:54:53 +0000 (16:54 -0300)]
perf test shell: Add 'probe_vfs_getname' shell test

First perf shell test:

  # perf test vfs_getname
  60: Add vfs_getname probe to get syscall args filenames: Ok
  #

In verbose mode:

  # perf test -v vfs_getname
  60: Add vfs_getname probe to get syscall args filenames:
  --- start ---
  test child forked, pid 19146
  Added new event:
    probe:vfs_getname    (on getname_flags:72 with pathname=result->name:string)

  You can now use it in all perf tools, such as:

  perf record -e probe:vfs_getname -aR sleep 1

  test child finished with 0
  ---- end ----
  Add vfs_getname probe to get syscall args filenames: Ok
  #

And if the vmlinux file is not found:

  # mv ../build/v4.12.0-rc6+/vmlinux ../build/v4.12.0-rc6+/vmlinux.hidden
  # perf test vfs_getname
  60: Add vfs_getname probe to get syscall args filenames: Skip
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-8f3n22c1yn516ev30s603ow2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test: Make 'list' use same filtering code as main 'perf test'
Arnaldo Carvalho de Melo [Fri, 4 Aug 2017 14:16:40 +0000 (11:16 -0300)]
perf test: Make 'list' use same filtering code as main 'perf test'

Before:

  # perf test Synth
  39: Synthesize thread map  : Ok
  41: Synthesize cpu map     : Ok
  42: Synthesize stat config : Ok
  43: Synthesize stat        : Ok
  44: Synthesize stat round  : Ok
  45: Synthesize attr update : Ok
  # perf test list Synth
  #

After:

  # perf test Synth
  39: Synthesize thread map  : Ok
  41: Synthesize cpu map     : Ok
  42: Synthesize stat config : Ok
  43: Synthesize stat        : Ok
  44: Synthesize stat round  : Ok
  45: Synthesize attr update : Ok
  # perf test list Synth
  39: Synthesize thread map
  41: Synthesize cpu map
  42: Synthesize stat config
  43: Synthesize stat
  44: Synthesize stat round
  45: Synthesize attr update
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-v95tqqzuwawsmds3zn2mosje@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>