Kan Liang [Tue, 2 Apr 2019 19:45:05 +0000 (12:45 -0700)]
perf/x86/intel: Add Icelake support
Add Icelake core PMU perf code, including constraint tables and the main
enable code.
Icelake expanded the generic counters to always 8 even with HT on, but a
range of events cannot be scheduled on the extra 4 counters.
Add new constraint ranges to describe this to the scheduler.
The number of constraints that need to be checked is larger now than
with earlier CPUs.
At some point we may need a new data structure to look them up more
efficiently than with linear search. So far it still seems to be
acceptable however.
Icelake added a new fixed counter SLOTS. Full support for it is added
later in the patch series.
The cache events table is identical to Skylake.
Compare to PEBS instruction event on generic counter, fixed counter 0
has less skid. Force instruction:ppp always in fixed counter 0.
Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-9-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them
With adaptive PEBS the CPU can directly supply the LBR information,
so we don't need to read it again. But the LBRs still need to be
enabled. Add a special count to the cpuc that distinguishes these
two cases, and avoid reading the LBRs unnecessarily when PEBS is
active.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-7-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kan Liang [Tue, 2 Apr 2019 19:45:02 +0000 (12:45 -0700)]
perf/x86/intel: Support adaptive PEBS v4
Adaptive PEBS is a new way to report PEBS sampling information. Instead
of a fixed size record for all PEBS events it allows to configure the
PEBS record to only include the information needed. Events can then opt
in to use such an extended record, or stay with a basic record which
only contains the IP.
The major new feature is to support LBRs in PEBS record.
Besides normal LBR, this allows (much faster) large PEBS, while still
supporting callstacks through callstack LBR. So essentially a lot of
profiling can now be done without frequent interrupts, dropping the
overhead significantly.
The main requirement still is to use a period, and not use frequency
mode, because frequency mode requires reevaluating the frequency on each
overflow.
The floating point state (XMM) is also supported, which allows efficient
profiling of FP function arguments.
Introduce specific drain function to handle variable length records.
Use a new callback to parse the new record format, and also handle the
STATUS field now being at a different offset.
Add code to set up the configuration register. Since there is only a
single register, all events either get the full super set of all events,
or only the basic record.
Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-6-kan.liang@linux.intel.com
[ Renamed GPRS => GP. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kan Liang [Tue, 2 Apr 2019 19:45:01 +0000 (12:45 -0700)]
perf/x86/intel/ds: Extract code of event update in short period
The drain_pebs() could be called twice in a short period for auto-reload
event in pmu::read(). The intel_pmu_save_and_restart_reload() should be
called to update the event->count.
This case should also be handled on Icelake. Extract the code for
later reuse.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-5-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/intel: Extract memory code PEBS parser for reuse
Extract some code related to memory profiling from the PEBS record
parser into separate functions. It can be reused by the upcoming
adaptive PEBS parser. No functional changes.
Rename intel_hsw_weight to intel_get_tsx_weight, and
intel_hsw_transaction to intel_get_tsx_transaction. Because the input is
not the hsw pebs format anymore.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-4-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kan Liang [Tue, 2 Apr 2019 19:44:59 +0000 (12:44 -0700)]
perf/x86: Support outputting XMM registers
Starting from Icelake, XMM registers can be collected in PEBS record.
But current code only output the pt_regs.
Add a new struct x86_perf_regs for both pt_regs and xmm_regs. The
xmm_regs will be used later to keep a pointer to PEBS record which has
XMM information.
XMM registers are 128 bit. To simplify the code, they are handled like
two different registers, which means setting two bits in the register
bitmap. This also allows only sampling the lower 64bit bits in XMM.
The index of XMM registers starts from 32. There are 16 XMM registers.
So all reserved space for regs are used. Remove REG_RESERVED.
Add PERF_REG_X86_XMM_MAX, which stands for the max number of all x86
regs including both GPRs and XMM.
Add REG_NOSUPPORT for 32bit to exclude unsupported registers.
Previous platforms can not collect XMM information in PEBS record.
Adding pebs_no_xmm_regs to indicate the unsupported platforms.
The common code still validates the supported registers. However, it
cannot check model specific registers, e.g. XMM. Add extra check in
x86_pmu_hw_config() to reject invalid config of regs_user and regs_intr.
The regs_user never supports XMM collection.
The regs_intr only supports XMM collection when sampling PEBS event on
icelake and later platforms.
Originally-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-3-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/intel: Force resched when TFA sysctl is modified
This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
event is using PMC3 when the echo command returns. Vice-Versa, when TFA
is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).
perf/core: Add perf_pmu_resched() as global function
This patch add perf_pmu_resched() a global function that can be called
to force rescheduling of events for a given PMU. The function locks
both cpuctx and task_ctx internally. This will be used by a subsequent
patch.
Signed-off-by: Stephane Eranian <eranian@google.com>
[ Simplified the calling convention. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Cc: nelson.dsouza@intel.com Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190408173252.37932-2-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
has an unintended side-effect of also suppressing all AUX records with no flags
and non-zero size, so all the regular records in the full trace mode.
This breaks some use cases for people.
Fix this by restoring "regular" AUX records.
Reported-by: Ben Gainey <Ben.Gainey@arm.com> Tested-by: Ben Gainey <Ben.Gainey@arm.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 1627314fb54a33e ("perf: Suppress AUX/OVERWRITE records") Link: https://lkml.kernel.org/r/20190329091338.29999-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alexander Shishkin [Fri, 29 Mar 2019 09:12:12 +0000 (11:12 +0200)]
perf/core: Fix the address filtering fix
The following recent commit:
c60f83b813e5 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset")
changes the address filtering logic to communicate filter ranges to the PMU driver
via a single address range object, instead of having the driver do the final bit of
math.
That change forgets to take into account kernel filters, which are not calculated
the same way as DSO based filters.
Fix that by passing the kernel filters the same way as file-based filters.
This doesn't require any additional changes in the drivers.
Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: c60f83b813e5 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset") Link: https://lkml.kernel.org/r/20190329091212.29870-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
kprobes: Fix error check when reusing optimized probes
The following commit introduced a bug in one of our error paths:
819319fc9346 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()")
it missed to handle the return value of kprobe_optready() as
error-value. In reality, the kprobe_optready() returns a bool
result, so "true" case must be passed instead of 0.
This causes some errors on kprobe boot-time selftests on ARM:
[ ] Beginning kprobe tests...
[ ] Probe ARM code
[ ] kprobe
[ ] kretprobe
[ ] ARM instruction simulation
[ ] Check decoding tables
[ ] Run test cases
[ ] FAIL: test_case_handler not run
[ ] FAIL: Test andge r10, r11, r14, asr r7
[ ] FAIL: Scenario 11
...
[ ] FAIL: Scenario 7
[ ] Total instruction simulation tests=1631, pass=1433 fail=198
[ ] kprobe tests failed
This can happen if an optimized probe is unregistered and next
kprobe is registered on same address until the previous probe
is not reclaimed.
If this happens, a hidden aggregated probe may be kept in memory,
and no new kprobe can probe same address. Also, in that case
register_kprobe() will return "1" instead of minus error value,
which can mislead caller logic.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v5.0+ Fixes: 819319fc9346 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()") Link: http://lkml.kernel.org/r/155530808559.32517.539898325433642204.stgit@devnote2 Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"I debated holding this back for the v5.2 merge window due to the size
of the "zero-key" changes, but affected users would benefit from
having the fixes sooner. It did not make sense to change the zero-key
semantic in isolation for the "secure-erase" command, but instead
include it for all security commands.
The short background on the need for these changes is that some NVDIMM
platforms enable security with a default zero-key rather than let the
OS specify the initial key. This makes the security enabling that
landed in v5.0 unusable for some users.
Summary:
- Compatibility fix for nvdimm-security implementations with a
default zero-key.
- Miscellaneous small fixes for out-of-bound accesses, cleanup after
initialization failures, and missing debug messages"
* tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: Retain security state after overwrite
libnvdimm/pmem: fix a possible OOB access when read and write pmem
libnvdimm/security, acpi/nfit: unify zero-key for all security commands
libnvdimm/security: provide fix for secure-erase to use zero-key
libnvdimm/btt: Fix a kmemdup failure check
libnvdimm/namespace: Fix a potential NULL pointer dereference
acpi/nfit: Always dump _DSM output payload
Merge tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull fsdax fix from Dan Williams:
"A single filesystem-dax fix. It has been lingering in -next for a long
while and there are no other fsdax fixes on the horizon:
- Avoid a crash scenario with architectures like powerpc that require
'pgtable_deposit' for the zero page"
* tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
fs/dax: Deposit pagetable even when installing zero page
Jann Horn reported that he can overflow the page ref count with
sufficient memory (and a filesystem that is intentionally extremely
slow).
Admittedly it's not exactly easy. To have more than four billion
references to a page requires a minimum of 32GB of kernel memory just
for the pointers to the pages, much less any metadata to keep track of
those pointers. Jann needed a total of 140GB of memory and a specially
crafted filesystem that leaves all reads pending (in order to not ever
free the page references and just keep adding more).
Still, we have a fairly straightforward way to limit the two obvious
user-controllable sources of page references: direct-IO like page
references gotten through get_user_pages(), and the splice pipe page
duplication. So let's just do that.
* branch page-refs:
fs: prevent page refcount overflow in pipe_buf_get
mm: prevent get_user_pages() from overflowing page refcount
mm: add 'try_get_page()' helper function
mm: make page ref count overflow check tighter and more explicit
Matthew Wilcox [Fri, 5 Apr 2019 21:02:10 +0000 (14:02 -0700)]
fs: prevent page refcount overflow in pipe_buf_get
Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount. All
callers converted to handle a failure.
mm: prevent get_user_pages() from overflowing page refcount
If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it. One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times. This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.
This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe". It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).
Also like 'get_page()', you can't use this function unless you already
had a reference to the page. The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.
The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.
NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).
mm: make page ref count overflow check tighter and more explicit
We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.
That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).
Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.
Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Set of fixes that should go into this round. This pull is larger than
I'd like at this time, but there's really no specific reason for that.
Some are fixes for issues that went into this merge window, others are
not. Anyway, this contains:
- Hardware queue limiting for virtio-blk/scsi (Dongli)
- Multi-page bvec fixes for lightnvm pblk
- Multi-bio dio error fix (Jason)
- Remove the cache hint from the io_uring tool side, since we didn't
move forward with that (me)
- Make io_uring SETUP_SQPOLL root restricted (me)
- Fix leak of page in error handling for pc requests (Jérôme)
- Fix BFQ regression introduced in this merge window (Paolo)
- Fix break logic for bio segment iteration (Ming)
- Fix NVMe cancel request error handling (Ming)
- NVMe pull request with two fixes (Christoph):
- fix the initial CSN for nvme-fc (James)
- handle log page offsets properly in the target (Keith)"
* tag 'for-linus-20190412' of git://git.kernel.dk/linux-block:
block: fix the return errno for direct IO
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error
block: do not leak memory in bio_copy_user_iov()
lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs
nvme: cancel request synchronously
blk-mq: introduce blk_mq_complete_request_sync()
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
virtio-blk: limit number of hw queues by nr_cpu_ids
block, bfq: fix use after free in bfq_bfqq_expire
io_uring: restrict IORING_SETUP_SQPOLL to root
tools/io_uring: remove IOCQE_FLAG_CACHEHIT
block: don't use for-inside-for in bio_for_each_segment_all
Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fix:
- Fix a deadlock in close() due to incorrect draining of RDMA queues
Bugfixes:
- Revert "SUNRPC: Micro-optimise when the task is known not to be
sleeping" as it is causing stack overflows
- Fix a regression where NFSv4 getacl and fs_locations stopped
working
- Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
- Fix xfstests failures due to incorrect copy_file_range() return
values"
* tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
NFSv4.1 fix incorrect return value in copy_file_range
xprtrdma: Fix helper that drains the transport
NFS: Fix handling of reply page vector
NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Here's more than a handful of clk driver fixes for changes that came
in during the merge window:
- Fix the AT91 sama5d2 programmable clk prescaler formula
- A bunch of Amlogic meson clk driver fixes for the VPU clks
- A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc
clks as critical only when really needed
- Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
implementation
- Use the right structure to test for a frequency table in i.MX's
PLL_1416x driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: imx: Fix PLL_1416X not rounding rates
clk: mediatek: fix clk-gate flag setting
platform/x86: pmc_atom: Drop __initconst on dmi table
clk: x86: Add system specific quirk to mark clocks as critical
clk: meson: vid-pll-div: remove warning and return 0 on invalid config
clk: meson: pll: fix rounding and setting a rate that matches precisely
clk: meson-g12a: fix VPU clock parents
clk: meson: g12a: fix VPU clock muxes mask
clk: meson-gxbb: round the vdec dividers to closest
clk: at91: fix programmable clock for sama5d2
Merge tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Add a DMA alias quirk for another Marvell SATA device (Andre
Przywara)
- Fix a pciehp regression that broke safe removal of devices (Sergey
Miroshnichenko)
* tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: pciehp: Ignore Link State Changes after powering off a slot
PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
Merge tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A minor build fix for 64-bit FLATMEM configs.
A fix for a boot failure on 32-bit powermacs.
My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on
64-bit kernels, ie. compat mode, which is only used on big endian.
The rewrite of the SLB code we merged in 4.20 missed the fact that the
0x380 exception is also used with the Radix MMU to report out of range
accesses. This could lead to an oops if userspace tried to read from
addresses outside the user or kernel range.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas
Piggin"
* tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs
powerpc/64s/radix: Fix radix segment exception handling
powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
powerpc/32: Fix early boot failure with RTAS built-in
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main thing is a fix to our FUTEX_WAKE_OP implementation which was
unbelievably broken, but did actually work for the one scenario that
GLIBC used to use.
- Fix terminally broken implementation of FUTEX_WAKE_OP atomics"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
arm64: backtrace: Don't bother trying to unwind the userspace stack
arm64/ftrace: fix inadvertent BUG() in trampoline check
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix typos in user-visible resctrl parameters, and also fix assembly
constraint bugs that might result in miscompilation"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Use stricter assembly constraints in bitops
x86/resctrl: Fix typos in the mba_sc mount option
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a NULL pointer dereference crash in certain environments"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Six kernel side fixes: three related to NMI handling on AMD systems, a
race fix, a kexec initialization fix and a PEBS sampling fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix perf_event_disable_inatomic() race
x86/perf/amd: Remove need to check "running" bit in NMI handler
x86/perf/amd: Resolve NMI latency issues for active PMCs
x86/perf/amd: Resolve race condition when disabling PMC
perf/x86/intel: Initialize TFA MSR
perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Fixes a crash when accessing /proc/lockdep"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Zap lock classes even with lock debugging disabled
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Two genirq fixes, plus an irqchip driver error handling fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
irqchip/irq-ls1x: Missing error code in ls1x_intc_of_init()
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion
bug"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Add rewind_stack_do_exit() to the noreturn list
linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
Merge tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"Fix a sparc64 sun4v_pci regression introduced in this merged window,
and a dma-debug stracktrace regression from the big refactor last
merge window"
* tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: only skip one stackframe entry
sparc64/pci_sun4v: fix ATU checks for large DMA masks
Merge tag 'iommu-fix-v5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"Fix an AMD IOMMU issue where the driver didn't correctly setup the
exclusion range in the hardware registers, resulting in exclusion
ranges being one page too big.
This can cause data corruption of the address of that last page is
used by DMA operations"
* tag 'iommu-fix-v5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Set exclusion range correctly
Merge tag 'mmc-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- alcor: Stabilize data write requests
- sdhci-omap: Fix command error path during tuning
* tag 'mmc-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
mmc: alcor: don't write data before command has completed
Merge tag 'sound-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Well, this one became unpleasantly larger than previous pull requests,
but it's a kind of usual pattern: now it contains a collection of ASoC
fixes, and nothing to worry too much.
The fixes for ASoC core (DAPM, DPCM, topology) are all small and just
covering corner cases. The rest changes are driver-specific, many of
which are for x86 platforms and new drivers like STM32, in addition to
the usual fixups for HD-audio"
* tag 'sound-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (66 commits)
ASoC: wcd9335: Fix missing regmap requirement
ALSA: hda: Fix racy display power access
ASoC: pcm: fix error handling when try_module_get() fails.
ASoC: stm32: sai: fix master clock management
ASoC: Intel: kbl: fix wrong number of channels
ALSA: hda - Add two more machines to the power_save_blacklist
ASoC: pcm: update module refcount if module_get_upon_open is set
ASoC: core: conditionally increase module refcount on component open
ASoC: stm32: fix sai driver name initialisation
ASoC: topology: Use the correct dobj to free enum control values and texts
ALSA: seq: Fix OOB-reads from strlcpy
ASoC: intel: skylake: add remove() callback for component driver
ASoC: cs35l35: Disable regulators on driver removal
ALSA: xen-front: Do not use stream buffer size before it is set
ASoC: rockchip: pdm: change dma burst to 8
ASoC: rockchip: pdm: fix regmap_ops hang issue
ASoC: simple-card: don't select DPCM via simple-audio-card
ASoC: audio-graph-card: don't select DPCM via audio-graph-card
ASoC: tlv320aic32x4: Change author's name
ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
...
Merge tag 'acpi-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix an ACPICA issue introduced during the 4.20 development cycle and
causing some systems to crash because of leftover operation region
data still maintained after the operation region in question has gone
away (Erik Schmauss)"
* tag 'acpi-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: remove address node from global list after method termination
Merge tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Fixes across the driver spectrum this week, the mediatek fbdev support
might be a bit late for this round, but I looked over it and it's not
very large and seems like a useful feature for them.
Otherwise the main thing is a regression fix for i915 5.0 bug that
caused black screens on a bunch of Dell XPS 15s I think, I know at
least Fedora is waiting for this to land, and the udl fix is also for
a regression since 5.0 where unplugging the device would end badly.
core:
- make atomic hooks optional
i915:
- Revert a 5.0 regression where some eDP panels stopped working
- DSI related fixes for platforms up to IceLake
- GVT (regression fix, warning fix, use-after free fix)
amdgpu:
- Cursor fixes
- missing PCI ID fix for KFD
- XGMI fix
- shadow buffer handling after reset fix
udl:
- fix unplugging device crashes.
mediatek:
- stabilise MT2701 HDMI support
- fbdev support
* tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm: (32 commits)
gpu: host1x: Fix compile error when IOMMU API is not available
drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->size
drm/i915/dp: revert back to max link rate and lane count on eDP
drm/i915/icl: Fix port disable sequence for mipi-dsi
drm/i915/icl: Ungate ddi clocks before IO enable
drm/mediatek: no change parent rate in round_rate() for MT2701 hdmi phy
drm/mediatek: using new factor for tvdpll for MT2701 hdmi phy
drm/mediatek: remove flag CLK_SET_RATE_PARENT for MT2701 hdmi phy
drm/mediatek: make implementation of recalc_rate() for MT2701 hdmi phy
drm/mediatek: fix the rate and divder of hdmi phy for MT2701
drm/mediatek: fix possible object reference leak
drm/i915: Get power refs in encoder->get_power_domains()
drm/i915: Fix pipe_bpp readout for BXT/GLK DSI
drm/amd/display: Fix negative cursor pos programming (v2)
drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind
drm/udl: add a release method and delay modeset teardown
drm/i915/gvt: Prevent use-after-free in ppgtt_free_all_spt()
drm/i915/gvt: Annotate iomem usage
drm/sun4i: DW HDMI: Lower max. supported rate for H6
Revert "Documentation/gpu/meson: Remove link to meson_canvas.c"
...
Will Deacon [Mon, 8 Apr 2019 11:45:09 +0000 (12:45 +0100)]
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
explicitly set the return value on the non-faulting path and instead
leaves it holding the result of the underlying atomic operation. This
means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
value will be reported as having failed. Regrettably, I wrote the buggy
code back in 2011 and it was upstreamed as part of the initial arm64
support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call
behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the
futex to zero, and therefore worked as expected. From 2.25 onwards,
FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0
to indicate that the atomic operation completed successfully, or -EFAULT
if we encountered a fault when accessing the user mapping.
The exlcusion range limit register needs to contain the
base-address of the last page that is part of the range, as
bits 0-11 of this register are treated as 0xfff by the
hardware for comparisons.
So correctly set the exclusion range in the hardware to the
last page which is _in_ the range.
irq_work_run()
perf_pending_event()
if (@pending_disable)
perf_event_disable_local(); // WHOOPS
The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.
Adding a irq_work_sync() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.
Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Fri, 12 Apr 2019 03:39:22 +0000 (13:39 +1000)]
Merge tag 'drm-intel-fixes-2019-04-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Revert back to max link rate and lane count on eDP.
- DSI related fixes for all platforms including Ice Lake.
- GVT Fixes including one vGPU display plane size regression fix,
one for preventing use-after-free in ppgtt shadow free function,
and another warning fix for iomem access annotation.
Jason Yan [Fri, 12 Apr 2019 02:09:16 +0000 (10:09 +0800)]
block: fix the return errno for direct IO
If the last bio returned is not dio->bio, the status of the bio will
not assigned to dio->bio if it is error. This will cause the whole IO
status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0]
<idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0]
<idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>]
<idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>]
ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475]
ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we
will only check dio->bio.bi_status when we return the whole IO to
the upper layer.
Fixes: 542ff7bf18c6 ("block: new direct I/O implementation") Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge tag 'for-5.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix parsing of compression algorithm when set as a inode property,
this could end up with eg. 'zst' or 'zli' in the value
- don't allow trim on a filesystem with unreplayed log, this could
cause data loss if there are pending updates to the block groups that
would not be subject to trim after replay
* tag 'for-5.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prop: fix vanished compression property after failed set
btrfs: prop: fix zstd compression parameter validation
Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
Dave Airlie [Thu, 11 Apr 2019 20:55:20 +0000 (06:55 +1000)]
Merge tag 'drm-misc-fixes-2019-04-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- core: Make atomic_enable and disable optional for CRTC
- dw-hdmi: Lower max frequency for the Allwinner H6, SCDC configuration
improvements for older controller versions
- omap: a fix for the CEC clock management policy
The ability to optimise here relies on compiler being able to optimise
away tail calls to avoid stack overflows. Unfortunately, we are seeing
reports of problems, so let's just revert.
Reported-by: Daniel Mack <daniel@zonque.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Olga Kornievskaia [Thu, 11 Apr 2019 18:34:18 +0000 (14:34 -0400)]
NFSv4.1 fix incorrect return value in copy_file_range
According to the NFSv4.2 spec if the input and output file is the
same file, operation should fail with EINVAL. However, linux
copy_file_range() system call has no such restrictions. Therefore,
in such case let's return EOPNOTSUPP and allow VFS to fallback
to doing do_splice_direct(). Also when copy_file_range is called
on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support
COPY functionality), we also need to return EOPNOTSUPP and
fallback to a regular copy.
Fixes xfstest generic/075, generic/091, generic/112, generic/263
for all NFSv4.x versions.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Chuck Lever [Tue, 9 Apr 2019 14:44:16 +0000 (10:44 -0400)]
NFS: Fix handling of reply page vector
NFSv4 GETACL and FS_LOCATIONS requests stopped working in v5.1-rc.
These two need the extra padding to be added directly to the reply
length.
Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 02ef04e432ba ("NFS: Account for XDR pad of buf->pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tetsuo Handa [Sat, 30 Mar 2019 01:21:07 +0000 (10:21 +0900)]
NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
(which is embedded into user-visible "struct nfs_mount_data" structure)
despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
bytes of AF_INET6 address to rpc_sockaddr2uaddr().
Since "struct nfs_mount_data" structure is user-visible, we can't change
"struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
assuming that everybody is using AF_INET family when passing address via
"struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
With the previous value of 2, afu_dma_map_region was being omitted. I
suspect that the code paths have simply changed since the value of 2 was
chosen a decade ago, but it's also possible that it varies based on which
mapping function was used, compiler inlining choices, etc. In any case,
it's best to err on the side of skipping less.
Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Stephen Boyd [Thu, 11 Apr 2019 17:22:43 +0000 (10:22 -0700)]
platform/x86: pmc_atom: Drop __initconst on dmi table
It's used by probe and that isn't an init function. Drop this so that we
don't get a section mismatch.
Reported-by: kbuild test robot <lkp@intel.com> Cc: David Müller <dave.mueller@gmx.ch> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Fixes: 7c2e07130090 ("clk: x86: Add system specific quirk to mark clocks as critical") Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph:
"Two nvme fixes for 5.1 - fixing the initial CSN for nvme-fc, and handle
log page offsets properly in the target."
* 'nvme-5.1' of git://git.infradead.org/nvme:
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error
Keith Busch [Tue, 9 Apr 2019 16:03:59 +0000 (10:03 -0600)]
nvmet: fix discover log page when offsets are used
The nvme target hadn't been taking the Get Log Page offset parameter
into consideration, and so has been returning corrupted log pages when
offsets are used. Since many tools, including nvme-cli, split the log
request to 4k, we've been breaking discovery log responses when more
than 3 subsystems exist.
Fix the returned data by internally generating the entire discovery
log page and copying only the requested bytes into the user buffer. The
command log page offset type has been modified to a native __le64 to
make it easier to extract the value from a command.
Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Mon, 8 Apr 2019 18:15:19 +0000 (11:15 -0700)]
nvme-fc: correct csn initialization and increments on error
This patch fixes a long-standing bug that initialized the FC-NVME
cmnd iu CSN value to 1. Early FC-NVME specs had the connection starting
with CSN=1. By the time the spec reached approval, the language had
changed to state a connection should start with CSN=0. This patch
corrects the initialization value for FC-NVME connections.
Additionally, in reviewing the transport, the CSN value is assigned to
the new IU early in the start routine. It's possible that a later dma
map request may fail, causing the command to never be sent to the
controller. Change the location of the assignment so that it is
immediately prior to calling the lldd. Add a comment block to explain
the impacts if the lldd were to additionally fail sending the command.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Merge tag 'asoc-fix-v5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.1
A few core fixes along with the driver specific ones, mainly fixing
small issues that only affect x86 platforms for various reasons (their
unusual machine enumeration mechanisms mainly, plus a fix for error
handling in topology).
There's some of the driver fixes that look larger than they are, like
the hdmi-codec changes which resulted in an indentation change, and most
of the other large changes are for new drivers like the STM32 changes.
Faiz Abbas [Thu, 11 Apr 2019 08:59:37 +0000 (14:29 +0530)]
mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
commit 5b0d62108b46 ("mmc: sdhci-omap: Add platform specific reset
callback") skips data resets during tuning operation. Because of this,
a data error or data finish interrupt might still arrive after a command
error has been handled and the mrq ended. This ends up with a "mmc0: Got
data interrupt 0x00000002 even though no data operation was in progress"
error message.
Fix this by adding a platform specific callback for sdhci_irq. Mark the
mrq as a failure but wait for a data interrupt instead of calling
finish_mrq().
Fixes: 5b0d62108b46 ("mmc: sdhci-omap: Add platform specific reset
callback") Signed-off-by: Faiz Abbas <faiz_abbas@ti.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Stefan Agner [Wed, 10 Apr 2019 22:47:46 +0000 (00:47 +0200)]
gpu: host1x: Fix compile error when IOMMU API is not available
In case the IOMMU API is not available compiling host1x fails with
the following error:
In file included from drivers/gpu/host1x/hw/host1x06.c:27:
drivers/gpu/host1x/hw/channel_hw.c: In function ‘host1x_channel_set_streamid’:
drivers/gpu/host1x/hw/channel_hw.c:118:30: error: implicit declaration of function
‘dev_iommu_fwspec_get’; did you mean ‘iommu_fwspec_free’? [-Werror=implicit-function-declaration]
struct iommu_fwspec *spec = dev_iommu_fwspec_get(channel->dev->parent);
^~~~~~~~~~~~~~~~~~~~
iommu_fwspec_free
Fixes: de5469c21ff9 ("gpu: host1x: Program the channel stream ID") Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->size
When fb is tiled and fb->height isn't the multiple of tile's height,
the format fb->size = fb->stride * fb->height, will get a smaller size
than the actual size. As the memory height of tiled fb should be multiple
of tile's height.
Fixes: 7f1a93b1f1d1 ("drm/i915/gvt: Correct the calculation of plane size") Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
David Müller [Mon, 8 Apr 2019 13:33:54 +0000 (15:33 +0200)]
clk: x86: Add system specific quirk to mark clocks as critical
Since commit 648e921888ad ("clk: x86: Stop marking clocks as
CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are
unconditionally gated off. Unfortunately this will break systems where these
clocks are used for external purposes beyond the kernel's knowledge. Fix it
by implementing a system specific quirk to mark the necessary pmc_plt_clks as
critical.
Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Signed-off-by: David Müller <dave.mueller@gmx.ch> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Sergey Miroshnichenko [Tue, 12 Mar 2019 12:05:48 +0000 (15:05 +0300)]
PCI: pciehp: Ignore Link State Changes after powering off a slot
During a safe hot remove, the OS powers off the slot, which may cause a
Data Link Layer State Changed event. The slot has already been set to
OFF_STATE, so that event results in re-enabling the device, making it
impossible to safely remove it.
Clear out the Presence Detect Changed and Data Link Layer State Changed
events when the disabled slot has settled down.
It is still possible to re-enable the device if it remains in the slot
after pressing the Attention Button by pressing it again.
Fixes the problem that Micah reported below: an NVMe drive power button may
not actually turn off the drive.
Christoph Hellwig [Wed, 3 Apr 2019 19:34:34 +0000 (21:34 +0200)]
sparc64/pci_sun4v: fix ATU checks for large DMA masks
Now that we allow drivers to always need to set larger than required
DMA masks we need to be a little more careful in the sun4v PCI iommu
driver to chose when to select the ATU support - a larger DMA mask
can be set even when the platform does not support ATU, so we always
have to check if it is avaiable before using it. Add a little helper
for that and use it in all the places where we make ATU usage decisions
based on the DMA mask.
Fixes: 24132a419c68 ("sparc64/pci_sun4v: allow large DMA masks") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Meelis Roos <mroos@linux.ee> Acked-by: David S. Miller <davem@davemloft.net>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Several driver bug fixes posted in the last several weeks
- Several bug fixes for the hfi1 driver 'TID RDMA' functionality
merged into 5.1. Since TID RDMA is on by default these all seem to
be regressions.
- Wrong software permission checks on memory in mlx5
- Memory leak in vmw_pvrdma during driver remove
- Several bug fixes for hns driver features merged into 5.1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/hfi1: Do not flush send queue in the TID RDMA second leg
RDMA/hns: Bugfix for SCC hem free
RDMA/hns: Fix bug that caused srq creation to fail
RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove
IB/mlx5: Reset access mask when looping inside page fault handler
IB/hfi1: Fix the allocation of RSM table
IB/hfi1: Eliminate opcode tests on mr deref
IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state
IB/hfi1: Failed to drain send queue when QP is put into error state
Kaike Wan [Wed, 10 Apr 2019 13:27:05 +0000 (06:27 -0700)]
IB/hfi1: Do not flush send queue in the TID RDMA second leg
When a QP is put into error state, the send queue will be flushed.
This mechanism is implemented in both the first and the second leg
of the send engine. Since the second leg is only responsible for
data transactions in the KDETH space for the TID RDMA WRITE request,
it should not perform the flushing of the send queue.
This patch removes the flushing function of the second leg, but
still keeps the bailing out of the QP if it is put into error state.
Fixes: 70dcb2e3dc6a ("IB/hfi1: Add the TID second leg send packet builder") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Several fixes, add more reviewers to the list"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: Honour 'may_reduce_num' in vring_create_virtqueue
MAiNTAINERS: add Paolo, Stefan for virtio blk/scsi
virtio_pci: fix a NULL pointer reference in vp_del_vqs
Jani Nikula [Fri, 5 Apr 2019 07:52:20 +0000 (10:52 +0300)]
drm/i915/dp: revert back to max link rate and lane count on eDP
Commit 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast
and narrow") started to optize the eDP 1.4+ link config, both per spec
and as preparation for display stream compression support.
Sadly, we again face panels that flat out fail with parameters they
claim to support. Revert, and go back to the drawing board.
v2: Actually revert to max params instead of just wide-and-slow.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109959 Fixes: 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: "Lee, Shawn C" <shawn.c.lee@intel.com> Cc: Dave Airlie <airlied@gmail.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.0+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> Tested-by: Albert Astals Cid <aacid@kde.org> # v5.0 backport Tested-by: Emanuele Panigati <ilpanich@gmail.com> # v5.0 backport Tested-by: Matteo Iervasi <matteoiervasi@gmail.com> # v5.0 backport Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190405075220.9815-1-jani.nikula@intel.com
(cherry picked from commit f11cb1c19ad0563b3c1ea5eb16a6bac0e401f428) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Ming Lei [Mon, 8 Apr 2019 22:31:22 +0000 (06:31 +0800)]
nvme: cancel request synchronously
nvme_cancel_request() is used in error handler, and it is always
reliable to cancel request synchronously, and avoids possible race
in which request may be completed after real hw queue is destroyed.
One issue is reported by our customer on NVMe RDMA, in which freed ib
queue pair may be used in nvme_rdma_complete_rq().
Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Bart Van Assche <bvanassche@acm.org> Cc: James Smart <james.smart@broadcom.com> Cc: linux-nvme@lists.infradead.org Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Mon, 8 Apr 2019 22:31:21 +0000 (06:31 +0800)]
blk-mq: introduce blk_mq_complete_request_sync()
In NVMe's error handler, follows the typical steps of tearing down
hardware for recovering controller:
1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queues
However, there may be race between #3 and #4, because blk_mq_complete_request()
may run q->mq_ops->complete(rq) remotelly and asynchronously, and
->complete(rq) may be run after #4.
This patch introduces blk_mq_complete_request_sync() for fixing the
above race.
Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Bart Van Assche <bvanassche@acm.org> Cc: James Smart <james.smart@broadcom.com> Cc: linux-nvme@lists.infradead.org Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dongli Zhang [Wed, 27 Mar 2019 10:36:35 +0000 (18:36 +0800)]
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
When tag_set->nr_maps is 1, the block layer limits the number of hw queues
by nr_cpu_ids. No matter how many hw queues are used by virtio-scsi, as it
has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues.
In addition, specifically for pci scenario, when the 'num_queues' specified
by qemu is more than maxcpus, virtio-scsi would not be able to allocate
more than maxcpus vectors in order to have a vector for each queue. As a
result, it falls back into MSI-X with one vector for config and one shared
for queues.
Considering above reasons, this patch limits the number of hw queues used
by virtio-scsi by nr_cpu_ids.
Dongli Zhang [Wed, 27 Mar 2019 10:36:34 +0000 (18:36 +0800)]
virtio-blk: limit number of hw queues by nr_cpu_ids
When tag_set->nr_maps is 1, the block layer limits the number of hw queues
by nr_cpu_ids. No matter how many hw queues are used by virtio-blk, as it
has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues.
In addition, specifically for pci scenario, when the 'num-queues' specified
by qemu is more than maxcpus, virtio-blk would not be able to allocate more
than maxcpus vectors in order to have a vector for each queue. As a result,
it falls back into MSI-X with one vector for config and one shared for
queues.
Considering above reasons, this patch limits the number of hw queues used
by virtio-blk by nr_cpu_ids.
Paolo Valente [Wed, 10 Apr 2019 08:38:33 +0000 (10:38 +0200)]
block, bfq: fix use after free in bfq_bfqq_expire
The function bfq_bfqq_expire() invokes the function
__bfq_bfqq_expire(), and the latter may free the in-service bfq-queue.
If this happens, then no other instruction of bfq_bfqq_expire() must
be executed, or a use-after-free will occur.
Basing on the assumption that __bfq_bfqq_expire() invokes
bfq_put_queue() on the in-service bfq-queue exactly once, the queue is
assumed to be freed if its refcounter is equal to one right before
invoking __bfq_bfqq_expire().
But, since commit 9dee8b3b057e ("block, bfq: fix queue removal from
weights tree") this assumption is false. __bfq_bfqq_expire() may also
invoke bfq_weights_tree_remove() and, since commit 9dee8b3b057e
("block, bfq: fix queue removal from weights tree"), also
the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire()
may invoke bfq_put_queue() twice, and this is the actual case where
the in-service queue may happen to be freed.
To address this issue, this commit moves the check on the refcounter
of the queue right around the last bfq_put_queue() that may be invoked
on the queue.
Fixes: 9dee8b3b057e ("block, bfq: fix queue removal from weights tree") Reported-by: Dmitrii Tcvetkov <demfloro@demfloro.ru> Reported-by: Douglas Anderson <dianders@chromium.org> Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru> Tested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
snd_hdac_display_power() doesn't handle the concurrent calls carefully
enough, and it may lead to the doubly get_power or put_power calls,
when a runtime PM and an async work get called in racy way.
This patch addresses it by reusing the bus->lock mutex that has been
used for protecting the link state change in ext bus code, so that it
can protect against racy display state changes. The initialization of
bus->lock was moved from snd_hdac_ext_bus_init() to
snd_hdac_bus_init() as well accordingly.
Testcase: igt/i915_pm_rpm/module-reload #glk-dsi Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Andrei Vagin [Mon, 8 Apr 2019 04:15:42 +0000 (21:15 -0700)]
alarmtimer: Return correct remaining time
To calculate a remaining time, it's required to subtract the current time
from the expiration time. In alarm_timer_remaining() the arguments of
ktime_sub are swapped.
Fixes: d653d8457c76 ("alarmtimer: Implement remaining callback") Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190408041542.26338-1-avagin@gmail.com
locking/lockdep: Zap lock classes even with lock debugging disabled
The following commit:
a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use")
changed the behavior of lockdep_free_key_range() from
unconditionally zapping lock classes into only zapping lock classes if
debug_lock == true. Not zapping lock classes if debug_lock == false leaves
dangling pointers in several lockdep datastructures, e.g. lock_class::name
in the all_lock_classes list.
The shell command "cat /proc/lockdep" causes the kernel to iterate the
all_lock_classes list. Hence the "unable to handle kernel paging request" cash
that Shenghui encountered by running cat /proc/lockdep.
Since the new behavior can cause cat /proc/lockdep to crash, restore the
pre-v5.1 behavior.
This patch avoids that cat /proc/lockdep triggers the following crash
with debug_lock == false:
ASoC: pcm: fix error handling when try_module_get() fails.
Handle error before returning when try_module_get() fails
to prevent inconsistent mutex lock/unlock.
Fixes: 52034add7 (ASoC: pcm: update module refcount if
module_get_upon_open is set) Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
apparmor: Restore Y/N in /sys for apparmor's "enabled"
Before commit c5459b829b71 ("LSM: Plumb visibility into optional "enabled"
state"), /sys/module/apparmor/parameters/enabled would show "Y" or "N"
since it was using the "bool" handler. After being changed to "int",
this switched to "1" or "0", breaking the userspace AppArmor detection
of dbus-broker. This restores the Y/N output while keeping the LSM
infrastructure happy.
When master clock is used, master clock rate is set exclusively.
Parent clocks of master clock cannot be changed after a call to
clk_set_rate_exclusive(). So the parent clock of SAI kernel clock
must be set before.
Ensure also that exclusive rate operations are balanced
in STM32 SAI driver.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com> Signed-off-by: Mark Brown <broonie@kernel.org>
Lendacky, Thomas [Tue, 2 Apr 2019 15:21:18 +0000 (15:21 +0000)]
x86/perf/amd: Remove need to check "running" bit in NMI handler
Spurious interrupt support was added to perf in the following commit, almost
a decade ago:
63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")
The two previous patches (resolving the race condition when disabling a
PMC and NMI latency mitigation) allow for the removal of this older
spurious interrupt support.
Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
is cleared before disabling the PMC, which sets up a race condition. This
race condition was mitigated by introducing the running bitmap. That race
condition can be eliminated by first disabling the PMC, waiting for PMC
reset on overflow and then clearing the bit for the PMC in the active_mask
bitmap. The NMI handler will not re-enable a disabled counter.
If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
mitigation support will guard against any unhandled NMI messages.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
Michael Ellerman [Tue, 9 Apr 2019 05:43:11 +0000 (15:43 +1000)]
powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs
The recent commit 8bc086899816 ("powerpc/mm: Only define
MAX_PHYSMEM_BITS in SPARSEMEM configurations") removed our definition
of MAX_PHYSMEM_BITS when SPARSEMEM is disabled.
This inadvertently broke some 64-bit FLATMEM using configs with eg:
arch/powerpc/include/asm/book3s/64/mmu-hash.h:584:6: error: "MAX_PHYSMEM_BITS" is not defined, evaluates to 0
#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
^~~~~~~~~~~~~~~~
Fix it by making sure we define MAX_PHYSMEM_BITS for all 64-bit
configs regardless of SPARSEMEM.
Fixes: 8bc086899816 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>