Santosh Shilimkar [Mon, 10 Aug 2015 20:46:40 +0000 (13:46 -0700)]
Merge branch 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek:
efi: Disable secure boot if shim is in insecure mode
hibernate: Disable in a signed modules environment
efi: Add EFI_SECURE_BOOT bit
Add option to automatically set securelevel when in Secure Boot mode
asus-wmi: Restrict debugfs interface when securelevel is set
x86: Restrict MSR access when securelevel is set
uswsusp: Disable when securelevel is set
kexec: Disable at runtime if securelevel has been set.
acpi: Ignore acpi_rsdp kernel parameter when securelevel is set
acpi: Limit access to custom_method if securelevel is set
Restrict /dev/mem and /dev/kmem when securelevel is set.
x86: Lock down IO port access when securelevel is enabled
PCI: Lock down BAR access when securelevel is enabled
Enforce module signatures when securelevel is greater than 0
Add BSD-style securelevel support
MODSIGN: Support not importing certs from db
MODSIGN: Import certificates from UEFI Secure Boot
MODSIGN: Add module certificate blacklist keyring
Add an EFI signature blob parser and key loader.
Add EFI signature data types
A user can manually tell the shim boot loader to disable validation of
images it loads. When a user does this, it creates a UEFI variable called
MokSBState that does not have the runtime attribute set. Given that the
user explicitly disabled validation, we can honor that and not enable
secure boot mode if that variable is set.
There is currently no way to verify the resume image when returning
from hibernate. This might compromise the signed modules trust model,
so until we can work with signed hibernate images we disable it in
a secure modules environment.
UEFI Secure Boot provides a mechanism for ensuring that the firmware will
only load signed bootloaders and kernels. Certain use cases may also
require that the kernel prevent userspace from inserting untrusted kernel
code at runtime. Add a configuration option that enforces this automatically
when enabled.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
We have no way of validating what all of the Asus WMI methods do on a
given machine, and there's a risk that some will allow hardware state to
be manipulated in such a way that arbitrary code can be executed in the
kernel. Prevent that if securelevel is set.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Permitting write access to MSRs allows userspace to modify the running
kernel. Prevent this if securelevel has been set. Based on a patch by Kees
Cook.
uswsusp allows a user process to dump and then restore kernel state, which
makes it possible to modify the running kernel. Disable this if securelevel
has been set.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
kexec permits the loading and execution of arbitrary code in ring 0, which
permits the modification of the running kernel. Prevent this if securelevel
has been set.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
This option allows userspace to pass the RSDP address to the kernel, which
makes it possible for a user to execute arbitrary code in the kernel.
Disable this when securelevel is set.
custom_method effectively allows arbitrary access to system memory, making
it possible for an attacker to modify the kernel at runtime. Prevent this
if securelevel has been set.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Allowing users to write to address space provides mechanisms that may permit
modification of the kernel at runtime. Prevent this if securelevel has been
set.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
IO port access would permit users to gain access to PCI configuration
registers, which in turn (on a lot of hardware) give access to MMIO register
space. This would potentially permit root to trigger arbitrary DMA, so lock
it down when securelevel is set.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Any hardware that can potentially generate DMA has to be locked down from
userspace in order to avoid it being possible for an attacker to modify
kernel code. This should be prevented if securelevel has been set. Default
to paranoid - in future we can potentially relax this for sufficiently
IOMMU-isolated devices.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
If a user tells shim to not use the certs/hashes in the UEFI db variable
for verification purposes, shim will set a UEFI variable called MokIgnoreDB.
Have the uefi import code look for this and not import things from the db
variable.
Secure Boot stores a list of allowed certificates in the 'db' variable.
This imports those certificates into the module signing keyring. This
allows for a third party signing certificate to be used in conjunction
with signed modules. By importing the public certificate into the 'db'
variable, a user can allow a module signed with that certificate to
load. The shim UEFI bootloader has a similar certificate list stored
in the 'MokListRT' variable. We import those as well.
In the opposite case, Secure Boot maintains a list of disallowed
certificates in the 'dbx' variable. We load those certificates into
the newly introduced module blacklist keyring and forbid any module
signed with those from loading.
Signed-off-by: Josh Boyer <jwboyer at fedoraproject.org> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
This adds an additional keyring that is used to store certificates that
are blacklisted. This keyring is searched first when loading signed modules
and if the module's certificate is found, it will refuse to load. This is
useful in cases where third party certificates are used for module signing.
Keith Busch [Thu, 6 Aug 2015 18:13:50 +0000 (11:13 -0700)]
NVMe: Fix filesystem deadlock on removal
Move gendisk deletion before controller shutdown so filesystem may sync
dirty pages. Before, this would deadlock trying to allocate requests
on frozen queues that are about to be deleted.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:
Keith Busch [Thu, 6 Aug 2015 18:07:24 +0000 (11:07 -0700)]
NVMe: Failed controller initialization fixes
This fixes an infinite device reset loop that may occur on devices that
fail initialization. If the drive fails to become ready for any reason
that does not involve an admin command timeout, the probe task should
assume the drive is unavailable and remove it from the topology. In
the case an admin command times out during device probing, the driver's
existing reset action will handle removing the drive.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit de3eff2bad56f0a29d3915105223d368f2bbc94e)
Santosh Shilimkar [Thu, 6 Aug 2015 21:48:39 +0000 (14:48 -0700)]
NVMe: Unify controller probe and resume
This unifies probe and resume so they both may be scheduled in the same
way. This is necessary for error handling that may occur during device
initialization since the task to cleanup the device wouldn't be able to
run if it is blocked on device initialization.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:
Keith Busch [Thu, 6 Aug 2015 21:43:00 +0000 (14:43 -0700)]
NVMe: Automatic namespace rescan
Namespaces may be dynamically allocated and deleted or attached and
detached. This has the driver rescan the device for namespace changes
after each device reset or namespace change asynchronous event.
There could potentially be many detached namespaces that we don't want
polluting /dev/ with unusable block handles, so this will delete disks
if the namespace is not active as indicated by the response from identify
namespace. This also skips adding the disk if no capacity is provisioned
to the namespace in the first place.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Keith Busch [Thu, 6 Aug 2015 17:37:53 +0000 (10:37 -0700)]
NVMe: Fix device cleanup on initialization failure
Don't release block queue and tagging resoureces if the driver never
got them in the first place. This can happen if the controller fails to
become ready, if memory wasn't available to allocate a tagset or admin
queue, or if the resources were released as part of error recovery.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4af0e21caf8e676e57ea27d7cce3426e473e498c)
This adds a sysfs entry that when written to will reset perform an NVMe
controller reset if the controller was successfully initialized in the
first place.
This also adds locking around resetting the device in the async probe
method so the driver can't schedule two resets.
Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Brandon Schultz <brandon.schulz@hgst.com> Cc: David Sariel <david.sariel@pmcs.com>
Updated by Jens to:
1) Merge this with the ioctl reset patch from David Sariel. The ioctl
path now shares the reset code from the sysfs path.
Santosh Shilimkar [Tue, 4 Aug 2015 20:31:18 +0000 (13:31 -0700)]
Merge branch 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek: (195 commits)
dtrace: accomodate changes in the 4.1 kernel for sparc64
dtrace: implement dtrace_handle_badaddr() for x86
dtrace: ignore any and all PFs during NOFAULT memory acceses
dtrace: do not allocate space for trampolines when probec = 0
dtrace: convert from sdt_instr_t to asm_instr_t 2of2
dtrace: convert from sdt_instr_t to asm_instr_t 1of2
dtrace: allocate space for SDT trampolines using module_alloc
dtrace: accomodate changes in the 4.1 kernels
kallsyms: fix /proc/kallmodsyms to not be misled by const variables
kallsyms: fix /proc/kallmodsyms to not be misled by external symbols
wait: change waitfd() to use wait4(), not waitid(); reduce invasiveness
dtrace: use a nonzero reference count on the fake module
dtrace: percpu: move from __get_cpu_var() to this_cpu_ptr()
dtrace: x86: Cater for new instruction size limit in instruction decoder
mm: memcontrol: adjust prototype to allow for poll_wait_fixed() changes.
dtrace: zero-initialize the fake vmlinux module's pdata space
dtrace: remove obsolete function
dtrace: make it possible to call do_sigaltstack()
dtrace: do not vmalloc/vfree from probe context
dtrace: fix dtrace_sdt.sh for UEK4
...
Santosh Shilimkar [Tue, 4 Aug 2015 20:30:31 +0000 (13:30 -0700)]
Merge branch 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek: (272 commits)
Revert "block: loop: convert to per-device workqueue"
Revert "block: loop: avoiding too many pending per work I/O"
Linux 4.1.4
x86/mpx: Do not set ->vm_ops on MPX VMAs
mm: avoid setting up anonymous pages into file mapping
Fix firmware loader uevent buffer NULL pointer dereference
hpfs: hpfs_error: Remove static buffer, use vsprintf extension %pV instead
hpfs: kstrdup() out of memory handling
ARM: 8397/1: fix vdsomunge not to depend on glibc specific error.h
ARM: 8393/1: smp: Fix suspicious RCU usage with ipi tracepoints
perf bench numa: Fix to show proper convergence stats
arm64: Don't report clear pmds and puds as huge
arm64: bpf: fix endianness conversion bugs
arm64: bpf: fix out-of-bounds read in bpf2a64_offset()
ARM64: smp: Fix suspicious RCU usage with ipi tracepoints
p9_client_write(): avoid double p9_free_req()
EDAC, octeon: Fix broken build due to model helper renames
ARM: dove: fix legacy dove IRQ numbers
agp/intel: Fix typo in needs_ilk_vtd_wa()
rbd: use GFP_NOIO in rbd_obj_request_create()
...
MPX setups private anonymous mapping, but uses vma->vm_ops too.
This can confuse core VM, as it relies on vm->vm_ops to
distinguish file VMAs from anonymous.
As result we will get SIGBUS, because handle_pte_fault() thinks
it's file VMA without vm_ops->fault and it doesn't know how to
handle the situation properly.
Let's fix that by not setting ->vm_ops.
We don't really need ->vm_ops here: MPX VMA can be detected with
VM_MPX flag. And vma_merge() will not merge MPX VMA with non-MPX
VMA, because ->vm_flags won't match.
The only thing left is name of VMA. I'm not sure if it's part of
ABI, or we can just drop it. The patch keep it by providing
arch_vma_name() on x86.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@sr71.net Link: http://lkml.kernel.org/r/20150720212958.305CC3E9@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().
Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.
For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.
The firmware class uevent function accessed the "fw_priv->buf" buffer
without the proper locking and testing for NULL. This is an old bug
(looks like it goes back to 2012 and commit 1244691c73b2: "firmware
loader: introduce firmware_buf"), but for some reason it's triggering
only now in 4.2-rc1.
Shuah Khan is trying to bisect what it is that causes this to trigger
more easily, but in the meantime let's just fix the bug since others are
hitting it too (at least Ingo reports having seen it as well).
Reported-and-tested-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the host toolchain is not glibc based then the arm kernel build
fails with
arch/arm/vdso/vdsomunge.c:53:19: fatal error: error.h: No such file or directory
error.h is a glibc only header (ie not available in musl, newlib and
bsd libcs). Changed the error reporting to standard conforming code
to avoid depending on specific C implementations.
Signed-off-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Fixes: 8512287a8165 ("ARM: 8330/1: add VDSO user-space code") Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.
At this point in the IPI handling path we haven't called
irq_enter() yet, so RCU doesn't know that we're about to exit
idle and properly warns that we're using RCU from an idle CPU.
Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so
that RCU is informed about our exit from idle.
Fixes: 365ec7b17327 ("ARM: add IPI tracepoints") Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit: e1e455f4f4d3 (perf tools: Work around lack of sched_getcpu
in glibc < 2.6), perf_bench numa mem with -c or -m option is not able to
correctly calculate convergence.
With the above commit, sched_getcpu always seems to return -1. The
intention of commit e1e455f was to add a sched_getcpu in glibc < 2.6.
Hence keep the sched_getcpu definition under an ifdef.
This regression happened occurred between v4.0 and v4.1
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vinson Lee <vlee@twitter.com> Fixes: e1e455f4f4d3 ("perf tools: Work around lack of sched_getcpu in glibc < 2.6") Link: http://lkml.kernel.org/r/20150624111004.GA5220@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current pmd_huge() and pud_huge() functions simply check if the table
bit is not set and reports the entries as huge in that case. This is
counter-intuitive as a clear pmd/pud cannot also be a huge pmd/pud, and
it is inconsistent with at least arm and x86.
To prevent others from making the same mistake as me in looking at code
that calls these functions and to fix an issue with KVM on arm64 that
causes memory corruption due to incorrect page reference counting
resulting from this mistake, let's change the behavior.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Steve Capper <steve.capper@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Fixes: 084bd29810a5 ("ARM64: mm: HugeTLB support.") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Problems occur when bpf_to or bpf_from has value prog->len - 1 (e.g.,
"Very long jump backwards" in test_bpf where the last instruction is a
jump): since ctx->offset has length prog->len, ctx->offset[bpf_to + 1]
or ctx->offset[bpf_from + 1] will cause an out-of-bounds read, leading
to a bogus jump offset and kernel panic.
This patch moves updating ctx->offset to after calling build_insn(),
and changes indexing to use bpf_to and bpf_from without + 1.
Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Cc: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
John Stultz reported an RCU splat on ARM with ipi trace events
enabled. It looks like the same problem exists on ARM64.
At this point in the IPI handling path we haven't called
irq_enter() yet, so RCU doesn't know that we're about to exit
idle and properly warns that we're using RCU from an idle CPU.
Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so
that RCU is informed about our exit from idle.
Cc: John Stultz <john.stultz@linaro.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Fixes: 45ed695ac10a ("ARM64: add IPI tracepoints") Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
v3.18 changed handle_IRQ() to call __handle_domain_irq(), which now
rejects attempts to deliver IRQ0. Since IRQ 0 is used as the timer
interrupt (just like the PIT on x86), this causes boot to fail as the
bogomips calibration never completes.
Fix this by shuffling all interrupts up by one.
Fixes: a71b092a9c68 ("ARM: Convert handle_IRQ to use __handle_domain_irq") Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In needs_ilk_vtd_wa(), we pass in the GPU device but compared it against
the ids for the mobile GPU and the mobile host bridge. That latter is
impossible and so likely was just a typo for the desktop GPU device id
(which is also buggy).
drm/i915: Disable WC PTE updates to w/a buggy IOMMU on ILK
Reported-by: Ting-Wei Lan <lantw44@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91127
References: https://bugzilla.freedesktop.org/show_bug.cgi?id=60391 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rbd_obj_request_create() is called on the main I/O path, so we need to
use GFP_NOIO to make sure allocation doesn't blow back on us. Not all
callers need this, but I'm still hardcoding the flag inside rather than
making it a parameter because a) this is going to stable, and b) those
callers shouldn't really use rbd_obj_request_create() and will be fixed
in the future.
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A ds can be associated with more than one mirror, but we currently skip
setting a mirror's credentials if we find that it's already set up with
a connected client.
The upshot is that we can end up sending DS writes with MDS credentials
instead of properly setting them up. Fix nfs4_ff_layout_prepare_ds to
always verify that the mirror's credentials are set up, even when we
have a DS that's already connected.
Reported-by: Tom Haynes <thomas.haynes@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we have two tasks racing to update a mirror's credentials, then they
can end up leaking one (or more) sets of credentials. The first task
will set mirror->cred and then the second task will just overwrite it.
Use a cmpxchg to ensure that the creds are only set once. If we get to
the point where we would set mirror->cred and find that they're already
set, then we just release the creds that were just found.
If a write attempt fails, and the write is queued up for resending to
the server, as opposed to being dropped, then we need to set the
appropriate flag so that nfs_file_fsync() does the right thing.
Problem: When an operation like WRITE receives a BAD_STATEID, even though
recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
the open state, because of clearing delegation state for the associated
inode, nfs_inode_find_state_and_recover() gets called and it makes the
same state with RECLAIM_NOGRACE flag again. As a results, when we restart
looking over the open states, we end up in the infinite loop instead of
breaking out in the next test of state flags.
Solution: unset the RECLAIM_NOGRACE set because of
calling of nfs_inode_find_state_and_recover() after returning from calling
recover_open() function.
When encoding the NFSACL SETACL operation, reserve just the estimated
size of the ACL rather than a fixed maximum. This eliminates needless
zero padding on the wire that the server ignores.
Fixes: ee5dc7732bd5 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pnfs_do_write() expects the call to pnfs_write_through_mds() to free the
pgio header and to release the layout segment before exiting. The problem
is that nfs_pgio_data_destroy() doesn't actually do this; it only frees
the memory allocated by nfs_generic_pgio().
Ditto for pnfs_do_read()...
Fix in both cases is to add a call to hdr->release(hdr).
Since the parent rate has been recalculated, pixel RCG clock
should rely on it to find the correct M/N values during set_rate,
instead of calling __clk_round_rate() to its parent again.
Signed-off-by: Hai Li <hali@codeaurora.org> Tested-by: Archit Taneja <architt@codeaurora.org> Fixes: 99cbd064b059 ("clk: qcom: Support display RCG clocks")
[sboyd@codeaurora.org: Silenced unused parent variable warning] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_clk_get_from_provider() returns ERR_PTR on failure. The
dra7-atl-clock driver was not checking its return value and
immediately used it in __clk_get_hw(). __clk_get_hw()
dereferences supplied clock, if it is not NULL, so in that case
it would dereference an ERR_PTR.
Fixes: 9ac33b0ce81f ("CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
key/value pairs in a JSON object must be separated by a comma.
After adding the properties "accuracy" and "phase" the JSON output
of /sys/kernel/debug/clk/clk_dump is invalid.
So add the missing commas to fix it.
Fixes: 5279fc402ae5 ("clk: add clk accuracy retrieval support") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
[sboyd@codeaurora.org: Added comment in function] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/leds/leds-gpio.c: In function ‘gpio_leds_create’:
drivers/leds/leds-gpio.c:187: error: implicit declaration of function ‘devm_get_gpiod_from_child’
drivers/leds/leds-gpio.c:187: warning: assignment makes pointer from integer without a cast
Add dummies for fwnode_get_named_gpiod() and devm_get_gpiod_from_child()
for the !GPIOLIB case to fix this.
The omap watchdog has the annoying behaviour that writes to most
registers don't have any effect when the watchdog is already running.
Quoting the AM335x reference manual:
To modify the timer counter value (the WDT_WCRR register),
prescaler ratio (the WDT_WCLR[4:2] PTV bit field), delay
configuration value (the WDT_WDLY[31:0] DLY_VALUE bit field), or
the load value (the WDT_WLDR[31:0] TIMER_LOAD bit field), the
watchdog timer must be disabled by using the start/stop sequence
(the WDT_WSPR register).
Currently the timer is stopped in the .probe callback but still there
are possibilities that yield to a situation where omap_wdt_start is
entered with the timer running (e.g. when /dev/watchdog is closed
without stopping and then reopened). In such a case programming the
timeout silently fails!
To circumvent this stop the timer before reprogramming.
Assuming one of the first things the watchdog user does is setting the
timeout explicitly nothing too bad should happen because this explicit
setting works fine.
If jffs2 can deadlock on overlayfs readdir because it takes the same lock
on ->iterate() as in ->lookup().
Fix by moving whiteout checking outside iterate_dir(). Optimized by
collecting potential whiteouts (DT_CHR) in a temporary list and if
non-empty iterating throug these and checking for a 0/0 chardev.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Fixes: 49c21e1cacd7 ("ovl: check whiteout while reading directory") Reported-by: Roman Yeryomin <leroi.lists@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit fix kernel crash when probing for rfkill devices in dell-laptop
driver failed. Function free_page() was incorrectly used on struct page *
instead of virtual address of SMI buffer.
This commit also simplify allocating page for SMI buffer by using
__get_free_page() function instead of sequential call of functions
alloc_page() and page_address().
Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When kzalloc() is called under spin_lock(), GFP_ATOMIC should be
used to avoid sleeping allocation.
The call tree is:
of_pci_range_to_resource()
--> pci_register_io_range() <-- takes spin_lock(&io_range_lock);
--> kzalloc()
Signed-off-by: Jingoo Han <jingoohan1@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a several year old regression that I found while trying
to get the Yoga 3 11 to work. The ideapad_rfk_set function is meant
to send a command to the embedded controller through ACPI, but
as of c1f73658ed, it sends the index of the rfkill device instead
of the command, and ignores the opcode field.
This changes it back to the original behavior, which indeed
flips the rfkill state as seen in the debugfs interface.
Whilst testing cpu hotplug events on kernel configured with
DEBUG_PREEMPT and DEBUG_ATOMIC_SLEEP we get following BUG message,
caused by calling request_irq() and free_irq() in the context of
hotplug notification (which is in this case atomic context).
[ 40.785859] CPU1: Software reset
[ 40.786660] BUG: sleeping function called from invalid context at mm/slub.c:1241
[ 40.786668] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/1
[ 40.786678] Preemption disabled at:[< (null)>] (null)
[ 40.786681]
[ 40.786692] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc4-00024-g7dca860 #36
[ 40.786698] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 40.786728] [<c0014a00>] (unwind_backtrace) from [<c0011980>] (show_stack+0x10/0x14)
[ 40.786747] [<c0011980>] (show_stack) from [<c0449ba0>] (dump_stack+0x70/0xbc)
[ 40.786767] [<c0449ba0>] (dump_stack) from [<c00c6124>] (kmem_cache_alloc+0xd8/0x170)
[ 40.786785] [<c00c6124>] (kmem_cache_alloc) from [<c005d6f8>] (request_threaded_irq+0x64/0x128)
[ 40.786804] [<c005d6f8>] (request_threaded_irq) from [<c0350b8c>] (exynos4_local_timer_setup+0xc0/0x13c)
[ 40.786820] [<c0350b8c>] (exynos4_local_timer_setup) from [<c0350ca8>] (exynos4_mct_cpu_notify+0x30/0xa8)
[ 40.786838] [<c0350ca8>] (exynos4_mct_cpu_notify) from [<c003b330>] (notifier_call_chain+0x44/0x84)
[ 40.786857] [<c003b330>] (notifier_call_chain) from [<c0022fd4>] (__cpu_notify+0x28/0x44)
[ 40.786873] [<c0022fd4>] (__cpu_notify) from [<c0013714>] (secondary_start_kernel+0xec/0x150)
[ 40.786886] [<c0013714>] (secondary_start_kernel) from [<40008764>] (0x40008764)
Interrupts cannot be requested/freed in the CPU_STARTING/CPU_DYING
notifications which run on the hotplugged cpu with interrupts and
preemption disabled.
To avoid the issue, request the interrupts for all possible cpus in
the boot code. The interrupts are marked NO_AUTOENABLE to avoid a racy
request_irq/disable_irq() sequence. The flag prevents the
request_irq() code from enabling the interrupt immediately.
The interrupt is then enabled in the CPU_STARTING notifier of the
hotplugged cpu and again disabled with disable_irq_nosync() in the
CPU_DYING notifier.
[ tglx: Massaged changelog to match the patch ]
Fixes: 7114cd749a12 ("clocksource: exynos_mct: use (request/free)_irq calls for local timer registration") Reported-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Tested-by: Marcin Jabrzyk <m.jabrzyk@samsung.com> Signed-off-by: Damian Eppel <d.eppel@samsung.com> Cc: m.szyprowski@samsung.com Cc: kyungmin.park@samsung.com Cc: daniel.lezcano@linaro.org Cc: kgene@kernel.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1435324984-7328-1-git-send-email-d.eppel@samsung.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the VLAN_HLEN was added to the calculation for the maximum frame size
there seems to have been a number of issues added to the driver.
The first issue is that in some cases the maximum frame size for a device
never really reached the actual maximum frame size as the VLAN header
length was not included the calculation for that value. As a result some
parts only supported a maximum frame size of either 1496 in the case of
parts that didn't support jumbo frames, and 8996 in the case of the parts
that do.
The second issue is the fact that there were several checks that weren't
updated so as a result setting an MTU of 1500 was treated as enabling jumbo
frames as the calculated value was 1522 instead of 1518. I have addressed
those by replacing ETH_FRAME_LEN with VLAN_ETH_FRAME_LEN where appropriate.
The final issue was the fact that lowering the MTU below 1500 would cause
the driver to allocate 2K buffers for the rings. This is an old issue that
was fixed several years ago in igb/ixgbe and I am addressing now by just
replacing == with a <= so that we always just round up to 1522 for anything
that isn't a jumbo frame.
Fixes: c751a3d58cf2d ("e1000e: Correctly include VLAN_HLEN when changing interface MTU") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was a possible race between
ieee80211_reconfig() and
ieee80211_delayed_tailroom_dec(). This could
result in inability to transmit data if driver
crashed during roaming or rekeying and subsequent
skbs with insufficient tailroom appeared.
This race was probably never seen in the wild
because a device driver would have to crash AND
recover within 0.5s which is very unlikely.
I was able to prove this race exists after
changing the delay to 10s locally and crashing
ath10k via debugfs immediately after GTK
rekeying. In case of ath10k the counter went below
0. This was harmless but other drivers which
actually require tailroom (e.g. for WEP ICV or
MMIC) could end up with the counter at 0 instead
of >0 and introduce insufficient skb tailroom
failures because mac80211 would not resize skbs
appropriately anymore.
Fixes: 8d1f7ecd2af5 ("mac80211: defer tailroom counter manipulation when roaming") Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was possible for mac80211 to be coerced into an
unexpected flow causing sdata union to become
corrupted. Station pointer was put into
sdata->u.vlan.sta memory location while it was
really master AP's sdata->u.ap.next_beacon. This
led to station entry being later freed as
next_beacon before __sta_info_flush() in
ieee80211_stop_ap() and a subsequent invalid
pointer dereference crash.
The problem was that ieee80211_ptr->use_4addr
wasn't cleared on interface type changes.
This could be reproduced with the following steps:
# host A and host B have just booted; no
# wpa_s/hostapd running; all vifs are down
host A> iw wlan0 set type station
host A> iw wlan0 set 4addr on
host A> printf 'interface=wlan0\nssid=4addrcrash\nchannel=1\nwds_sta=1' > /tmp/hconf
host A> hostapd -B /tmp/conf
host B> iw wlan0 set 4addr on
host B> ifconfig wlan0 up
host B> iw wlan0 connect -w hostAssid
host A> pkill hostapd
# host A crashed:
Note: This isn't a revert of f8cdddb8d61d
("cfg80211: check iface combinations only when
iface is running") as far as functionality is
considered because b6a550156bc ("cfg80211/mac80211:
move more combination checks to mac80211") moved
the logic somewhere else already.
Fixes: f8cdddb8d61d ("cfg80211: check iface combinations only when iface is running") Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b112889c5af8124 ("iwlwifi: mvm: add Aux ROC request/response flow")
added aux ROC flow in addition to the existing ROC flow. While doing
it, it moved the ROC reference release to a common work item, which
is being called for both the ROC and aux ROC flows.
This resulted in invalid reference accounting, as no reference was
taken in case of aux ROC, while a reference was released on completion.
Fix it by adding a reference for the aux ROC as well, and release
only the relevant references on completion (according to the set bits).
While at it, convert cancel_work_sync() to flush_work(), in order
to make sure the references are being cleaned properly.
The final version of commit 637241a900cb ("kmsg: honor dmesg_restrict
sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
processed incorrectly:
- open of /dev/kmsg checks syslog access permissions by using
check_syslog_permissions() where security_syslog() is not called if
dmesg_restrict is set.
- syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
can be executed twice (inside check_syslog_permissions() and then
directly in do_syslog())
With this patch security_syslog() is called once only in all
syslog-related operations regardless of dmesg_restrict value.
Fixes: 637241a900cb ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bitmap_parselist("", &mask, nmaskbits) will erroneously set bit zero in
the mask. The same bug is visible in cpumask_parselist() since it is
layered on top of the bitmask code, e.g. if you boot with "isolcpus=",
you will actually end up with cpu zero isolated.
The bug was introduced in commit 4b060420a596 ("bitmap, irq: add
smp_affinity_list interface to /proc/irq") when bitmap_parselist() was
generalized to support userspace as well as kernelspace.
Fixes: 4b060420a596 ("bitmap, irq: add smp_affinity_list interface to /proc/irq") Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cleanup commit 73679e508201 ("compiler-intel.h: Remove duplicate
definition") removed the double definition of __memory_barrier()
intrinsics.
However, in doing so, it also removed the preceding #undef barrier by
accident, meaning, the actual barrier() macro from compiler-gcc.h with
inline asm is still in place as __GNUC__ is provided.
Subsequently, barrier() can never be defined as __memory_barrier() from
compiler.h since it already has a definition in place and if we trust
the comment in compiler-intel.h, ecc doesn't support gcc specific asm
statements.
I don't have an ecc at hand (unsure if that's still used in the field?)
and only found this by accident during code review, a revert of that
cleanup would be simplest option.
Fixes: 73679e508201 ("compiler-intel.h: Remove duplicate definition") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: mancha security <mancha1@zoho.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A 32-bit entry point to a DMI table says how many structures the table
contains. The SMBIOS specification explicitly says that end-of-table
markers should be ignored if they are not actually at the end of the
DMI table. So only honor the end-of-table marker for tables accessed
through 64-bit entry points, as they do not specify a structure count.
Fixes: fc43026278 ("dmi: add support for SMBIOS 3.0 64-bit entry point") Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Many harddisks (mostly WD ones) have firmware problems and take too
long, more than 10 seconds, to resume from suspend. And this often
exceeds the default DPM watchdog timeout (12 seconds), resulting in a
kernel panic out of sudden.
Since most distros just take the default as is, we should give a bit
more safer value. This patch increases the default value from 12
seconds to one minute, which has been confirmed to be long enough for
such problematic disks.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=91921 Fixes: 70fea60d888d (PM / Sleep: Detect device suspend/resume lockup and log event) Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The platform_sysrq_reset_seq code was intended as a way for an embedded
platform to provide its own sysrq sequence at compile time. After over two
years, nobody has started using it in an upstream kernel, and the platforms
that were interested in it have moved on to devicetree, which can be used
to configure the sequence without requiring kernel changes. The method is
also incompatible with the way that most architectures build support for
multiple platforms into a single kernel.
Now the code is producing warnings when built with gcc-5.1:
drivers/tty/sysrq.c: In function 'sysrq_init':
drivers/tty/sysrq.c:959:33: warning: array subscript is above array bounds [-Warray-bounds]
key = platform_sysrq_reset_seq[i];
We could fix this, but it seems unlikely that it will ever be used, so
let's just remove the code instead. We still have the option to pass the
sequence either in DT, using the kernel command line, or using the
/sys/module/sysrq/parameters/reset_seq file.
A reorganisation of the PD allocation and deallocation in commit 9ba1377daa ("RDMA/ocrdma: Move PD resource management to driver.")
introduced a double free on pd, as detected by static analysis by
smatch:
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:682 ocrdma_alloc_pd()
error: double free of 'pd'^
The original call to ocrdma_mbx_dealloc_pd() (which does not kfree
pd) was replaced with a call to _ocrdma_dealloc_pd() (which does
kfree pd). The kfree following this call causes the double free,
so just remove it to fix the problem.
Fixes: 9ba1377daa ("RDMA/ocrdma: Move PD resource management to driver.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the final iteration of commit 245bd6f6af8a62a2 ("PM / clock_ops: Add
pm_clk_add_clk()"), a refcount increment was added by Grygorii Strashko.
However, the accompanying IS_ERR() check operates on the wrong clock
pointer, which is always zero at this point, i.e. not an error.
This may lead to a NULL pointer dereference later, when __clk_get()
tries to dereference an error pointer.
Check the passed clock pointer instead to fix this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: 245bd6f6af8a62a2 ("PM / clock_ops: Add pm_clk_add_clk()") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3a48edc4bd68 ("mmc: sdhci: Use mmc core regulator infrastucture")
changed the behavior for how to assign the ocr_avail mask for the mmc
host. More precisely it started to mask the bits instead of assigning
them.
Restore the behavior, but also make it clear that an OCR mask created
from an external regulator overrides the other ones. The OCR mask is
determined by one of the following with this priority:
1. Supported ranges of external regulator if one supplies VDD
2. Host OCR mask if set by the driver (based on DT properties)
3. The capabilities reported by the controller itself
Fixes: 3a48edc4bd68 ("mmc: sdhci: Use mmc core regulator infrastucture") Cc: Tim Kryger <tim.kryger@gmail.com> Reported-by: Yangbo Lu <yangbo.lu@freescale.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Tim Kryger <tim.kryger@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current handler of MMC_BLK_CMD_ERR in mmc_blk_issue_rw_rq function
may cause new coming request permanent missing when the ongoing
request (previoulsy started) complete end.
The problem scenario is as follows:
(1) Request A is ongoing;
(2) Request B arrived, and finally mmc_blk_issue_rw_rq() is called;
(3) Request A encounters the MMC_BLK_CMD_ERR error;
(4) In the error handling of MMC_BLK_CMD_ERR, suppose mmc_blk_cmd_err()
end request A completed and return zero. Continue the error handling,
suppose mmc_blk_reset() reset device success;
(5) Continue the execution, while loop completed because variable ret
is zero now;
(6) Finally, mmc_blk_issue_rw_rq() return without processing request B.
The process related to the missing request may wait that IO request
complete forever, possibly crashing the application or hanging the system.
Fix this issue by starting new request when reset success.
A configuration that enables earlycon but not the core console
code causes a link error:
drivers/built-in.o: In function `setup_earlycon':
drivers/tty/serial/earlycon.c:70: undefined reference to `uart_parse_earlycon'
That error can be triggered by the newly added samsung earlycon support,
which is missing a 'select' statement.
As suggested by Peter Hurley, solves the problem by moving the
'select SERIAL_EARLYCON' statement to the samsung console driver
option, as it is done by all other console drivers.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: b94ba0328d3b3 ("serial: samsung: Add support for early console") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zoltan Boszormenyi reported this regression:
"There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
1565:230e) network chip on the mainboard. After the r8169 driver loaded
the IRQs in the machine went berserk. Keyboard keypressed arrived with
considerable latency and duplicated, so no real work was possible.
The machine responded to the power button but didn't actually power
down. It just stuck at the powering down message. I had to press the
power button for 4 seconds to power it down.
The computer is a POS machine with a big battery inside. Because of this,
either ACPI or the Realtek chip kept the bad state and after rebooting,
the network chip didn't even show up in lspci. Not even the PXE ROM
announced itself during boot. I had to disconnect the battery to beat
some sense back to the computer.
The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
good."
The regression is caused by commit 593669c2ac0f (x86/PCI/ACPI: Use common
ACPI resource interfaces to simplify implementation). Since commit 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
first converting an ACPI resource to a 'struct resource' structure and
then applying checks against the converted resource structure. The 'start'
and 'end' fields in 'struct resource' are defined to be type of
resource_size_t, which may be 32 bits or 64 bits depending on
CONFIG_PHYS_ADDR_T_64BIT.
This may cause incorrect resource validation results with 32-bit kernels
because 64-bit ACPI resource descriptors may get truncated when converting
to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
affects PCI resource allocation subsystem and makes some PCI devices and
the system behave abnormally due to incorrect resource assignment.
So enhance the ACPI resource parsing interfaces to ignore ACPI resource
descriptors with address/offset above 4G when running in 32-bit mode.
With the fix applied, the behavior of the machine was restored to how
3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
and lspci -vvxxx shows that everything is at the same memory window as
they were with 3.18.16.
Reported-and-tested-by: Boszormenyi Zoltan <zboszor@pr.hu> Fixes: 593669c2ac0f (x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following commit temporarily disables correct 64-bit FADT addresses
favor during the period the root cause of the bug is not fixed:
Commit: 85dbd5801f62b66e2aa7826aaefcaebead44c8a6
ACPICA: Tables: Restore old behavor to favor 32-bit FADT addresses.
With enough protections, this patch re-enables 64-bit FADT addresses by
default. If regressions are reported against such change, this patch should
be bisected and reverted.
Note that 64-bit FACS favor and 64-bit firmware waking vector favor are
excluded by this commit in order not to break OSPMs. Lv Zheng.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021 Link: https://github.com/acpica/acpica/commit/4da56eea Reported-and-tested-by: Oswald Buddenhagen <ossi@kde.org> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds a new FACS initialization flag for acpi_tb_initialize().
acpi_enable_subsystem() might be invoked several times in OS bootup process,
and we don't want FACS initialization to be invoked twice. Lv Zheng.
Link: https://github.com/acpica/acpica/commit/90f5332a Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following commit is reported to have broken s2ram on some platforms:
Commit: 0249ed2444d65d65fc3f3f64f398f1ad0b7e54cd
ACPICA: Add option to favor 32-bit FADT addresses.
The platform reports 2 FACS tables (which is not allowed by ACPI
specification) and the new 32-bit address favor rule forces OSPMs to use
the FACS table reported via FADT's X_FIRMWARE_CTRL field.
The root cause of the reported bug might be one of the followings:
1. BIOS may favor the 64-bit firmware waking vector address when the
version of the FACS is greater than 0 and Linux currently only supports
resuming from the real mode, so the 64-bit firmware waking vector has
never been set and might be invalid to BIOS while the commit enables
higher version FACS.
2. BIOS may favor the FACS reported via the "FIRMWARE_CTRL" field in the
FADT while the commit doesn't set the firmware waking vector address of
the FACS reported by "FIRMWARE_CTRL", it only sets the firware waking
vector address of the FACS reported by "X_FIRMWARE_CTRL".
This patch excludes the cases that can trigger the bugs caused by the root
cause 2.
There is no handshaking mechanism can be used by OSPM to tell BIOS which
FACS is currently used. Thus the FACS reported by "FIRMWARE_CTRL" may still
be used by BIOS and the 0 value of the 32-bit firmware waking vector might
trigger such failure.
This patch tries to favor 32bit FACS address in another way where both the
FACS reported by "FIRMWARE_CTRL" and the FACS reported by "X_FIRMWARE_CTRL"
are loaded so that further commit can set firmware waking vector in the
both tables to ensure we can exclude the cases that trigger the bugs caused
by the root cause 2. The exclusion is split into 2 commits as this commit
is also useful for dumping more ACPI tables, it won't get reverted when
such exclusion is no longer necessary. Lv Zheng.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021 Link: https://github.com/acpica/acpica/commit/f7b86f35 Reported-and-tested-by: Oswald Buddenhagen <ossi@kde.org> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>