Currently the opal log is globally readable. It is kernel policy to
limit the visibility of physical addresses / kernel pointers to root.
Given this and the fact the opal log may contain this information it
would be better to limit the readability to root.
Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC
deny the use of BATS for mapping memory.
This patch makes sure that the specific wii RAM mapping function
takes it into account as well.
Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram") Cc: stable@vger.kernel.org Reviewed-by: Jonathan Neuschafer <j.neuschaefer@gmx.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the case when we're reusing a superblock, selinux_sb_clone_mnt_opts()
fails to set set_kern_flags, with the result that
nfs_clone_sb_security() incorrectly clears NFS_CAP_SECURITY_LABEL.
The result is that if you mount the same NFS filesystem twice, NFS
security labels are turned off, even if they would work fine if you
mounted the filesystem only once.
("fixes" may be not exactly the right tag, it may be more like
"fixed-other-cases-but-missed-this-one".)
Cc: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Fixes: 0b4d3452b8b4 "security/selinux: allow security_sb_clone_mnt_opts..." Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As does in __sctp_connect(), when checking addrs in a while loop, after
get the addr len according to sa_family, it's necessary to do the check
walk_size + af->sockaddr_len > addrs_size to make sure it won't access
an out-of-bounds addr.
The same thing is needed in selinux_sctp_bind_connect(), otherwise an
out-of-bounds issue can be triggered:
The jh pointer may be used uninitialized in the two cases below and the
compiler complain about it when enabling JBUFFER_TRACE macro, fix them.
In file included from fs/jbd2/transaction.c:19:0:
fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’:
./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized]
#define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
^
fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here
struct journal_head *jh;
^
In file included from fs/jbd2/transaction.c:19:0:
fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’:
./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized]
#define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
^
fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here
struct journal_head *jh;
^
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now, we capture a data corruption problem on ext4 while we're truncating
an extent index block. Imaging that if we are revoking a buffer which
has been journaled by the committing transaction, the buffer's jbddirty
flag will not be cleared in jbd2_journal_forget(), so the commit code
will set the buffer dirty flag again after refile the buffer.
Finally, if the freed extent index block was allocated again as data
block by some other files, it may corrupt the file data after writing
cached pages later, such as during unmount time. (In general,
clean_bdev_aliases() related helpers should be invoked after
re-allocation to prevent the above corruption, but unfortunately we
missed it when zeroout the head of extra extent blocks in
ext4_ext_handle_unwritten_extents()).
This patch mark buffer as freed and set j_next_transaction to the new
transaction when it already belongs to the committing transaction in
jbd2_journal_forget(), so that commit code knows it should clear dirty
bits when it is done with the buffer.
This problem can be reproduced by xfstests generic/455 easily with
seeds (3246 3247 3248 3249).
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are two other drivers that bind to mrvl,mmp-uart and both of them
assume register shift of 2 bits. There are device trees that lack the
property and rely on that assumption.
If this driver wins the race to bind to those devices, it should behave
the same as the older deprecated driver.
If RX is disabled while there are still unprocessed bytes in RX FIFO,
cdns_uart_handle_rx() called from interrupt handler will get stuck in
the receive loop as read bytes will not get removed from the RX FIFO
and CDNS_UART_SR_RXEMPTY bit will never get set.
Avoid the stuck handler by checking first if RX is disabled. port->lock
protects against race with RX-disabling functions.
This HW behavior was mentioned by Nathan Rossi in 43e98facc4a3 ("tty:
xuartps: Fix RX hang, and TX corruption in termios call") which fixed a
similar issue in cdns_uart_set_termios().
The behavior can also be easily verified by e.g. setting
CDNS_UART_CR_RX_DIS at the beginning of cdns_uart_handle_rx() - the
following loop will then get stuck.
Resetting the FIFO using RXRST would not set RXEMPTY either so simply
issuing a reset after RX-disable would not work.
I observe this frequently on a ZynqMP board during heavy RX load at 1M
baudrate when the reader process exits and thus RX gets disabled.
Fixes: 61ec9016988f ("tty/serial: add support for Xilinx PS UART") Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BPF can adjust gso only for tcp bytestreams. Fail on other gso types.
But only on gso packets. It does not touch this field if !gso_size.
Fixes: b90efd225874 ("bpf: only adjust gso_size on bytestream protocols") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Legacy behaviour was to allow non-page-aligned mmap requests, as does the
linux mmap(2) implementation by virtue of automatically rounding up for
the caller.
To avoid breaking legacy userspace relax the newly introduced fix.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 5c4604e757ba ("drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set") Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Adam Zabrocki <adamza@microsoft.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.0+ Cc: Akash Goel <akash.goel@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190305110409.28633-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit a90e1948efb648f567444f87f3c19b2a0787affd) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch fixes the following checkpatch warning:
| Macro argument 'x' may be better as '(x)' to avoid precedence issues
Fixes: cbffaf7aa09e ("can: flexcan: Always use last mailbox for TX") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a PCA953x gpio was used as an interrupt and then released,
the shutdown function was trying to extract the pca953x_chip
pointer directly from the irq_data, but in reality was getting
the gpio_chip structure.
The net effect was that the subsequent writes to the data
structure corrupted data in the gpio_chip structure, which wasn't
immediately obvious until attempting to use the GPIO again in the
future, at which point the kernel panics.
This fix correctly extracts the pca953x_chip structure via the
gpio_chip structure, as is correctly done in the other irq
functions.
Fixes: 0a70fe00efea ("gpio: pca953x: Clear irq trigger type on irq shutdown") Cc: stable@vger.kernel.org Signed-off-by: Mark Walton <mark.walton@serialtek.com> Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the ov5640 specification (2.7 power up sequence), host can
access the sensor's registers 20ms after reset. Trying to access them
before leads to undefined behavior and result in sporadic initialization
errors.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Adam Ford <aford173@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the original code before 181bf1e815a2 the loop was continuing until
it finds the first matching superios[i].io and p->base.
But after 181bf1e815a2 the logic changed and the loop now returns the
pointer to the first mismatched array element which is then used in
get_superio_dma() and get_superio_irq() and thus returning the wrong
value.
Fix the condition so that it now returns the correct pointer.
Fixes: 181bf1e815a2 ("parport_pc: clean up the modified while loops using for") Cc: Alan Cox <alan@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: QiaoChong <qiaochong@loongson.cn>
[rewrite the commit message] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an output port driver is removed, also remove references to it from
any masters. Failing to do this causes a NULL ptr dereference when
configuring another output port:
With string type property entries we need to use
sizeof(const char *) instead of the number of characters as
the length of the entry.
If the string was shorter then sizeof(const char *),
attempts to read it would have failed with -EOVERFLOW. The
problem has been hidden because all build-in string
properties have had a string longer then 8 characters until
now.
Fixes: a85f42047533 ("device property: helper macros for property entry creation") Cc: 4.5+ <stable@vger.kernel.org> # 4.5+ Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This bug has apparently existed since the introduction of this function
in the pre-git era (4500e91754d3 in Thomas Gleixner's history.git,
"[NET]: Add proc_dointvec_userhz_jiffies, use it for proper handling of
neighbour sysctls.").
As a minimal fix we can simply duplicate the corresponding check in
do_proc_dointvec_conv().
LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
This is a stress test, where one thread mmaps/writes/munmaps memory area
and other thread is trying to read from it:
page_table_free() is called with NULL mm parameter, but because "0" is a
valid address on s390 (see S390_lowcore), it keeps going until it
eventually crashes in lockdep's lock_acquire. This crash is
reproducible at least since 4.14.
Problem is that "vmf->vma" used in do_fault() can become stale. Because
mmap_sem may be released, other threads can come in, call munmap() and
cause "vma" be returned to kmem cache, and get zeroed/re-initialized and
re-used:
handle_mm_fault |
__handle_mm_fault |
do_fault |
vma = vmf->vma |
do_read_fault |
__do_fault |
vma->vm_ops->fault(vmf); |
mmap_sem is released |
|
| do_munmap()
| remove_vma_list()
| remove_vma()
| vm_area_free()
| # vma is released
| ...
| # same vma is allocated
| # from kmem cache
| do_mmap()
| vm_area_alloc()
| memset(vma, 0, ...)
|
pte_free(vma->vm_mm, ...); |
page_table_free |
spin_lock_bh(&mm->context.lock);|
<crash> |
Cache mm_struct to avoid using potentially stale "vma".
When VM_NO_GUARD is not set area->size includes adjacent guard page,
thus for correct size checking get_vm_area_size() should be used, but
not area->size.
This fixes possible kernel oops when userspace tries to mmap an area on
1 page bigger than was allocated by vmalloc_user() call: the size check
inside remap_vmalloc_range_partial() accounts non-existing guard page
also, so check successfully passes but vmalloc_to_page() returns NULL
(guard page does not physically exist).
The following code pattern example should trigger an oops:
soft_offline_in_use_page() passed refcount and page lock from tail page
to head page, which is not needed because we can pass any subpage to
split_huge_page().
Naoya had fixed a similar issue in c3901e722b29 ("mm: hwpoison: fix thp
split handling in memory_failure()"). But he missed fixing soft
offline.
This commit fixes the issue that USB-DMAC hangs silently after system
resumes on R-Car Gen3 hence renesas_usbhs will not work correctly
when using USB-DMAC for bulk transfer e.g. ethernet or serial
gadgets.
The issue can be reproduced by these steps:
1. modprobe g_serial
2. Suspend and resume system.
3. connect a usb cable to host side
4. Transfer data from Host to Target
5. cat /dev/ttyGS0 (Target side)
6. echo "test" > /dev/ttyACM0 (Host side)
The 'cat' will not result anything. However, system still can work
normally.
Currently, USB-DMAC driver does not have system sleep callbacks hence
this driver relies on the PM core to force runtime suspend/resume to
suspend and reinitialize USB-DMAC during system resume. After
the commit 17218e0092f8 ("PM / genpd: Stop/start devices without
pm_runtime_force_suspend/resume()"), PM core will not force
runtime suspend/resume anymore so this issue happens.
To solve this, make system suspend resume explicit by using
pm_runtime_force_{suspend,resume}() as the system sleep callbacks.
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() is used to make sure USB-DMAC
suspended after and initialized before renesas_usbhs."
Commit 1a2f474d328f handles block _reads_ separately with plain-I2C
adapters, but the problem described with regmap-i2c not handling
SMBus block transfers (i.e. read and writes) correctly also exists
with writes.
As workaround, this patch adds a block write function the same way 1a2f474d328f adds a block read function.
Fixes: 1a2f474d328f ("usb: typec: tps6598x: handle block reads separately with plain-I2C adapters") Fixes: 0a4c005bd171 ("usb: typec: driver for TI TPS6598x USB Power Delivery controllers") Signed-off-by: Nikolaus Voss <nikolaus.voss@loewensteinmedical.de> Cc: stable <stable@vger.kernel.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 'div' field does not represent a number of bits used to divide
(understand: right-shift) the divider, but a number itself used to
divide the divider.
Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Maarten ter Huurne <maarten@treewalker.org> Cc: <stable@vger.kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Take a parent rate of 180 MHz, and a requested rate of 4.285715 MHz.
This results in a theorical divider of 41.999993 which is then rounded
up to 42. The .round_rate function would then return (180 MHz / 42) as
the clock, rounded down, so 4.285714 MHz.
Calling clk_set_rate on 4.285714 MHz would round the rate again, and
give a theorical divider of 42,0000028, now rounded up to 43, and the
rate returned would be (180 MHz / 43) which is 4.186046 MHz, aka. not
what we requested.
Fix this by rounding up the divisions.
Signed-off-by: Paul Cercueil <paul@crapouillou.net> Tested-by: Maarten ter Huurne <maarten@treewalker.org> Cc: <stable@vger.kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Platform driver driver_override field should not be initialized from
const memory because the core later kfree() it. If driver_override is
manually set later through sysfs, kfree() of old value leads to:
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058e8c0>] (platform_set_driver_override+0x84/0xac)
(platform_set_driver_override) from [<c058e908>] (driver_override_store+0x20/0x34)
(driver_override_store) from [<c031f778>] (kernfs_fop_write+0x100/0x1dc)
(kernfs_fop_write) from [<c0296de8>] (__vfs_write+0x2c/0x17c)
(__vfs_write) from [<c02970c4>] (vfs_write+0xa4/0x188)
(vfs_write) from [<c02972e8>] (ksys_write+0x4c/0xac)
(ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
The clk-exynos5-subcmu driver uses override only for the purpose of
creating meaningful names for children devices (matching names of power
domains, e.g. DISP, MFC). The driver_override was not developed for
this purpose so just switch to default names of devices to fix the
issue.
During initialization of subdevices if platform_device_alloc() failed,
returned NULL pointer will be later dereferenced. Add proper error
paths to exynos5_clk_register_subcmu(). The return value of this
function is still ignored because at this stage of init there is nothing
we can do.
I noticed that modprobe clk-twl6040 can fail after a cold boot with:
abe_cm:clk:0010:0: failed to enable
...
Unhandled fault: imprecise external abort (0x1406) at 0xbe896b20
WARNING: CPU: 1 PID: 29 at drivers/clk/clk.c:828 clk_core_disable_lock+0x18/0x24
...
(clk_core_disable_lock) from [<c0123534>] (_disable_clocks+0x18/0x90)
(_disable_clocks) from [<c0124040>] (_idle+0x17c/0x244)
(_idle) from [<c0125ad4>] (omap_hwmod_idle+0x24/0x44)
(omap_hwmod_idle) from [<c053a038>] (sysc_runtime_suspend+0x48/0x108)
(sysc_runtime_suspend) from [<c06084c4>] (__rpm_callback+0x144/0x1d8)
(__rpm_callback) from [<c0608578>] (rpm_callback+0x20/0x80)
(rpm_callback) from [<c0607034>] (rpm_suspend+0x120/0x694)
(rpm_suspend) from [<c0607a78>] (__pm_runtime_idle+0x60/0x84)
(__pm_runtime_idle) from [<c053aaf0>] (sysc_probe+0x874/0xf2c)
(sysc_probe) from [<c05fecd4>] (platform_drv_probe+0x48/0x98)
After searching around for a similar issue, I came across an earlier fix
that never got merged upstream in the Android tree for glass-omap-xrr02.
There is patch "MFD: twl6040-codec: Implement PDMCLK cold temp errata"
by Misael Lopez Cruz <misael.lopez@ti.com>.
Based on my observations, this fix is also needed when cold booting
devices, and not just for deeper idle modes. Since we now have a clock
driver for pdmclk, let's fix the issue in twl6040_pdmclk_prepare().
Cc: Misael Lopez Cruz <misael.lopez@ti.com> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: <stable@vger.kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When ext2 filesystem is created with 64k block size, ext2_max_size()
will return value less than 0. Also, we cannot write any file in this fs
since the sb->maxbytes is less than 0. The core of the problem is that
the size of block index tree for such large block size is more than
i_blocks can carry. So fix the computation to count with this
possibility.
File size limits computed with the new function for the full range of
possible block sizes look like:
Within cxl module, iteration over array 'adapter->afu' may be racy
at few points as it might be simultaneously read during an EEH and its
contents being set to NULL while driver is being unloaded or unbound
from the adapter. This might result in a NULL pointer to 'struct afu'
being de-referenced during an EEH thereby causing a kernel oops.
This patch fixes this by making sure that all access to the array
'adapter->afu' is wrapped within the context of spin-lock
'adapter->afu_list_lock'.
When disabling and removing a receive context, it is possible for an
asynchronous event (i.e IRQ) to occur. Because of this, there is a race
between cleaning up the context, and the context being used by the
asynchronous event.
Since 7c5925afbc58 (PCI: dwc: Move MSI IRQs allocation to IRQ domains
hierarchical API) the MSI init claims one of the controller IRQs as a
chained IRQ line for the MSI controller. On some designs, like the i.MX6,
this line is shared with a PCIe legacy IRQ. When the line is claimed for
the MSI domain, any device trying to use this legacy IRQs will fail to
request this IRQ line.
As MSI and legacy IRQs are already mutually exclusive on the DWC core,
as the core won't forward any legacy IRQs once any MSI has been enabled,
users wishing to use legacy IRQs already need to explictly disable MSI
support (usually via the pci=nomsi kernel commandline option). To avoid
any issues with MSI conflicting with legacy IRQs, just skip all of the
DWC MSI initalization, including the IRQ line claim, when MSI is disabled.
Fixes: 7c5925afbc58 ("PCI: dwc: Move MSI IRQs allocation to IRQ domains hierarchical API") Tested-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously dpc_handler() called aer_get_device_error_info() without
initializing info->severity, so aer_get_device_error_info() relied on
uninitialized data.
Add dpc_get_aer_uncorrect_severity() to read the port's AER status, mask,
and severity registers and set info->severity.
Also, clear the port's AER fatal error status bits.
Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
[bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RussianNeuroMancer reported that the Intel 7265 wifi on a Dell Venue 11 Pro
7140 table stopped working after wakeup from suspend and bisected the
problem to 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't
have LTR"). David Ward reported the same problem on a Dell Latitude 7350.
After af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before
using it"), we don't enable LTR unless the platform has granted LTR control
to us. In addition, we don't notice if the platform had already enabled
LTR itself.
After 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have
LTR"), we avoid using LTR if we don't think the path to the device has LTR
enabled.
The combination means that if the platform itself enables LTR but declines
to give the OS control over LTR, we unnecessarily avoided using ASPM L1.2.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201469 Fixes: 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have LTR") Fixes: af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before using it") Reported-by: RussianNeuroMancer <russianneuromancer@ya.ru> Reported-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v4.18+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When computing maximum size of filesystem possible with given number of
group descriptor blocks, we forget to include s_first_data_block into
the number of blocks. Thus for filesystems with non-zero
s_first_data_block it can happen that computed maximum filesystem size
is actually lower than current filesystem size which confuses the code
and eventually leads to a BUG_ON in ext4_alloc_group_tables() hitting on
flex_gd->count == 0. The problem can be reproduced like:
Fix the problem by properly including s_first_data_block into the
computed number of filesystem blocks.
Fixes: 1c6bd7173d66 "ext4: convert file system to meta_bg if needed..." Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The reason is that while swapping two inode, we swap the flags too.
Some flags such as EXT4_JOURNAL_DATA_FL can really confuse the things
since we're not resetting the address operations structure. The
simplest way to keep things sane is to restrict the flags that can be
swapped.
While do swap between two inode, they swap i_data without update
quota information. Also, swap_inode_boot_loader can do "revert"
somtimes, so update the quota while all operations has been finished.
While do swap, we should make sure there has no new dirty page since we
should swap i_data between two inode:
1.We should lock i_mmap_sem with write to avoid new pagecache from mmap
read/write;
2.Change filemap_flush to filemap_write_and_wait and move them to the
space protected by inode lock to avoid new pagecache from buffer read/write.
Before really do swap between inode and boot inode, something need to
check to avoid invalid or not permitted operation, like does this inode
has inline data. But the condition check should be protected by inode
lock to avoid change while swapping. Also some other condition will not
change between swapping, but there has no problem to do this under inode
lock.
pxa_cpufreq_init_voltages() is marked __init but usually inlined into
the non-__init pxa_cpufreq_init() function. When building with clang,
it can stay as a standalone function in a discarded section, and produce
this warning:
WARNING: vmlinux.o(.text+0x616a00): Section mismatch in reference from the function pxa_cpufreq_init() to the function .init.text:pxa_cpufreq_init_voltages()
The function pxa_cpufreq_init() references
the function __init pxa_cpufreq_init_voltages().
This is often because pxa_cpufreq_init lacks a __init
annotation or the annotation of pxa_cpufreq_init_voltages is wrong.
Fixes: 50e77fcd790e ("ARM: pxa: remove __init from cpufreq_driver->init()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 5ad7346b4ae2 ("cpufreq: kryo: Add module remove and exit") made
it possible to build the kryo cpufreq driver as a module, but it failed
to release all the resources, i.e. OPP tables, when the module is
unloaded.
This patch fixes it by releasing the OPP tables, by calling
dev_pm_opp_put_supported_hw() for them, from the
qcom_cpufreq_kryo_remove() routine. The array of pointers to the OPP
tables is also allocated dynamically now in qcom_cpufreq_kryo_probe(),
as the pointers will be required while releasing the resources.
Prohibit probing on optprobe template code, since it is not
a code but a template instruction sequence. If we modify
this template, copied template must be broken.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 9326638cbee2 ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation") Link: http://lkml.kernel.org/r/154998787911.31052.15274376330136234452.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using the irq_gc_lock/irq_gc_unlock functions in the suspend and
resume functions creates the opportunity for a deadlock during
suspend, resume, and shutdown. Using the irq_gc_lock_irqsave/
irq_gc_unlock_irqrestore variants prevents this possible deadlock.
Cc: stable@vger.kernel.org Fixes: 7f646e92766e2 ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
[maz: tidied up $SUBJECT] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In current logic, its_parse_indirect_baser() will be invoked twice
when allocating Device tables. Add a *break* to omit the unnecessary
and annoying (might be ...) invoking.
Fixes: 32bd44dc19de ("irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size") Cc: stable@vger.kernel.org Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using the batch API from the interconnect driver sometimes leads to a
KASAN error due to an access to freed memory. This is easier to trigger
with threadirqs on the kernel commandline.
BUG: KASAN: use-after-free in rpmh_tx_done+0x114/0x12c
Read of size 1 at addr fffffff51414ad84 by task irq/110-apps_rs/57
Memory state around the buggy address: fffffff51414ac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fffffff51414ad00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>fffffff51414ad80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ fffffff51414ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fffffff51414ae80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
The batch API sets the same completion for each rpmh message that's sent
and then loops through all the messages and waits for that single
completion declared on the stack to be completed before returning from
the function and freeing the message structures. Unfortunately, some
messages may still be in process and 'stuck' in the TCS. At some later
point, the tcs_tx_done() interrupt will run and try to process messages
that have already been freed at the end of rpmh_write_batch(). This will
in turn access the 'needs_free' member of the rpmh_request structure and
cause KASAN to complain. Furthermore, if there's a message that's
completed in rpmh_tx_done() and freed immediately after the complete()
call is made we'll be racing with potentially freed memory when
accessing the 'needs_free' member:
Let's fix this by allocating a chunk of completions for each message and
waiting for all of them to be completed before returning from the batch
API. Alternatively, we could wait for the last message in the batch, but
that may be a more complicated change because it looks like
tcs_tx_done() just iterates through the indices of the queue and
completes each message instead of tracking the last inserted message and
completing that first.
Fixes: c8790cb6da58 ("drivers: qcom: rpmh: add support for batch RPMH request") Cc: Lina Iyer <ilina@codeaurora.org> Cc: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Reviewed-by: Lina Iyer <ilina@codeaurora.org> Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the past we had data corruption when reading compressed extents that
are shared within the same file and they are consecutive, this got fixed
by commit 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and
shared extents") and by commit 808f80b46790f ("Btrfs: update fix for read
corruption of compressed and shared extents"). However there was a case
that was missing in those fixes, which is when the shared and compressed
extents are referenced with a non-zero offset. The following shell script
creates a reproducer for this issue:
#!/bin/bash
mkfs.btrfs -f /dev/sdc &> /dev/null
mount -o compress /dev/sdc /mnt/sdc
# Create a file with 3 consecutive compressed extents, each has an
# uncompressed size of 128Kb and a compressed size of 4Kb.
for ((i = 1; i <= 3; i++)); do
head -c 4096 /dev/zero
for ((j = 1; j <= 31; j++)); do
head -c 4096 /dev/zero | tr '\0' "\377"
done
done > /mnt/sdc/foobar
sync
echo "Digest after file creation: $(md5sum /mnt/sdc/foobar)"
# Clone the first extent into offsets 128K and 256K.
xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar
xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar
sync
echo "Digest after cloning: $(md5sum /mnt/sdc/foobar)"
# Punch holes into the regions that are already full of zeroes.
xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar
xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar
xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar
sync
echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)"
When running the script we get the following output:
Digest after file creation: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
linked 131072/131072 bytes at offset 131072
128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec)
linked 131072/131072 bytes at offset 262144
128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec)
Digest after cloning: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
Digest after hole punching: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
Dropping page cache...
Digest after hole punching: fba694ae8664ed0c2e9ff8937e7f1484 /mnt/sdc/foobar
This happens because after reading all the pages of the extent in the
range from 128K to 256K for example, we read the hole at offset 256K
and then when reading the page at offset 260K we don't submit the
existing bio, which is responsible for filling all the page in the
range 128K to 256K only, therefore adding the pages from range 260K
to 384K to the existing bio and submitting it after iterating over the
entire range. Once the bio completes, the uncompressed data fills only
the pages in the range 128K to 256K because there's no more data read
from disk, leaving the pages in the range 260K to 384K unfilled. It is
just a slightly different variant of what was solved by commit 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and shared
extents").
Fix this by forcing a bio submit, during readpages(), whenever we find a
compressed extent map for a page that is different from the extent map
for the previous page or has a different starting offset (in case it's
the same compressed extent), instead of the extent map's original start
offset.
A test case for fstests follows soon.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Fixes: 808f80b46790f ("Btrfs: update fix for read corruption of compressed and shared extents") Fixes: 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and shared extents") Cc: stable@vger.kernel.org # 4.3+ Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We recently had a customer issue with a corrupted filesystem. When
trying to mount this image btrfs panicked with a division by zero in
calc_stripe_length().
The corrupt chunk had a 'num_stripes' value of 1. calc_stripe_length()
takes this value and divides it by the number of copies the RAID profile
is expected to have to calculate the amount of data stripes. As a DUP
profile is expected to have 2 copies this division resulted in 1/2 = 0.
Later then the 'data_stripes' variable is used as a divisor in the
stripe length calculation which results in a division by 0 and thus a
kernel panic.
When encountering a filesystem with a DUP block group and a
'num_stripes' value unequal to 2, refuse mounting as the image is
corrupted and will lead to unexpected behaviour.
Code inspection showed a RAID1 block group has the same issues.
Fixes: e06cd3dd7cea ("Btrfs: add validadtion checks for chunk loading") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are holding a transaction handle when setting an acl, therefore we can
not allocate the xattr value buffer using GFP_KERNEL, as we could deadlock
if reclaim is triggered by the allocation, therefore setup a nofs context.
Fixes: 39a27ec1004e8 ("btrfs: use GFP_KERNEL for xattr and acl allocations") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are holding a transaction handle when creating a tree, therefore we can
not allocate the root using GFP_KERNEL, as we could deadlock if reclaim is
triggered by the allocation, therefore setup a nofs context.
Fixes: 74e4d82757f74 ("btrfs: let callers of btrfs_alloc_root pass gfp flags") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a build failure when using GCC 8.1:
/usr/bin/ld: block/partitions/ldm.o: in function `ldm_parse_tocblock':
block/partitions/ldm.c:153: undefined reference to `strcmp'
This is caused by a new optimization which effectively replaces a
strncmp() call with a strcmp() call. This affects a number of strncmp()
call sites in the kernel.
The entire class of optimizations is avoided with -fno-builtin, which
gets enabled by -ffreestanding. This may avoid possible future build
failures in case new optimizations appear in future compilers.
I haven't done any performance measurements with this patch but I did
count the function calls in a defconfig build. For example, there are now
23 more sprintf() calls and 39 fewer strcpy() calls. The effect on the
other libc functions is smaller.
If this harms performance we can tackle that regression by optimizing
the call sites, ideally using semantic patches. That way, clang and ICC
builds might benfit too.
If a file has been copied up metadata only, and later data is copied up,
upper loses any security.capability xattr it has (underlying filesystem
clears it as upon file write).
From a user's point of view, this is just a file copy-up and that should
not result in losing security.capability xattr. Hence, before data copy
up, save security.capability xattr (if any) and restore it on upper after
data copy up is complete.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Fixes: 0c2888749363 ("ovl: A new xattr OVL_XATTR_METACOPY for file on upper") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a file with capability set (and hence security.capability xattr) is
written kernel clears security.capability xattr. For overlay, during file
copy up if xattrs are copied up first and then data is, copied up. This
means data copy up will result in clearing of security.capability xattr
file on lower has. And this can result into surprises. If a lower file has
CAP_SETUID, then it should not be cleared over copy up (if nothing was
actually written to file).
This also creates problems with chown logic where it first copies up file
and then tries to clear setuid bit. But by that time security.capability
xattr is already gone (due to data copy up), and caller gets -ENODATA.
This has been reported by Giuseppe here.
Fix this by copying up data first and then metadta. This is a regression
which has been introduced by my commit as part of metadata only copy up
patches.
TODO: There will be some corner cases where a file is copied up metadata
only and later data copy up happens and that will clear security.capability
xattr. Something needs to be done about that too.
Fixes: bd64e57586d3 ("ovl: During copy up, first copy up metadata and then data") Cc: <stable@vger.kernel.org> # v4.19+ Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before this patch, it was possible for two pipes to affect each other after
data had been transferred between them with tee():
============
$ cat tee_test.c
int main(void) {
int pipe_a[2];
if (pipe(pipe_a)) err(1, "pipe");
int pipe_b[2];
if (pipe(pipe_b)) err(1, "pipe");
if (write(pipe_a[1], "abcd", 4) != 4) err(1, "write");
if (tee(pipe_a[0], pipe_b[1], 2, 0) != 2) err(1, "tee");
if (write(pipe_b[1], "xx", 2) != 2) err(1, "write");
As suggested by Al Viro, fix it by creating a separate type for
non-mergeable pipe buffers, then changing the types of buffers in
splice_pipe_to_pipe() and link_pipe().
Cc: <stable@vger.kernel.org> Fixes: 7c77f0b3f920 ("splice: implement pipe to pipe splicing") Fixes: 70524490ee2e ("[PATCH] splice: add support for sys_tee()") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d_delete only unhashes an entry if it is reached with
dentry->d_lockref.count != 1. Prior to commit 8ead9dd54716 ("devpts:
more pty driver interface cleanups"), d_delete was called on a dentry
from devpts_pty_kill with two references held, which would trigger the
unhashing, and the subsequent dputs would release it.
Commit 8ead9dd54716 reworked devpts_pty_kill to stop acquiring the second
reference from d_find_alias, and the d_delete call left the dentries
still on the hashed list without actually ever being dropped from dcache
before explicit cleanup. This causes the number of negative dentries for
devpts to pile up, and an `ls /dev/pts` invocation can take seconds to
return.
Provide always_delete_dentry() from simple_dentry_operations
as .d_delete for devpts, to make the dentry be dropped from dcache.
Without this cleanup, the number of dentries in /dev/pts/ can be grown
arbitrarily as:
This patch fixes LUN discovery when loop ID is not yet assigned by the
firmware during driver load/sg_reset operations. Driver will now search for
new loop id before retrying login.
Fixes: 48acad099074 ("scsi: qla2xxx: Fix N2N link re-connect") Cc: stable@vger.kernel.org #4.19 Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using SCSI passthrough in combination with the iSCSI target driver
then cmd->t_state_lock may be obtained from interrupt context. Hence, all
code that obtains cmd->t_state_lock from thread context must disable
interrupts first. This patch avoids that lockdep reports the following:
WARNING: inconsistent lock state
4.18.0-dbg+ #1 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
iscsi_ttx/1800 [HC1[1]:SC0[2]:HE0:SE0] takes: 000000006e7b0ceb (&(&cmd->t_state_lock)->rlock){?...}, at: target_complete_cmd+0x47/0x2c0 [target_core_mod]
{HARDIRQ-ON-W} state was registered at:
lock_acquire+0xd2/0x260
_raw_spin_lock+0x32/0x50
iscsit_close_connection+0x97e/0x1020 [iscsi_target_mod]
iscsit_take_action_for_connection_exit+0x108/0x200 [iscsi_target_mod]
iscsi_target_rx_thread+0x180/0x190 [iscsi_target_mod]
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
irq event stamp: 1281
hardirqs last enabled at (1279): [<ffffffff970ade79>] __local_bh_enable_ip+0xa9/0x160
hardirqs last disabled at (1281): [<ffffffff97a008a5>] interrupt_entry+0xb5/0xd0
softirqs last enabled at (1278): [<ffffffff977cd9a1>] lock_sock_nested+0x51/0xc0
softirqs last disabled at (1280): [<ffffffffc07a6e04>] ip6_finish_output2+0x124/0xe40 [ipv6]
other info that might help us debug this:
Possible unsafe locking scenario:
It was reported that some devices report an OPTIMAL TRANSFER LENGTH of
0xFFFF blocks. That looks bogus, especially for a device with a
4096-byte physical block size.
Ignore OPTIMAL TRANSFER LENGTH if it is not a multiple of the device's
reported physical block size.
To make the sanity checking conditionals more readable--and to
facilitate printing warnings--relocate the checking to a helper
function. No functional change aside from the printks.
Cc: <stable@vger.kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199759 Reported-by: Christoph Anton Mitterer <calestyo@scientia.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix performance issue where the queue depth for SmartIOC logical volumes is
set to 1, and allow the usual logical volume code to be executed
Fixes: a052865fe287 (aacraid: Set correct Queue Depth for HBA1000 RAW disks) Cc: stable@vger.kernel.org Signed-off-by: Sagar Biradar <Sagar.Biradar@microchip.com> Reviewed-by: Dave Carroll <david.carroll@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The virtio scsi spec defines struct virtio_scsi_ctrl_tmf as a set of
device-readable records and a single device-writable response entry:
struct virtio_scsi_ctrl_tmf
{
// Device-readable part
le32 type;
le32 subtype;
u8 lun[8];
le64 id;
// Device-writable part
u8 response;
}
The above should be organised as two descriptor entries (or potentially
more if using VIRTIO_F_ANY_LAYOUT), but without any extra data after "le64
id" or after "u8 response".
The Linux driver doesn't respect that, with virtscsi_abort() and
virtscsi_device_reset() setting cmd->sc before calling virtscsi_tmf(). It
results in the original scsi command payload (or writable buffers) added to
the tmf.
This fixes the problem by leaving cmd->sc zeroed out, which makes
virtscsi_kick_cmd() add the tmf to the control vq without any payload.
Cc: stable@vger.kernel.org Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A queue with a capacity of zero is clearly not a valid virtio queue.
Some emulators report zero queue size if queried with an invalid queue
index. Instead of crashing in this case let us just return -ENOENT. To
make that work properly, let us fix the notifier cleanup logic as well.
Cc: stable@vger.kernel.org Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The setup_lowcore() function creates a new prefix page for the boot CPU.
The PSW mask for the system_call, external interrupt, i/o interrupt and
the program check handler have the DAT bit set in this new prefix page.
At the time setup_lowcore is called the system still runs without virtual
address translation, the paging_init() function creates the kernel page
table and loads the CR13 with the kernel ASCE.
Any code between setup_lowcore() and the end of paging_init() that has
a BUG or WARN statement will create a program check that can not be
handled correctly as there is no kernel page table yet.
To allow early WARN statements initially setup the lowcore with DAT off
and set the DAT bit only after paging_init() has completed.
Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Allwinner A64 SoC is known[1] to have an unstable architectural
timer, which manifests itself most obviously in the time jumping forward
a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
timer frequency of 24 MHz, implying that the time went slightly backward
(and this was interpreted by the kernel as it jumping forward and
wrapping around past the epoch).
Investigation revealed instability in the low bits of CNTVCT at the
point a high bit rolls over. This leads to power-of-two cycle forward
and backward jumps. (Testing shows that forward jumps are about twice as
likely as backward jumps.) Since the counter value returns to normal
after an indeterminate read, each "jump" really consists of both a
forward and backward jump from the software perspective.
Unless the kernel is trapping CNTVCT reads, a userspace program is able
to read the register in a loop faster than it changes. A test program
running on all 4 CPU cores that reported jumps larger than 100 ms was
run for 13.6 hours and reported the following:
This works out to a jump larger than 100 ms about every 5.5 seconds on
each CPU core.
The largest jump (almost an hour!) was the following sequence of reads:
0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000
Note that the middle bits don't necessarily all read as all zeroes or
all ones during the anomalous behavior; however the low 10 bits checked
by the function in this patch have never been observed with any other
value.
Also note that smaller jumps are much more common, with backward jumps
of 2048 (2^11) cycles observed over 400 times per second on each core.
(Of course, this is partially explained by lower bits rolling over more
frequently.) Any one of these could have caused the 95 year time skip.
Similar anomalies were observed while reading CNTPCT (after patching the
kernel to allow reads from userspace). However, the CNTPCT jumps are
much less frequent, and only small jumps were observed. The same program
as before (except now reading CNTPCT) observed after 72 hours:
Further investigation showed that the instability in CNTPCT/CNTVCT also
affected the respective timer's TVAL register. The following values were
observed immediately after writing CNVT_TVAL to 0x10000000:
I was also twice able to reproduce the issue covered by Allwinner's
workaround[4], that writing to TVAL sometimes fails, and both CVAL and
TVAL are left with entirely bogus values. One was the following values:
CNTVCT | CNTV_TVAL | CNTV_CVAL
--------------------+------------+--------------------------------------
0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past) Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
========================================================================
Because the CPU can read the CNTPCT/CNTVCT registers faster than they
change, performing two reads of the register and comparing the high bits
(like other workarounds) is not a workable solution. And because the
timer can jump both forward and backward, no pair of reads can
distinguish a good value from a bad one. The only way to guarantee a
good value from consecutive reads would be to read _three_ times, and
take the middle value only if the three values are 1) each unique and
2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
if an anomaly is detected.
However, since there is a distinct pattern to the bad values, we can
optimize the common case (1022/1024 of the time) to a single read by
simply ignoring values that match the error pattern. This still takes no
more than 3 cycles in the worst case, and requires much less code. As an
additional safety check, we still limit the loop iteration to the number
of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.
For the TVAL registers, the simple solution is to not use them. Instead,
read or write the CVAL and calculate the TVAL value in software.
Although the manufacturer is aware of at least part of the erratum[4],
there is no official name for it. For now, use the kernel-internal name
"UNKNOWN1".
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Cc: stable@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When shutting down the timer, ensure that after we have stopped the
timer any pending interrupts are cleared. This fixes a problem when
suspending, as interrupts are disabled before the timer is stopped,
so the timer interrupt may still be asserted, preventing the system
entering a low power state when the wfi is executed.
Signed-off-by: Stuart Menefy <stuart.menefy@mathembedded.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: <stable@vger.kernel.org> # v4.3+ Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a timer tick occurs and the clock is in one-shot mode, the timer
needs to be stopped to prevent it triggering subsequent interrupts.
Currently this code is in exynos4_mct_tick_clear(), but as it is
only needed when an ISR occurs move it into exynos4_mct_tick_isr(),
leaving exynos4_mct_tick_clear() just doing what its name suggests it
should.
Signed-off-by: Stuart Menefy <stuart.menefy@mathembedded.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The step values for some of the LDOs appears to be incorrect, resulting
in incorrect voltages (or at least, ones which are different from the
Samsung 3.4 vendor kernel).
Signed-off-by: Stuart Menefy <stuart.menefy@mathembedded.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If regulator DT node doesn't exist, its of_parse_cb callback
function isn't called. Then all values for DT properties are
filled with zero. This leads to wrong register update for
FPS and POK settings.
Signed-off-by: Jinyoung Park <jinyoungp@nvidia.com> Signed-off-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LDO35 uses 25 mV step, not 50 mV. Bucks 7 and 8 use 12.5 mV step
instead of 6.25 mV. Wrong step caused over-voltage (LDO35) or
under-voltage (buck7 and 8) if regulators were used (e.g. on Exynos5420
Arndale Octa board).
Cc: <stable@vger.kernel.org> Fixes: cb74685ecb39 ("regulator: s2mps11: Add samsung s2mps11 regulator driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
("spi: pxa2xx: Switch to SPI core DMA mapping functionality")
switches to SPI core provided DMA helpers, it missed to setup maximum
supported DMA transfer length for the controller and thus users
mistakenly try to send more data than supported with the following
warning:
ili9341 spi-PRP0001:01: DMA disabled for transfer length 153600 greater than 65536
Setup maximum supported DMA transfer length in order to make users know
the limit.
Fixes: b6ced294fb61 ("spi: pxa2xx: Switch to SPI core DMA mapping functionality") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4dea6c9b0b64 ("spi: spi-ti-qspi: add mmap mode read support") has
has got order of parameter wrong when calling regmap_update_bits() to
select CS for mmap access. Mask and value arguments are interchanged.
Code will work on a system with single slave, but fails when more than
one CS is in use. Fix this by correcting the order of parameters when
calling regmap_update_bits().
Fixes: 4dea6c9b0b64 ("spi: spi-ti-qspi: add mmap mode read support") Cc: stable@vger.kernel.org Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The original purpose of the code I fix is to replace max_discard with
max_trim if max_trim is less than max_discard. When max_discard is 0
we should replace max_discard with max_trim as well, because
max_discard equals 0 happens only when the max_do_calc_max_discard
process is overflowed, so if mmc_can_trim(card) is true, max_discard
should be replaced by an available max_trim.
However, in the original code, there are two lines of code interfere
the right process.
1) if (max_discard && mmc_can_trim(card))
when max_discard is 0, it skips the process checking if max_discard
needs to be replaced with max_trim.
2) if (max_trim < max_discard)
the condition is false when max_discard is 0. it also skips the process
that replaces max_discard with max_trim, in fact, we should replace the
0-valued max_discard with max_trim.
Now tuning reset will be done when the timing is MMC_TIMING_LEGACY/
MMC_TIMING_MMC_HS/MMC_TIMING_SD_HS. But for timing MMC_TIMING_MMC_HS,
we can not do tuning reset, otherwise HS400 timing is not right.
Here is the process of init HS400, first finish tuning in HS200 mode,
then switch to HS mode and 8 bit DDR mode, finally switch to HS400
mode. If we do tuning reset in HS mode, this will cause HS400 mode
lost the tuning setting, which will cause CRC error.
Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Cc: stable@vger.kernel.org # v4.12+ Acked-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: d9370424c948 ("mmc: sdhci-esdhc-imx: reset tuning circuit when power on mmc card") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If SSDT overlay is loaded via ConfigFS and then unloaded the device,
we would like to have OF modalias for, already gone. Thus, acpi_get_name()
returns no allocated buffer for such case and kernel crashes afterwards:
Commit f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit
PV guests") introduced a regression for booting dom0 on huge systems
with lots of RAM (in the TB range).
Reason is that on those hosts the p2m list needs to be moved early in
the boot process and this requires temporary page tables to be created.
Said commit modified xen_set_pte_init() to use a hypercall for writing
a PTE, but this requires the page table being in the direct mapped
area, which is not the case for the temporary page tables used in
xen_relocate_p2m().
As the page tables are completely written before being linked to the
actual address space instead of set_pte() a plain write to memory can
be used in xen_relocate_p2m().
The first version of this method was missing the check for
`ret == PATH_MAX`; then such a check was added, but it didn't call kfree()
on error, so there was still a small memory leak in the error case.
Fix it by using strndup_user() instead of open-coding it.
Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in
pipe files") use the current tracer instead of the copy in
tracing_open_pipe(), but it forget to remove the freeing sentence in
the error path.
There's an error path that can call kfree(iter->trace) after the iter->trace
was assigned to tr->current_trace, which would be bad to free.
Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com Cc: stable@vger.kernel.org Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because there may be random garbage beyond a string's null terminator,
it's not correct to copy the the complete character array for use as a
hist trigger key. This results in multiple histogram entries for the
'same' string key.
So, in the case of a string key, use strncpy instead of memcpy to
avoid copying in the extra bytes.
Before, using the gdbus entries in the following hist trigger as an
example:
When we have a READ lease for a file and have just issued a write
operation to the server we need to purge the cache and set oplock/lease
level to NONE to avoid reading stale data. Currently we do that
only if a write operation succedeed thus not covering cases when
a request was sent to the server but a negative error code was
returned later for some other reasons (e.g. -EIOCBQUEUED or -EINTR).
Fix this by turning off caching regardless of the error code being
returned.
The patches fixes generic tests 075 and 112 from the xfs-tests.
Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we hit failures during constructing MIDs or sending PDUs
through the network, we end up not using message IDs assigned
to the packet. The next SMB packet will skip those message IDs
and continue with the next one. This behavior may lead to a server
not granting us credits until we use the skipped IDs. Fix this by
reverting the current ID to the original value if any errors occur
before we push the packet through the network stack.
This patch fixes the generic/310 test from the xfs-tests.
Cc: <stable@vger.kernel.org> # 4.19.x Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently on lease break the client sets a caching level twice:
when oplock is detected and when oplock is processed. While the
1st attempt sets the level to the value provided by the server,
the 2nd one resets the level to None unconditionally.
This happens because the oplock/lease processing code was changed
to avoid races between page cache flushes and oplock breaks.
The commit c11f1df5003d534 ("cifs: Wait for writebacks to complete
before attempting write.") fixed the races for oplocks but didn't
apply the same changes for leases resulting in overwriting the
server granted value to None. Fix this by properly processing
lease breaks.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 5092fcf34908 ("crypto: arm64/aes-ce-ccm: add non-SIMD generic
fallback") introduced C fallback code to replace the NEON routines
when invoked from a context where the NEON is not available (i.e.,
from the context of a softirq taken while the NEON is already being
used in kernel process context)
Fix two logical flaws in the MAC calculation of the associated data.
The NEON MAC calculation routine fails to handle the case correctly
where there is some data in the buffer, and the input fills it up
exactly. In this case, we enter the loop at the end with w8 == 0,
while a negative value is assumed, and so the loop carries on until
the increment of the 32-bit counter wraps around, which is quite
obviously wrong.
So omit the loop altogether in this case, and exit right away.
Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: a3fd82105b9d1 ("arm64/crypto: AES in CCM mode using ARMv8 Crypto ...") Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The x86 MORUS implementations all fail the improved AEAD tests because
they produce the wrong result with some data layouts. The issue is that
they assume that if the skcipher_walk API gives 'nbytes' not aligned to
the walksize (a.k.a. walk.stride), then it is the end of the data. In
fact, this can happen before the end.
Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
incorrectly sleep in the skcipher_walk_*() functions while preemption
has been disabled by kernel_fpu_begin().
Fix these bugs.
Fixes: 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for MORUS") Cc: <stable@vger.kernel.org> # v4.18+ Cc: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gcmaes_crypt_by_sg() dereferences the NULL pointer returned by
scatterwalk_ffwd() when encrypting an empty plaintext and the source
scatterlist ends immediately after the associated data.
Fix it by only fast-forwarding to the src/dst data scatterlists if the
data length is nonzero.
This bug is reproduced by the "rfc4543(gcm(aes))" test vectors when run
with the new AEAD test manager.
Fixes: e845520707f8 ("crypto: aesni - Update aesni-intel_glue to use scatter/gather") Cc: <stable@vger.kernel.org> # v4.17+ Cc: Dave Watson <davejwatson@fb.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The x86 AEGIS implementations all fail the improved AEAD tests because
they produce the wrong result with some data layouts. The issue is that
they assume that if the skcipher_walk API gives 'nbytes' not aligned to
the walksize (a.k.a. walk.stride), then it is the end of the data. In
fact, this can happen before the end.
Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
incorrectly sleep in the skcipher_walk_*() functions while preemption
has been disabled by kernel_fpu_begin().
Fix these bugs.
Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations") Cc: <stable@vger.kernel.org> # v4.18+ Cc: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instantiating "cryptd(crc32c)" causes a crypto self-test failure because
the crypto_alloc_shash() in alg_test_crc32c() fails. This is because
cryptd(crc32c) is an ahash algorithm, not a shash algorithm; so it can
only be accessed through the ahash API, unlike shash algorithms which
can be accessed through both the ahash and shash APIs.
As the test is testing the shash descriptor format which is only
applicable to shash algorithms, skip it for ahash algorithms.
(Note that it's still important to fix crypto self-test failures even
for weird algorithm instantiations like cryptd(crc32c) that no one
would really use; in fips_enabled mode unprivileged users can use them
to panic the kernel, and also they prevent treating a crypto self-test
failure as a bug when fuzzing the kernel.)
Fixes: 8e3ee85e68c5 ("crypto: crc32c - Test descriptor context format") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some algorithms have a ->setkey() method that is not atomic, in the
sense that setting a key can fail after changes were already made to the
tfm context. In this case, if a key was already set the tfm can end up
in a state that corresponds to neither the old key nor the new key.
For example, in lrw.c, if gf128mul_init_64k_bbe() fails due to lack of
memory, then priv::table will be left NULL. After that, encryption with
that tfm will cause a NULL pointer dereference.
It's not feasible to make all ->setkey() methods atomic, especially ones
that have to key multiple sub-tfms. Therefore, make the crypto API set
CRYPTO_TFM_NEED_KEY if ->setkey() fails and the algorithm requires a
key, to prevent the tfm from being used until a new key is set.
[Cc stable mainly because when introducing the NEED_KEY flag I changed
AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
previously didn't have this problem. So these "incompletely keyed"
states became theoretically accessible via AF_ALG -- though, the
opportunities for causing real mischief seem pretty limited.]
Fixes: f8d33fac8480 ("crypto: skcipher - prevent using skciphers without setting key") Cc: <stable@vger.kernel.org> # v4.16+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The memcpy()s in the PCBC implementation use walk->iv as both the source
and destination, which has undefined behavior. These memcpy()'s are
actually unneeded, because walk->iv is already used to hold the previous
plaintext block XOR'd with the previous ciphertext block. Thus,
walk->iv is already updated to its final value.
So remove the broken and unnecessary memcpy()s.
Fixes: 91652be5d1b9 ("[CRYPTO] pcbc: Add Propagated CBC template") Cc: <stable@vger.kernel.org> # v2.6.21+ Cc: David Howells <dhowells@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The generic MORUS implementations all fail the improved AEAD tests
because they produce the wrong result with some data layouts. The issue
is that they assume that if the skcipher_walk API gives 'nbytes' not
aligned to the walksize (a.k.a. walk.stride), then it is the end of the
data. In fact, this can happen before the end. Fix them.
Fixes: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD implementations") Cc: <stable@vger.kernel.org> # v4.18+ Cc: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some algorithms have a ->setkey() method that is not atomic, in the
sense that setting a key can fail after changes were already made to the
tfm context. In this case, if a key was already set the tfm can end up
in a state that corresponds to neither the old key nor the new key.
It's not feasible to make all ->setkey() methods atomic, especially ones
that have to key multiple sub-tfms. Therefore, make the crypto API set
CRYPTO_TFM_NEED_KEY if ->setkey() fails and the algorithm requires a
key, to prevent the tfm from being used until a new key is set.
Note: we can't set CRYPTO_TFM_NEED_KEY for OPTIONAL_KEY algorithms, so
->setkey() for those must nevertheless be atomic. That's fine for now
since only the crc32 and crc32c algorithms set OPTIONAL_KEY, and it's
not intended that OPTIONAL_KEY be used much.
[Cc stable mainly because when introducing the NEED_KEY flag I changed
AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
previously didn't have this problem. So these "incompletely keyed"
states became theoretically accessible via AF_ALG -- though, the
opportunities for causing real mischief seem pretty limited.]
Fixes: 9fa68f620041 ("crypto: hash - prevent using keyed hashes without setting key") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SIMD routine ported from x86 used to have a special code path
for inputs < 16 bytes, which got lost somewhere along the way.
Instead, the current glue code aligns the input pointer to 16 bytes,
which is not really necessary on this architecture (although it
could be beneficial to performance to expose aligned data to the
the NEON routine), but this could result in inputs of less than
16 bytes to be passed in. This not only fails the new extended
tests that Eric has implemented, it also results in the code
reading past the end of the input, which could potentially result
in crashes when dealing with less than 16 bytes of input at the
end of a page which is followed by an unmapped page.
So update the glue code to only invoke the NEON routine if the
input is at least 16 bytes.
Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Fixes: 6ef5737f3931 ("crypto: arm64/crct10dif - port x86 SSE implementation to arm64") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The arm64 NEON bit-sliced implementation of AES-CTR fails the improved
skcipher tests because it sometimes produces the wrong ciphertext. The
bug is that the final keystream block isn't returned from the assembly
code when the number of non-final blocks is zero. This can happen if
the input data ends a few bytes after a page boundary. In this case the
last bytes get "encrypted" by XOR'ing them with uninitialized memory.
Fix the assembly code to return the final keystream block when needed.
Fixes: 88a3f582bea9 ("crypto: arm64/aes - don't use IV buffer to return final keystream block") Cc: <stable@vger.kernel.org> # v4.11+ Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SIMD routine ported from x86 used to have a special code path
for inputs < 16 bytes, which got lost somewhere along the way.
Instead, the current glue code aligns the input pointer to permit
the NEON routine to use special versions of the vld1 instructions
that assume 16 byte alignment, but this could result in inputs of
less than 16 bytes to be passed in. This not only fails the new
extended tests that Eric has implemented, it also results in the
code reading past the end of the input, which could potentially
result in crashes when dealing with less than 16 bytes of input
at the end of a page which is followed by an unmapped page.
So update the glue code to only invoke the NEON routine if the
input is at least 16 bytes.
Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Fixes: 1d481f1cd892 ("crypto: arm/crct10dif - port x86 SSE implementation to ARM") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>