On delalloc enabled file system on invalidatepage operation
in ext4_da_page_release_reservation() we want to clear the delayed
buffer and remove the extent covering the delayed buffer from the extent
status tree.
However currently there is a bug where on the systems with page size >
block size we will always remove extents from the start of the page
regardless where the actual delayed buffers are positioned in the page.
This leads to the errors like this:
EXT4-fs warning (device loop0): ext4_da_release_space:1225:
ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
blocks
This however can cause data loss on writeback time if the file system is
in ENOSPC condition because we're releasing reservation for someones
else delayed buffer.
Fix this by only removing extents that corresponds to the part of the
page we want to invalidate.
This problem is reproducible by the following fio receipt (however I was
only able to reproduce it with fio-2.1 or older.
sb_getblk() is used during ext4 (and possibly other FSes) writeback
paths. Sometimes such path require allocating memory and guaranteeing
that such allocation won't block. Currently, however, there is no way
to provide user flags for sb_getblk which could lead to deadlocks.
This patch implements a sb_getblk_gfp with the only difference it can
accept user-provided GFP flags.
Commit 8f4d8558391: "ext4: fix lazytime optimization" was not a
complete fix. In the case where the inode number is a multiple of 16,
and we could still end up updating an inode with dirty timestamps
written to the wrong inode on disk. Oops.
This can be easily reproduced by using generic/005 with a file system
with metadata_csum and lazytime enabled.
Newer versions of mount parse the lazytime feature and pass it to the
mount system call via the flags field in the mount system call,
removing the lazytime string from the mount options list. So we need
to check for the presence of MS_LAZYTIME and set it in sb->s_flags in
order for this flag to be set on a remount.
ext4 isn't willing to map clusters to a non-extent file. Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever. Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.
(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Normally all of the buffers will have been forced out to disk before
we call invalidate_bdev(), but there will be some cases, where a file
system operation was aborted due to an ext4_error(), where there may
still be some dirty buffers in the buffer cache for the device. So
try to force them out to memory before calling invalidate_bdev().
This fixes a warning triggered by generic/081:
WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()
The commit cf108bca465d: "ext4: Invert the locking order of page_lock
and transaction start" caused __ext4_journalled_writepage() to drop
the page lock before the page was written back, as part of changing
the locking order to jbd2_journal_start -> page_lock. However, this
introduced a potential race if there was a truncate racing with the
data=journalled writeback mode.
Fix this by grabbing the page lock after starting the journal handle,
and then checking to see if page had gotten truncated out from under
us.
This fixes a number of different warnings or BUG_ON's when running
xfstests generic/086 in data=journalled mode, including:
By default all the sensors are runtime suspended state (lowest power
state). During Linux suspend process, all the run time suspended
devices are resumed and then suspended. This caused all sensors to
power up and introduced delay in suspend time, when we introduced
runtime PM for HID sensors. The opposite process happens during resume
process.
To fix this, we do powerup process of the sensors only when the request
is issued from user (raw or tiggerred). In this way when runtime,
resume calls for powerup it will simply return as this will not match
user requested state.
Note this is a regression fix as the increase in suspend / resume
times can be substantial (report of 8 seconds on Len's laptop!)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Len Brown <len.brown@intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Normally, low-level Comedi drivers set an `insn_bits` handler for
digital input (DI), digital output (DO) and digital input/output (DIO)
subdevice types to handle normal reading and writing of digital
channels. The "cb_pcimdas" driver currently has an `insn_read` handler
for the DI subdevice and an `insn_write` handler for the DO subdevice.
However, the actual handler functions `cb_pcimdas_di_insn_read()` and
`cb_pcimdas_do_insn_write()` are written to behave like `insn_bits`
handlers. Something's wrong there! To fix it, set the functions as
`insn_bits` handlers and rename them for consistency.
Fixes: e56d03dee14a ("staging: comedi: cb_pcimdas: add main connector digital input/output") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With an RTL8191SU USB adaptor, sometimes the hints for a fragmented
packet are set, but the packet length is too large. Allocate enough
space to prevent memory corruption and a resulting kernel panic [1].
struct ieee802154_addr_sa {
int addr_type;
u16 pan_id;
union {
u8 hwaddr[IEEE802154_ADDR_LEN];
u16 short_addr;
};
};
On most architectures there will be implicit structure padding here,
in two different places:
* In struct sockaddr_ieee802154, two bytes of padding between 'family'
(unsigned short) and 'addr', so that 'addr' starts on a four byte
boundary.
* In struct ieee802154_addr_sa, two bytes at the end of the structure,
to make the structure 16 bytes.
When calling recvmsg(2) on a PF_IEEE802154 SOCK_DGRAM socket, the
ieee802154 stack constructs a struct sockaddr_ieee802154 on the
kernel stack without clearing these padding fields, and, depending
on the addr_type, between four and ten bytes of uncleared kernel
stack will be copied to userspace.
We can't just insert two 'u16 __pad's in the right places and zero
those before copying an address to userspace, as not all architectures
insert this implicit padding -- from a quick test it seems that avr32,
cris and m68k don't insert this padding, while every other architecture
that I have cross compilers for does insert this padding.
The easiest way to plug the leak is to just memset the whole struct
sockaddr_ieee802154 before filling in the fields we want to fill in,
and that's what this patch does.
Several of these drivers have there TX randomly blocked for 3~5 seconds while
measuring tx throughput (iperf). The root couse happens in rtl_pci_flush().
The function uses a while-loop to wait for TX queue length to decrease to 0.
The TX queue length counts the number of packets that are queued in the driver.
The driver relys on the TX OK interrupt to return skb and reduce TX queue length.
The interrupt subroutine disables interupts, reads the interrupt registers, and
then clears the registers in the beginning of _rtl_pci_interrupt(). After all
interupts process are finished, the driver invokes enable_interrupt() to enable
interupts. This behavior is normal for an interrupt subroutine.
But enable_interrupt() invokes clear_interrupt() again. This unexpected interrupt
clearing may cleari me fresh TX OK interrupts. These missing interrupts cause TX
queue length to never reduce to 0i, which causes rtl_pci_flush() to be stuck in
unterminated while-loop.
This patch removes clear_interrupt() in enable_interrupt() to avoid this behavior.
Signed-off-by: Vincent Fann <vincent_fann@realtek.com> Signed-off-by: Shao Fu <shaofu@realtek.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In d8a2c51cdcae ('ath9k_htc: Use atomic operations for op_flags') we
changed things like this:
- if (priv->op_flags & OP_TSF_RESET) {
+ if (test_bit(OP_TSF_RESET, &priv->op_flags)) {
The problem is that test_bit() takes a bit number and not a mask. It
means that when we do:
set_bit(OP_TSF_RESET, &priv->op_flags);
Then it sets the (1 << 6) bit instead of the 6 bit so we are setting a
bit which is past the end of the unsigned long.
Fixes: d8a2c51cdcae ('ath9k_htc: Use atomic operations for op_flags') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AR93xx and newer needs to stop rx before tx to avoid getting the DMA
engine or MAC into a stuck state.
This should reduce/fix the occurence of "Failed to stop Tx DMA" logspam.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1c8ba6d013 moved around the setup code for broadcomm chips,
and also added btbcm_read_verbose_config() to read extra information
about the hardware. It's returning errors on some macbooks:
Bluetooth: hci0: BCM: Read verbose config info failed (-16)
Which makes us error out of the setup function. Since this
probe isn't critical to operate the chip, this patch just changes
things to carry on when it fails.
This patch fixes the command length alignment issue for Intel Bluetooth
8260.
The length of parameters in the firmware downloading command must be
multiplication of 4. If not, the command must append Intel_NOP command
with extra parameters, zeros, at the end, and the firmware file is
already included Intel_NOP command for alignment.
This patch checks the next command and if the next command is Intel_NOP
command, it reads the Intel_NOP command and send them together.
For example, if the data from the firmware file looks like this:
8E FC 03 11 22 33 02 FC 03 00 00 00
Previously, btusb sends two commands:
09 FC 06 8E FC 03 11 22 33
09 FC 06 02 FC 03 00 00 00
This won't work because the length of parameters are 6 which violates
the 4 byte alignment.
This patch will append them together and send as one command:
09 FC 0C 8E FC 03 11 22 33 02 FC 03 00 00 00
Based on previous work from Tedd Ho-Jeong An <tedd.an@intel.com>
Reported-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During the initial setup stage of a controller, the low-level transport
is actually active. This means that HCI_UP is true. To avoid toggling
the transport off and back on again for normal operation the kernel
holds a grace period with HCI_AUTO_OFF that will turn the low-level
transport off in case no user is present.
The idea of the grace period is important to avoid having to initialize
all of the controller twice. So legacy ioctl and the new management
interface knows how to clear this grace period and then start normal
operation.
For the user channel operation this grace period has not been taken into
account which results in the problem that HCI_UP and HCI_AUTO_OFF are
set and the kernel will return EBUSY. However from a system point of
view the controller is ready to be grabbed by either the ioctl, the
management interface or the user channel.
This patch brings the user channel to the same level as the other two
entries for operating a controller.
It is possible to disable the clock selection at configuration time,
but for ColdFire targets we always expect a clock frequency to be
selected. This results in the following compile time error:
CC arch/m68k/kernel/asm-offsets.s
In file included from ./arch/m68k/include/asm/timex.h:14:0,
from include/linux/timex.h:65,
from include/linux/sched.h:19,
from arch/m68k/kernel/asm-offsets.c:14:
./arch/m68k/include/asm/coldfire.h:25:2: error: #error "Don't know what your ColdFire CPU clock frequency is??"
Remove CONFIG_CLOCK_SELECT completely and always enable CONFIG_CLOCK_FREQ
for ColdFire.
It would be nice if we could support multiple ColdFire SoC types in a
single binary - but currently the code simply does not support it.
Change the SoC selection config options to be a choice instead of
individual selectable entries.
This fixes problems with building allnoconfig, and means that a sane
linux kernel is generated for a single ColdFire SoC type.
kernel/uid16.c: In function 'SYSC_setgroups16':
kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc'
kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast
openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n.
Fixes: 2813893f8b197a1 ("kernel: conditionally support non-root users, groups and capabilities") Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The latest version of the Armada XP datasheet no longer documents the
VDD cpu_pd functions, which might indicate they are not working and/or
not supported. This commit ensures the pinctrl driver matches the
datasheet.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: 463e270f766a ("pinctrl: mvebu: add pinctrl driver for Armada XP") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After updating to a more recent version of the Armada XP datasheet, we
realized that some of the pins documented as having a NAND-related
functionality in fact did not have such functionality. This commit
updates the pinctrl driver accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: 463e270f766a ("pinctrl: mvebu: add pinctrl driver for Armada XP") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pinctrl_gpio_range[] array described a first bank of 32 GPIOs and
a second one of 27 GPIOs. However, since there is a total of 60 MPP
pins that can be muxed as GPIOs, the second bank really has 28 GPIOs.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: ee086577abe7f ("pinctrl: mvebu: add pinctrl driver for Marvell Armada 39x") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pinctrl_gpio_range[] array described a first bank of 32 GPIOs and
a second one of 27 GPIOs. However, since there is a total of 60 MPP
pins that can be muxed as GPIOs, the second bank really has 28 GPIOs.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: ca6d9a084b56f ("pinctrl: mvebu: add pin-muxing driver for the Marvell Armada 380/385") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A new revision of the Marvell Armada 38x hardware datasheet unveiled
that the definition of some of the PCIe functions were not
correct. This commit fixes the pinctrl driver accordingly.
Some PCIe functions simply do not exist, some of the PCIe functions in
fact were corresponding to other functions, and some PCIe functions
have been added.
Note: the seemingly unrelated removal of spi(cs2) on MPP47 is related:
this function is in fact implemented on MPP43, instead of a PCIe
function.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: ca6d9a084b56f ("pinctrl: mvebu: add pin-muxing driver for the Marvell Armada 380/385") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After updating to a more recent version of the Armada 375, we realized
that some of the pins documented as having a NAND-related
functionality in fact did not have such functionality. This commit
updates the pinctrl driver accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: ce3ed59dcddd ("pinctrl: mvebu: add pin-muxing driver for the Marvell Armada 375") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The i2c_master_recv() uses readsize to receive data from i2c but compares
to size of rdbuf which is always 27. This would cause problem when the
max_fingers is not 5. Change the comparison value to readsize instead.
Fixes: 36874c7e219 ("Input: pixcir_i2c_ts - support up to 5 fingers and
hardware tracking IDs:)
41f8bba7f555 ("of/pci: Add pci_register_io_range() and
pci_pio_to_address()") added support for systems with several I/O ranges
described by OF bindings. It modified pci_address_to_pio() look up the
io_range for a given CPU physical address, but the conversion was wrong.
The commit referenced below deferred waiting for command completion until
the start of the next command, allowing hardware to do the latching
asynchronously. Unfortunately, being ready to accept a new command is the
only indication we have that the previous command is completed. In cases
where we need that state change to be enabled, we must still wait for
completion. For instance, pciehp_reset_slot() attempts to disable anything
that might generate a surprise hotplug on slots that support presence
detection. If we don't wait for those settings to latch before the
secondary bus reset, we negate any value in attempting to prevent the
spurious hotplug.
Create a base function with optional wait and helper functions so that
pcie_write_cmd() turns back into the "safe" interface which waits before
and after issuing a command and add pcie_write_cmd_nowait(), which
eliminates the trailing wait for asynchronous completion. The following
functions are returned to their previous behavior:
The rationale is that pciehp_power_on_slot() enables the link and therefore
relies on completion of power-on. pciehp_power_off_slot() and
pcie_disable_notification() need a wait because data structures may be
freed after these calls and continued signaling from the device would be
unexpected. And, of course, pciehp_reset_slot() needs to wait for the
scenario outlined above.
The problem is that sparc64 assumed that dma_addr_t only needed to hold DMA
addresses, i.e., bus addresses returned via the DMA API (dma_map_single(),
etc.), while the PCI core assumed dma_addr_t could hold *any* bus address,
including raw BAR values. On sparc64, all DMA addresses fit in 32 bits, so
dma_addr_t is a 32-bit type. However, BAR values can be 64 bits wide, so
they don't fit in a dma_addr_t. d63e2e1f3df9 added new checking that
tripped over this mismatch.
Add pci_bus_addr_t, which is wide enough to hold any PCI bus address,
including both raw BAR values and DMA addresses. This will be 64 bits
on 64-bit platforms and on platforms with a 64-bit dma_addr_t. Then
dma_addr_t only needs to be wide enough to hold addresses from the DMA API.
Refine the mechanism introduced by commit f244d8b623da ("ACPIPHP / radeon /
nouveau: Fix VGA switcheroo problem related to hotplug") to propagate the
ignore_hotplug setting of the device to its parent bridge in case hotplug
notifications related to the graphics adapter switching are given for the
bridge rather than for the device itself (they need to be ignored in both
cases).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=61891 Link: https://bugs.freedesktop.org/show_bug.cgi?id=88927 Fixes: b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") Reported-and-tested-by: tiagdtd-lava <tiagdtd-lava@yahoo.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit bd31b85960a7 (which is in 3.2-rc1) nw_gpio_lock is a raw spinlock
that needs usage of the corresponding raw functions.
This fixes:
drivers/mtd/maps/dc21285.c: In function 'nw_en_write':
drivers/mtd/maps/dc21285.c:41:340: warning: passing argument 1 of 'spinlock_check' from incompatible pointer type
spin_lock_irqsave(&nw_gpio_lock, flags);
In file included from include/linux/seqlock.h:35:0,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/mtd/maps/dc21285.c:8:
include/linux/spinlock.h:299:102: note: expected 'struct spinlock_t *' but argument is of type 'struct raw_spinlock_t *'
static inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
^
drivers/mtd/maps/dc21285.c:43:25: warning: passing argument 1 of 'spin_unlock_irqrestore' from incompatible pointer type
spin_unlock_irqrestore(&nw_gpio_lock, flags);
^
In file included from include/linux/seqlock.h:35:0,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/mtd/maps/dc21285.c:8:
include/linux/spinlock.h:370:91: note: expected 'struct spinlock_t *' but argument is of type 'struct raw_spinlock_t *'
static inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
Fixed by taking the mutex in blktrans_open and blktrans_release.
Note that this locking is already suggested in
include/linux/mtd/blktrans.h:
struct mtd_blktrans_ops {
...
/* Called with mtd_table_mutex held; no race with add/remove */
int (*open)(struct mtd_blktrans_dev *dev);
void (*release)(struct mtd_blktrans_dev *dev);
...
};
But we weren't following it.
Originally reported by (and patched by) Zhang and Giuseppe,
independently. Improved and rewritten.
Reported-by: Zhang Xingcai <zhangxingcai@huawei.com> Reported-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com> Tested-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com> Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Setting a dev_pm_ops suspend/resume pair of callbacks but not a set of
hibernation callbacks means those pm functions will not be
called upon hibernation - that leads to system crash on ARM during
freezing if gpio-led is used in combination with CPU led trigger.
It may happen after freeze_noirq stage (GPIO is suspended)
and before syscore_suspend stage (CPU led trigger is suspended)
- usually when disable_nonboot_cpus() is called.
Log:
PM: noirq freeze of devices complete after 1.425 msecs
Disabling non-boot CPUs ...
^ system may crash or stuck here with message (TI AM572x)
WARNING: CPU: 0 PID: 3100 at drivers/bus/omap_l3_noc.c:148 l3_interrupt_handler+0x22c/0x370() 44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4_PER1_P3 (Idle): Data Access in Supervisor mode during Functional access
CPU1: shutdown
^ or here
Fix this by using SIMPLE_DEV_PM_OPS, which appropriately
assigns the suspend and hibernation callbacks and move
led_suspend/led_resume under CONFIG_PM_SLEEP to avoid
build warnings.
Fixes: 73e1ab41a80d (leds: Convert led class driver from legacy pm ops to dev_pm_ops) Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org> Acked-by: Jacek Anaszewski <j.anaszewski@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The LCDIF engines embedded in i.MX6sl and i.MX6sx SoCs need the axi clock
as the engine's system clock. The clock should be enabled when accessing
LCDIF registers, otherwise the kernel would hang up. We should also keep
the clock enabled when the engine is being active to scan out frames from
memory. This patch makes sure the axi clock is enabled when accessing
registers so that the kernel hang up issue can be fixed.
Reported-by: Peter Chen <peter.chen@freescale.com> Tested-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Liu Ying <Ying.Liu@freescale.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
request_any_context_irq() returns a negative value on failure.
It returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED on success.
So fix testing return value of request_any_context_irq().
Also fixup the return value of devm_request_any_context_irq() to make it
consistent with request_any_context_irq().
Fixes: 0668d3065128 ("genirq: Add devm_request_any_context_irq()") Signed-off-by: Axel Lin <axel.lin@ingics.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1431334978.17783.4.camel@ingics.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Although it is possible to let SRP I/O continue if a reconnect
results in a reduction of the number of channels, the current
code does not handle this scenario correctly. Instead of making
the reconnect code more complex, consider this as a reconnection
failure.
Reception of a DREQ message only causes the state of a single
channel to change. Hence move the 'connected' member variable
from the target to the channel data structure. This patch
avoids that following false positive warning can be reported
by srp_destroy_qp():
Fix a scsi_get_host() / scsi_host_put() imbalance in the error
path of srp_create_target(). See also patch "IB/srp: Avoid that
I/O hangs due to a cable pull during LUN scanning" (commit ID 34aa654ecb8e).
Avoid that srp_terminate_io() can get invoked while srp_queuecommand()
is in progress. This patch avoids that an I/O timeout can trigger the
following kernel warning:
Introduce the helper function srp_wait_for_queuecommand().
Move the definition of scsi_request_fn_active(). Add a comment
above srp_wait_for_queuecommand() that support for scsi-mq needs
to be added.
This patch does not change any functionality. A second call to
srp_wait_for_queuecommand() will be introduced in the next patch.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: James Bottomley <JBottomley@Odin.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 39a6ac11df65 ("spi/pl022: Devicetree support w/o platform data")
the 'num-cs' parameter cannot be passed through platform data when probing
with devicetree. Instead, it's a required devicetree property.
Fix the binding documentation so the property is properly specified.
Fixes: 39a6ac11df65 ("spi/pl022: Devicetree support w/o platform data") Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit df59fa7f4bca "spi: orion: support armada extended baud
rates" was too optimistic for the maximum baud rate that the Armada
SoCs can support. According to the hardware datasheet the maximum
frequency supported by the Armada 370 SoC is tclk/4. But for the
Armada XP, Armada 38x and Armada 39x SoCs the limitation is 50MHz and
for the Armada 375 it is tclk/15.
Currently the armada-370-spi compatible is only used by the Armada 370
and the Armada XP device tree. On Armada 370, tclk cannot be higher
than 200MHz. In order to be able to handle both SoCs, we can take the
minimum of 50MHz and tclk/4.
A proper solution is adding a compatible string for each SoC, but it
can't be done as a fix for compatibility reason (we can't modify
device tree that have been already released) and it will be part of a
separate patch.
Fixes: df59fa7f4bca (spi: orion: support armada extended baud rates) Reported-by: Kostya Porotchkin <kostap@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a race (with some kernel configurations) where a queued
master->pump_messages runs and frees dummy_tx/rx before
spi_unmap_msg is running (or is finished).
This results in the following messages:
BUG: Bad page state in process
page:db7ba030 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x200(arch_1)
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
...
Reported-by: Noralf Trønnes <noralf@tronnes.org> Suggested-by: Noralf Trønnes <noralf@tronnes.org> Tested-by: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Martin Sperl <kernel@martin.sperl.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The list of loaded modules is walked through in
module_kallsyms_on_each_symbol (called by kallsyms_on_each_symbol). The
module_mutex lock should be acquired to prevent potential corruptions
in the list.
This was uncovered with new lockdep asserts in module code introduced by
the commit 0be964be0d45 ("module: Sanitize RCU usage and locking") in
recent next- trees.
The buffer for condtraints debug isn't big enough to hold the output
in all cases. So fix this issue by increasing the buffer.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The way the mask is generated in regmap_field_init() is wrong.
Indeed, a field initialized with msb = 31 and lsb = 0 provokes a shift
overflow while calculating the mask field.
On some 32 bits architectures, such as x86, the generated mask is 0,
instead of the expected 0xffffffff.
This patch uses GENMASK() to fix the problem, as this macro is already safe
regarding shift overflow.
Signed-off-by: Maxime Coquelin <maxime.coquelin@st.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In big endian mode regmap_bulk_read gives incorrect data
for byte reads.
This is because memcpy of a single byte from an address
after full word read gives different results when
endianness differs. ie. we get little-end in LE and big-end in BE.
Signed-off-by: Arun Chandran <achandran@mvista.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 077fcf116c8c ("mm/thp: allocate transparent hugepages on
local node"), we handle THP allocations on page fault in a special way -
for non-interleave memory policies, the allocation is only attempted on
the node local to the current CPU, if the policy's nodemask allows the
node.
This is motivated by the assumption that THP benefits cannot offset the
cost of remote accesses, so it's better to fallback to base pages on the
local node (which might still be available, while huge pages are not due
to fragmentation) than to allocate huge pages on a remote node.
The nodemask check prevents us from violating e.g. MPOL_BIND policies
where the local node is not among the allowed nodes. However, the
current implementation can still give surprising results for the
MPOL_PREFERRED policy when the preferred node is different than the
current CPU's local node.
In such case we should honor the preferred node and not use the local
node, which is what this patch does. If hugepage allocation on the
preferred node fails, we fall back to base pages and don't try other
nodes, with the same motivation as is done for the local node hugepage
allocations. The patch also moves the MPOL_INTERLEAVE check around to
simplify the hugepage specific test.
The difference can be demonstrated using in-tree transhuge-stress test
on the following 2-node machine where half memory on one node was
occupied to show the difference.
Number of successful THP allocations corresponds to free memory on node 0 in
the first case and node 1 in the second case, i.e. -p parameter is ignored and
cpu binding "wins".
After the patch:
> numactl -p0 -C0 ./transhuge-stress
transhuge-stress: 2.183 s/loop, 0.274 ms/page, 7295.516 MiB/s 7962 succeed, 0 failed, 1760 different pages
Beginning at commit d52d3997f843 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc7-next-20150612 #1 Not tainted
-------------------------------
kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by systemd/1:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
#1: (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
#2: (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
stack backtrace:
CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 04/17/2014
Call Trace:
dump_stack+0x4c/0x6e
lockdep_rcu_suspicious+0xe7/0x120
___might_sleep+0x1d5/0x1f0
__might_sleep+0x4d/0x90
kmem_cache_alloc+0x47/0x250
create_object+0x39/0x2e0
kmemleak_alloc_percpu+0x61/0xe0
pcpu_alloc+0x370/0x630
Additional backtrace lines are truncated. In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs. As suggested by Martin KaFai Lau,
these are the clue to the fix. Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kmemleak scanning thread can run for minutes. Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock. Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ... -> __delete_object() function until the lock is
released in scan_object().
When a kmemleak error occurs (e.g. it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects. If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()). This leads to kmemleak_scan potentially triggering a
page fault.
This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed. The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.
When building the kernel with a bare-metal (ELF) toolchain, the -shared
option may not be passed down to collect2, resulting in silent corruption
of the vDSO image (in particular, the DYNAMIC section is omitted).
The effect of this corruption is that the dynamic linker fails to find
the vDSO symbols and libc is instead used for the syscalls that we
intended to optimise (e.g. gettimeofday). Functionally, there is no
issue as the sigreturn trampoline is still intact and located by the
kernel.
This patch fixes the problem by explicitly passing -shared to the linker
when building the vDSO.
Reported-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Reported-by: James Greenlaigh <james.greenhalgh@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The memmap freeing code in free_unused_memmap() computes the end of
each memblock by adding the memblock size onto the base. However,
if SPARSEMEM is enabled then the value (start) used for the base
may already have been rounded downwards to work out which memmap
entries to free after the previous memblock.
This may cause memmap entries that are in use to get freed.
In general, you're not likely to hit this problem unless there
are at least 2 memblocks and one of them is not aligned to a
sparsemem section boundary. Note that carve-outs can increase
the number of memblocks by splitting the regions listed in the
device tree.
This problem doesn't occur with SPARSEMEM_VMEMMAP, because the
vmemmap code deals with freeing the unused regions of the memmap
instead of requiring the arch code to do it.
This patch gets the memblock base out of the memblock directly when
computing the block end address to ensure the correct value is used.
Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6c81fe7925cc4c42 ("arm64: enable context tracking") did not
update el0_sp_pc to use ct_user_exit, but this appears to have been
unintentional. In commit 6ab6463aeb5fbc75 ("arm64: adjust el0_sync so
that a function can be called") we made x0 available, and in the return
to userspace we call ct_user_enter in the kernel_exit macro.
Due to this, we currently don't correctly inform RCU of the user->kernel
transition, and may erroneously account for time spent in the kernel as
if we were in an extended quiescent state when CONFIG_CONTEXT_TRACKING
is enabled.
As we do record the kernel->user transition, a userspace application
making accesses from an unaligned stack pointer can demonstrate the
imbalance, provoking the following warning:
After secondary CPU boot or hotplug, the active_mm of the idle thread is
&init_mm. The init_mm.pgd (swapper_pg_dir) is only meant for TTBR1_EL1
and must not be set in TTBR0_EL1. Since when active_mm == &init_mm the
TTBR0_EL1 is already set to the reserved value, there is no need to
perform any context reset.
HW has to be in known state before the initialisation
sequence is started. The polling step for settling aliveness
was set to 200ms while in practise this can be done in up to 30msecs.
Fix the hbm power gating state machine so it will wait till it receives
confirmation interrupt for the PG_ISOLATION_EXIT message.
In process of the suspend flow the devices first have to exit from the
power gating state (runtime pm resume).
If we do not handle the confirmation interrupt after sending
PG_ISOLATION_EXIT message, we may receive it already after the suspend
flow has changed the device state and interrupt will be interpreted as a
spurious event, consequently link reset will be invoked which will
prevent the device from completing the suspend flow
kernel: [6603] mei_reset:136: mei_me 0000:00:16.0: powering down: end of reset
kernel: [476] mei_me_irq_thread_handler:643: mei_me 0000:00:16.0: function called after ISR to handle the interrupt processing.
kernel: mei_me 0000:00:16.0: FW not ready: resetting
Don't call the power_supply_changed() from power_supply_register() when
parent is still probing because it may lead to accessing parent too
early.
In bq27x00_battery this caused NULL pointer exception because uevent of
power_supply_changed called back the the get_property() method provided
by the driver. The get_property() method accessed pointer which should
be returned by power_supply_register().
Starting from bq27x00_battery_probe():
di->bat = power_supply_register()
power_supply_changed()
kobject_uevent()
power_supply_uevent()
power_supply_show_property()
power_supply_get_property()
bq27x00_battery_get_property()
dereference of di->bat which is NULL here
The dereference of di->bat (value returned by power_supply_register())
is the currently visible problem. However calling back the methods
provided by driver before ending the probe may lead to accessing other
driver-related data which is not yet initialized.
The call to power_supply_changed() is postponed till probing ends -
mutex of parent device is released.
Reported-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: 297d716f6260 ("power_supply: Change ownership from driver to core") Tested-By: Dr. H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Power supply is often registered during probe of a driver. The
power_supply_register() returns pointer to newly allocated structure as
return value. However before returning the power_supply_register()
calls back the get_property() method provided by the driver through
uevent.
In that time the driver probe is still in progress and driver did not
assigned pointer to power supply to its local variables. This leads to
NULL pointer dereference from get_property() function.
Starting from bq27x00_battery_probe():
di->bat = power_supply_register()
device_add()
kobject_uevent()
power_supply_uevent()
power_supply_show_property()
power_supply_get_property()
bq27x00_battery_get_property()
dereference of (di->bat) which is NULL here
The first uevent of power supply (the one coming from device creation)
should not call back to the driver. To prevent that from happening,
increment the atomic use counter at the end of power_supply_register().
This means that power_supply_get_property() will return -ENODEV.
IMPORTANT:
The patch has impact on this first uevent sent from power supply because
it will not contain properties from power supply.
The uevent with properties will be sent later after indicating that
power supply has changed. This also has a race now, but will be fixed in
other patches.
Reported-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: 297d716f6260 ("power_supply: Change ownership from driver to core") Tested-By: Dr. H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
static arc_pmu in the arch/arc/kernel/perf_event.c is not initialized as
it's shadowed by a local variable of the same name in the
arc_pmu_device_probe.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Fixes: 03c94fcf954d "ARC: perf: make @arc_pmu static global" Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
Since ARCv2 only provides load/load, store/store and all/all, we need
the full barrier
- LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
values were lacking the explicit smp barriers.
- Non LLOCK/SCOND varaints don't need the explicit barriers since that
is implicity provided by the spin locks used to implement the
critical section (the spin lock barriers in turn are also fixed in
this commit as explained above
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make 3.81 doesn't have the 'undefine' command. Using undefine
to clear LDFLAGS fails when make version 3.81 is used. Fix it
to use override to clear LDFLAGS.
Commit b9a5e5e18fbf "ACPI / init: Fix the ordering of
acpi_reserve_resources()" overlooked the fact that the memory
and/or I/O regions reserved by acpi_reserve_resources() may
conflict with those reserved by the PNP "system" driver.
If that conflict actually takes place, it causes the reservations
made by the "system" driver to fail while before commit b9a5e5e18fbf
all reservations made by it and by acpi_reserve_resources() would be
successful. In turn, that allows the resources that haven't been
reserved by the "system" driver to be used by others (e.g. PCI) which
sometimes leads to functional problems (up to and including boot
failures).
To fix that issue, introduce a common resource reservation routine,
acpi_reserve_region(), to be used by both acpi_reserve_resources()
and the "system" driver, that will track all resources reserved by
it and avoid making conflicting requests.
Add missing invocation of pm_generic_complete() to
acpi_subsys_complete() to allow ->complete callbacks provided
by the drivers of devices using the ACPI PM domain to be executed
during system resume.
Fixes: f25c0ae2b4c4 (ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before
timekeeping_init()" moved the ACPI subsystem initialization,
including the ACPI mode enabling, to an earlier point in the
initialization sequence, to allow the timekeeping subsystem
use ACPI early. Unfortunately, that resulted in boot regressions
on some systems and the early ACPI initialization was moved toward
its original position in the kernel initialization code by commit c4e1acbb35e4 "ACPI / init: Invoke early ACPI initialization later".
However, that turns out to be insufficient, as boot is still broken
on the Tyan S8812 mainboard.
To fix that issue, split the ACPI early initialization code into
two pieces so the majority of it still located in acpi_early_init()
and the part switching over the platform into the ACPI mode goes into
a new function, acpi_subsystem_init(), executed at the original early
ACPI initialization spot.
That fixes the Tyan S8812 boot problem, but still allows ACPI
tables to be loaded earlier which is useful to the EFI code in
efi_enter_virtual_mode().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141 Fixes: 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before timekeeping_init()" Reported-and-tested-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fujitsu Lifebook E780 sets the sequence number 0x0f to only only of
the two headphones, thus the driver tries to assign another as the
line-out, and this results in the inconsistent mapping between the
created jack ctl and the actual I/O. Due to this, PulseAudio doesn't
handle it properly and gets the silent output.
Acer Aspire V5 with ALC282 codec needs the similar quirk like Dell
laptops to support the headset mic. The headset mic pin is 0x19 and
it's not exposed by BIOS, thus we need to fix the pincfg as well.
The widget power-save that was enabled in 4.1 kernel seems resulting
in the silent output on VIA codecs by some reason. Some widgets get
wrong power states.
As a quick fix, turn this flag off while keeping power_down_unused
flag. This will bring back to the state of 4.0.x.
Fixes: 688b12cc3ca8 ('ALSA: hda - Use the new power control for VIA codecs') Reported-and-tested-by: Harald Dunkel <harri@afaics.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thinkpad X250, when attached to a dock, has two headphone outs but
no line out. Make sure we don't try to turn this into one headphone
and one line out (since that disables the headphone amp on the dock).
The pcm_class sysfs of each PCM substream gives only "none" since the
recent code change to embed the struct device. Fix the code to point
directly to the embedded device object properly.
Disable write buffering on the Toshiba ToPIC95 if it is enabled by
somebody (it is not supposed to be a power-on default according to
the datasheet). On the ToPIC95, practically no 32-bit Cardbus card
will work under heavy load without locking up the whole system if
this is left enabled. I tried about a dozen. It does not affect
16-bit cards. This is similar to the O2 bugs in early controller
revisions it seems.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=55961 Signed-off-by: Ryan C. Underwood <nemesis@icequake.net> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Increase the default init stage change timeout from 15 seconds to 30 seconds.
This resolves issues we have seen with some adapters not transitioning
to the first init stage within 15 seconds, which results in adapter
initialization failures.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If, at the time __rcu_process_callbacks() is invoked, there are callbacks
in Tiny RCU's callback list, but none of them are ready to be invoked,
the current list-management code will knit the non-ready callbacks out
of the list. This can result in hangs and possibly worse. This commit
therefore inserts a check for there being no callbacks that can be
invoked immediately.
This bug is unlikely to occur -- you have to get a new callback between
the time rcu_sched_qs() or rcu_bh_qs() was called, but before we get to
__rcu_process_callbacks(). It was detected by the addition of RCU-bh
testing to rcutorture, which in turn was instigated by Iftekhar Ahmed's
mutation testing. Although this bug was made much more likely by 915e8a4fe45e (rcu: Remove fastpath from __rcu_process_callbacks()), this
did not cause the bug, but rather made it much more probable. That
said, it takes more than 40 hours of rcutorture testing, on average,
for this bug to appear, so this fix cannot be considered an emergency.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If an interrupt controller doesn't support wake-up configuration,
irq_set_irq_wake() returns an error code. Then any subsequent call
trying to deconfigure wake-up will cause an imbalance, and a warning
will be printed:
WARNING: CPU: 1 PID: 1341 at kernel/irq/manage.c:540 irq_set_irq_wake+0x9c/0xf8()
Unbalanced IRQ 26 wake disable
To fix this, refrain from any further parent interrupt controller
(de)configuration if irq_set_irq_wake() failed.
Alternative fixes would be:
- calling "gic_set_irqchip_flags(IRQCHIP_SKIP_SET_WAKE)" from the
platform code,
- setting "gic_chip.flags = IRQCHIP_SKIP_SET_WAKE" in the GIC driver
code,
but these were withheld as the GIC hardware doesn't really support
wake-up interrupts.
The CrystalCove GPIO irqchip doesn't have irq_set_wake callback defined
so we should set IRQCHIP_SKIP_SET_WAKE for it or it would cause an irq
desc's wake_depth unbalanced warning during system resume phase from the
gpio_keys driver, which is the driver for the power button of the ASUS
T100 laptop.
Ignore an existing mount if the locked readonly, nodev or atime
attributes are less permissive than the desired attributes
of the new mount.
On success ensure the new mount locks all of the same readonly, nodev and
atime attributes as the old mount.
The nosuid and noexec attributes are not checked here as this change
is destined for stable and enforcing those attributes causes a
regression in lxc and libvirt-lxc where those applications will not
start and there are no known executables on sysfs or proc and no known
way to create exectuables without code modifications
Fixes: e51db73532955 ("userns: Better restrictions on when proc and sysfs can be mounted") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fresh mounts of proc and sysfs are a very special case that works very
much like a bind mount. Unfortunately the current structure can not
preserve the MNT_LOCK... mount flags. Therefore refactor the logic
into a form that can be modified to preserve those lock bits.
Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount
of the filesystem be fully visible in the current mount namespace,
before the filesystem may be mounted.
Move the logic for calling fs_fully_visible from proc and sysfs into
fs/namespace.c where it has greater access to mount namespace state.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fs_fully_visible attempts to make fresh mounts of proc and sysfs give
the mounter no more access to proc and sysfs than if they could have
by creating a bind mount. One aspect of proc and sysfs that makes
this particularly tricky is that there are other filesystems that
typically mount on top of proc and sysfs. As those filesystems are
mounted on empty directories in practice it is safe to ignore them.
However testing to ensure filesystems are mounted on empty directories
has not been something the in kernel data structures have supported so
the current test for an empty directory which checks to see
if nlink <= 2 is a bit lacking.
proc and sysfs have recently been modified to use the new empty_dir
infrastructure to create all of their dedicated mount points. Instead
of testing for S_ISDIR(inode->i_mode) && i_nlink <= 2 to see if a
directory is empty, test for is_empty_dir_inode(inode). That small
change guaranteess mounts found on proc and sysfs really are safe to
ignore, because the directories are not only empty but nothing can
ever be added to them. This guarantees there is nothing to worry
about when mounting proc and sysfs.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add two functions sysfs_create_mount_point and
sysfs_remove_mount_point that hang a permanently empty directory off
of a kobject or remove a permanently emptpy directory hanging from a
kobject. Export these new functions so modular filesystems can use
them.