Newer version of HSDK aka HSDK-4xD (with dual issue HS48x4 CPU) wired up
the perf interrupt, so enable that in DT.
This is OK for old HSDK where this irq is ignored because pct irq is not
wired up in hardware.
Currently we allocate rx buffers in a single contiguous buffers for
headers (iser and iscsi) and data trailer. This means that most likely the
data starting offset is aligned to 76 bytes (size of both headers).
This worked fine for years, but at some point this broke, resulting in
data corruptions in isert when a command comes with immediate data and the
underlying backend device assumes 512 bytes buffer alignment.
We assume a hard-requirement for all direct I/O buffers to be 512 bytes
aligned. To fix this, we should avoid passing unaligned buffers for I/O.
Instead, we allocate our recv buffers with some extra space such that we
can have the data portion align to 512 byte boundary. This also means that
we cannot reference headers or data using structure but rather
accessors (as they may move based on alignment). Also, get rid of the
wrong __packed annotation from iser_rx_desc as this has only harmful
effects (not aligned to anything).
This affects the rx descriptors for iscsi login and data plane.
Fixes: 3d75ca0adef4 ("block: introduce multi-page bvec helpers") Link: https://lore.kernel.org/r/20200904195039.31687-1-sagi@grimberg.me Reported-by: Stephen Rust <srust@blockbridge.com> Tested-by: Doug Dumitru <doug@dumitru.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway
not merging this page in this bio, then it make sense to make same_page
also as false before returning.
Without this patch, we hit below WARNING in iomap.
This mostly happens with very large memory system and / or after tweaking
vm dirty threshold params to delay writeback of dirty data.
Right now we are failing requests based on the controller state (which
is checked inline in nvmf_check_ready) however we should definitely
accept requests if the queue is live.
When entering controller reset, we transition the controller into
NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath
requests (have blk_noretry_request set).
This is also the case for NVME_REQ_USER for the wrong reason. There
shouldn't be any reason for us to reject this I/O in a controller reset.
We do want to prevent passthru commands on the admin queue because we
need the controller to fully initialize first before we let user passthru
admin commands to be issued.
In a non-mpath setup, this means that the requests will simply be
requeued over and over forever not allowing the q_usage_counter to drop
its final reference, causing controller reset to hang if running
concurrently with heavy I/O.
Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready") Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The 'spi_stm32 44004000.spi: Communication suspended' message means that
when using PIO, the kernel did not read the FIFO fast enough and so the
SPI controller paused the transfer. Currently, this is printed on every
single such event, so if the kernel is busy and the controller is pausing
the transfers often, the kernel will be all the more busy scrolling this
message into the log buffer every few milliseconds. That is not helpful.
Instead, rate-limit the message and print it every once in a while. It is
not possible to use the default dev_warn_ratelimited(), because that is
still too verbose, as it prints 10 lines (DEFAULT_RATELIMIT_BURST) every
5 seconds (DEFAULT_RATELIMIT_INTERVAL). The policy here is to print 1 line
every 50 seconds (DEFAULT_RATELIMIT_INTERVAL * 10), because 1 line is more
than enough and the cycles saved on printing are better left to the CPU to
handle the SPI. However, dev_warn_once() is also not useful, as the user
should be aware that this condition is possibly recurring or ongoing. Thus
the custom rate-limit policy.
Finally, turn the message from dev_warn() to dev_dbg(), since the system
does not suffer any sort of malfunction if this message appears, it is
just slowing down. This further reduces the printing into the log buffer
and frees the CPU to do useful work.
Fixes: dcbe0d84dfa5 ("spi: add driver for STM32 SPI controller") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: Amelie Delaunay <amelie.delaunay@st.com> Cc: Antonio Borneo <borneo.antonio@gmail.com> Cc: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200905151913.117775-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
As the comments in this patch say, if we tune and find all phases are
valid it's _almost_ as bad as no phases being found valid. Probably
all phases are not really reliable but we didn't detect where the
unreliable place is. That means we'll essentially be guessing and
hoping we get a good phase.
This is not just a problem in theory. It was causing real problems on
a real board. On that board, most often phase 10 is found as the only
invalid phase, though sometimes 10 and 11 are invalid and sometimes
just 11. Some percentage of the time, however, all phases are found
to be valid. When this happens, the current logic will decide to use
phase 11. Since phase 11 is sometimes found to be invalid, this is a
bad choice. Sure enough, when phase 11 is picked we often get mmc
errors later in boot.
I have seen cases where all phases were found to be valid 3 times in a
row, so increase the retry count to 10 just to be extra sure.
Fixes: 415b5a75da43 ("mmc: sdhci-msm: Add platform_execute_tuning implementation") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20200827075809.1.If179abf5ecb67c963494db79c3bc4247d987419b@changeid Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The commit 61d7437ed1390 ("mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040")
broke resume for eMMC HS400. When the system suspends the eMMC controller
is powered down. So, on resume we need to reinitialize the controller.
Although, amd_sdhci_host was not getting cleared, so the DLL was never
re-enabled on resume. This results in HS400 being non-functional.
To fix the problem, this change clears the tuned_clock flag, clears the
dll_enabled flag and disables the DLL on reset.
Fixes: 61d7437ed1390 ("mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040") Signed-off-by: Raul E Rangel <rrangel@chromium.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20200831150517.1.I93c78bfc6575771bb653c9d3fca5eb018a08417d@changeid Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Unlike what we previously thought, only the per-pixel alpha is broken on
the lowest plane and the per-plane alpha isn't. Remove the check on the
alpha property being set on the lowest plane to reject a mode.
Fixes: dcf496a6a608 ("drm/sun4i: sun4i: Introduce a quirk for lowest plane alpha support") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728134810.883457-1-maxime@cerno.tech Signed-off-by: Sasha Levin <sashal@kernel.org>
disable_irq() might sleep, replace it with disable_irq_nosync(). For
synchronisation 'irq_poll_scheduled' is sufficient
Fixes: 320e77acb3 scsi: mpt3sas: Irq poll to avoid CPU hard lockups Link: https://lore.kernel.org/r/20200901145026.12174-1-thenzl@redhat.com Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
disable_irq() might sleep. Replace it with disable_irq_nosync() which is
sufficient as irq_poll_scheduled protects against concurrently running
complete_cmd_fusion() from megasas_irqpoll() and megasas_isr_fusion().
Link: https://lore.kernel.org/r/20200827165332.8432-1-thenzl@redhat.com Fixes: a6ffd5bf681 scsi: megaraid_sas: Call disable_irq from process IRQ poll Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the returned speed from __ethtool_get_link_ksettings() is
SPEED_UNKNOWN this will lead to reporting a wrong speed and width for
providers that uses ib_get_eth_speed(), fix that by defaulting the
netdev_speed to SPEED_1000 in case the returned value from
__ethtool_get_link_ksettings() is SPEED_UNKNOWN.
Fixes: d41861942fc5 ("IB/core: Add generic function to extract IB speed from netdev") Link: https://lore.kernel.org/r/20200902124304.170912-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
It was discovered that sdparm will fail when attempting to disable write
cache on a SATA disk connected via libsas.
In the ATA command set the write cache state is controlled through the SET
FEATURES operation. This is roughly corresponds to MODE SELECT in SCSI and
the latter command is what is used in the SCSI-ATA translation layer. A
subtle difference is that a MODE SELECT carries data whereas SET FEATURES
is defined as a non-data command in ATA.
Set the DMA data direction to DMA_NONE if the requested ATA command is
identified as non-data.
[mkp: commit desc]
Fixes: fa1c1e8f1ece ("[SCSI] Add SATA support to libsas") Link: https://lore.kernel.org/r/1598426666-54544-1-git-send-email-luojiaxing@huawei.com Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Locking should be held for the entire reading sequence involving setting
the channel, waiting for the channel switch and reading from the
channel.
If not, reading from a channel can result mixing with the reading from
another channel.
Fixes: 07914c84ba30 ("iio: adc: Add driver for Microchip MCP3422/3/4 high resolution ADC") Signed-off-by: Angelo Compagnucci <angelo.compagnucci@gmail.com> Link: https://lore.kernel.org/r/20200819075525.1395248-1-angelo.compagnucci@gmail.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's writing too much data. regmap_bulk_write expects number of
register sized chunks to write, not a byte sized length of the
bounce buffer. Bounce buffer needs to be padded too, so that
regmap_bulk_write will not read past the end of the buffer.
If sun8i_r40_tcon_tv_set_mux() succeed, sun8i_r40_tcon_tv_set_mux()
doesn't have a corresponding put_device(). Thus add put_device()
to fix the exception handling for this function implementation.
QP1 Rx CQE reports transparent VLAN ID in the completion and this is used
while reporting the completion for received MAD packet. Check if the vlan
id is configured before reporting it in the work completion.
To avoid the following kernel panic when calling kmem_cache_create() with
a NULL pointer from pool_cache(), Block the rxe_param_set_add() from
running if the rdma_rxe module is not initialized.
This can be triggered if a user tries to use the 'module option' which is
not actually a real module option but some idiotic (and thankfully no
obsolete) sysfs interface.
Correct gpio ranges according to i.MX7ULP pinctrl driver:
gpio_ptc: ONLY pin 0~19 are available;
gpio_ptd: ONLY pin 0~11 are available;
gpio_pte: ONLY pin 0~15 are available;
gpio_ptf: ONLY pin 0~19 are available;
The following 4 tests in timers can take longer than the default 45
seconds that added in commit 852c8cbf34d3 ("selftests/kselftest/runner.sh:
Add 45 second timeout per test") to run:
* nsleep-lat - 2m7.350s
* set-timer-lat - 2m0.66s
* inconsistency-check - 1m45.074s
* raw_skew - 2m0.013s
Thus they will be marked as failed with the current 45s setting:
not ok 3 selftests: timers: nsleep-lat # TIMEOUT
not ok 4 selftests: timers: set-timer-lat # TIMEOUT
not ok 6 selftests: timers: inconsistency-check # TIMEOUT
not ok 7 selftests: timers: raw_skew # TIMEOUT
Disable the timeout setting for timers can make these tests finish
properly:
ok 3 selftests: timers: nsleep-lat
ok 4 selftests: timers: set-timer-lat
ok 6 selftests: timers: inconsistency-check
ok 7 selftests: timers: raw_skew
https://bugs.launchpad.net/bugs/1864626 Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Move another allocation out of regulator_list_mutex-protected region, as
reclaim might want to take the same lock.
WARNING: possible circular locking dependency detected
5.7.13+ #534 Not tainted
------------------------------------------------------
kswapd0/383 is trying to acquire lock: c0e5d920 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x54/0x2c0
but task is already holding lock: c0e38518 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x50
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire.part.11+0x40/0x50
fs_reclaim_acquire+0x24/0x28
kmem_cache_alloc_trace+0x40/0x1e8
regulator_register+0x384/0x1630
devm_regulator_register+0x50/0x84
reg_fixed_voltage_probe+0x248/0x35c
[...]
other info that might help us debug this:
*** DEADLOCK ***
[...]
2 locks held by kswapd0/383:
#0: c0e38518 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x50
#1: cb70e5e0 (hctx->srcu){....}-{0:0}, at: hctx_lock+0x60/0xb8
[...]
Fixes: 541d052d7215 ("regulator: core: Only support passing enable GPIO descriptors")
[this commit only changes context] Fixes: f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking")
[this is when the regulator_list_mutex was introduced in reclaim locking path]
A previous commit removed the panel-dpi driver, which made the
SOM-LV video stop working because it relied on the DPI driver
for setting video timings. Now that the simple-panel driver is
available in omap2plus, this patch migrates the SOM-LV dev kits
to use a similar panel and remove the manual timing requirements.
A similar patch was already done and applied to the Torpedo family.
Fixes: 8bf4b1621178 ("drm/omap: Remove panel-dpi driver") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Older versions of U-Boot would pinmux the whole board, but as
the bootloader got updated, it started to only pinmux the pins
it needed, and expected Linux to configure what it needed.
Unfortunately this caused an issue with the audio, because the
mcbsp2 pins were configured in the device tree but never
referenced by the driver. When U-Boot stopped muxing the audio
pins, the audio died.
This patch adds the references to the associate the pin controller
with the mcbsp2 driver which makes audio operate again.
Fixes: 5cb8b0fa55a9 ("ARM: dts: Move most of logicpd-som-lv-37xx-devkit.dts to logicpd-som-lv-baseboard.dtsi") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Older versions of U-Boot would pinmux the whole board, but as
the bootloader got updated, it started to only pinmux the pins
it needed, and expected Linux to configure what it needed.
Unfortunately this caused an issue with the audio, because the
mcbsp2 pins were configured in the device tree, they were never
referenced by the driver. When U-Boot stopped muxing the audio
pins, the audio died.
This patch adds the references to the associate the pin controller
with the mcbsp2 driver which makes audio operate again.
Fixes: 739f85bba5ab ("ARM: dts: Move most of logicpd-torpedo-37xx-devkit to logicpd-torpedo-baseboard") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
netpoll from accessing rings before init is complete. However, the
same is not done for fresh napi instances in netif_napi_add(),
even though we expect NAPI instances to be added as disabled.
This causes crashes during driver reconfiguration (enabling XDP,
changing the channel count) - if there is any printk() after
netif_napi_add() but before napi_enable().
To ensure memory ordering is correct we need to use RCU accessors.
Reported-by: Rob Sherwood <rsher@fb.com> Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
int main(int argc, char *argv[])
{
const int fd = open("/dev/nbd0", 3);
alarm(5);
ioctl(fd, NBD_SET_SOCK, socket(PF_TIPC, SOCK_DGRAM, 0));
ioctl(fd, NBD_DO_IT, 0); /* To be interrupted by SIGALRM. */
return 0;
}
----------
One problem is that wait_for_completion() from flush_workqueue() from
nbd_start_device_ioctl() from nbd_ioctl() cannot be completed when
nbd_start_device_ioctl() received a signal at wait_event_interruptible(),
for tipc_shutdown() from kernel_sock_shutdown(SHUT_RDWR) from
nbd_mark_nsock_dead() from sock_shutdown() from nbd_start_device_ioctl()
is failing to wake up a WQ thread sleeping at wait_woken() from
tipc_wait_for_rcvmsg() from sock_recvmsg() from sock_xmit() from
nbd_read_stat() from recv_work() scheduled by nbd_start_device() from
nbd_start_device_ioctl(). Fix this problem by always invoking
sk->sk_state_change() (like inet_shutdown() does) when tipc_shutdown() is
called.
The other problem is that tipc_wait_for_rcvmsg() cannot return when
tipc_shutdown() is called, for tipc_shutdown() sets sk->sk_shutdown to
SEND_SHUTDOWN (despite "how" is SHUT_RDWR) while tipc_wait_for_rcvmsg()
needs sk->sk_shutdown set to RCV_SHUTDOWN or SHUTDOWN_MASK. Fix this
problem by setting sk->sk_shutdown to SHUTDOWN_MASK (like inet_shutdown()
does) when the socket is connectionless.
Since commit 9c66d1564676 ("taprio: Add support for hardware
offloading") there's a bit of inconsistency when offloading schedules
to the hardware:
In software mode, the gate masks are specified in terms of traffic
classes, so if say "sched-entry S 03 20000", it means that the traffic
classes 0 and 1 are open for 20us; when taprio is offloaded to
hardware, the gate masks are specified in terms of hardware queues.
The idea here is to fix hardware offloading, so schedules in hardware
and software mode have the same behavior. What's needed to do is to
map traffic classes to queues when applying the offload to the driver.
Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With disabling bh in the whole sctp_get_port_local(), when
snum == 0 and too many ports have been used, the do-while
loop will take the cpu for a long time and cause cpu stuck:
There's no need to disable bh in the whole function of
sctp_get_port_local. So fix this cpu stuck by removing
local_bh_disable() called at the beginning, and using
spin_lock_bh() instead.
The same thing was actually done for inet_csk_get_port() in
Commit ea8add2b1903 ("tcp/dccp: better use of ephemeral
ports in bind()").
Thanks to Marcelo for pointing the buggy code out.
v1->v2:
- use cond_resched() to yield cpu to other tasks if needed,
as Eric noticed.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keenetic Plus DSL is a xDSL modem that uses dm9620 as its USB interface.
Signed-off-by: Kamil Lorenc <kamil@re-ws.pl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes two main problems seen when removing NetLabel
mappings: memory leaks and potentially extra audit noise.
The memory leaks are caused by not properly free'ing the mapping's
address selector struct when free'ing the entire entry as well as
not properly cleaning up a temporary mapping entry when adding new
address selectors to an existing entry. This patch fixes both these
problems such that kmemleak reports no NetLabel associated leaks
after running the SELinux test suite.
The potentially extra audit noise was caused by the auditing code in
netlbl_domhsh_remove_entry() being called regardless of the entry's
validity. If another thread had already marked the entry as invalid,
but not removed/free'd it from the list of mappings, then it was
possible that an additional mapping removal audit record would be
generated. This patch fixes this by returning early from the removal
function when the entry was previously marked invalid. This change
also had the side benefit of improving the code by decreasing the
indentation level of large chunk of code by one (accounting for most
of the diffstat).
Fixes: 63c416887437 ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping") Reported-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cited commit added the possible value of '2', but it cannot be set. Fix
it by adjusting the maximum value to '2'. This is consistent with the
corresponding IPv4 sysctl.
Fixes: d8f74f0975d8 ("ipv6: Support multipath hashing on inner IP pkts") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fib_info_notify_update() is always called with RTNL held, but not from
an RCU read-side critical section. This leads to the following warning
[1] when the FIB table list is traversed with
hlist_for_each_entry_rcu(), but without a proper lockdep expression.
Since modification of the list is protected by RTNL, silence the warning
by adding a lockdep expression which verifies RTNL is held.
[1]
=============================
WARNING: suspicious RCU usage 5.9.0-rc1-custom-14233-g2f26e122d62f #129 Not tainted
-----------------------------
net/ipv4/fib_trie.c:2124 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/834:
#0: ffffffff85a3b6b0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
Fixes: 1bff1a0c9bbd ("ipv4: Add function to send route updates") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The buffer size is 2 Bytes and we expect to receive the same amount of
data. But sometimes we receive less data and run into uninit-was-stored
issue upon read. Hence modify the error check on the return value to match
with the buffer size as a prevention.
Reported-and-tested by: syzbot+a7e220df5a81d1ab400e@syzkaller.appspotmail.com Signed-off-by: Himadri Pandya <himadrispandya@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead(). The intent was probably to read
a single page. Fix it to use the number of pages to the end of the
window instead.
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Yang Shi <shy828301@gmail.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Eric Biggers <ebiggers@google.com> Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.
Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.
The usage of "capture group (...)" in the immediate condition after `&&`
results in `$1` being uninitialized. This issues a warning "Use of
uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
line 2638".
I noticed this bug while running checkpatch on the set of commits from
v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in
their commit message.
This bug was introduced in the script by commit e518e9a59ec3
("checkpatch: emit an error when there's a diff in a changelog"). It
has been in the script since then.
The author intended to store the match made by capture group in variable
`$1`. This should have contained the name of the file as `[\w/]+`
matched. However, this couldn't be accomplished due to usage of capture
group and `$1` in the same regular expression.
Fix this by placing the capture group in the condition before `&&`.
Thus, `$1` can be initialized to the text that capture group matches
thereby setting it to the desired and required value.
Fixes: e518e9a59ec3 ("checkpatch: emit an error when there's a diff in a changelog") Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Joe Perches <joe@perches.com> Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandey Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tegra210 and later has a separate sdmmc_legacy_tm (TMCLK) used by Tegra
SDMMC hawdware for data timeout to achive better timeout than using
SDCLK and using TMCLK is recommended.
USE_TMCLK_FOR_DATA_TIMEOUT bit in Tegra SDMMC register
SDHCI_TEGRA_VENDOR_SYS_SW_CTRL can be used to choose either TMCLK or
SDCLK for data timeout.
Default USE_TMCLK_FOR_DATA_TIMEOUT bit is set to 1 and TMCLK is used
for data timeout by Tegra SDMMC hardware and having TMCLK not enabled
is not recommended.
So, this patch adds quirk NVQUIRK_HAS_TMCLK for SoC having separate
timeout clock and keeps TMCLK enabled all the time.
The help info of option "--no-bpf-event" is wrongly described as "record
bpf events", correct it.
Committer testing:
$ perf record -h bpf
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
--clang-opt <clang options>
options passed to clang when compiling BPF scriptlets
--clang-path <clang path>
clang binary to use for compiling BPF scriptlets
--no-bpf-event do not record bpf events
$
Fixes: 71184c6ab7e6 ("perf record: Replace option --bpf-event with --no-bpf-event") Signed-off-by: Wei Li <liwei391@huawei.com> Acked-by: Song Liu <songliubraving@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200819031947.12115-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SR-IOV VFs do not implement the memory enable bit of the command
register, therefore this bit is not set in config space after
pci_enable_device(). This leads to an unintended difference
between PF and VF in hand-off state to the user. We can correct
this by setting the initial value of the memory enable bit in our
virtualized config space. There's really no need however to
ever fault a user on a VF though as this would only indicate an
error in the user's management of the enable bit, versus a PF
where the same access could trigger hardware faults.
Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].
Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement. Fix it by sticking to the behavior intended
in the original patch [1]. In addition, make freelist_corrupted()
immune to passing NULL instead of &freelist.
The issue has been spotted via static analysis and code review.
The following error ocurred when testing disk online/offline:
[ 301.798344] device-mapper: thin: 253:5: aborting current metadata transaction
[ 301.848441] device-mapper: thin: 253:5: failed to abort metadata transaction
[ 301.849206] Aborting journal on device dm-26-8.
[ 301.850489] EXT4-fs error (device dm-26) in __ext4_new_inode:943: Journal has aborted
[ 301.851095] EXT4-fs (dm-26): Delayed block allocation failed for inode 398742 at logical offset 181 with max blocks 19 with error 30
[ 301.854476] BUG: KASAN: use-after-free in dm_bm_set_read_only+0x3a/0x40 [dm_persistent_data]
Reason is:
metadata_operation_failed
abort_transaction
dm_pool_abort_metadata
__create_persistent_data_objects
r = __open_or_format_metadata
if (r) --> If failed will free pmd->bm but pmd->bm not set NULL
dm_block_manager_destroy(pmd->bm);
set_pool_mode
dm_pool_metadata_read_only(pool->pmd);
dm_bm_set_read_only(pmd->bm); --> use-after-free
Add checks to see if pmd->bm is NULL in dm_bm_set_read_only and
dm_bm_set_read_write functions. If bm is NULL it means creating the
bm failed and so dm_bm_is_read_only must return true.
Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maybe __create_persistent_data_objects() caller will use PTR_ERR as a
pointer, it will lead to some strange things.
Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maybe __create_persistent_data_objects() caller will use PTR_ERR as a
pointer, it will lead to some strange things.
Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dm-integrity target did not report errors in bitmap mode just after
creation. The reason is that the function integrity_recalc didn't clean up
ic->recalc_bitmap as it proceeded with recalculation.
Fix this by updating the bitmap accordingly -- the double shift serves
to rounddown.
Commit 935fcc56abc3 ("dm mpath: only flush workqueue when needed")
changed flush_multipath_work() to avoid needless workqueue
flushing (of a multipath global workqueue). But that change didn't
realize the surrounding flush_multipath_work() code should also only
run if 'pg_init_in_progress' is set.
Fix this by only doing all of flush_multipath_work()'s PG init related
work if 'pg_init_in_progress' is set.
Otherwise multipath_wait_for_pg_init_completion() will run
unconditionally but the preceeding flush_workqueue(kmpath_handlerd)
may not. This could lead to deadlock (though only if kmpath_handlerd
never runs a corresponding work to decrement 'pg_init_in_progress').
It could also be, though highly unlikely, that the kmpath_handlerd
work that does PG init completes before 'pg_init_in_progress' is set,
and then an intervening DM table reload's multipath_postsuspend()
triggers flush_multipath_work().
Fixes: 935fcc56abc3 ("dm mpath: only flush workqueue when needed") Cc: stable@vger.kernel.org Reported-by: Ben Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function dax_direct_access doesn't take partitions into account,
it always maps pages from the beginning of the device. Therefore,
persistent_memory_claim() must get the partition offset using
get_start_sect() and add it to the page offsets passed to
dax_direct_access().
Normally softwareshutdowntemp should be greater than Thotspotlimit.
However, on some VEGA10 ASIC, the softwareshutdowntemp is 91C while
Thotspotlimit is 105C. This seems not right and may trigger some
false alarms.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the source and destination physical address calculation of a
peripheral device on scatter-gather implementation.
This issue manifested during tests using a 64 bits architecture system.
The abnormal behavior wasn't visible before due to all previous tests
were done using 32 bits architecture system, that masked his effect.
ioc_pd_free() grabs irq-safe ioc->lock without ensuring that irq is disabled
when it can be called with irq disabled or enabled. This has a small chance
of causing A-A deadlocks and triggers lockdep splats. Use irqsave operations
instead.
All three generations of Sandisk SSDs lock up hard intermittently.
Experiments showed that disabling NCQ lowered the failure rate significantly
and the kernel has been disabling NCQ for some models of SD7's and 8's,
which is obviously undesirable.
Karthik worked with Sandisk to root cause the hard lockups to trim commands
larger than 128M. This patch implements ATA_HORKAGE_MAX_TRIM_128M which
limits max trim size to 128M and applies it to all three generations of
Sandisk SSDs.
If a driver leaves the limit settings as the defaults, then we don't
initialize bdi->io_pages. This means that file systems may need to
work around bdi->io_pages == 0, which is somewhat messy.
Initialize the default value just like we do for ->ra_pages.
Cc: stable@vger.kernel.org Fixes: 9491ae4aade6 ("mm: don't cap request size based on read-ahead setting") Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Block layer usually doesn't support or allow zero-length bvec. Since
commit 1bdc76aea115 ("iov_iter: use bvec iterator to implement
iterate_bvec()"), iterate_bvec() switches to bvec iterator. However,
Al mentioned that 'Zero-length segments are not disallowed' in iov_iter.
Fixes for_each_bvec() so that it can move on after seeing one zero
length bvec.
Fixes: 1bdc76aea115 ("iov_iter: use bvec iterator to implement iterate_bvec()") Reported-by: syzbot <syzbot+61acc40a49a3e46e25ea@syzkaller.appspotmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The basic permission bits (protection bits in AmigaOS) have been broken
in Linux' AFFS - it would only set bits, but never delete them.
Also, contrary to the documentation, the Archived bit was not handled.
Let's fix this for good, and set the bits such that Linux and classic
AmigaOS can coexist in the most peaceful manner.
Also, update the documentation to represent the current state of things.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Max Staudt <max@enpas.org> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Device drivers do not expect to have change_protocol or wakeup
re-programming to be accesed after rc_unregister_device(). This can
cause the device driver to access deallocated resources.
Cc: <stable@vger.kernel.org> # 4.16+ Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For Intel controllers, SDHCI_RESET_ALL resets also CQHCI registers.
Normally, SDHCI_RESET_ALL is not used while CQHCI is enabled, but that can
happen on the error path. e.g. if mmc_cqe_recovery() fails, mmc_blk_reset()
is called which, for a eMMC that does not support HW Reset, will cycle the
bus power and the driver will perform SDHCI_RESET_ALL.
So whenever performing SDHCI_RESET_ALL ensure CQHCI is deactivated.
That will force the driver to reinitialize CQHCI when it is next used.
A similar change was done already for sdhci-msm, and other drivers using
CQHCI might benefit from a similar change, if they also have CQHCI reset
by SDHCI_RESET_ALL.
Host controllers can reset CQHCI either directly or as a consequence of
host controller reset. Add cqhci_deactivate() which puts the CQHCI
driver into a state that is consistent with that.
This patch fixs eMMC-Access on mt7622/Bpi-64.
Before we got these Errors on mounting eMMC ion R64:
[ 48.664925] blk_update_request: I/O error, dev mmcblk0, sector 204800 op 0x1:(WRITE)
flags 0x800 phys_seg 1 prio class 0
[ 48.676019] Buffer I/O error on dev mmcblk0p1, logical block 0, lost sync page write
This patch adds a optional reset management for msdc.
Sometimes the bootloader does not bring msdc register
to default state, so need reset the msdc controller.
Cc: <stable@vger.kernel.org> # v5.4+ Fixes: 966580ad236e ("mmc: mediatek: add support for MT7622 SoC") Signed-off-by: Wenbin Mei <wenbin.mei@mediatek.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Frank Wunderlich <frank-w@public-files.de> Link: https://lore.kernel.org/r/20200814014346.6496-4-wenbin.mei@mediatek.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There've been quite a few regression reports about the lowered volume
(reduced to ca 65% from the previous level) on Lenovo Thinkpad X1
after the commit d2cd795c4ece ("ALSA: hda - fixup for the bass speaker
on Lenovo Carbon X1 7th gen"). Although the commit itself does the
right thing from HD-audio POV in order to have a volume control for
bass speakers, it seems that the machine has some secret recipe under
the hood.
Through experiments, Benjamin Poirier found out that the following
routing gives the best result:
* DAC1 (NID 0x02) -> Speaker pin (NID 0x14)
* DAC2 (NID 0x03) -> Shared by both Bass Speaker pin (NID 0x17) &
Headphone pin (0x21)
* DAC3 (NID 0x06) -> Unused
DAC1 seems to have some equalizer internally applied, and you'd get
again the output in a bad quality if you connect this to the
headphone pin. Hence the headphone is connected to DAC2, which is now
shared with the bass speaker pin. DAC3 has no volume amp, hence it's
not connected at all.
For achieving the routing above, this patch introduced a couple of
workarounds:
* The connection list of bass speaker pin (NID 0x17) is reduced not to
include DAC3 (NID 0x06)
* Pass preferred_pairs array to specify the fixed connection
Here, both workarounds are needed because the generic parser prefers
the individual DAC assignment over others.
When the routing above is applied, the generic parser creates the two
volume controls "Front" and "Bass Speaker". Since we have only two
DACs for three output pins, those are not fully controlling each
output individually, and it would confuse PulseAudio. For avoiding
the pitfall, in this patch, we rename those volume controls to some
unique ones ("DAC1" and "DAC2"). Then PulseAudio ignore them and
concentrate only on the still good-working "Master" volume control.
If a user still wants to control each DAC volume, they can still
change manually via "DAC1" and "DAC2" volume controls.
The Galaxy Book Ion NT950XCJ-X716A (15 inches) uses the same ALC298
codec as other Samsung laptops which have the no headphone sound bug. I
confirmed on my own hardware that this fixes the bug.
This also correct the model name for the 13 inches version. It was
incorrectly referenced as NT950XCJ-X716A in commit e17f02d05. But it
should have been NP930XCJ-K01US.
Tascam FE-8 is known to support communication by asynchronous transaction
only. The support can be implemented in userspace application and
snd-firewire-ctl-services project has the support. However, ALSA
firewire-tascam driver is bound to the model.
This commit changes device entries so that the model is excluded. In a
commit 53b3ffee7885 ("ALSA: firewire-tascam: change device probing
processing"), I addressed to the concern that version field in
configuration differs depending on installed firmware. However, as long
as I checked, the version number is fixed. It's safe to return version
number back to modalias.
Following Christian Lachner's patch for Gigabyte X570-based motherboards,
also patch the MSI X570-A PRO motherboard; the ALC1220 codec requires the
same workaround for Clevo laptops to enforce the DAC/mixer connection
path. Set up a quirk entry for that.
I suspect most if all X570 motherboards will require similar patches.
[ The entries reordered in the SSID order -- tiwai ]
Avid Adrenaline is reported that ALSA firewire-digi00x driver is bound to.
However, as long as he investigated, the design of this model is hardly
similar to the one of Digi 00x family. It's better to exclude the model
from modalias of ALSA firewire-digi00x driver.
This commit changes device entries so that the model is excluded.
When system is suspended with active audio playback to HDMI/DP, two
alternative sequences can happen at resume:
a) monitor is detected first and ALSA prepare follows normal
stream setup sequence, or
b) ALSA prepare is called first, but monitor is not yet detected,
so PCM is restarted without a pin,
In case of (b), on i915 systems, haswell_verify_D0() is not called at
resume and the pin power state may be incorrect. Result is lack of audio
after resume with no error reported back to user-space.
Fix the problem by always verifying converter and pin state in the
i915_pin_cvt_fixup().
The PCM OSS mulaw plugin has a check of the format of the counter part
whether it's a linear format. The check is with snd_BUG_ON() that
emits WARN_ON() when the debug config is set, and it confuses
syzkaller as if it were a serious issue. Let's drop snd_BUG_ON() for
avoiding that.
While we're at it, correct the error code to a more suitable, EINVAL.
snd_ca0106_spi_write() returns 1 on error, snd_ca0106_pcm_power_dac()
is returning the error code directly, and the caller is expecting an
negative error code
Because AZX_DRIVER_GENERIC can not work well for Loongson LS7A HDA
controller, it needs some workarounds which are not merged into the
upstream kernel at this time, so it should revert this patch now.
Fixes: 61eee4a7fc40 ("ALSA: hda: Add support for Loongson 7A1000 controller") Cc: <stable@vger.kernel.org> # 5.9-rc1+ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1598348388-2518-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit af199a1a9cb0 ("net: dsa: microchip: set the correct
number of ports") seems to have been applied twice on top of the 5.4
branch. This revert the second instance of said commit.
The problem is we're doing a copy_to_user() while holding tree locks,
which can deadlock if we have to do a page fault for the copy_to_user().
This exists even without my locking changes, so it needs to be fixed.
Rework the search ioctl to do the pre-fault and then
copy_to_user_nofault for the copying.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 323ebb61e32b4 ("net: use listified RX for handling GRO_NORMAL
skbs") made use of listified skb processing for the users of
napi_gro_frags().
The same technique can be used in a way more common napi_gro_receive()
to speed up non-merged (GRO_NORMAL) skbs for a wide range of drivers
including gro_cells and mac80211 users.
This slightly changes the return value in cases where skb is being
dropped by the core stack, but it seems to have no impact on related
drivers' functionality.
gro_normal_batch is left untouched as it's very individual for every
single system configuration and might be tuned in manual order to
achieve an optimal performance.
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Hyunsoon Kim <h10.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These are special extent buffers that get rewound in order to lookup
the state of the tree at a specific point in time. As such they do not
go through the normal initialization paths that set their lockdep class,
so handle them appropriately when they are created and before they are
locked.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When flipping over to the rw_semaphore I noticed I'd get a lockdep splat
in replace_path(), which is weird because we're swapping the reloc root
with the actual target root. Turns out this is because we're using the
root->root_key.objectid as the root id for the newly allocated tree
block when setting the lockdep class, however we need to be using the
actual owner of this new block, which is saved in owner.
The affected path is through btrfs_copy_root as all other callers of
btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid
== root->root_key.objectid .
CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This happens because we're allocating the scrub workqueues under the
scrub and device list mutex, which brings in a whole host of other
dependencies.
Because the work queue allocation is done with GFP_KERNEL, it can
trigger reclaim, which can lead to a transaction commit, which in turns
needs the device_list_mutex, it can lead to a deadlock. A different
problem for which this fix is a solution.
Fix this by moving the actual allocation outside of the
scrub lock, and then only take the lock once we're ready to actually
assign them to the fs_info. We'll now have to cleanup the workqueues in
a few more places, so I've added a helper to do the refcount dance to
safely free the workqueues.
CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This problem exists because we have two different rescan threads,
btrfs_uuid_scan_kthread which creates the uuid tree, and
btrfs_uuid_tree_iterate that goes through and updates or deletes any out
of date roots. The problem is they both do things in different order.
btrfs_uuid_scan_kthread() reads the tree_root, and then inserts entries
into the uuid_root. btrfs_uuid_tree_iterate() scans the uuid_root, but
then does a btrfs_get_fs_root() which can read from the tree_root.
It's actually easy enough to not be holding the path in
btrfs_uuid_scan_kthread() when we add a uuid entry, as we already drop
it further down and re-start the search when we loop. So simply move
the path release before we add our entry to the uuid tree.
This also fixes a problem where we're holding a path open after we do
btrfs_end_transaction(), which has it's own problems.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.
This breaks building of the Linux kernel. How to reproduce:
1. extract the Linux kernel tree on dax-mounted xfs filesystem
2. run make clean
3. run make -j12
4. run make -j12
at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).
The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.
This breaks building of the Linux kernel. How to reproduce:
1. extract the Linux kernel tree on dax-mounted ext2 filesystem
2. run make clean
3. run make -j12
4. run make -j12
at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).
The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.