A jump target was specified in an if branch. The corresponding function
call did not release the desired system resource then.
Thus use the label “rom_unmap” instead to fix the exception handling
for this function implementation.
There is a check on the retry counter invalid_buf_id_retry that is always
false because invalid_buf_id_retry is initialized to zero on each iteration
of a while-loop. Fix this by initializing the retry counter before the
while-loop starts.
Addresses-Coverity: ("Logically dead code") Fixes: b4a967b7d0f5 ("wil6210: reset buff id in status message after completion") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When ath10k_qmi_init() fails, the error handling does not free the irq
resources, which causes an issue if we EPROBE_DEFER as we'll attempt to
(re-)register irqs which are already registered.
Fix this by doing a power off since we just powered on the hardware, and
freeing the irqs as error handling.
Fixes: ba94c753ccb4 ("ath10k: add QMI message handshake for wcn3990 client") Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently tx->bytes is being freed r->num_transactions number of
times because tx is not being set correctly. Fix this by setting
tx to &r->transactions[i] so that the correct objects are being
freed on each loop iteration.
psbfb_probe performs an evaluation of the required size from the stolen
GTT memory, but gets it wrong in two distinct ways:
- The resulting size must be page-size-aligned;
- The size to allocate is derived from the surface dimensions, not the fb
dimensions.
When two connectors are connected with different modes, the smallest will
be stored in the fb dimensions, but the size that needs to be allocated must
match the largest (surface) dimensions. This is what is used in the actual
allocation code.
Fix this by correcting the evaluation to conform to the two points above.
It allows correctly switching to 16bpp when one connector is e.g. 1920x1080
and the other is 1024x768.
unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
to fl_policy.
Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size
of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper
entry to mall_policy.
Fixes: b87f7936a932 ("net/sched: Add match-all classifier hw offloading.") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Passing tag size to skb_cow_head will make sure
there is enough headroom for the tag data.
This change does not introduce any overhead in case there
is already available headroom for tag.
Signed-off-by: Per Forlin <perfn@axis.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As nlmsg_put() does not clear the memory that is reserved,
it this the caller responsability to make sure all of this
memory will be written, in order to not reveal prior content.
While we are at it, we can provide the socket cookie even
if clsock is not set.
syzbot reported :
BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
BUG: KMSAN: uninit-value in __swab32p include/uapi/linux/swab.h:179 [inline]
BUG: KMSAN: uninit-value in __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
BUG: KMSAN: uninit-value in get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
BUG: KMSAN: uninit-value in bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
CPU: 1 PID: 5262 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
__arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
__fswab32 include/uapi/linux/swab.h:59 [inline]
__swab32p include/uapi/linux/swab.h:179 [inline]
__be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recent months, our customer reported several kernel crashes all
preceding with following message:
NETDEV WATCHDOG: eth2 (enic): transmit queue 0 timed out
Error message of one of those crashes:
BUG: unable to handle kernel paging request at ffffffffa007e090
After analyzing severl vmcores, I found that most of crashes are
caused by memory corruption. And all the corrupted memory areas
are overwritten by data of network packets. Moreover, I also found
that the tx queues were enabled over watchdog reset.
After going through the source code, I found that in enic_stop(),
the tx queues stopped by netif_tx_disable() could be woken up over
a small time window between netif_tx_disable() and the
napi_disable() by the following code path:
napi_poll->
enic_poll_msix_wq->
vnic_cq_service->
enic_wq_service->
netif_wake_subqueue(enic->netdev, q_number)->
test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)
In turn, upper netowrk stack could queue skb to ENIC NIC though
enic_hard_start_xmit(). And this might introduce some race condition.
Our customer comfirmed that this kind of kernel crash doesn't occur over
90 days since they applied this patch.
Signed-off-by: Firo Yang <firo.yang@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current generic XDP handler skips execution of XDP programs entirely if
an SKB is marked as cloned. This leads to some surprising behaviour, as
packets can end up being cloned in various ways, which will make an XDP
program not see all the traffic on an interface.
This was discovered by a simple test case where an XDP program that always
returns XDP_DROP is installed on a veth device. When combining this with
the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending
side, SKBs reliably end up in the cloned state, causing them to be passed
through to the receiving interface instead of being dropped. A minimal
reproducer script for this is included below.
This patch fixed the issue by simply triggering the existing linearisation
code for cloned SKBs instead of skipping the XDP program execution. This
behaviour is in line with the behaviour of the native XDP implementation
for the veth driver, which will reallocate and copy the SKB data if the SKB
is marked as shared.
Reproducer Python script (requires BCC and Scapy):
from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP
from bcc import BPF
import time, sys, subprocess, shlex
s = sniff(iface="a_to_b", count=10, timeout=15)
if len(s):
print(f"Got {len(s)} packets - should have gotten 0")
return 1
else:
print("Got no packets - as expected")
return 0
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <skb|drv>")
sys.exit(1)
if sys.argv[1] == "client":
sys.exit(client())
elif sys.argv[1] == "server":
mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE
sys.exit(server(mode))
else:
try:
mode = sys.argv[1]
if mode not in ('skb', 'drv'):
print(f"Usage: {sys.argv[0]} <skb|drv>")
sys.exit(1)
print(f"Running in {mode} mode")
for cmd in [
'ip netns add netns_a',
'ip netns add netns_b',
'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b',
# Disable ipv6 to make sure there's no address autoconf traffic
'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1',
'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1',
'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa',
'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc',
'ip -n netns_a link set dev a_to_b up',
'ip -n netns_b link set dev b_to_a up']:
subprocess.check_call(shlex.split(cmd))
server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}"))
client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client"))
Use MMC_CAP2_RO_ACTIVE_HIGH flag as indicator if GPIO line is to be
inverted compared to DT/platform-specified polarity. The flag is not used
after init in GPIO mode anyway. No functional changes intended.
Add possibility to toggle active-low flag of a gpio descriptor. This is
useful for compatibility code, where defaults are inverted vs DT gpio
flags or the active-low flag is taken from elsewhere.
Setting softlimit larger than hardlimit seems meaningless
for disk quota but currently it is allowed. In this case,
there may be a bit of comfusion for users when they run
df comamnd to directory which has project quota.
For example, we set 20M softlimit and 10M hardlimit of
block usage limit for project quota of test_dir(project id 123).
[root@hades mnt_ext4]# repquota -P -a
*** Report for project quotas on device /dev/loop0
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
Project used soft hard grace used soft hard grace
----------------------------------------------------------------------
0 -- 13 0 0 2 0 0
123 -- 10237 20480 10240 5 200 100
The result of df command as below:
[root@hades mnt_ext4]# df -h test_dir
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 20M 10M 10M 50% /home/cgxu/test/mnt_ext4
Even though it looks like there is another 10M free space to use,
if we write new data to diretory test_dir(inherit project id),
the write will fail with errno(-EDQUOT).
After this patch, the df result looks like below.
[root@hades mnt_ext4]# df -h test_dir
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 10M 10M 3.0K 100% /home/cgxu/test/mnt_ext4
The rc6 residency starts ticking from 0 from BIOS POST, but the kernel
starts measuring the time from its boot. If we start measuruing
I915_PMU_RC6_RESIDENCY while the GT is idle, we start our sampling from
0 and then upon first activity (park/unpark) add in all the rc6
residency since boot. After the first park with the sampler engaged, the
sleep/active counters are aligned.
v2: With a wakeref to be sure
Closes: https://gitlab.freedesktop.org/drm/intel/issues/973 Fixes: df6a42053513 ("drm/i915/pmu: Ensure monotonic rc6") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200114105648.2172026-1-chris@chris-wilson.co.uk
(cherry picked from commit f4e9894b6952a2819937f363cd42e7cd7894a1e4) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Setting mode_config.allow_fb_modifiers manually is completely
unnecessary. It is set automatically by drm_universal_plane_init() based
on the fact if modifier list is provided or not. Even more, it breaks
DE2 and DE3 as they don't support any modifiers beside linear. Modifiers
aware applications can be confused by provided empty modifier list - at
least linear modifier should be included, but it's not for DE2 and DE3.
Fixes: 9db9c0cf5895 ("drm/sun4i: drv: Allow framebuffer modifiers in mode config") Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200126065937.9564-1-jernej.skrabec@siol.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For data collected on machines with front end stalled cycles supported,
such as found on modern AMD CPU families, commit 146540fb545b ("perf
stat: Always separate stalled cycles per insn") introduces a new line in
CSV output with a leading comma that upsets some automated scripts.
Scripts have to use "-e ex_ret_instr" to work around this issue, after
upgrading to a version of perf with that commit.
We could add "if (have_frontend_stalled && !config->csv_sep)" to the not
(total && avg) else clause, to emphasize that CSV users are usually
scripts, and are written to do only what is needed, i.e., they wouldn't
typically invoke "perf stat" without specifying an explicit event list.
But - let alone CSV output - why should users now tolerate a constant
0-reporting extra line in regular terminal output?:
BEFORE:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
181,110,981 instructions # 0.58 insn per cycle
# 0.00 stalled cycles per insn
309,876,469 cycles
So this patch removes the printing of the zeroed stalled cycles line
altogether, almost reverting the very original commit fb4605ba47e7
("perf stat: Check for frontend stalled for metrics"), which seems like
it was written to normalize --metric-only column output of common Intel
machines at the time: modern Intel machines have ceased to support the
genericised frontend stalled metrics AFAICT.
AFTER:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
244,071,432 instructions # 0.69 insn per cycle
355,353,490 cycles
Fixes: 146540fb545b ("perf stat: Always separate stalled cycles per insn") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200207230613.26709-1-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will
be populated if a VM-exit caused by an INIT signal takes priority over a
debug-trap. Emulate this behavior when synthesizing an INIT signal
VM-exit into L1.
Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KVM defines the #DB payload as compatible with the 'pending debug
exceptions' field under VMX, not DR6. Mask off bit 12 when applying the
payload to DR6, as it is reserved on DR6 but not the 'pending debug
exceptions' field.
Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The interrupt map for the FVP's PCI node is missing the
parent-unit-address cells for each of the INTx entries, leading to the
kernel code failing to parse the entries correctly.
Add the missing zero cells, which are pretty useless as far as the GIC
is concerned, but that the spec requires. This allows INTx to be usable
on the model, and VFIO to work correctly.
Fixes: fa083b99eb28 ("arm64: dts: fast models: Add DTS fo Base RevC FVP") Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For the old mount API, the module parameters parseing function will
be called in ceph_mount() and also just after the default posix acl
flag set, so we can control to enable/disable it via the mount option.
But for the new mount API, it will call the module parameters
parseing function before ceph_get_tree(), so the posix acl will always
be enabled.
Fixes: 82995cc6c5ae ("libceph, rbd, ceph: convert to use the new mount API") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix display for sec=krb5i which was wrongly interleaved by cruid,
resulting in string "sec=krb5,cruid=<...>i" instead of
"sec=krb5i,cruid=<...>".
Fixes: 96281b9e46eb ("smb3: for kerberos mounts display the credential uid used") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously I intended to ignore quiet mode in probe response, however
I ended up ignoring it instead for action frames. As a matter of fact,
this path isn't invoked for probe responses to start with. Just revert
this patch.
Signed-off-by: Sara Sharon <sara.sharon@intel.com> Fixes: 7976b1e9e3bf ("mac80211: ignore quiet mode in probe") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20200131111300.891737-15-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change 21537dc driver PMBus polling of MFR_COMMON from bits 5/4 to
bits 6/5. This fixs a LTC297X family bug where polling always returns
not busy even when the part is busy. This fixes a LTC388X and
LTM467X bug where polling used PEND and NOT_IN_TRANS, and BUSY was
not polled, which can lead to NACKing of commands. LTC388X and
LTM467X modules now poll BUSY and PEND, increasing reliability by
eliminating NACKing of commands.
Perf doesn't take the left period into account when auto-reload is
enabled with fixed period sampling mode in context switch.
Here is the MSR trace of the perf command as below.
(The MSR trace is simplified from a ftrace log.)
#perf record -e cycles:p -c 2000000 -- ./triad_loop
//The MSR trace of task schedule out
//perf disable all counters, disable PEBS, disable GP counter 0,
//read GP counter 0, and re-enable all counters.
//The counter 0 stops at 0xfffffff82840
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0
write_msr: MSR_P6_EVNTSEL0(186), value 40003003c
rdpmc: 0, value fffffff82840
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
//The MSR trace of the same task schedule in again
//perf disable all counters, enable and set GP counter 0,
//enable PEBS, and re-enable all counters.
//0xffffffe17b80 (-2000000) is written to GP counter 0.
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PMC0(4c1), value ffffffe17b80
write_msr: MSR_P6_EVNTSEL0(186), value 40043003c
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
When the same task schedule in again, the counter should starts from
previous left. However, it starts from the fixed period -2000000 again.
A special variant of intel_pmu_save_and_restart() is used for
auto-reload, which doesn't update the hwc->period_left.
When the monitored task schedules in again, perf doesn't know the left
period. The fixed period is used, which is inaccurate.
With auto-reload, the counter always has a negative counter value. So
the left period is -value. Update the period_left in
intel_pmu_save_and_restart_reload().
With the patch:
//The MSR trace of task schedule out
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0
write_msr: MSR_P6_EVNTSEL0(186), value 40003003c
rdpmc: 0, value ffffffe25cbc
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
//The MSR trace of the same task schedule in again
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PMC0(4c1), value ffffffe25cbc
write_msr: MSR_P6_EVNTSEL0(186), value 40043003c
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
Fixes: d31fc13fdcb2 ("perf/x86/intel: Fix event update for auto-reload") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200121190125.3389-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I see the following lockdep splat in the qcom pinctrl driver when
attempting to suspend the device.
WARNING: possible recursive locking detected
5.4.11 #3 Tainted: G W
--------------------------------------------
cat/3074 is trying to acquire lock: ffffff81f49804c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
but task is already holding lock: ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
other info that might help us debug this:
Possible unsafe locking scenario:
It turns out that this wasn't a good idea, I hit a test failure in
hwsim due to this. That particular failure was easily worked around,
but it raised questions: if an AP needs to, for example, send action
frames to each connected station, the current limit is nowhere near
enough (especially if those stations are sleeping and the frames are
queued for a while.)
Shuffle around some bits to make more room for ack_frame_id to allow
up to 8192 queued up frames, that's enough for queueing 4 frames to
each connected station, even at the maximum of 2007 stations on a
single AP.
We take the bits from band (which currently only 2 but I leave 3 in
case we add another band) and from the hw_queue, which can only need
4 since it has a limit of 16 queues.
Fixes: 6912daed05e1 ("mac80211: Shrink the size of ack_frame_id to make room for tx_time_est") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20200115122549.b9a4ef9f4980.Ied52ed90150220b83a280009c590b65d125d087c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In file included from ../arch/s390/boot/startup.c:3:
In file included from ../include/linux/elf.h:5:
In file included from ../arch/s390/include/asm/elf.h:132:
In file included from ../include/linux/compat.h:10:
In file included from ../include/linux/time.h:74:
In file included from ../include/linux/time32.h:13:
In file included from ../include/linux/timex.h:65:
../arch/s390/include/asm/timex.h:160:20: warning: passing 'unsigned char
[16]' to parameter of type 'char *' converts between pointers to integer
types with different sign [-Wpointer-sign]
get_tod_clock_ext(clk);
^~~
../arch/s390/include/asm/timex.h:149:44: note: passing argument to
parameter 'clk' here
static inline void get_tod_clock_ext(char *clk)
^
Change clk's type to just be char so that it matches what happens in
get_tod_clock_ext.
We don't need to set pkey as valid in case that user set only one of pkey
index or port number, otherwise it will be resulted in NULL pointer
dereference while accessing to uninitialized pkey list. The following
crash from Syzkaller revealed it.
The root cause is that tasklet is actually a softirq. In a tasklet
handler, another softirq handler is triggered. Usually these softirq
handlers run on the same cpu core. So this will cause "soft lockup Bug".
Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
when entering into TERM state.
In c4iw_modify_qp(), disconnect operation should only be performed when
the modify_qp call is invoked from ib_core. And all other internal
modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
like below can occur:
Call Trace:
schedule+0x2f/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.5+0x2d0/0x4a0
c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again
c4iw_modify_qp+0x468/0x10d0
rx_data+0x218/0x570 => acquires ep lock
process_work+0x5f/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
kthread+0x112/0x130
ret_from_fork+0x35/0x40
Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state") Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a check that the size specified in the flow spec header doesn't cause
an overflow when calculating the filter size, and thus prevent access to
invalid memory. The following crash from syzkaller revealed it.
When disassociating a device from umad we must ensure that the sysfs
access is prevented before blocking the fops, otherwise assumptions in
syfs don't hold:
The prior patch made an error in moving the device_destroy(), it should
have been split into device_del() (above) and put_device() (below). At
this point we already have the split, so move the device_del() back to its
original place.
When the hfi1 device is shut down during a system reboot, it is possible
that some QPs might have not not freed by ULPs. More requests could be
post sent and a lingering timer could be triggered to schedule more packet
sends, leading to a crash:
The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.
Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table") Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The panic is the result of slab entries being freed during the destruction
of the pq slab.
The code attempts to quiesce the pq, but looking for n_req == 0 doesn't
account for new requests.
Fix the issue by using SRCU to get a pq pointer and adjust the pq free
logic to NULL the fd pq pointer prior to the quiesce.
Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Each user context is allocated a certain number of RcvArray (TID)
entries and these entries are managed through TID groups. These groups
are put into one of three lists in each user context: tid_group_list,
tid_used_list, and tid_full_list, depending on the number of used TID
entries within each group. When TID packets are expected, one or more
TID groups will be allocated. After the packets are received, the TID
groups will be freed. Since multiple user threads may access the TID
groups simultaneously, a mutex exp_mutex is used to synchronize the
access. However, when the user file is closed, it tries to release
all TID groups without acquiring the mutex first, which risks a race
condition with another thread that may be releasing its TID groups,
leading to data corruption.
This patch addresses the issue by acquiring the mutex first before
releasing the TID groups when the file is closed.
Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs") Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.
This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.
Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter") Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The end of buffer check is off-by-one since the check is against
an index that is pre-incremented before a store to buf[]. Fix this
adjusting the bounds check appropriately.
Addresses-Coverity: ("Out-of-bounds write") Fixes: 51bd6f291583 ("Add support for IPMB driver") Signed-off-by: Colin Ian King <colin.king@canonical.com>
Message-Id: <20200114144031.358003-1-colin.king@canonical.com> Reviewed-by: Asmaa Mnebhi <asmaa@mellanox.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The input_read function declares the size of the hex array relative to
sizeof(buf), but buf is a pointer argument of the function. The hex
array is meant to contain hexadecimal representation of the bin array.
Link: https://lore.kernel.org/r/20200215142130.22743-1-marek.behun@nic.cz Fixes: 5bc7f990cd98 ("bus: Add support for Moxtet bus") Signed-off-by: Marek Behún <marek.behun@nic.cz> Reported-by: sohu0106 <sohu0106@126.com> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rather than the FEATURE_ID flags. Avoids a possible reading past
the end of the array.
Reviewed-by: Evan Quan <evan.quan@amd.com> Reported-by: Aleksandr Mezin <mezin.alexander@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.5.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Userspace might tag a BO purgeable while it's still referenced by GPU
jobs. We need to make sure the shrinker does not purge such BOs until
all jobs referencing it are finished.
Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191129135908.2439529-9-boris.brezillon@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to DP specification, DP_SINK_EVENT_NOTIFY is also a
broadcast message but as this function only handles
DP_CONNECTION_STATUS_NOTIFY I will only make the static
analyzer that caught this issue happy by not calling
drm_dp_get_mst_branch_device_by_guid() with a NULL guid, causing
drm_dp_mst_process_up_req() to return in the "if (!mstb)" right
bellow.
Fixes: 9408cc94eb04 ("drm/dp_mst: Handle UP requests asynchronously") Cc: Lyude Paul <lyude@redhat.com> Cc: Sean Paul <sean@poorly.run> Cc: <stable@vger.kernel.org> # v5.5+ Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
[added cc to stable] Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200129232448.84704-1-jose.souza@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's two references floating around here (for the object reference,
not the handle_count reference, that's a different thing):
- The temporary reference held by vgem_gem_create, acquired by
creating the object and released by calling
drm_gem_object_put_unlocked.
- The reference held by the object handle, created by
drm_gem_handle_create. This one generally outlives the function,
except if a 2nd thread races with a GEM_CLOSE ioctl call.
So usually everything is correct, except in that race case, where the
access to gem_object->size could be looking at freed data already.
Which again isn't a real problem (userspace shot its feet off already
with the race, we could return garbage), but maybe someone can exploit
this as an information leak.
Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+0dc4444774d419e916c8@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Emil Velikov <emil.velikov@collabora.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Sean Paul <seanpaul@chromium.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Eric Anholt <eric@anholt.net> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Rob Clark <robdclark@chromium.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200202132133.1891846-1-daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The query parameter block might contain additional information and can
be extended in the future. If the size of the block does not suffice we
get an error code of rc=0x100. The buffer will contain all information
up to the specified size and the hypervisor/guest simply do not need the
additional information as they do not know about the new data. That
means that we can (and must) accept rc=0x100 as success.
The pkey ioctl call PKEY_SEC2PROTK updates a struct pkey_protkey
on return. The protected key is stored in, the protected key type
is stored in but the len information was not updated. This patch
now fixes this and so the len field gets an update to refrect
the actual size of the protected key value returned.
Fixes: efc598e6c8a9 ("s390/zcrypt: move cca misc functions to new code file") Cc: Stable <stable@vger.kernel.org> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reported-by: Christian Rund <RUNDC@de.ibm.com> Suggested-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h"),
claimed L2 misses were unsupported, due to them not being found in its
referenced documentation, whose link has now moved [1].
That old documentation listed PMCx064 unit mask bit 3 as:
"LsRdBlkC: LS Read Block C S L X Change to X Miss."
and bit 0 as:
"IcFillMiss: IC Fill Miss"
We now have new public documentation [2] with improved descriptions, that
clearly indicate what events those unit mask bits represent:
Bit 3 now clearly states:
"LsRdBlkC: Data Cache Req Miss in L2 (all types)"
and bit 0 is:
"IcFillMiss: Instruction Cache Req Miss in L2."
So we can now add support for L2 misses in perf's genericised events as
PMCx064 with both the above unit masks.
[1] The commit's original documentation reference, "Processor Programming
Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors",
originally available here:
Define PT_MAX_FULL_LEVELS as PT64_ROOT_MAX_LEVEL, i.e. 5, to fix shadow
paging for 5-level guest page tables. PT_MAX_FULL_LEVELS is used to
size the arrays that track guest pages table information, i.e. using a
"max levels" of 4 causes KVM to access garbage beyond the end of an
array when querying state for level 5 entries. E.g. FNAME(gpte_changed)
will read garbage and most likely return %true for a level 5 entry,
soft-hanging the guest because FNAME(fetch) will restart the guest
instead of creating SPTEs because it thinks the guest PTE has changed.
Note, KVM doesn't yet support 5-level nested EPT, so PT_MAX_FULL_LEVELS
gets to stay "4" for the PTTYPE_EPT case.
Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU
currently also hardcodes the page walk level for nested EPT to be 4
levels. The L2 guest is all but guaranteed to soft hang on its first
instruction when L1 is using EPT, as KVM will construct 4-level page
tables and then tell hardware to use 5-level page tables.
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:
1) Use-after-free:
On 27.11.19 17:07:33, John Garry wrote:
> [ 22.104498] BUG: KASAN: use-after-free in
> edac_remove_sysfs_mci_device+0x148/0x180
The use-after-free is caused by the mci_for_each_dimm() macro called in
edac_remove_sysfs_mci_device(). The iterator was introduced with
c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator").
The iterator loop calls device_unregister(&dimm->dev), which removes
the sysfs entry of the device, but also frees the dimm struct in
dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
the dimm struct is accessed again, after having been freed already.
The fix is to free all the mci device's subsequent dimm and csrow
objects at a later point, in _edac_mc_free(), when the mci device itself
is being freed.
This keeps the data structures intact and the mci device can be
fully used until its removal. The change allows the safe usage of
mci_for_each_dimm() to release dimm devices from sysfs.
All leaks are from memory allocated by edac_mc_alloc().
Note: The test above shows that edac_mc_alloc() was called here from
ghes_edac_register(), thus both functions show up in the stack trace
but the module causing the leaks is edac_mc. The comments with the data
structures involved were made manually by analyzing the objdump.
The data structures listed above and created by edac_mc_alloc() are
not properly removed during device removal, which is done in
edac_mc_free().
There are two paths implemented to remove the device depending on device
registration, _edac_mc_free() is called if the device is not registered
and edac_unregister_sysfs() otherwise.
The implemenations differ. For the sysfs case, the mci device removal
lacks the removal of subsequent data structures (csrows, channels,
dimms). This causes the memory leaks (see mci_attr_release()).
[ bp: Massage commit message. ]
Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All created csrow objects must be removed in the error path of
edac_create_csrow_objects(). The objects have been added as devices.
They need to be removed by doing a device_del() *and* put_device() call
to also free their memory. The missing put_device() leaves a memory
leak. Use device_unregister() instead of device_del() which properly
unregisters the device doing both.
Fixes: 7adc05d2dc3a ("EDAC/sysfs: Drop device references properly") Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.
*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
already been unmapped.
For the metadata buffer we forgetting, we should always keep the mapped
flag and clear the dirty flags is enough, so this patch pick out the
these buffers and keep their BH_Mapped flag.
Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().
Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before we add a new EA we should check that this will not overflow
the maximum buffer we have available to read the EAs back.
Otherwise we can get into a situation where the EAs are so big that
we can not read them back to the client and thus we can not list EAs
anymore or delete them.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.
The bug was exposed by recent changes to the AMD IOMMU driver, which
enabled sg entry concatenation.
Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
new memory registration API") and reviewing other kernel ULPs, it's
not clear that the frwr_map() logic was ever correct for this case.
Reported-by: Andre Tomt <andre@tomt.net> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5153faac18d2 ("cgroup: remove cgroup_enable_task_cg_lists()
optimization") removed lazy initialization of css_sets so that new
tasks are always lniked to its css_set. In the process, it incorrectly
ended up adding init_tasks to root css_set. They show up as PID 0's in
root's cgroup.procs triggering warnings in systemd and generally
confusing people.
When all CPUs in the system implement the SSBS extension, the SSBS field
in PSTATE is the definitive indication of the mitigation state. Further,
when the CPUs implement the SSBS manipulation instructions (advertised
to userspace via an HWCAP), EL0 can toggle the SSBS field directly and
so we cannot rely on any shadow state such as TIF_SSBD at all.
Avoid forcing the SSBS field in context-switch on such a system, and
simply rely on the PSTATE register instead.
Cc: <stable@vger.kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Srinivas Ramana <sramana@codeaurora.org> Fixes: cbdf8a189a66 ("arm64: Force SSBS on context switch") Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Care is taken with "index", however with the current version
the actual xgpio_writereg is using index for data but
xgpio_regoffset(chip, i) for the offset. And since i is already
incremented it is incorrect. This patch fixes it so that index
is used for the offset too.
The CONFIG_ARCH_REQUIRE_GPIOLIB is gone since commit 65053e1a7743
("gpio: delete ARCH_[WANTS_OPTIONAL|REQUIRE]_GPIOLIB") and all platforms
should explicitly select GPIOLIB to have it.
A remount to a read-write filesystem is not safe when there's tree-log
to be replayed. Files that could be opened until now might be affected
by the changes in the tree-log.
A regular mount is needed to replay the log so the filesystem presents
the consistent view with the pending changes included.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's no logged information about tree-log replay although this is
something that points to previous unclean unmount. Other filesystems
report that as well.
Suggested-by: Chris Murphy <lists@colorremedies.com> CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In btrfs_ref_tree_mod(), 'ref' and 'ra' are allocated through kzalloc() and
kmalloc(), respectively. In the following code, if an error occurs, the
execution will be redirected to 'out' or 'out_unlock' and the function will
be exited. However, on some of the paths, 'ref' and 'ra' are not
deallocated, leading to memory leaks. For example, if 'action' is
BTRFS_ADD_DELAYED_EXTENT, add_block_entry() will be invoked. If the return
value indicates an error, the execution will be redirected to 'out'. But,
'ref' is not deallocated on this path, causing a memory leak.
To fix the above issues, deallocate both 'ref' and 'ra' before exiting from
the function when an error is encountered.
CC: stable@vger.kernel.org # 4.15+ Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have a few cases where we allow an extent map that is in an extent map
tree to be merged with other extents in the tree. Such cases include the
unpinning of an extent after the respective ordered extent completed or
after logging an extent during a fast fsync. This can lead to subtle and
dangerous problems because when doing the merge some other task might be
using the same extent map and as consequence see an inconsistent state of
the extent map - for example sees the new length but has seen the old start
offset.
With luck this triggers a BUG_ON(), and not some silent bug, such as the
following one in __do_readpage():
$ cat -n fs/btrfs/extent_io.c
3061 static int __do_readpage(struct extent_io_tree *tree,
3062 struct page *page,
(...)
3127 em = __get_extent_map(inode, page, pg_offset, cur,
3128 end - cur + 1, get_extent, em_cached);
3129 if (IS_ERR_OR_NULL(em)) {
3130 SetPageError(page);
3131 unlock_extent(tree, cur, end);
3132 break;
3133 }
3134 extent_offset = cur - em->start;
3135 BUG_ON(extent_map_end(em) <= cur);
(...)
Consider the following example scenario, where we end up hitting the
BUG_ON() in __do_readpage().
We have an inode with a size of 8KiB and 2 extent maps:
extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by
a previous transaction
extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet
persisted but writeback started for it already. The extent map
is pinned since there's writeback and an ordered extent in
progress, so it can not be merged with extent map A yet
The following sequence of steps leads to the BUG_ON():
1) The ordered extent for extent B completes, the respective page gets its
writeback bit cleared and the extent map is unpinned, at that point it
is not yet merged with extent map A because it's in the list of modified
extents;
2) Due to memory pressure, or some other reason, the MM subsystem releases
the page corresponding to extent B - btrfs_releasepage() is called and
returns 1, meaning the page can be released as it's not dirty, not under
writeback anymore and the extent range is not locked in the inode's
iotree. However the extent map is not released, either because we are
not in a context that allows memory allocations to block or because the
inode's size is smaller than 16MiB - in this case our inode has a size
of 8KiB;
3) Task B needs to read extent B and ends up __do_readpage() through the
btrfs_readpage() callback. At __do_readpage() it gets a reference to
extent map B;
4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B
while holding the write lock on the inode's extent map tree - this
results in try_merge_map() being called and since it's possible to merge
extent map B with extent map A now (the extent map B was removed from
the list of modified extents), the merging begins - it sets extent map
B's start offset to 0 (was 4KiB), but before it increments the map's
length to 8KiB (4kb + 4KiB), task A is at:
BUG_ON(extent_map_end(em) <= cur);
The call to extent_map_end() sees the extent map has a start of 0
and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so
the BUG_ON() is triggered.
So it's dangerous to modify an extent map that is in the tree, because some
other task might have got a reference to it before and still using it, and
needs to see a consistent map while using it. Generally this is very rare
since most paths that lookup and use extent maps also have the file range
locked in the inode's iotree. The fsync path is pretty much the only
exception where we don't do it to avoid serialization with concurrent
reads.
Fix this by not allowing an extent map do be merged if if it's being used
by tasks other then the one attempting to merge the extent map (when the
reference count of the extent map is greater than 2).
Reported-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp> Reported-by: Koki Mitani <koki.mitani.xg@hco.ntt.co.jp>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211 CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If CONFIG_QFMT_V2 is not enabled, but CONFIG_QUOTA is enabled, when a
user tries to mount a file system with the quota or project quota
enabled, the kernel will emit a very confusing messsage:
EXT4-fs warning (device vdc): ext4_enable_quotas:5914: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix.
EXT4-fs (vdc): mount failed
We will now report an explanatory message indicating which kernel
configuration options have to be enabled, to avoid customer/sysadmin
confusion.
Link: https://lore.kernel.org/r/20200215012738.565735-1-tytso@mit.edu
Google-Bug-Id: 149093531 Fixes: 7c319d328505b778 ("ext4: make quota as first class supported feature") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When journal size is set too big by "mkfs.ext4 -J size=", or when
we mount a crafted image to make journal inode->i_size too big,
the loop, "while (i < num)", holds cpu too long. This could cause
soft lockup.
DIR_INDEX has been introduced as a compat ext4 feature. That means that
even kernels / tools that don't understand the feature may modify the
filesystem. This works because for kernels not understanding indexed dir
format, internal htree nodes appear just as empty directory entries.
Index dir aware kernels then check the htree structure is still
consistent before using the data. This all worked reasonably well until
metadata checksums were introduced. The problem is that these
effectively made DIR_INDEX only ro-compatible because internal htree
nodes store checksums in a different place than normal directory blocks.
Thus any modification ignorant to DIR_INDEX (or just clearing
EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
and trigger kernel errors. So we have to be more careful when dealing
with indexed directories on filesystems with checksumming enabled.
1) We just disallow loading any directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it
means someone disabled DIR_INDEX on existing filesystem and didn't run
e2fsck), e2fsck can fix the problem, and we don't want to answer the
difficult question: "Should we rather corrupt the directory more or
should we ignore that DIR_INDEX feature is not set?"
2) When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree
information for reading but we refuse to add new entries to the
directory to avoid corrupting it more.
Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent commit, 9803387c55f7 ("ext4: validate the
debug_want_extra_isize mount option at parse time"), moved mount-time
checks around. One of those changes moved the inode size check before
the blocksize variable was set to the blocksize of the file system.
After 9803387c55f7 was set to the minimum allowable blocksize, which
in practice on most systems would be 1024 bytes. This cuased file
systems with inode sizes larger than 1024 bytes to be rejected with a
message:
EXT4-fs (sdXX): unsupported inode size: 4096
Fixes: 9803387c55f7 ("ext4: validate the debug_want_extra_isize mount option at parse time") Link: https://lore.kernel.org/r/20200206225252.GA3673@mit.edu Reported-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't assume that the mmp_nodename and mmp_bdevname strings are NUL
terminated, since they are filled in by snprintf(), which is not
guaranteed to do so.
If the platform triggers a spurious SCI even though the status bit
is not set for any GPE when the system is suspended to idle, it will
be treated as a genuine wakeup, so avoid that by checking if any GPEs
are active at all before returning 'true' from acpi_s2idle_wake().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206413 Fixes: 56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow") Reported-by: Tsuchiya Yuto <kitakar@gmail.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is theoretically possible for the ACPI EC GPE to be set after the
s2idle_ops->wake() called from s2idle_loop() has returned and before
the subsequent pm_wakeup_pending() check is carried out. If that
happens, the resulting wakeup event will cause the system to resume
even though it may be a spurious one.
To avoid that race, first make the ->wake() callback in struct
platform_s2idle_ops return a bool value indicating whether or not
to let the system resume and rearrange s2idle_loop() to use that
value instad of the direct pm_wakeup_pending() call if ->wake() is
present.
Next, rework acpi_s2idle_wake() to process EC events and check
pm_wakeup_pending() before re-arming the SCI for system wakeup
to prevent it from triggering prematurely and add comments to
that function to explain the rationale for the new code flow.
Fixes: 56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 016b87ca5c8c ("ACPI: EC: Rework flushing of pending work")
introduced a subtle bug into the flushing of pending EC work while
suspended to idle, which may cause the EC driver to fail to
re-enable the EC GPE after handling a non-wakeup event (like a
battery status change event, for example).
The problem is that the work item flushed by flush_scheduled_work()
in __acpi_ec_flush_work() may disable the EC GPE and schedule another
work item expected to re-enable it, but that new work item is not
flushed, so __acpi_ec_flush_work() returns with the EC GPE disabled
and the CPU running it goes into an idle state subsequently. If all
of the other CPUs are in idle states at that point, the EC GPE won't
be re-enabled until at least one CPU is woken up by another interrupt
source, so system wakeup events that would normally come from the EC
then don't work.
This is reproducible on a Dell XPS13 9360 in my office which
sometimes stops reacting to power button and lid events (triggered
by the EC on that machine) after switching from AC power to battery
power or vice versa while suspended to idle (each of those switches
causes the EC GPE to trigger for several times in a row, but they
are not system wakeup events).
To avoid this problem, it is necessary to drain the workqueue
entirely in __acpi_ec_flush_work(), but that cannot be done with
respect to system_wq, because work items may be added to it from
other places while __acpi_ec_flush_work() is running. For this
reason, make the EC driver use a dedicated workqueue for EC events
processing (let that workqueue be ordered so that EC events are
processed sequentially) and use drain_workqueue() on it in
__acpi_ec_flush_work().
Fixes: 016b87ca5c8c ("ACPI: EC: Rework flushing of pending work") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Audioengine D1 (0x2912:0x30c8) does support reading the sample rate,
but it returns the rate in byte-reversed order.
When setting sampling rate, the driver produces these warning messages:
[168840.944226] usb 3-2.2: current rate 4500480 is different from the runtime rate 44100
[168854.930414] usb 3-2.2: current rate 8436480 is different from the runtime rate 48000
[168905.185825] usb 3-2.1.2: current rate 30465 is different from the runtime rate 96000
As can be seen from the hexadecimal conversion, the current rate read
back is byte-reversed from the rate that was set.
MSI-GL73 laptop with ALC1220 codec requires a similar workaround for
Clevo laptops to enforce the DAC/mixer connection path. Set up a
quirk entry for that.
The commit 66f2d19f8116 ("ALSA: pcm: Fix memory leak at closing a
stream without hw_free") tried to fix the regression wrt the missing
hw_free call at closing without SNDRV_PCM_IOCTL_HW_FREE ioctl.
However, the code change dropped mistakenly the state check, resulting
in calling hw_free twice when SNDRV_PCM_IOCTL_HW_FRE got called
beforehand. For most drivers, this is almost harmless, but the
drivers like SOF show another regression now.
This patch adds the state condition check before calling do_hw_free()
at releasing the stream for avoiding the double hw_free calls.
We've got a regression report about M-Audio Fast Track C400 device,
and the git bisection resulted in the commit e0ccdef92653 ("ALSA:
usb-audio: Clean up check_input_term()"). This commit was about the
rewrite of the input terminal parser, and it's not too obvious from
the change what really broke. The answer is: it's the interpretation
of UAC2/3 effect units.
In the original code, UAC2 effect unit is as if through UAC1
processing unit because both UAC1 PU and UAC2/3 EU share the same
number (0x07). The old code went through a complex switch-case
fallthrough, finally bailing out in the middle:
if (protocol == UAC_VERSION_2 &&
hdr[2] == UAC2_EFFECT_UNIT) {
/* UAC2/UAC1 unit IDs overlap here in an
* uncompatible way. Ignore this unit for now.
*/
return 0;
}
... and this special handling was missing in the new code; the new
code treats UAC2/3 effect unit as if it were equivalent with the
processing unit.
Actually, the old code was too confusing. The effect unit has an
incompatible unit description with the processing unit, so we
shouldn't have dealt with EU in the same way.
This patch addresses the regression by changing the effect unit
handling to the own parser function. The own parser function makes
the clear distinct with PU, so it improves the readability, too.
The EU parser just sets the type and the id like the old kernels.
Once when the proper effect unit support is added, we can revisit this
parser function, but for now, let's keep this simple setup as is.
It should be safe to ignore clock validity check result if the following
conditions are met:
- only one single sample rate is supported;
- the terminal is directly connected to the clock source;
- the clock type is internal.
This is to deal with some Denon DJ controllers that always reports that
clock is invalid.
For non-blocking issue, we set IOCB_NOWAIT in the kiocb. However, on a
raw block device, this yields an -EOPNOTSUPP return, as non-blocking
writes aren't supported. Turn this -EOPNOTSUPP into -EAGAIN, so we retry
from blocking context with IOCB_NOWAIT cleared.
After defer, a request will be prepared, that includes allocating iovec
if needed, and then submitted through io_wq_submit_work() but not custom
handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak
iovec, as it's in io-wq and the code goes as follows:
io_read() {
if (!io_wq_current_is_worker())
kfree(iovec);
}
Put all deallocation logic in io_{read,write,send,recv}(), which will
leave the memory, if going async with -EAGAIN.
It also fixes a leak after failed io_alloc_async_ctx() in
io_{recv,send}_msg().
commit bda0be7ad994 ("security: make inode_follow_link RCU-walk aware")
passed down the rcu flag to the SELinux AVC, but failed to adjust the
test in slow_avc_audit() to also return -ECHILD on LSM_AUDIT_DATA_DENTRY.
Previously, we only returned -ECHILD if generating an audit record with
LSM_AUDIT_DATA_INODE since this was only relevant from inode_permission.
Move the handling of MAY_NOT_BLOCK to avc_audit() and its inlined
equivalent in selinux_inode_permission() immediately after we determine
that audit is required, and always fall back to ref-walk in this case.
Fixes: bda0be7ad994 ("security: make inode_follow_link RCU-walk aware") Reported-by: Will Deacon <will@kernel.org> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss
descriptor") introduced a bounds check on the number of supplied rates to
lbs_ibss_join_existing() and made it to return on overflow.
However, the aforementioned commit doesn't set the return value accordingly
and thus, lbs_ibss_join_existing() would return with zero even though it
failed.
Make lbs_ibss_join_existing return -EINVAL in case the bounds check on the
number of supplied rates fails.
Fixes: e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss descriptor") Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss
descriptor") introduced a bounds check on the number of supplied rates to
lbs_ibss_join_existing().
Unfortunately, it introduced a return path from within a RCU read side
critical section without a corresponding rcu_read_unlock(). Fix this.
Fixes: e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss descriptor") Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
mwifiex_cmd_append_vsie_tlv() calls memcpy() without checking
the destination size may trigger a buffer overflower,
which a local user could use to cause denial of service
or the execution of arbitrary code.
Fix it by putting the length check before calling memcpy().
Signed-off-by: Qing Xu <m1s5p6688@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
mwifiex_ret_wmm_get_status() calls memcpy() without checking the
destination size.Since the source is given from remote AP which
contains illegal wmm elements , this may trigger a heap buffer
overflow.
Fix it by putting the length check before calling memcpy().
Signed-off-by: Qing Xu <m1s5p6688@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>