For tc ip_proto filter, when we extract the flow via __skb_flow_dissect()
without flag FLOW_DISSECTOR_F_STOP_AT_ENCAP, we will continue extract to
the inner proto.
So for GRE + ICMP messages, we should not track GRE proto, but inner ICMP
proto.
For test mirror_gre.sh, it may make user confused if we capture ICMP
message on $h3(since the flow is GRE message). So I move the capture
dev to h3-gt{4,6}, and only capture ICMP message.
Before the fix:
]# ./mirror_gre.sh
TEST: ingress mirror to gretap (skip_hw) [ OK ]
TEST: egress mirror to gretap (skip_hw) [ OK ]
TEST: ingress mirror to ip6gretap (skip_hw) [ OK ]
TEST: egress mirror to ip6gretap (skip_hw) [ OK ]
TEST: ingress mirror to gretap: envelope MAC (skip_hw) [FAIL]
Expected to capture 10 packets, got 0.
TEST: egress mirror to gretap: envelope MAC (skip_hw) [FAIL]
Expected to capture 10 packets, got 0.
TEST: ingress mirror to ip6gretap: envelope MAC (skip_hw) [FAIL]
Expected to capture 10 packets, got 0.
TEST: egress mirror to ip6gretap: envelope MAC (skip_hw) [FAIL]
Expected to capture 10 packets, got 0.
TEST: two simultaneously configured mirrors (skip_hw) [ OK ]
WARN: Could not test offloaded functionality
After fix:
]# ./mirror_gre.sh
TEST: ingress mirror to gretap (skip_hw) [ OK ]
TEST: egress mirror to gretap (skip_hw) [ OK ]
TEST: ingress mirror to ip6gretap (skip_hw) [ OK ]
TEST: egress mirror to ip6gretap (skip_hw) [ OK ]
TEST: ingress mirror to gretap: envelope MAC (skip_hw) [ OK ]
TEST: egress mirror to gretap: envelope MAC (skip_hw) [ OK ]
TEST: ingress mirror to ip6gretap: envelope MAC (skip_hw) [ OK ]
TEST: egress mirror to ip6gretap: envelope MAC (skip_hw) [ OK ]
TEST: two simultaneously configured mirrors (skip_hw) [ OK ]
WARN: Could not test offloaded functionality
Fixes: ba8d39871a10 ("selftests: forwarding: Add test for mirror to gretap") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Petr Machata <pmachata@gmail.com> Tested-by: Petr Machata <pmachata@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
For a given byte clock, if VCO recalc value is exactly same as
vco set rate value, vco_set_rate does not get called assuming
VCO is already set to required value. But Due to GDSC toggle,
VCO values are erased in the HW. To make sure VCO is programmed
correctly, we forcefully call set_rate from vco_prepare.
Signed-off-by: Harigovindan P <harigovi@codeaurora.org> Reviewed-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Save pll state before dsi host is powered off. Without this change
some register values gets resetted.
Signed-off-by: Harigovindan P <harigovi@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a flag to DMA memory allocation to silence a warning.
This driver allocates DMA memory for IO frames. This allocation may exceed
MAX_ORDER pages for few megaraid_sas controllers (controllers with very
high queue depth). Consequently, the driver has logic to keep reducing the
controller queue depth until the DMA memory allocation succeeds.
On impacted megaraid_sas controllers there would be multiple DMA allocation
failures until driver settled on an allocation that fit. These failed DMA
allocation requests caused stack traces in system logs. These were not
harmful and this patch silences those warnings/stack traces.
[mkp: clarified commit desc]
Link: https://lore.kernel.org/r/20200204152413.7107-1-thenzl@redhat.com Signed-off-by: Tomas Henzl <thenzl@redhat.com> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
At the moment, only DRM_MODE_ROTATE_180 is allowed when we try to apply
the rotation from the video mode parameters. It is also useful to allow
DRM_MODE_ROTATE_0 in case there is only a reflect option in the video mode
parameter (e.g. video=540x960,reflect_x).
DRM_MODE_ROTATE_0 means "no rotation" and should therefore not require
any special handling, so we can just add it to the if condition.
A rotation value should have exactly one rotation angle.
At the moment there is no validation for this when parsing video=
parameters from the command line. This causes problems later on
when we try to combine the command line rotation with the panel
orientation.
To make sure that we generate a valid rotation value:
- Set DRM_MODE_ROTATE_0 by default (if no rotate= option is set)
- Validate that there is exactly one rotation angle set
(i.e. specifying the rotate= option multiple times is invalid)
I was hitting kCFI crashes when building with clang, and after
some digging finally narrowed it down to the
dsi_mgr_connector_mode_valid() function being implemented as
returning an int, instead of an enum drm_mode_status.
This patch fixes it, and appeases the opaque word of the kCFI
gods (seriously, clang inlining everything makes the kCFI
backtraces only really rough estimates of where things went
wrong).
Thanks as always to Sami for his help narrowing this down.
Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Alistair Delva <adelva@google.com> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: freedreno@lists.freedesktop.org Cc: clang-built-linux@googlegroups.com Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
During device memory memset, the driver allocates and use a CB (command
buffer). To reuse existing code, it keeps a pointer to the CB in two
variables, user_cb and patched_cb. Therefore, there is no need to "put"
both the user_cb and patched_cb, as it will cause an underflow of the
refcnt of the CB.
During hard reset we must not write to the device.
Hence avoid halting CoreSight during user context close if it is done
during hard reset.
In addition, we must not re-enable clock gating afterwards as it was
deliberately disabled in the beginning of the hard reset flow.
The driver must halt the engines before doing hard-reset, otherwise the
device can go into undefined state. There is a place where the driver
didn't do that and this patch fixes it.
Symptom: application opens /dev/ttyGS0 and starts sending (writing) to
it while either USB cable is not connected, or nobody listens on the
other side of the cable. If driver circular buffer overflows before
connection is established, no data will be written to the USB layer
until/unless /dev/ttyGS0 is closed and re-opened again by the
application (the latter besides having no means of being notified about
the event of establishing of the connection.)
Fix: on open and/or connect, kick Tx to flush circular buffer data to
USB layer.
Signed-off-by: Sergey Organov <sorganov@gmail.com> Reviewed-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
ffs_aio_cancel() can be called from both interrupt and thread context. Make
sure that the current IRQ state is saved and restored by using
spin_{un,}lock_irq{save,restore}().
Otherwise undefined behavior might occur.
Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
USB 3.x SuperSpeed peripherals can draw up to 900mA of VBUS power
when in configured state. However, if a configuration wanting to
take advantage of this is added with MaxPower greater than 500
(currently possible if using a ConfigFS gadget) the composite
driver fails to accommodate this for a couple reasons:
- usb_gadget_vbus_draw() when called from set_config() and
composite_resume() will be passed the MaxPower value without
regard for the current connection speed, resulting in a
violation for USB 2.0 since the max is 500mA.
- the bMaxPower of the configuration descriptor would be
incorrectly encoded, again if the connection speed is only
at USB 2.0 or below, likely wrapping around U8_MAX since
the 2mA multiplier corresponds to a maximum of 510mA.
Fix these by adding checks against the current gadget->speed
when the c->MaxPower value is used (set_config() and
composite_resume()) and appropriately limit based on whether
it is currently at a low-/full-/high- or super-speed connection.
Because 900 is not divisible by 8, with the round-up division
currently used in encode_bMaxPower() a MaxPower of 900mA will
result in an encoded value of 0x71. When a host stack (including
Linux and Windows) enumerates this on a single port root hub, it
reads this value back and decodes (multiplies by 8) to get 904mA
which is strictly greater than 900mA that is typically budgeted
for that port, causing it to reject the configuration. Instead,
we should be using the round-down behavior of normal integral
division so that 900 / 8 -> 0x70 or 896mA to stay within range.
And we might as well change it for the high/full/low case as well
for consistency.
N.B. USB 3.2 Gen N x 2 allows for up to 1500mA but there doesn't
seem to be any any peripheral controller supported by Linux that
does two lane operation, so for now keeping the clamp at 900
should be fine.
Signed-off-by: Jack Pham <jackp@codeaurora.org> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
With some shells, the command construed for install of bpf selftests becomes
too large due to long list of files:
make[1]: execvp: /bin/sh: Argument list too long
make[1]: *** [../lib.mk:73: install] Error 127
Currently, each of the file lists is replicated three times in the command:
in the shell 'if' condition, in the 'echo' and in the 'rsync'. Reduce that
by one instance by using make conditionals and separate the echo and rsync
into two shell commands. (One would be inclined to just remove the '@' at
the beginning of the rsync command and let 'make' echo it by itself;
unfortunately, it appears that the '@' in the front of mkdir silences output
also for the following commands.)
Also, separate handling of each of the lists to its own shell command.
The semantics of the makefile is unchanged before and after the patch. The
ability of individual test directories to override INSTALL_RULE is retained.
tpm2 tests set fails if there is no /dev/tpm0 and /dev/tpmrm0
supported. Check if these files exist before run and mark test as
skipped in case of absence.
On AR934x this UART is usually not initialized by the bootloader
as it is only used as a secondary serial port while the primary
UART is a newly introduced NS16550-compatible.
In order to make use of the ar933x-uart on AR934x without RTS/CTS
hardware flow control, one needs to set the
UART_CS_{RX,TX}_READY_ORIDE bits as other than on AR933x where this
UART is used as primary/console, the bootloader on AR934x typically
doesn't set those bits.
Setting them explicitely on AR933x should not do any harm, so just
set them unconditionally.
snd_hdac_ext_bus_link_get() does not work correctly in case
there are multiple codecs on the bus. It unconditionally
resets the bus->codec_mask value. As per documentation in
hdaudio.h and existing use in client code, this field should
be used to store bit flag of detected codecs on the bus.
By overwriting value of the codec_mask, information on all
detected codecs is lost. No current user of hdac is impacted,
but use of bus->codec_mask is planned in future patches
for SOF.
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20200206200223.7715-1-kai.vehmanen@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Before releasing the global mutex, we only unlink the hashtable
from the hash list, its proc file is still not unregistered at
this point. So syzbot could trigger a race condition where a
parallel htable_create() could register the same file immediately
after the mutex is released.
Move htable_remove_proc_entry() back to mutex protection to
fix this. And, fold htable_destroy() into htable_put() to make
the code slightly easier to understand.
Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com Fixes: c4a3922d2d20 ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There was a recent change in blktrace.c that added a RCU protection to
`q->blk_trace` in order to fix a use-after-free issue during access.
However the change missed an edge case that can lead to dereferencing of
`bt` pointer even when it's NULL:
Coverity static analyzer marked this as a FORWARD_NULL issue with CID 1460458.
```
/kernel/trace/blktrace.c: 1904 in sysfs_blk_trace_attr_store()
1898 ret = 0;
1899 if (bt == NULL)
1900 ret = blk_trace_setup_queue(q, bdev);
1901
1902 if (ret == 0) {
1903 if (attr == &dev_attr_act_mask)
>>> CID 1460458: Null pointer dereferences (FORWARD_NULL)
>>> Dereferencing null pointer "bt".
1904 bt->act_mask = value;
1905 else if (attr == &dev_attr_pid)
1906 bt->pid = value;
1907 else if (attr == &dev_attr_start_lba)
1908 bt->start_lba = value;
1909 else if (attr == &dev_attr_end_lba)
```
Added a reassignment with RCU annotation to fix the issue.
Fixes: c780e86dd48 ("blktrace: Protect q->blk_trace with RCU") Cc: stable@vger.kernel.org Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Cengiz Can <cengiz@kernel.wtf> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
When port is part of the modify mask, then we should take it from the
qp_attr and not from the old pps. Same for PKEY. Otherwise there are
panics in some configurations:
We are still experiencing some packet loss with the existing advanced
congestion buffering (ACB) settings with the IMP port configured for
2Gb/sec, so revert to conservative link speeds that do not produce
packet loss until this is resolved.
Fixes: 8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec") Fixes: de34d7084edd ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
We shouldn't need to hold the lock while are just tearing down and
freeing the whole metadata pool structure.
Fixes: 44d8ebf436399a4 ("dm thin metadata: use pool locking at end of dm_pool_metadata_close") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
BFQ maintains an ordered list, implemented with an RB tree, of
head-request positions of non-empty bfq_queues. This position tree,
inherited from CFQ, is used to find bfq_queues that contain I/O close
to each other. BFQ merges these bfq_queues into a single shared queue,
if this boosts throughput on the device at hand.
There is however a special-purpose bfq_queue that does not participate
in queue merging, the oom bfq_queue. Yet, also this bfq_queue could be
wrongly added to the position tree. So bfqq_find_close() could return
the oom bfq_queue, which is a source of further troubles in an
out-of-memory situation. This commit prevents the oom bfq_queue from
being inserted into the position tree.
Tested-by: Patrick Dung <patdung100@gmail.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
In bfq_bfqq_move(), the bfq_queue, say Q, to be moved to a new group
may happen to be deactivated in the scheduling data structures of the
source group (and then activated in the destination group). If Q is
referred only by the data structures in the source group when the
deactivation happens, then Q is freed upon the deactivation.
This commit addresses this issue by getting an extra reference before
the possible deactivation, and releasing this extra reference after Q
has been moved.
Tested-by: Chris Evich <cevich@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
BFQ schedules generic entities, which may represent either bfq_queues
or groups of bfq_queues. When an entity is inserted into a service
tree, a reference must be taken, to make sure that the entity does not
disappear while still referred in the tree. Unfortunately, such a
reference is mistakenly taken only if the entity represents a
bfq_queue. This commit takes a reference also in case the entity
represents a group.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Chris Evich <cevich@redhat.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is required for the auto-detection in the user space (for UCM).
Signed-off-by: Jaroslav Kysela <perex@perex.cz> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20191204211556.12671-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to the SDM, VMWRITE checks to see if the secondary source
operand corresponds to an unsupported VMCS field before it checks to
see if the secondary source operand corresponds to a VM-exit
information field and the processor does not support writing to
VM-exit information fields.
Fixes: 49f705c5324aa ("KVM: nVMX: Implement VMREAD and VMWRITE") Signed-off-by: Jim Mattson <jmattson@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Jon Cargille <jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the SDM, a VMWRITE in VMX non-root operation with an
invalid VMCS-link pointer results in VMfailInvalid before the validity
of the VMCS field in the secondary source operand is checked.
For consistency, modify both handle_vmwrite and handle_vmread, even
though there was no problem with the latter.
Fixes: 6d894f498f5d1 ("KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2") Signed-off-by: Jim Mattson <jmattson@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Jon Cargille <jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If thp defrag setting "defer" is used and a newline is *not* used when
writing to the sysfs file, this is interpreted as the "defer+madvise"
option.
This is because we do prefix matching and if five characters are written
without a newline, the current code ends up comparing to the first five
bytes of the "defer+madvise" option and using that instead.
Use the more appropriate sysfs_streq() that handles the trailing newline
for us. Since this doubles as a nice cleanup, do it in enabled_store()
as well.
The current implementation relies on prefix matching: the number of
bytes compared is either the number of bytes written or the length of
the option being compared. With a newline, "defer\n" does not match
"defer+"madvise"; without a newline, however, "defer" is considered to
match "defer+madvise" (prefix matching is only comparing the first five
bytes). End result is that writing "defer" is broken unless it has an
additional trailing character.
This means that writing "madv" in the past would match and set
"madvise". With strict checking, that no longer is the case but it is
unlikely anybody is currently doing this.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2001171411020.56385@chino.kir.corp.google.com Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Signed-off-by: David Rientjes <rientjes@google.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
This, combined with the fact that get_user_pages_fast() falls back to
"slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
if you need FOLL_FORCE, you cannot call get_user_pages_fast().
There does not appear to be any reason for filtering out FOLL_FORCE.
There is nothing in the _fast() implementation that requires that we
avoid writing to the pages. So it appears to have been an oversight.
Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().
Link: http://lkml.kernel.org/r/20200107224558.2362728-9-jhubbard@nvidia.com Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line")
inadvertently removed printing of page flags for pages that are neither
anon nor ksm nor have a mapping. Fix that.
Using pr_cont() again would be a solution, but the commit explicitly
removed its use. Avoiding the danger of mixing up split lines from
multiple CPUs might be beneficial for near-panic dumps like this, so fix
this without reintroducing pr_cont().
Link: http://lkml.kernel.org/r/9f884d5c-ca60-dc7b-219c-c081c755fab6@suse.cz Fixes: 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Anshuman Khandual <anshuman.khandual@arm.com> Reported-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Qian Cai <cai@lca.pw> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was found that two lines in the output of /proc/lockdep_stats have
indentation problem:
# cat /proc/lockdep_stats
:
in-process chains: 25057
stack-trace entries: 137827 [max: 524288]
number of stack traces: 7973
number of stack hash chains: 6355
combined max dependencies: 1356414598
hardirq-safe locks: 57
hardirq-unsafe locks: 1286
:
All the numbers displayed in /proc/lockdep_stats except the two stack
trace numbers are formatted with a field with of 11. To properly align
all the numbers, a field width of 11 is now added to the two stack
trace numbers.
Fixes: 8c779229d0f4 ("locking/lockdep: Report more stack trace statistics") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lkml.kernel.org/r/20191211213139.29934-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't allow passing arbitrary flags as they change behavior including
memory allocation that the call stack is not prepared for.
Fixes: ddbca70cc45c ("xfs: allocate xattr buffer on demand") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At the time the brcmstb_thermal driver and its binding were merged, the
DT binding did not make the coefficients properties a mandatory one,
therefore all users of the brcmstb_thermal driver out there have a non
functional implementation with zero coefficients. Even if these
properties were provided, the formula used for computation is incorrect.
The coefficients are entirely process specific (right now, only 28nm is
supported) and not board or SoC specific, it is therefore appropriate to
hard code them in the driver given the compatibility string we are
probed with which has to be updated whenever a new process is
introduced.
We remove the existing coefficients definition since subsequent patches
are going to add support for a new process and will introduce new
coefficients as well.
We are not interested in getting this debug print on our
console all the time.
Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Stephan Gerhold <stephan@gerhold.net> Fixes: 6c375eccded4 ("thermal: db8500: Rewrite to be a pure OF sensor") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Stephan Gerhold <stephan@gerhold.net> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191119074650.2664-1-linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fs/ubifs/debug.h:158:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘ino_t {aka unsigned int}’ [-Wformat=]
...
fs/ubifs/orphan.c:132:3: note: in expansion of macro ‘dbg_gen’
dbg_gen("deleted twice ino %lu", orph->inum);
...
fs/ubifs/orphan.c:140:3: note: in expansion of macro ‘dbg_gen’
dbg_gen("delete later ino %lu", orph->inum);
__kernel_ino_t is "unsigned long" on most architectures, but not on
alpha and s390x, where it is "unsigned int". Hence when printing an
ino_t, it should always be cast to "unsigned long" first.
Fix this by re-adding the recently removed casts.
Fixes: 8009ce956c3d2802 ("ubifs: Don't leak orphans on memory during commit") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current expedited RCU grace-period code expects that a task
requesting an expedited grace period cannot awaken until that grace
period has reached the wakeup phase. However, it is possible for a long
preemption to result in the waiting task never sleeping. For example,
consider the following sequence of events:
1. Task A starts an expedited grace period by invoking
synchronize_rcu_expedited(). It proceeds normally up to the
wait_event() near the end of that function, and is then preempted
(or interrupted or whatever).
2. The expedited grace period completes, and a kworker task starts
the awaken phase, having incremented the counter and acquired
the rcu_state structure's .exp_wake_mutex. This kworker task
is then preempted or interrupted or whatever.
3. Task A resumes and enters wait_event(), which notes that the
expedited grace period has completed, and thus doesn't sleep.
4. Task B starts an expedited grace period exactly as did Task A,
complete with the preemption (or whatever delay) just before
the call to wait_event().
5. The expedited grace period completes, and another kworker
task starts the awaken phase, having incremented the counter.
However, it blocks when attempting to acquire the rcu_state
structure's .exp_wake_mutex because step 2's kworker task has
not yet released it.
6. Steps 4 and 5 repeat, resulting in overflow of the rcu_node
structure's ->exp_wq[] array.
In theory, this is harmless. Tasks waiting on the various ->exp_wq[]
array will just be spuriously awakened, but they will just sleep again
on noting that the rcu_state structure's ->expedited_sequence value has
not advanced far enough.
In practice, this wastes CPU time and is an accident waiting to happen.
This commit therefore moves the rcu_exp_gp_seq_end() call that officially
ends the expedited grace period (along with associate tracing) until
after the ->exp_wake_mutex has been acquired. This prevents Task A from
awakening prematurely, thus preventing more than one expedited grace
period from being in flight during a previous expedited grace period's
wakeup phase.
Fixes: 3b5f668e715b ("rcu: Overlap wakeups with next expedited grace period") Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
[ paulmck: Added updated comment. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove a bogus clearing of apf.msr_val from kvm_arch_vcpu_destroy().
apf.msr_val is only set to a non-zero value by kvm_pv_enable_async_pf(),
which is only reachable by kvm_set_msr_common(), i.e. by writing
MSR_KVM_ASYNC_PF_EN. KVM does not autonomously write said MSR, i.e.
can only be written via KVM_SET_MSRS or KVM_RUN. Since KVM_SET_MSRS and
KVM_RUN are vcpu ioctls, they require a valid vcpu file descriptor.
kvm_arch_vcpu_destroy() is only called if KVM_CREATE_VCPU fails, and KVM
declares KVM_CREATE_VCPU successful once the vcpu fd is installed and
thus visible to userspace. Ergo, apf.msr_val cannot be non-zero when
kvm_arch_vcpu_destroy() is called.
Fixes: 344d9588a9df0 ("KVM: Add PV MSR to enable asynchronous page faults delivery.") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
x86 does not load its MMU until KVM_RUN, which cannot be invoked until
after vCPU creation succeeds. Given that kvm_arch_vcpu_destroy() is
called if and only if vCPU creation fails, it is impossible for the MMU
to be loaded.
Note, the bogus kvm_mmu_unload() call was added during an unrelated
refactoring of vCPU allocation, i.e. was presumably added as an
opportunstic "fix" for a perceived leak.
Fixes: fb3f0f51d92d1 ("KVM: Dynamically allocate vcpus") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, there are three static keys in the resctrl file system:
rdt_mon_enable_key and rdt_alloc_enable_key indicate if the monitoring
feature and the allocation feature are enabled, respectively. The
rdt_enable_key is enabled when either the monitoring feature or the
allocation feature is enabled.
If no monitoring feature is present (either hardware doesn't support a
monitoring feature or the feature is disabled by the kernel command line
option "rdt="), rdt_enable_key is still enabled but rdt_mon_enable_key
is disabled.
MBM is a monitoring feature. The MBM overflow handler intends to
check if the monitoring feature is not enabled for fast return.
So check the rdt_mon_enable_key in it instead of the rdt_enable_key as
former is the more accurate check.
`tools/perf/util/map.c` has a function named `maps__insert` that
acquires a write lock if its in multithread context.
Even though this lock is released when function successfully completes,
there's a branch that is executed when `maps_by_name == NULL` that
returns from this function without releasing the write lock.
Added an `up_write` to release the lock when this happens.
Fixes: a7c2b572e217 ("perf map_groups: Auto sort maps by name, if needed") Signed-off-by: Cengiz Can <cengiz@kernel.wtf> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200120141553.23934-1-cengiz@kernel.wtf Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we moved zalloc.o to the library we missed gtk library which needs
it compiled in, otherwise the missing __zfree symbol will cause the
library to fail to load.
Adding the zalloc object to the gtk library build.
Fixes: 7f7c536f23e6 ("tools lib: Adopt zalloc()/zfree() from tools/perf") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jelle van der Waa <jelle@vdwaa.nl> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200113104358.123511-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to set actions->ms.map since 599a2f38a989 ("perf hists browser:
Check sort keys before hot key actions"), as in that patch we bail out
if map is NULL.
Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 599a2f38a989 ("perf hists browser: Check sort keys before hot key actions") Link: https://lkml.kernel.org/n/tip-wp1ssoewy6zihwwexqpohv0j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/pwm/pwm-omap-dmtimer.c:304:2-8: ERROR: missing put_device;
call of_find_device_by_node on line 255, but without a corresponding
object release within this function.
Reported-by: Markus Elfring <elfring@users.sourceforge.net> Fixes: 6604c6556db9 ("pwm: Add PWM driver for OMAP using dual-mode timers") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
can be used even if there is no VDSO capable clocksource.
But if an architecture opts out of the VDSO data update then this
information becomes stale. This affects ARM when there is no architected
timer available. The lack of update causes userspace to use stale data
forever.
Make the update of the low resolution parts unconditional and only skip
the update of the high resolution parts if the architecture requests it.
Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200114185946.765577901@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function name suggests that this is a boolean checking whether the
architecture asks for an update of the VDSO data, but it works the other
way round. To spare further confusion invert the logic.
Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200114185946.656652824@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Set the unoptimized flag after confirming the code is completely
unoptimized. Without this fix, when a kprobe hits the intermediate
modified instruction (the first byte is replaced by an INT3, but
later bytes can still be a jump address operand) while unoptimizing,
it can return to the middle byte of the modified code, which causes
an invalid instruction exception in the kernel.
Usually, this is a rare case, but if we put a probe on the function
call while text patching, it always causes a kernel panic as below:
text_poke() is used for patching the code in optprobes.
This can happen even if we blacklist text_poke() and other functions,
because there is a small time window during which we show the intermediate
code to other CPUs.
[ mingo: Edited the changelog. ]
Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Fixes: 6274de4984a6 ("kprobes: Support delayed unoptimizing") Link: https://lkml.kernel.org/r/157483422375.25881.13508326028469515760.stgit@devnote2 Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keep the ima policy rules around from the beginning even if they appear
invalid at the time of loading, as they may become active after an lsm
policy load. However, loading a custom IMA policy with unknown LSM
labels is only safe after we have transitioned from the "built-in"
policy rules to a custom IMA policy.
Patch also fixes the rule re-use during the lsm policy reload and makes
some prints a bit more human readable.
Changelog:
v4:
- Do not allow the initial policy load refer to non-existing lsm rules.
v3:
- Fix too wide policy rule matching for non-initialized LSMs
v2:
- Fix log prints
'alloc_etherdev_mqs()' expects first 'tx', then 'rx'. The semantic here
looks reversed.
Reorder the arguments passed to 'alloc_etherdev_mqs()' in order to keep
the correct semantic.
In fact, this is a no-op because both XGENE_NUM_[RT]X_RING are 8.
Fixes: 107dec2749fe ("drivers: net: xgene: Add support for multiple queues") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Driver should first check whether the sge is valid, then fill the valid
sge and the caculated total into hardware, otherwise invalid sges will
cause an error.
Fixes: 52e3b42a2f58 ("RDMA/hns: Filter for zero length of sge in hip08 kernel mode") Fixes: 7bdee4158b37 ("RDMA/hns: Fill sq wqe context of ud type in hip08") Link: https://lore.kernel.org/r/1578571852-13704-1-git-send-email-liweihang@huawei.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the wqe idx is calculated repeatly everywhere it is used. This
patch defines wqe_idx and calculated it only once, then just use it as
needed.
Fixes: 2d40788825ac ("RDMA/hns: Add support for processing send wr and receive wr") Link: https://lore.kernel.org/r/1575981902-5274-1-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a test case can corrupt f2fs image:
- dd if=/dev/zero of=/swapfile bs=1M count=4096
- chmod 600 /swapfile
- mkswap /swapfile
- swapon --discard /swapfile
The root cause is f2fs_swap_activate() intends to return zero value
to setup_swap_extents() to enable SWP_FS mode (swap file goes through
fs), in this flow, setup_swap_extents() setups swap extent with wrong
block address range, result in discard_swap() erasing incorrect address.
Because f2fs_swap_activate() has pinned swapfile, its data block
address will not change, it's safe to let swap to handle IO through
raw device, so we can get rid of SWAP_FS mode and initial swap extents
inside f2fs_swap_activate(), by this way, later discard_swap() can trim
in right address range.
Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
select_idle_cpu() will scan the LLC domain for idle CPUs,
it's always expensive. so the next commit :
1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()")
introduces a way to limit how many CPUs we scan.
But it consume some CPUs out of 'nr' that are not allowed
for the task and thus waste our attempts. The function
always return nr_cpumask_bits, and we can't find a CPU
which our task is allowed to run.
Cpumask may be too big, similar to select_idle_core(), use
per_cpu_ptr 'select_idle_mask' to prevent stack overflow.
Fixes: 1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()") Signed-off-by: Cheng Jian <cj.chengjian@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20191213024530.28052-1-cj.chengjian@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When reading/writing using the guest/host cache, check for a bad hva
before checking for a NULL memslot, which triggers the slow path for
handing cross-page accesses. Because the memslot is nullified on error
by __kvm_gfn_to_hva_cache_init(), if the bad hva is encountered after
crossing into a new page, then the kvm_{read,write}_guest() slow path
could potentially write/access the first chunk prior to detecting the
bad hva.
Arguably, performing a partial access is semantically correct from an
architectural perspective, but that behavior is certainly not intended.
In the original implementation, memslot was not explicitly nullified
and therefore the partial access behavior varied based on whether the
memslot itself was null, or if the hva was simply bad. The current
behavior was introduced as a seemingly unintentional side effect in
commit f1b9dd5eb86c ("kvm: Disallow wraparound in
kvm_gfn_to_hva_cache_init"), which justified the change with "since some
callers don't check the return code from this function, it sit seems
prudent to clear ghc->memslot in the event of an error".
Regardless of intent, the partial access is dependent on _not_ checking
the result of the cache initialization, which is arguably a bug in its
own right, at best simply weird.
Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Cc: Jim Mattson <jmattson@google.com> Cc: Andrew Honig <ahonig@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The KVM MMIO support uses bit 51 as the reserved bit to cause nested page
faults when a guest performs MMIO. The AMD memory encryption support uses
a CPUID function to define the encryption bit position. Given this, it is
possible that these bits can conflict.
Use svm_hardware_setup() to override the MMIO mask if memory encryption
support is enabled. Various checks are performed to ensure that the mask
is properly defined and rsvd_bits() is used to generate the new mask (as
was done prior to the change that necessitated this patch).
Fixes: 28a1f3ac1d0c ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 800d3f561659 ("perf report: Add warning when libunwind not
compiled in") breaks the s390 platform. S390 uses libdw-dwarf-unwind for
call chain unwinding and had no support for libunwind.
So the warning "Please install libunwind development packages during the
perf build." caused the confusion even if the call-graph is displayed
correctly.
This patch adds checking for HAVE_DWARF_SUPPORT, which is set when
libdw-dwarf-unwind is compiled in.
Fixes: 800d3f561659 ("perf report: Add warning when libunwind not compiled in") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200107191745.18415-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 7afb94da3cd8 ("mwifiex: update set_mac_address logic") fixed the
only user of this function, partly because the author seems to have
noticed that, as written, it's on the borderline between highly
misleading and buggy.
Anyway, no sense in keeping dead code around: let's drop it.
Fixes: 7afb94da3cd8 ("mwifiex: update set_mac_address logic") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before commit 1e58252e334d ("mwifiex: Fix heap overflow in
mmwifiex_process_tdls_action_frame()"),
mwifiex_process_tdls_action_frame() already had too many magic numbers.
But this commit just added a ton more, in the name of checking for
buffer overflows. That seems like a really bad idea.
Let's make these magic numbers a little less magic, by
(a) factoring out 'pos[1]' as 'ie_len'
(b) using 'sizeof' on the appropriate source or destination fields where
possible, instead of bare numbers
(c) dropping redundant checks, per below.
Regarding redundant checks: the beginning of the loop has this:
if (pos + 2 + pos[1] > end)
break;
but then individual 'case's include stuff like this:
if (pos > end - 3)
return;
if (pos[1] != 1)
return;
Note that the second 'return' (validating the length, pos[1]) combined
with the above condition (ensuring 'pos + 2 + length' doesn't exceed
'end'), makes the first 'return' (whose 'if' can be reworded as 'pos >
end - pos[1] - 2') redundant. Rather than unwind the magic numbers
there, just drop those conditions.
Fixes: 1e58252e334d ("mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's over-zealous to return hard errors under RCU-walk here, given that
a REF-walk will be triggered for all other cases handling ".." under
RCU.
The original purpose of this check was to ensure that if a rename occurs
such that a directory is moved outside of the bind-mount which the
resolution started in, it would be detected and blocked to avoid being
able to mess with paths outside of the bind-mount. However, triggering a
new REF-walk is just as effective a solution.
Cc: "Eric W. Biederman" <ebiederm@xmission.com> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 9546a0b7ce00 ("tipc: fix wrong connect() return code"), we
fixed the issue with the 'connect()' that returns zero even though the
connecting has failed by waiting for the connection to be 'ESTABLISHED'
really. However, the approach has one drawback in conjunction with our
'lightweight' connection setup mechanism that the following scenario
can happen:
Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED'
and the 'wait_for_conn()' process is woken up but not run. Meanwhile,
the server starts to send a number of data following by a 'close()'
shortly without waiting any response from the client, which then forces
the client socket to be 'DISCONNECTING' immediately. When the wait
process is switched to be running, it continues to wait until the timer
expires because of the unexpected socket state. The client 'connect()'
will finally get ‘-ETIMEDOUT’ and force to release the socket whereas
there remains the messages in its receive queue.
Obviously the issue would not happen if the server had some delay prior
to its 'close()' (or the number of 'DATA' messages is large enough),
but any kind of delay would make the connection setup/shutdown "heavy".
We solve this by simply allowing the 'connect()' returns zero in this
particular case. The socket is already 'DISCONNECTING', so any further
write will get '-EPIPE' but the socket is still able to read the
messages existing in its receive queue.
Note: This solution doesn't break the previous one as it deals with a
different situation that the socket state is 'DISCONNECTING' but has no
error (i.e. sk->sk_err = 0).
Fixes: 9546a0b7ce00 ("tipc: fix wrong connect() return code") Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second
timeout per test") added a 45 second timeout for tests, and also added
a way for tests to customise the timeout via a settings file.
For example the ftrace tests take multiple minutes to run, so they
were given longer in commit b43e78f65b1d ("tracing/selftests: Turn off
timeout setting").
This works when the tests are run from the source tree. However if the
tests are installed with "make -C tools/testing/selftests install",
the settings files are not copied into the install directory. When the
tests are then run from the install directory the longer timeouts are
not applied and the tests timeout incorrectly.
So add the settings files to TEST_FILES of the appropriate Makefiles
to cause the settings files to be installed using the existing install
logic.
Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fix static checker warning:
drivers/net/ethernet/aquantia/atlantic/aq_filters.c:166 aq_check_approve_fvlan()
error: passing untrusted data to 'test_bit()'
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 7975d2aff5af: ("net: aquantia: add support of rx-vlan-filter offload") Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
during hibernation freeze, aq_nic_stop could be invoked
on a stopped device. That may cause panic on access to
not yet allocated vector/ring structures.
Add a check to stop device if it is not yet stopped.
Similiarly after freeze in hibernation thaw, aq_nic_start
could be invoked on a not initialized net device.
Result will be the same.
Add a check to start device if it is initialized.
In our case, this is the same as started.
Fixes: 8aaa112a57c1 ("net: atlantic: refactoring pm logic") Signed-off-by: Pavel Belous <pbelous@marvell.com> Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Code inspection found that in case of mapping error we do return current
'ret' value. But beside error, it is used to count number of descriptors
allocated for the packet. In that case map_skb function could return '1'.
Changing it to return zero (number of mapped descriptors for skb)
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Signed-off-by: Pavel Belous <pbelous@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
skb->len is used to calculate statistics after xmit invocation.
Under a stress load it may happen that skb will be xmited,
rx interrupt will come and skb will be freed, all before xmit function
is even returned.
Eventually, skb->len will access unallocated area.
Moving stats calculation into tx_clean routine.
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Reported-by: Christophe Vu-Brugier <cvubrugier@fastmail.fm> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Pavel Belous <pbelous@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add checks to not enable multiple loopback modes simultaneously,
It was also discovered that for dma loopback to function correctly
promisc mode should be enabled on device.
Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags") Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yet another checksum offload compatibility issue was found.
The known issue is that AQC HW marks tcp packets with 0xFFFF checksum
as invalid (1). This is workarounded in driver, passing all the suspicious
packets up to the stack for further csum validation.
Another HW problem (2) is that it hides invalid csum of LRO aggregated
packets inside of the individual descriptors. That was workarounded
by forced scan of all LRO descriptors for checksum errors.
However the scan logic was joint for both LRO and multi-descriptor
packets (jumbos). And this causes the issue.
We have to drop LRO packets with the detected bad checksum
because of (2), but we have to pass jumbo packets to stack because of (1).
When using windows tcp partner with jumbo frames but with LSO disabled
driver discards such frames as bad checksummed. But only LRO frames
should be dropped, not jumbos.
On such a configurations tcp stream have a chance of drops and stucks.
(1) 76f254d4afe2 ("net: aquantia: tcp checksum 0xffff being handled incorrectly")
(2) d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")
Fixes: d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum") Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since nl_groups is a u32 we can't bind more groups via ->bind
(netlink_bind) call, but netlink has supported more groups via
setsockopt() for a long time and thus nlk->ngroups could be over 32.
Recently I added support for per-vlan notifications and increased the
groups to 33 for NETLINK_ROUTE which exposed an old bug in the
netlink_bind() code causing out-of-bounds access on archs where unsigned
long is 32 bits via test_bit() on a local variable. Fix this by capping the
maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively
capping them at 32 which is the minimum of allocated groups and the
maximum groups which can be bound via netlink_bind().
CC: Christophe Leroy <christophe.leroy@c-s.fr> CC: Richard Guy Briggs <rgb@redhat.com> Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.") Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The RX copybreak is intended as the _max_ value where the frame's data
should be copied. So for frame_len == copybreak, don't build an SG skb.
Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When getting or setting VNICC parameters, the error code EOPNOTSUPP
should have precedence over EBUSY.
EBUSY is used because vnicc feature and bridgeport feature are mutually
exclusive, which is a temporary condition.
Whereas EOPNOTSUPP indicates that the HW does not support all or parts of
the vnicc feature.
This issue causes the vnicc sysfs params to show 'blocked by bridgeport'
for HW that does not support VNICC at all.
Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Completions need to consumed in the same order the controller submitted
them, otherwise future completion entries may overwrite ones we haven't
handled yet. Hold the nvme queue's poll lock while completing new CQEs to
prevent another thread from freeing command tags for reuse out-of-order.
Fixes: dabcefab45d3 ("nvme: provide optimized poll function for separate poll queues") Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When netvsc_attach() is called by operations like changing MTU, etc.,
an extra wakeup may happen while netvsc_attach() calling
rndis_filter_device_add() which sends rndis messages when queue is
stopped in netvsc_detach(). The completion message will wake up queue 0.
We can reproduce the issue by changing MTU etc., then the wake_queue
counter from "ethtool -S" will increase beyond stop_queue counter:
stop_queue: 0
wake_queue: 1
The issue causes queue wake up, and counter increment, no other ill
effects in current code. So we didn't see any network problem for now.
To fix this, initialize tx_disable to true, and set it to false when
the NIC is ready to be attached or registered.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This if_change_rule is not working properly; it cannot detect any
command line change.
The reason is because cmd-check in scripts/Kbuild.include compares
$(cmd_$@) and $(cmd_$1), but cmd_dtc_dt_yaml does not exist here.
For if_change_rule to work properly, the stem part of cmd_* and rule_*
must match. Because this cmd_and_fixdep invokes cmd_dtc, this rule must
be named rule_dtc.
Fixes: 4f0e3a57d6eb ("kbuild: Add support for DT binding schema checks") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The below-mentioned commit changed the code to unlock *inside*
the function, but previously the unlock was *outside*. It failed
to remove the outer unlock, however, leading to double unlock.
Fix this.
Fixes: 33483a6b88e4 ("mac80211: fix missing unlock on error in ieee80211_mark_sta_auth()") Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Link: https://lore.kernel.org/r/20200221104719.cce4741cf6eb.I671567b185c8a4c2409377e483fd149ce590f56d@changeid
[rewrite commit message to better explain what happened] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ALL_ENGINES reset doesn't clobber display with the current gvt-g
supported platforms. Thus ALL_ENGINES reset shouldn't reset the
display engine registers emulated by gvt-g.
We mark the vma as active while binding it in order to protect outselves
from being shrunk under mempressure. This only works if we are strict in
not attempting to shrink active objects.
The Cavium Octeon CPU uses a special sync instruction for implementing
wmb, and due to a CPU bug, the instruction must appear twice. A macro
had been defined to hide this:
which was intended to evaluate to 2 for __SYNC_wmb, and 1 for any other
type of sync. However, this expression is evaluated by the assembler,
and not the compiler, and the result of '==' in the assembler is 0 or
-1, not 0 or 1 as it is in C. The net result was wmb() producing no code
at all. The simple fix in this patch is to change the '+' to '-'.
Fixes: bf92927251b3 ("MIPS: barrier: Add __SYNC() infrastructure") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The printout for txabrt is way too talkative and is highly annoying with
scanning programs like 'i2cdetect'. Reduce it to the minimum, the rest
can be gained by I2C core debugging and datasheet information. Also,
make it a debug printout, it won't help the regular user.
Fixes: ba92222ed63a ("i2c: jz4780: Add i2c bus controller driver for Ingenic JZ4780") Reported-by: H. Nikolaus Schaller <hns@goldelico.com> Tested-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pointer on the memory allocated by 'alloc_progmem()' is stored in
'v->load_addr'. So this is this memory that should be freed by
'release_progmem()'.
'release_progmem()' is only a call to 'kfree()'.
With the current code, there is both a double free and a memory leak.
Fix it by passing the correct pointer to 'release_progmem()'.
Fixes: e01402b115ccc ("More AP / SP bits for the 34K, the Malta bits and things. Still wants") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: ralf@linux-mips.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-janitors@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Historically, we have been enabling all interrupts for each
HART in trap_init(). Ideally, we should only enable M-mode
interrupts for M-mode kernel and S-mode interrupts for S-mode
kernel in trap_init().
Currently, we get suprious S-mode interrupts on Kendryte K210
board running M-mode NO-MMU kernel because we are enabling all
interrupts in trap_init(). To fix this, we only enable software
and external interrupt in trap_init(). In future, trap_init()
will only enable software interrupt and PLIC driver will enable
external interrupt using CPU notifiers.
Fixes: a4c3733d32a7 ("riscv: abstract out CSR names for supervisor vs machine mode") Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-by: Atish Patra <atish.patra@wdc.com> Tested-by: Palmer Dabbelt <palmerdabbelt@google.com> [QMEU virt machine with SMP]
[Palmer: Move the Fixes up to a newer commit] Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Syzbot reports that "hiddev" is used after it's free in hiddev_disconnect().
The hiddev_disconnect() function sets "hiddev->exist = 0;" so
hiddev_release() can free it as soon as we drop the "existancelock"
lock. This patch moves the mutex_unlock(&hiddev->existancelock) until
after we have finished using it.
They are issues:
- if 'input_allocate_device()' fails and return NULL, there is no need
to free anything and 'input_free_device()' call is a no-op. It can
be axed.
- 'ret' is known to be 0 at this point, so we must set it to a
meaningful value before returning
When the forceadd option is enabled, the hash:* types should find and replace
the first entry in the bucket with the new one if there are no reuseable
(deleted or timed out) entries. However, the position index was just not set
to zero and remained the invalid -1 if there were no reuseable entries.
Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>