Vincenzo Frascino [Tue, 24 Nov 2020 05:45:13 +0000 (16:45 +1100)]
arm64: mte: convert gcr_user into an exclude mask
The gcr_user mask is a per thread mask that represents the tags that are
excluded from random generation when the Memory Tagging Extension is
present and an 'irg' instruction is invoked.
gcr_user affects the behavior on EL0 only.
Currently that mask is an include mask and it is controlled by the user
via prctl() while GCR_EL1 accepts an exclude mask.
Convert the include mask into an exclude one to make it easier the
register setting.
Note: This change will affect gcr_kernel (for EL1) introduced with a
future patch.
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:13 +0000 (16:45 +1100)]
arm64: kasan: allow enabling in-kernel MTE
Hardware tag-based KASAN relies on Memory Tagging Extension (MTE) feature
and requires it to be enabled. MTE supports
This patch adds a new mte_enable_kernel() helper, that enables MTE in
Synchronous mode in EL1 and is intended to be called from KASAN runtime
during initialization.
The Tag Checking operation causes a synchronous data abort as a
consequence of a tag check fault when MTE is configured in synchronous
mode.
As part of this change enable match-all tag for EL1 to allow the kernel to
access user pages without faulting. This is required because the kernel
does not have knowledge of the tags set by the user in a page.
Note: For MTE, the TCF bit field in SCTLR_EL1 affects only EL1 in a
similar way as TCF0 affects EL0.
MTE that is built on top of the Top Byte Ignore (TBI) feature hence we
enable it as part of this patch as well.
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:13 +0000 (16:45 +1100)]
arm64: mte: reset the page tag in page->flags
The hardware tag-based KASAN for compatibility with the other modes stores
the tag associated to a page in page->flags. Due to this the kernel
faults on access when it allocates a page with an initial tag and the user
changes the tags.
Reset the tag associated by the kernel to a page in all the meaningful
places to prevent kernel faults on access.
Note: An alternative to this approach could be to modify page_to_virt().
This though could end up being racy, in fact if a CPU checks the
PG_mte_tagged bit and decides that the page is not tagged but another CPU
maps the same with PROT_MTE and becomes tagged the subsequent kernel
access would fail.
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:13 +0000 (16:45 +1100)]
arm64: mte: add in-kernel MTE helpers
Provide helper functions to manipulate allocation and pointer tags for
kernel addresses.
Low-level helper functions (mte_assign_*, written in assembly) operate tag
values from the [0x0, 0xF] range. High-level helper functions
(mte_get/set_*) use the [0xF0, 0xFF] range to preserve compatibility with
normal kernel pointers that have 0xFF in their top byte.
MTE_GRANULE_SIZE and related definitions are moved to mte-def.h header
that doesn't have any dependencies and is safe to include into any
low-level header.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:12 +0000 (16:45 +1100)]
kasan: introduce CONFIG_KASAN_HW_TAGS
This patch adds a configuration option for a new KASAN mode called
hardware tag-based KASAN. This mode uses the memory tagging approach like
the software tag-based mode, but relies on arm64 Memory Tagging Extension
feature for tag management and access checking.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:12 +0000 (16:45 +1100)]
kasan, arm64: don't allow SW_TAGS with ARM64_MTE
Software tag-based KASAN provides its own tag checking machinery that can
conflict with MTE. Don't allow enabling software tag-based KASAN when MTE
is enabled.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:12 +0000 (16:45 +1100)]
kasan: separate metadata_fetch_row for each mode
This is a preparatory commit for the upcoming addition of a new hardware
tag-based (MTE-based) KASAN mode.
Rework print_memory_metadata() to make it agnostic with regard to the way
metadata is stored. Allow providing a separate metadata_fetch_row()
implementation for each KASAN mode. Hardware tag-based KASAN will provide
its own implementation that doesn't use shadow memory.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:11 +0000 (16:45 +1100)]
kasan, arm64: rename kasan_init_tags and mark as __init
Rename kasan_init_tags() to kasan_init_sw_tags() as the upcoming hardware
tag-based KASAN mode will have its own initialization routine. Also
similarly to kasan_init() mark kasan_init_tags() as __init.
Link: https://lkml.kernel.org/r/71e52af72a09f4b50c8042f16101c60e50649fbb.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andrey Konovalov [Tue, 24 Nov 2020 05:45:11 +0000 (16:45 +1100)]
kasan, arm64: move initialization message
Software tag-based KASAN mode is fully initialized with kasan_init_tags(),
while the generic mode only requires kasan_init(). Move the
initialization message for tag-based mode into kasan_init_tags().
Also fix pr_fmt() usage for KASAN code: generic.c doesn't need it as it
doesn't use any printing functions; tag-based mode should use "kasan:"
instead of KBUILD_MODNAME (which stands for file name).
Link: https://lkml.kernel.org/r/29a30ea4e1750450dd1f693d25b7b6cb05913ecf.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andrey Konovalov [Tue, 24 Nov 2020 05:45:10 +0000 (16:45 +1100)]
kasan: hide invalid free check implementation
This is a preparatory commit for the upcoming addition of a new hardware
tag-based (MTE-based) KASAN mode.
For software KASAN modes the check is based on the value in the shadow
memory. Hardware tag-based KASAN won't be using shadow, so hide the
implementation of the check in check_invalid_free().
Also simplify the code for software tag-based mode.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:10 +0000 (16:45 +1100)]
kasan: rename report and tags files
Rename generic_report.c to report_generic.c and tags_report.c to
report_sw_tags.c, as their content is more relevant to report.c file.
Also rename tags.c to sw_tags.c to better reflect that this file contains
code for software tag-based mode.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:09 +0000 (16:45 +1100)]
kasan: define KASAN_MEMORY_PER_SHADOW_PAGE
Define KASAN_MEMORY_PER_SHADOW_PAGE as (KASAN_GRANULE_SIZE << PAGE_SHIFT),
which is the same as (KASAN_GRANULE_SIZE * PAGE_SIZE) for software modes
that use shadow memory, and use it across KASAN code to simplify it.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:09 +0000 (16:45 +1100)]
kasan: split out shadow.c from common.c
This is a preparatory commit for the upcoming addition of a new hardware
tag-based (MTE-based) KASAN mode.
The new mode won't be using shadow memory. Move all shadow-related code
to shadow.c, which is only enabled for software KASAN modes that use
shadow memory.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:09 +0000 (16:45 +1100)]
kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_*
This is a preparatory commit for the upcoming addition of a new hardware
tag-based (MTE-based) KASAN mode.
The new mode won't be using shadow memory, but will still use the concept
of memory granules. Each memory granule maps to a single metadata entry:
8 bytes per one shadow byte for generic mode, 16 bytes per one shadow byte
for software tag-based mode, and 16 bytes per one allocation tag for
hardware tag-based mode.
Rename KASAN_SHADOW_SCALE_SIZE to KASAN_GRANULE_SIZE, and
KASAN_SHADOW_MASK to KASAN_GRANULE_MASK.
Also use MASK when used as a mask, otherwise use SIZE.
Andrey Konovalov [Tue, 24 Nov 2020 05:45:09 +0000 (16:45 +1100)]
kasan: rename (un)poison_shadow to (un)poison_range
This is a preparatory commit for the upcoming addition of a new hardware
tag-based (MTE-based) KASAN mode.
The new mode won't be using shadow memory. Rename external annotation
kasan_unpoison_shadow() to kasan_unpoison_range(), and introduce internal
functions (un)poison_range() (without kasan_ prefix).
Co-developed-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/fccdcaa13dc6b2211bf363d6c6d499279a54fe3a.1606161801.git.andreyknvl@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andrey Konovalov [Tue, 24 Nov 2020 05:45:08 +0000 (16:45 +1100)]
kasan: drop unnecessary GPL text from comment headers
Patch series "kasan: add hardware tag-based mode for arm64", v11.
This patchset adds a new hardware tag-based mode to KASAN [1]. The new
mode is similar to the existing software tag-based KASAN, but relies on
arm64 Memory Tagging Extension (MTE) [2] to perform memory and pointer
tagging (instead of shadow memory and compiler instrumentation).
This patchset is co-developed and tested by
Vincenzo Frascino <vincenzo.frascino@arm.com>.
The underlying ideas of the approach used by hardware tag-based KASAN are:
1. By relying on the Top Byte Ignore (TBI) arm64 CPU feature, pointer tags
are stored in the top byte of each kernel pointer.
2. With the Memory Tagging Extension (MTE) arm64 CPU feature, memory tags
for kernel memory allocations are stored in a dedicated memory not
accessible via normal instuctions.
3. On each memory allocation, a random tag is generated, embedded it into
the returned pointer, and the corresponding memory is tagged with the
same tag value.
4. With MTE the CPU performs a check on each memory access to make sure
that the pointer tag matches the memory tag.
5. On a tag mismatch the CPU generates a tag fault, and a KASAN report is
printed.
Same as other KASAN modes, hardware tag-based KASAN is intended as a
debugging feature at this point.
====== Rationale
There are two main reasons for this new hardware tag-based mode:
1. Previously implemented software tag-based KASAN is being successfully
used on dogfood testing devices due to its low memory overhead (as
initially planned). The new hardware mode keeps the same low memory
overhead, and is expected to have significantly lower performance
impact, due to the tag checks being performed by the hardware.
Therefore the new mode can be used as a better alternative in dogfood
testing for hardware that supports MTE.
2. The new mode lays the groundwork for the planned in-kernel MTE-based
memory corruption mitigation to be used in production.
====== Technical details
Considering the implementation perspective, hardware tag-based KASAN is
almost identical to the software mode. The key difference is using MTE
for assigning and checking tags.
Compared to the software mode, the hardware mode uses 4 bits per tag, as
dictated by MTE. Pointer tags are stored in bits [56:60), the top 4 bits
have the normal value 0xF. Having less distict tags increases the
probablity of false negatives (from ~1/256 to ~1/16) in certain cases.
Only synchronous exceptions are set up and used by hardware tag-based KASAN.
====== Benchmarks
Note: all measurements have been performed with software emulation of Memory
Tagging Extension, performance numbers for hardware tag-based KASAN on the
actual hardware are expected to be better.
Boot time [1]:
* 2.8 sec for clean kernel
* 5.7 sec for hardware tag-based KASAN
* 11.8 sec for software tag-based KASAN
* 11.6 sec for generic KASAN
Slab memory usage after boot [2]:
* 7.0 kb for clean kernel
* 9.7 kb for hardware tag-based KASAN
* 9.7 kb for software tag-based KASAN
* 41.3 kb for generic KASAN
Measurements have been performed with:
* defconfig-based configs
* Manually built QEMU master
* QEMU arguments: -machine virt,mte=on -cpu max
* CONFIG_KASAN_STACK_ENABLE disabled
* CONFIG_KASAN_INLINE enabled
* clang-10 as the compiler and gcc-10 as the assembler
[1] Time before the ext4 driver is initialized.
[2] Measured as `cat /proc/meminfo | grep Slab`.
====== Notes
The cover letter for software tag-based KASAN patchset can be found here:
Stephen Rothwell [Tue, 24 Nov 2020 05:45:08 +0000 (16:45 +1100)]
merge fix for "s390/pci: remove races against pte updates"
Link: https://lkml.kernel.org/r/20201111221254.7f6a3658@canb.auug.org.au Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Christoph Hellwig [Tue, 24 Nov 2020 05:45:07 +0000 (16:45 +1100)]
mm: simplify follow_pte{,pmd}
Merge __follow_pte_pmd, follow_pte_pmd and follow_pte into a single
follow_pte function and just pass two additional NULL arguments for the
two previous follow_pte callers.
Link: https://lkml.kernel.org/r/20201029101432.47011-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Christoph Hellwig [Tue, 24 Nov 2020 05:45:07 +0000 (16:45 +1100)]
mm: unexport follow_pte_pmd
Patch series "simplify follow_pte a bit".
This small series drops the not needed follow_pte_pmd exports, and
simplifies the follow_pte family of functions a bit.
This patch (of 2):
follow_pte_pmd() is only used by the DAX code, which can't be modular.
Link: https://lkml.kernel.org/r/20201029101432.47011-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
epoll: eliminate unnecessary lock for zero timeout
We call ep_events_available() under lock when timeout is 0, and then call
it without locks in the loop for the other cases.
Instead, call ep_events_available() without lock for all cases. For
non-zero timeouts, we will recheck after adding the thread to the wait
queue. For zero timeout cases, by definition, user is opportunistically
polling and will have to call epoll_wait again in the future.
Note that this lock was kept in c5a282e9635e9 because the whole loop was
historically under lock.
This patch results in a 1% CPU/RPC reduction in RPC benchmarks.
Link: https://lkml.kernel.org/r/20201106231635.3528496-9-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
ep_events_available() is called multiple times around the busy loop logic,
even though the logic is generally not used. ep_reset_busy_poll_napi_id()
is similarly always called, even when busy loop is not used.
Eliminate ep_reset_busy_poll_napi_id() and inline it inside
ep_busy_loop(). Make ep_busy_loop() return whether there are any events
available after the busy loop. This will eliminate unnecessary loads and
branches, and simplifies the loop.
Link: https://lkml.kernel.org/r/20201106231635.3528496-6-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
epoll: pull fatal signal checks into ep_send_events()
To simplify the code, pull in checking the fatal signals into
ep_send_events(). ep_send_events() is called only from ep_poll().
Note that, previously, we were always checking fatal events, but it is
checked only if eavail is true. This should be fine because the goal of
that check is to quickly return from epoll_wait() when there is a pending
fatal signal.
Link: https://lkml.kernel.org/r/20201106231635.3528496-4-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
epoll: check for events when removing a timed out thread from the wait queue
Patch series "simplify ep_poll".
This patch series is a followup based on the suggestions and feedback by
Linus:
https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com
The first patch in the series is a fix for the epoll race in presence of
timeouts, so that it can be cleanly backported to all affected stable
kernels.
The rest of the patch series simplify the ep_poll() implementation. Some
of these simplifications result in minor performance enhancements as well.
We have kept these changes under self tests and internal benchmarks for a
few days, and there are minor (1-2%) performance enhancements as a result.
This patch (of 8):
After abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2)
timeout"), we break out of the ep_poll loop upon timeout, without checking
whether there is any new events available. Prior to that patch-series we
always called ep_events_available() after exiting the loop.
This can cause races and missed wakeups. For example, consider the
following scenario reported by Guantao Liu:
Suppose we have an eventfd added using EPOLLET to an epollfd.
Thread 1: Sleeps for just below 5ms and then writes to an eventfd.
Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an
event of the eventfd, it will write back on that fd.
Thread 3: Calls epoll_wait with a negative timeout.
Prior to abc610e01c66, it is guaranteed that Thread 3 will wake up either
by Thread 1 or Thread 2. After abc610e01c66, Thread 3 can be blocked
indefinitely if Thread 2 sees a timeout right before the write to the
eventfd by Thread 1. Thread 2 will be woken up from
schedule_hrtimeout_range and, with evail 0, it will not call
ep_send_events().
To fix this issue:
1) Simplify the timed_out case as suggested by Linus.
2) while holding the lock, recheck whether the thread was woken up
after its time out has reached.
Note that (2) is different from Linus' original suggestion: It do not set
"eavail = ep_events_available(ep)" to avoid unnecessary contention (when
there are too many timed-out threads and a small number of events), as
well as races mentioned in the discussion thread.
This is the first patch in the series so that the backport to stable
releases is straightforward.
Nathan Chancellor [Tue, 24 Nov 2020 05:45:06 +0000 (16:45 +1100)]
ARM: boot: quote aliased symbol names in string.c
Patch "treewide: Remove stringification from __alias macro definition"
causes arguments to __alias to no longer be quoted automatically, which
breaks CONFIG_KASAN on ARM after commit d6d51a96c7d6 ("ARM: 9014/2:
Replace string mem* functions for KASan"):
arch/arm/boot/compressed/string.c:24:1: error: attribute 'alias' argument not a string
24 | void *__memcpy(void *__dest, __const void *__src, size_t __n) __alias(memcpy);
| ^~~~
arch/arm/boot/compressed/string.c:25:1: error: attribute 'alias' argument not a string
25 | void *__memmove(void *__dest, __const void *__src, size_t count) __alias(memmove);
| ^~~~
arch/arm/boot/compressed/string.c:26:1: error: attribute 'alias' argument not a string
26 | void *__memset(void *s, int c, size_t count) __alias(memset);
| ^~~~
make[3]: *** [scripts/Makefile.build:283: arch/arm/boot/compressed/string.o] Error 1
Quote the names like the treewide patch does so there is no more error.
Joe Perches [Tue, 24 Nov 2020 05:45:05 +0000 (16:45 +1100)]
treewide: remove stringification from __alias macro definition
Like the old __section macro, the __alias macro uses macro #
stringification to create quotes around the symbol name used in the
__attribute__.
This can cause differences between gcc and clang when the stringification
itself contains a quote character. So avoid these differences by always
using quotes to define the aliased symbol.
Remove the stringification and add quotes and when necessary a
stringification when existing uses have a ## concatenation.
Unlike the __section macro conversion in commit 33def8498fdd ("treewide:
Convert macro and uses of __section(foo) to __section("foo")") this one
was done by hand.
No other use of __alias exists in the kernel.
This patch does _not_ convert any uses of __attribute__((alias("<foo>")))
so it should not cause any compilation issues.
Link: https://lkml.kernel.org/r/8451df41359b52f048780d19e07b6fa4445b6392.1604026698.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Alex Shi [Tue, 24 Nov 2020 05:45:05 +0000 (16:45 +1100)]
mm/memcg: add missed warning in mem_cgroup_lruvec
commit "(mm/memcontrol:rewrite mem_cgroup_page_lruvec())" on mm tree use
mem_cgroup_lruvec to rewrite mem_cgroup_page_lruvec, but it missed a
DEBUG_VM warning as following, since we always charge a page before
return from allocation. Add back this warning is helpful:
VM_WARN_ON_ONCE(!memcg);
Link: https://lkml.kernel.org/r/94f17bb7-ec61-5b72-3555-fabeb5a4d73b@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Hui Su <sh_def@163.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Hui Su [Tue, 24 Nov 2020 05:45:05 +0000 (16:45 +1100)]
mm/memcontrol:rewrite mem_cgroup_page_lruvec()
mem_cgroup_page_lruvec() in memcontrol.c and mem_cgroup_lruvec() in
memcontrol.h is very similar except for the param(page and memcg) which
also can be convert to each other.
So rewrite mem_cgroup_page_lruvec() with mem_cgroup_lruvec().
Link: https://lkml.kernel.org/r/20201108143731.GA74138@rlk Signed-off-by: Hui Su <sh_def@163.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
open-code set_page_objcgs() and add some comments, by Johannes
Link: https://lkml.kernel.org/r/20201113001926.GA2934489@carbon.dhcp.thefacebook.com Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Roman Gushchin [Tue, 24 Nov 2020 05:45:04 +0000 (16:45 +1100)]
mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT
In general it's unknown in advance if a slab page will contain accounted
objects or not. In order to avoid memory waste, an obj_cgroup vector is
allocated dynamically when a need to account of a new object arises. Such
approach is memory efficient, but requires an expensive cmpxchg() to set
up the memcg/objcgs pointer, because an allocation can race with a
different allocation on another cpu.
But in some common cases it's known for sure that a slab page will contain
accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT
flag set. It includes such popular objects like vm_area_struct, anon_vma,
task_struct, etc.
In such cases we can pre-allocate the objcgs vector and simple assign it
to the page without any atomic operations, because at this early stage the
page is not visible to anyone else.
A very simplistic benchmark (allocating 10000000 64-bytes objects in a
row) shows ~15% win. In the real life it seems that most workloads are
not very sensitive to the speed of (accounted) slab allocations.
Link: https://lkml.kernel.org/r/20201110195753.530157-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Roman Gushchin [Tue, 24 Nov 2020 05:45:04 +0000 (16:45 +1100)]
mm: slub: call account_slab_page() after slab page initialization
It's convenient to have page->objects initialized before calling into
account_slab_page(). In particular, this information can be used to
pre-alloc the obj_cgroup vector.
Let's call account_slab_page() a bit later, after the initialization of
page->objects.
This commit doesn't bring any functional change, but is required for
further optimizations.
Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Shakeel Butt [Tue, 24 Nov 2020 05:45:04 +0000 (16:45 +1100)]
mm, kvm: account kvm_vcpu_mmap to kmemcg
A VCPU of a VM can allocate couple of pages which can be mmap'ed by the
user space application. At the moment this memory is not charged to the
memcg of the VMM. On a large machine running large number of VMs or
small number of VMs having large number of VCPUs, this unaccounted
memory can be very significant. So, charge this memory to the memcg of
the VMM. Please note that lifetime of these allocations corresponds to
the lifetime of the VMM.
Link: https://lkml.kernel.org/r/20201106202923.2087414-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Wei Yang [Tue, 24 Nov 2020 05:45:04 +0000 (16:45 +1100)]
mm/memcg: remove unused definitions
Some definitions are left unused, just clean them.
Link: https://lkml.kernel.org/r/20201108003834.12669-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Alex Shi [Tue, 24 Nov 2020 05:45:04 +0000 (16:45 +1100)]
mm/memcg: warning on !memcg after readahead page charged
Add VM_WARN_ON_ONCE_PAGE() macro.
Since readahead page is charged on memcg too, in theory we don't have to
check this exception now. Before safely remove them all, add a warning
for the unexpected !memcg.
Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Alex Shi [Tue, 24 Nov 2020 05:45:03 +0000 (16:45 +1100)]
mm/memcg: bail early from swap accounting if memcg disabled
Patch series "bail out early for memcg disable".
These 2 patches are indepenedent from per memcg lru lock, and may
encounter unexpected warning, so let's move out them from per memcg
lru locking patchset.
This patch (of 2):
We could bail out early when memcg wasn't enabled.
Roman Gushchin [Tue, 24 Nov 2020 05:45:03 +0000 (16:45 +1100)]
mm: convert page kmemcg type to a page memcg flag
PageKmemcg flag is currently defined as a page type (like buddy, offline,
table and guard). Semantically it means that the page was accounted as a
kernel memory by the page allocator and has to be uncharged on the
release.
As a side effect of defining the flag as a page type, the accounted page
can't be mapped to userspace (look at page_has_type() and comments above).
In particular, this blocks the accounting of vmalloc-backed memory used
by some bpf maps, because these maps do map the memory to userspace.
One option is to fix it by complicating the access to page->mapcount,
which provides some free bits for page->page_type.
But it's way better to move this flag into page->memcg_data flags.
Indeed, the flag makes no sense without enabled memory cgroups and memory
cgroup pointer set in particular.
This commit replaces PageKmemcg() and __SetPageKmemcg() with
PageMemcgKmem() and an open-coded OR operation setting the memcg pointer
with the MEMCG_DATA_KMEM bit. __ClearPageKmemcg() can be simple deleted,
as the whole memcg_data is zeroed at once.
As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will be
compiled out.
Link: https://lkml.kernel.org/r/20201027001657.3398190-5-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Roman Gushchin [Tue, 24 Nov 2020 05:45:03 +0000 (16:45 +1100)]
mm: introduce page memcg flags
The lowest bit in page->memcg_data is used to distinguish between struct
memory_cgroup pointer and a pointer to a objcgs array. All checks and
modifications of this bit are open-coded.
Let's formalize it using page memcg flags, defined in enum
page_memcg_data_flags.
Additional flags might be added later.
Link: https://lkml.kernel.org/r/20201027001657.3398190-4-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
They are similar to the corresponding API for generic pages, except that
the setter can return false, indicating that the value has been already
set from a different thread.
Link: https://lkml.kernel.org/r/20201027001657.3398190-3-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Roman Gushchin [Tue, 24 Nov 2020 05:45:03 +0000 (16:45 +1100)]
mm: memcontrol: use helpers to read page's memcg data
Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
Currently a non-slab kernel page which has been charged to a memory cgroup
can't be mapped to userspace. The underlying reason is simple: PageKmemcg
flag is defined as a page type (like buddy, offline, etc), so it takes a
bit from a page->mapped counter. Pages with a type set can't be mapped to
userspace.
But in general the kmemcg flag has nothing to do with mapping to
userspace. It only means that the page has been accounted by the page
allocator, so it has to be properly uncharged on release.
Some bpf maps are mapping the vmalloc-based memory to userspace, and their
memory can't be accounted because of this implementation detail.
This patchset removes this limitation by moving the PageKmemcg flag into
one of the free bits of the page->mem_cgroup pointer. Also it formalizes
accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
adds several checks and removes a couple of obsolete functions. As the
result the code became more robust with fewer open-coded bit tricks.
This patch (of 4):
Currently there are many open-coded reads of the page->mem_cgroup pointer,
as well as a couple of read helpers, which are barely used.
It creates an obstacle on a way to reuse some bits of the pointer for
storing additional bits of information. In fact, we already do this for
slab pages, where the last bit indicates that a pointer has an attached
vector of objcg pointers instead of a regular memcg pointer.
This commits uses 2 existing helpers and introduces a new helper to
converts all read sides to calls of these helpers:
struct mem_cgroup *page_memcg(struct page *page);
struct mem_cgroup *page_memcg_rcu(struct page *page);
struct mem_cgroup *page_memcg_check(struct page *page);
page_memcg_check() is intended to be used in cases when the page can be a
slab page and have a memcg pointer pointing at objcg vector. It does
check the lowest bit, and if set, returns NULL. page_memcg() contains a
VM_BUG_ON_PAGE() check for the page not being a slab page.
To make sure nobody uses a direct access, struct page's
mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
Alex Shi [Tue, 24 Nov 2020 05:45:02 +0000 (16:45 +1100)]
mm/swap.c: reduce lock contention in lru_cache_add
The current relock logic will change lru_lock when it finds a new lruvec,
so if 2 memcgs are reading a file or allocating pages at same time, they
could hold the lru_lock alternately, and wait for each other for fairness
attribute of ticket spin lock.
This patch will sort all lru_locks and only hold them once in the above
scenario. That could reduce fairness waiting for lock reacquision.
vm-scalability/case-lru-file-readtwice gets a ~5% performance gain on my
2P*20core*HT machine.
Testing when all or most of the pages belong to the same lruvec show no
regression - most time is spent on lru_lock for lru sensitive cases.
Link: https://lkml.kernel.org/r/1605860847-47445-1-git-send-email-alex.shi@linux.alibaba.com Suggested-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Barnabás Pőcze [Tue, 24 Nov 2020 05:39:24 +0000 (16:39 +1100)]
fault-injection: handle EI_ETYPE_TRUE
Commit af3b854492f351d1 ("mm/page_alloc.c: allow error injection")
introduced EI_ETYPE_TRUE, but did not extend
* lib/error-inject.c:error_type_string(), and
* kernel/fail_function.c:adjust_error_retval()
to accommodate for this change.
Handle EI_ETYPE_TRUE in both functions appropriately by
* returning "TRUE" in error_type_string(),
* adjusting the return value to true (1) in adjust_error_retval().
Furthermore, simplify the logic of handling EI_ETYPE_NULL in
adjust_error_retval().
Nathan Chancellor [Tue, 24 Nov 2020 05:39:23 +0000 (16:39 +1100)]
reboot: fix variable assignments in type_store
Clang warns:
kernel/reboot.c:707:17: warning: implicit conversion from enumeration
type 'enum reboot_type' to different enumeration type 'enum reboot_mode'
[-Wenum-conversion]
reboot_mode = BOOT_TRIPLE;
~ ^~~~~~~~~~~
kernel/reboot.c:709:17: warning: implicit conversion from enumeration
type 'enum reboot_type' to different enumeration type 'enum reboot_mode'
[-Wenum-conversion]
reboot_mode = BOOT_KBD;
~ ^~~~~~~~
kernel/reboot.c:711:17: warning: implicit conversion from enumeration
type 'enum reboot_type' to different enumeration type 'enum reboot_mode'
[-Wenum-conversion]
reboot_mode = BOOT_BIOS;
~ ^~~~~~~~~
kernel/reboot.c:713:17: warning: implicit conversion from enumeration
type 'enum reboot_type' to different enumeration type 'enum reboot_mode'
[-Wenum-conversion]
reboot_mode = BOOT_ACPI;
~ ^~~~~~~~~
kernel/reboot.c:715:17: warning: implicit conversion from enumeration
type 'enum reboot_type' to different enumeration type 'enum reboot_mode'
[-Wenum-conversion]
reboot_mode = BOOT_EFI;
~ ^~~~~~~~
kernel/reboot.c:717:17: warning: implicit conversion from enumeration
type 'enum reboot_type' to different enumeration type 'enum reboot_mode'
[-Wenum-conversion]
reboot_mode = BOOT_CF9_FORCE;
~ ^~~~~~~~~~~~~~
kernel/reboot.c:719:17: warning: implicit conversion from enumeration
type 'enum reboot_type' to different enumeration type 'enum reboot_mode'
[-Wenum-conversion]
reboot_mode = BOOT_CF9_SAFE;
~ ^~~~~~~~~~~~~
7 warnings generated.
It seems that these assignment should be to reboot_type, not
reboot_mode. Fix it so there are no more warnings and the code works
properly.
Link: https://lkml.kernel.org/r/20201112035023.974748-1-natechancellor@gmail.com Fixes: eab8da48579d ("reboot: allow to specify reboot mode via sysfs") Link: https://github.com/ClangBuiltLinux/linux/issues/1197 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Matteo Croce <mcroce@microsoft.com> Tested-by: Matteo Croce <mcroce@microsoft.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Matteo Croce [Tue, 24 Nov 2020 05:39:22 +0000 (16:39 +1100)]
reboot: allow to specify reboot mode via sysfs
The kernel cmdline reboot= option offers some sort of control on how the
reboot is issued.
We don't always know in advance what type of reboot to perform.
Sometimes a warm reboot is preferred to persist certain memory regions
across the reboot. Others a cold one is needed to apply a future
system update that makes a memory memory model change, like changing
the base page size or resizing a persistent memory region.
Or simply we want to enable reboot_force because we noticed that
something bad happened.
Add handles in sysfs to allow setting these reboot options, so they can
be changed when the system is booted, other than at boot time.
The handlers are under <sysfs>/kernel/reboot, can be read to get the
current configuration and written to alter it.
Before setting anything, check for CAP_SYS_BOOT capability, so it's
possible to allow an unpriviledged process to change these settings simply
by relaxing the handles permissions, without opening them to the world.
Link: https://lkml.kernel.org/r/20201110202746.9690-1-mcroce@linux.microsoft.com Signed-off-by: Matteo Croce <mcroce@microsoft.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Oleg Nesterov [Tue, 24 Nov 2020 05:39:21 +0000 (16:39 +1100)]
aio: simplify read_events()
Change wait_event_hrtimeout() to not call __wait_event_hrtimeout() if
timeout == 0, this matches other _timeout() helpers in wait.h.
This allows to simplify its only user, read_events(), it no longer needs
to optimize the "until == 0" case by hand.
Note: this patch doesn't use ___wait_cond_timeout because _hrtimeout()
also differs in that it returns 0 if succeeds and -ETIME on timeout.
Perhaps we should change this to make it fully compatible with other
helpers.
Link: http://lkml.kernel.org/r/20190607175413.GA29187@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Eric Wong <e@80x24.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Jani Nikula [Tue, 24 Nov 2020 05:39:18 +0000 (16:39 +1100)]
relay: allow the use of const callback structs
None of the relay users require the use of mutable structs for callbacks,
however the relay code does. Instead of assigning the default callback
for subbuf_start, add a wrapper to conditionally call the client callback
if available, and fall back to default behaviour otherwise.
This lets all relay users make their struct rchan_callbacks const data.
Jani Nikula [Tue, 24 Nov 2020 05:39:17 +0000 (16:39 +1100)]
relay: make create_buf_file and remove_buf_file callbacks mandatory
All clients provide create_buf_file and remove_buf_file callbacks, and
they're required for relay to make sense. There is no point in them being
optional.
Also document whether each callback is mandatory/optional.
Jani Nikula [Tue, 24 Nov 2020 05:39:17 +0000 (16:39 +1100)]
relay: require non-NULL callbacks in relay_open()
There are no clients passing NULL callbacks, which makes sense as it
wouldn't even create a file. Require non-NULL callbacks, and throw away
the handling for NULL callbacks.
Jani Nikula [Tue, 24 Nov 2020 05:39:17 +0000 (16:39 +1100)]
relay: remove unused buf_mapped and buf_unmapped callbacks
Patch series "relay: cleanup and const callbacks", v2.
None of the relay users require the use of mutable structs for callbacks,
however the relay code does. Instead of assigning default callbacks when
there is none, add callback wrappers to conditionally call the client
callbacks if available, and fall back to default behaviour (typically
no-op) otherwise.
This lets all relay users make their struct rchan_callbacks const data.
This series starts with a number of cleanups first based on Christoph's
feedback.
This patch (of 9):
No relay client uses the buf_mapped or buf_unmapped callbacks. Remove
them. This makes relay's vm_operations_struct close callback a dummy,
remove it as well.
Alex Shi [Tue, 24 Nov 2020 05:39:16 +0000 (16:39 +1100)]
gcov: fix kernel-doc markup issue
Fix the following kernel-doc issue in gcov:
kernel/gcov/gcc_4_7.c:238: warning: Function parameter or member 'dst'
not described in 'gcov_info_add'
kernel/gcov/gcc_4_7.c:238: warning: Function parameter or member 'src'
not described in 'gcov_info_add'
kernel/gcov/gcc_4_7.c:238: warning: Excess function parameter 'dest'
description in 'gcov_info_add'
kernel/gcov/gcc_4_7.c:238: warning: Excess function parameter 'source'
description in 'gcov_info_add'
Nick Desaulniers [Tue, 24 Nov 2020 05:39:16 +0000 (16:39 +1100)]
gcov: remove support for GCC < 4.9
Since commit 0bddd227f3dc ("Documentation: update for gcc 4.9
requirement") the minimum supported version of GCC is gcc-4.9. It's now
safe to remove this code.
Similar to commit 10415533a906 ("gcov: Remove old GCC 3.4 support") but
that was for GCC 4.8 and this is for GCC 4.9.
Sebastian Andrzej Siewior [Tue, 24 Nov 2020 05:39:15 +0000 (16:39 +1100)]
rapidio: remove unused rio_get_asm() and rio_get_device()
The functions rio_get_asm() and rio_get_device() are globally exported
but have almost no users in tree. The only user is rio_init_mports()
which invokes it via rio_init().
rio_init() iterates over every registered device and invokes
rio_fixup_device(). It looks like a fixup function which should perform a
"change" to the device but does nothing. It has been like this since its
introduction in commit 394b701ce4fbf ("[PATCH] RapidIO support: core
base") which was merged into v2.6.15-rc1.
Remove rio_init() because the performed fixup function
(rio_fixup_device()) does nothing. Remove rio_get_asm() and
rio_get_device() which have no callers now.
Link: https://lkml.kernel.org/r/20201116170004.420143-1-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Alexander Egorenkov [Tue, 24 Nov 2020 05:39:15 +0000 (16:39 +1100)]
kdump: append uts_namespace.name offset to VMCOREINFO
The offset of the field 'init_uts_ns.name' has changed since commit 9a56493f6942 ("uts: Use generic ns_common::count").
Make the offset of the field 'uts_namespace.name' available in VMCOREINFO
because tools like 'crash-utility' and 'makedumpfile' must be able to read
it from crash dumps.
Link: https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain Link: https://lkml.kernel.org/r/20200930102328.396488-1-egorenar@linux.ibm.com Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Acked-by: lijiang <lijiang@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Aditya Srivastava [Tue, 24 Nov 2020 05:39:14 +0000 (16:39 +1100)]
checkpatch: add fix option for ASSIGNMENT_CONTINUATIONS
Currently, checkpatch warns us if an assignment operator is placed at the
start of a line and not at the end of previous line.
E.g., running checkpatch on commit 8195b1396ec8 ("hv_netvsc: fix
deadlock on hotplug") reports:
CHECK: Assignment operator '=' should be on the previous line
+ struct netvsc_device *nvdev
+ = container_of(w, struct netvsc_device, subchan_work);
Provide a simple fix by appending assignment operator to the previous
line and removing from the current line, if both the lines are additions
(ie start with '+')
Link: https://lkml.kernel.org/r/20201121120407.22942-1-yashsri421@gmail.com Signed-off-by: Aditya Srivastava <yashsri421@gmail.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Dwaipayan Ray [Tue, 24 Nov 2020 05:39:13 +0000 (16:39 +1100)]
checkpatch: fix unescaped left brace
There is an unescaped left brace in a regex in OPEN_BRACE check. This
throws a runtime error when checkpatch is run with --fix flag and the
OPEN_BRACE check is executed.
Fix it by escaping the left brace.
Link: https://lkml.kernel.org/r/20201115202928.81955-1-dwaipayanray1@gmail.com Fixes: 8d1824780f2f ("checkpatch: add --fix option for a couple OPEN_BRACE misuses") Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Aditya Srivastava [Tue, 24 Nov 2020 05:39:13 +0000 (16:39 +1100)]
checkpatch: avoid COMMIT_LOG_LONG_LINE warning for signature tags
Currently checkpatch warns us for long lines in commits even for signature
tag lines.
Generally these lines exceed the 75-character limit because of:
1) long names and long email address
2) some comments on scoped review and acknowledgement, i.e., for a
dedicated pointer on what was reported by the identity in 'Reported-by'
3) some additional comments on CC: stable@vger.org tags
Exclude signature tag lines from this class of warning.
There were 1896 COMMIT_LOG_LONG_LINE warnings in v5.6..v5.8 before this
patch application and 1879 afterwards.
A quick manual check found all the dropped warnings related to signature
tags.
Link: https://lkml.kernel.org/r/20201116083754.10629-1-yashsri421@gmail.com Signed-off-by: Aditya Srivastava <yashsri421@gmail.com> Acked-by: Joe Perches <joe@perches.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Dwaipayan Ray [Tue, 24 Nov 2020 05:39:12 +0000 (16:39 +1100)]
checkpatch: improve email parsing
checkpatch doesn't report warnings for many common mistakes in emails.
Some of which are trailing commas and incorrect use of email comments.
At the same time several false positives are reported due to incorrect
handling of mail comments. The most common of which is due to the
pattern:
<stable@vger.kernel.org> # X.X
Improve email parsing in checkpatch.
Some general email rules are defined:
- Multiple name comments should not be allowed.
- Comments inside address should not be allowed.
- In general comments should be enclosed within parentheses.
Relaxation is given to comments beginning with #.
- Stable addresses should not begin with a name.
- Comments in stable addresses should begin only
with a #.
Improvements to parsing:
- Detect and report unexpected content after email.
- Quoted names are excluded from comment parsing.
- Trailing dots, commas or quotes in email are removed during
formatting. Correspondingly a BAD_SIGN_OFF warning
is emitted.
- Improperly quoted email like '"name <address>"' are now
warned about.
In addition, added fixes for all the possible rules.
commit 33def8498fdd ("treewide: Convert macro and uses of
__section(foo) to __section("foo")") removed the stringification of the
section name and now requires quotes around the named section.
Update checkpatch to not remove any quotes when suggesting conversion
of __attribute__((section("name"))) to __section("name")
Miscellanea:
o Add section to the hash with __section replacement
o Remove separate test for __attribute__((section
o Remove the limitation on converting attributes containing only
known, possible conversions. Any unknown attribute types are now
left as-is and known types are converted and moved before
__attribute__ and removed from within the __attribute__((list...)).
Tom Rix [Tue, 24 Nov 2020 05:39:10 +0000 (16:39 +1100)]
checkpatch: add a fixer for missing newline at eof
Remove the trailing error message from the fixed lines.
Link: https://lkml.kernel.org/r/20201017142546.28988-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Dwaipayan Ray [Tue, 24 Nov 2020 05:39:09 +0000 (16:39 +1100)]
checkpatch: extend attributes check to handle more patterns
It is generally preferred that the macros from
include/linux/compiler_attributes.h are used, unless there is a reason not
to.
checkpatch currently checks __attribute__ for each of packed, aligned,
section, printf, scanf, and weak. Other declarations in
compiler_attributes.h are not handled.
Add a generic test to check the presence of such attributes. Some
attributes require more specific handling and are kept separate.
Also add fixes to the generic attributes check to substitute the correct
conversions.
Declarations which contain multiple attributes like
__attribute__((__packed__, __cold__)) are also handled except when proper
conversions for one or more attributes of the list cannot be determined.
Here, it reports 'ff' and 'root' to be repeated, but it is in fact part
of some address or code, where it has to be repeated.
In these cases, the intent of the warning to find stylistic issues in
commit messages is not met and the warning is just completely wrong in
this case.
To avoid these warnings, add an additional regex check for the
directory permission pattern and avoid checking the line for this
class of warning. Similarly, to avoid hex pattern, check if the word
consists of hex symbols and skip this warning if it is not among the
common english words formed using hex letters.
A quick evaluation on v5.6..v5.8 showed that this fix reduces
REPEATED_WORD warnings by the frequency of 1890.
A quick manual check found all cases are related to hex output or
list command outputs in commit messages.
Link: https://lkml.kernel.org/r/20201024102253.13614-1-yashsri421@gmail.com Signed-off-by: Aditya Srivastava <yashsri421@gmail.com> Acked-by: Joe Perches <joe@perches.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Dwaipayan Ray [Tue, 24 Nov 2020 05:39:07 +0000 (16:39 +1100)]
checkpatch: add new exception to repeated word check
Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
moved the repeated word test to check for more file types. But after
this, if checkpatch.pl is run on MAINTAINERS, it generates several
new warnings of the type:
WARNING: Possible repeated word: 'git'
For example:
WARNING: Possible repeated word: 'git'
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git
So, the pattern "git git://..." is a false positive in this case.
There are several other combinations which may produce a wrong
warning message, such as "@size size", ":Begin begin", etc.
Extend repeated word check to compare the characters before and
after the word matches.
If there is a non whitespace character before the first word or a
non whitespace character excluding punctuation characters after
the second word, then the check is skipped and the warning is avoided.
Also add case insensitive word matching to the repeated word check.
gpio: xilinx: utilize generic bitmap_get_value and _set_value
Reimplement the xgpio_set_multiple() function in
drivers/gpio/gpio-xilinx.c to use the new generic functions:
bitmap_get_value() and bitmap_set_value(). The code is now simpler to
read and understand. Moreover, instead of looping for each bit in
xgpio_set_multiple() function, now we can check each channel at a time and
save cycles.
Link: https://lkml.kernel.org/r/15a044d3ba23f00c31fd09437bdd3e5924bb91cd.1603055402.git.syednwaris@gmail.com Signed-off-by: Syed Nayyar Waris <syednwaris@gmail.com> Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reimplement the thunderx_gpio_set_multiple function in
drivers/gpio/gpio-thunderx.c to use the new for_each_set_clump macro.
Instead of looping for each bank in thunderx_gpio_set_multiple function,
now we can skip bank which is not set and save cycles.
Link: https://lkml.kernel.org/r/5e94ad3c156b98d2ed28617b2ca25bacadc189d5.1603055402.git.syednwaris@gmail.com Signed-off-by: Syed Nayyar Waris <syednwaris@gmail.com> Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Robert Richter <rrichter@marvell.com> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
ERROR: that open brace { should be on the previous line
#151: FILE: lib/test_bitmap.c:648:
+static struct clump_test_data_params clump_test_data[] __initdata =
+ { {{0}, 2, 0, 64, 8, clump_exp1},
ERROR: that open brace { should be on the previous line
#161: FILE: lib/test_bitmap.c:658:
+ for(i = 0; i < clump_test_data[index].count; i++)
+ {
ERROR: space required before the open parenthesis '('
#161: FILE: lib/test_bitmap.c:658:
+ for(i = 0; i < clump_test_data[index].count; i++)
ERROR: that open brace { should be on the previous line
#169: FILE: lib/test_bitmap.c:666:
+static void __init execute_for_each_set_clump_test(unsigned int index)
+{
lib/test_bitmap.c: add for_each_set_clump test cases
The introduction of the generic for_each_set_clump macro need test cases
to verify the implementation. This patch adds test cases for scenarios in
which clump sizes are 8 bits, 24 bits, 30 bits and 6 bits. The cases
contain situations where clump is getting split at the word boundary and
also when zeroes are present in the start and middle of bitmap.
Patch series "Introduce the for_each_set_clump macro", v12.
This patchset introduces a new generic version of for_each_set_clump. The
previous version of for_each_set_clump8 used a fixed size 8-bit clump, but
the new generic version can work with clump (n-bits) having size between 1
and BITS_PER_LONG inclusive. size less than 1 or more than BITS_PER_LONG
causes undefined behaviour. The patchset utilizes the new macro in some
GPIO drivers.
The earlier 8-bit for_each_set_clump8 facilitated a for-loop syntax that
iterates over a memory region entire groups of set bits at a time.
For example, suppose you would like to iterate over a 32-bit integer 8
bits at a time, skipping over 8-bit groups with no set bit, where
XXXXXXXX represents the current 8-bit group:
Each iteration of the loop returns the next 8-bit group that has at least
one set bit.
But with the new for_each_set_clump the clump size can be different from 8
bits. Moreover, the clump can be split at word boundary in situations
where word size is not multiple of clump size. Following are examples
showing the working of new macro for clump sizes of 24 bits and 6 bits.
Example 1:
clump size: 24 bits, Number of clumps (or ports): 10
bitmap stores the bit information from where successive clumps are retrieved.
/* bitmap memory region */
0x00aa0000ff000000; /* Most significant bits */
0xaaaaaa0000ff0000;
0x000000aa000000aa;
0xbbbbabcdeffedcba; /* Least significant bits */
Different iterations of for_each_set_clump:-
'offset' is the bit position and 'clump' is the 24 bit clump from the
above bitmap.
Iteration first: offset: 0 clump: 0xfedcba
Iteration second: offset: 24 clump: 0xabcdef
Iteration third: offset: 48 clump: 0xaabbbb
Iteration fourth: offset: 96 clump: 0xaa
Iteration fifth: offset: 144 clump: 0xff
Iteration sixth: offset: 168 clump: 0xaaaaaa
Iteration seventh: offset: 216 clump: 0xff
Loop breaks because in the end the remaining bits (0x00aa) size was less
than clump size of 24 bits.
In above example it can be seen that in iteration third, the 24 bit clump
that was retrieved was split between bitmap[0] and bitmap[1]. This
example also shows that 24 bit zeroes if present in between, were skipped
(preserving the previous for_each_set_macro8 behaviour).
Example 2:
clump size = 6 bits, Number of clumps (or ports) = 3.
/* bitmap memory region */
0x00aa0000ff000000; /* Most significant bits */
0xaaaaaa0000ff0000;
0x0f00000000000000;
0x0000000000000ac0; /* Least significant bits */
Different iterations of for_each_set_clump:
'offset' is the bit position and 'clump' is the 6 bit clump from the
above bitmap.
Iteration first: offset: 6 clump: 0x2b
Loop breaks because 6 * 3 = 18 bits traversed in bitmap.
Here 6 * 3 is clump size * no. of clumps.
This patch (of 4):
This macro iterates for each group of bits (clump) with set bits, within a
bitmap memory region. For each iteration, "start" is set to the bit
offset of the found clump, while the respective clump value is stored to
the location pointed by "clump".
Additionally, the bitmap_get_value() and bitmap_set_value() functions are
introduced to respectively get and set a value of n-bits in a bitmap
memory region. The n-bits can have any size from 1 to BITS_PER_LONG.
size less than 1 or more than BITS_PER_LONG causes undefined behaviour.
Moreover, during setting value of n-bit in bitmap, if a situation arise
that the width of next n-bit is exceeding the word boundary, then it will
divide itself such that some portion of it is stored in that word, while
the remaining portion is stored in the next higher word. Similar
situation occurs while retrieving the value from bitmap.