]> www.infradead.org Git - nvme.git/log
nvme.git
6 years agoxfs: support returning partial reflink results
Darrick J. Wong [Mon, 29 Oct 2018 23:47:06 +0000 (10:47 +1100)]
xfs: support returning partial reflink results

Back when the XFS reflink code only supported clone_file_range, we were
only able to return zero or negative error codes to userspace.  However,
now that copy_file_range (which returns bytes copied) can use XFS'
clone_file_range, we have the opportunity to return partial results.
For example, if userspace sends a 1GB clone request and we run out of
space halfway through, we at least can tell userspace that we completed
512M of that request like a regular write.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agoxfs: clean up xfs_reflink_remap_blocks call site
Darrick J. Wong [Mon, 29 Oct 2018 23:46:50 +0000 (10:46 +1100)]
xfs: clean up xfs_reflink_remap_blocks call site

Move the offset <-> blocks unit conversions into
xfs_reflink_remap_blocks to make the call site less ugly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agoxfs: fix pagecache truncation prior to reflink
Darrick J. Wong [Mon, 29 Oct 2018 23:46:33 +0000 (10:46 +1100)]
xfs: fix pagecache truncation prior to reflink

Prior to remapping blocks, it is necessary to remove pages from the
destination file's page cache.  Unfortunately, the truncation is not
aggressive enough -- if page size > block size, we'll end up zeroing
subpage blocks instead of removing them.  So, round the start offset
down and the end offset up to page boundaries.  We already wrote all
the dirty data so the larger range shouldn't be a problem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agoocfs2: remove ocfs2_reflink_remap_range
Darrick J. Wong [Mon, 29 Oct 2018 23:45:48 +0000 (10:45 +1100)]
ocfs2: remove ocfs2_reflink_remap_range

Since ocfs2_remap_file_range is a thin shell around
ocfs2_remap_remap_range, move everything from the latter into the
former.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agoocfs2: support partial clone range and dedupe range
Darrick J. Wong [Mon, 29 Oct 2018 23:44:45 +0000 (10:44 +1100)]
ocfs2: support partial clone range and dedupe range

Change the ocfs2 remap code to allow for returning partial results.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agoocfs2: fix pagecache truncation prior to reflink
Darrick J. Wong [Mon, 29 Oct 2018 23:43:16 +0000 (10:43 +1100)]
ocfs2: fix pagecache truncation prior to reflink

Prior to remapping blocks, it is necessary to remove pages from the
destination file's page cache.  Unfortunately, the truncation is not
aggressive enough -- if page size > block size, we'll end up zeroing
subpage blocks instead of removing them.  So, round the start offset
down and the end offset up to page boundaries.  We already wrote all
the dirty data so the larger range should be fine.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agoocfs2: truncate page cache for clone destination file before remapping
Darrick J. Wong [Mon, 29 Oct 2018 23:42:56 +0000 (10:42 +1100)]
ocfs2: truncate page cache for clone destination file before remapping

When cloning blocks into another file, truncate the page cache before we
start remapping blocks so that concurrent reads wait for us to finish.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: clean up generic_remap_file_range_prep return value
Darrick J. Wong [Mon, 29 Oct 2018 23:42:24 +0000 (10:42 +1100)]
vfs: clean up generic_remap_file_range_prep return value

Since the remap prep function can update the length of the remap
request, we can change this function to return the usual return status
instead of the odd behavior it has now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: hide file range comparison function
Darrick J. Wong [Mon, 29 Oct 2018 23:42:17 +0000 (10:42 +1100)]
vfs: hide file range comparison function

There are no callers of vfs_dedupe_file_range_compare, so we might as
well make it a static helper and remove the export.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: enable remap callers that can handle short operations
Darrick J. Wong [Mon, 29 Oct 2018 23:42:10 +0000 (10:42 +1100)]
vfs: enable remap callers that can handle short operations

Plumb in a remap flag that enables the filesystem remap handler to
shorten remapping requests for callers that can handle it.  Now
copy_file_range can report partial success (in case we run up against
alignment problems, resource limits, etc.).

We also enable CAN_SHORTEN for fideduperange to maintain existing
userspace-visible behavior where xfs/btrfs shorten the dedupe range to
avoid stale post-eof data exposure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: plumb remap flags through the vfs dedupe functions
Darrick J. Wong [Mon, 29 Oct 2018 23:42:03 +0000 (10:42 +1100)]
vfs: plumb remap flags through the vfs dedupe functions

Plumb a remap_flags argument through the vfs_dedupe_file_range_one
functions so that dedupe can take advantage of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: plumb remap flags through the vfs clone functions
Darrick J. Wong [Mon, 29 Oct 2018 23:41:56 +0000 (10:41 +1100)]
vfs: plumb remap flags through the vfs clone functions

Plumb a remap_flags argument through the {do,vfs}_clone_file_range
functions so that clone can take advantage of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: make remap_file_range functions take and return bytes completed
Darrick J. Wong [Mon, 29 Oct 2018 23:41:49 +0000 (10:41 +1100)]
vfs: make remap_file_range functions take and return bytes completed

Change the remap_file_range functions to take a number of bytes to
operate upon and return the number of bytes they operated on.  This is a
requirement for allowing fs implementations to return short clone/dedupe
results to the user, which will enable us to obey resource limits in a
graceful manner.

A subsequent patch will enable copy_file_range to signal to the
->clone_file_range implementation that it can handle a short length,
which will be returned in the function's return value.  For now the
short return is not implemented anywhere so the behavior won't change --
either copy_file_range manages to clone the entire range or it tries an
alternative.

Neither clone ioctl can take advantage of this, alas.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: remap helper should update destination inode metadata
Darrick J. Wong [Mon, 29 Oct 2018 23:41:41 +0000 (10:41 +1100)]
vfs: remap helper should update destination inode metadata

Extend generic_remap_file_range_prep to handle inode metadata updates
when remapping into a file.  If the operation can possibly alter the
file contents, we must update the ctime and mtime and remove security
privileges, just like we do for regular file writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: pass remap flags to generic_remap_checks
Darrick J. Wong [Mon, 29 Oct 2018 23:41:34 +0000 (10:41 +1100)]
vfs: pass remap flags to generic_remap_checks

Pass the same remap flags to generic_remap_checks for consistency.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: pass remap flags to generic_remap_file_range_prep
Darrick J. Wong [Mon, 29 Oct 2018 23:41:28 +0000 (10:41 +1100)]
vfs: pass remap flags to generic_remap_file_range_prep

Plumb the remap flags through the filesystem from the vfs function
dispatcher all the way to the prep function to prepare for behavior
changes in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: combine the clone and dedupe into a single remap_file_range
Darrick J. Wong [Mon, 29 Oct 2018 23:41:21 +0000 (10:41 +1100)]
vfs: combine the clone and dedupe into a single remap_file_range

Combine the clone_file_range and dedupe_file_range operations into a
single remap_file_range file operation dispatch since they're
fundamentally the same operation.  The differences between the two can
be made in the prep functions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: rename clone_verify_area to remap_verify_area
Darrick J. Wong [Mon, 29 Oct 2018 23:41:14 +0000 (10:41 +1100)]
vfs: rename clone_verify_area to remap_verify_area

Since we use clone_verify_area for both clone and dedupe range checks,
rename the function to make it clear that it's for both.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: rename vfs_clone_file_prep to be more descriptive
Darrick J. Wong [Mon, 29 Oct 2018 23:41:08 +0000 (10:41 +1100)]
vfs: rename vfs_clone_file_prep to be more descriptive

The vfs_clone_file_prep is a generic function to be called by filesystem
implementations only.  Rename the prefix to generic_ and make it more
clear that it applies to remap operations, not just clones.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: skip zero-length dedupe requests
Darrick J. Wong [Mon, 29 Oct 2018 23:41:01 +0000 (10:41 +1100)]
vfs: skip zero-length dedupe requests

Don't bother calling the filesystem for a zero-length dedupe request;
we can return zero and exit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: avoid problematic remapping requests into partial EOF block
Darrick J. Wong [Mon, 29 Oct 2018 23:40:55 +0000 (10:40 +1100)]
vfs: avoid problematic remapping requests into partial EOF block

A deduplication data corruption is exposed in XFS and btrfs. It is
caused by extending the block match range to include the partial EOF
block, but then allowing unknown data beyond EOF to be considered a
"match" to data in the destination file because the comparison is only
made to the end of the source file. This corrupts the destination file
when the source extent is shared with it.

The VFS remapping prep functions  only support whole block dedupe, but
we still need to appear to support whole file dedupe correctly.  Hence
if the dedupe request includes the last block of the souce file, don't
include it in the actual dedupe operation. If the rest of the range
dedupes successfully, then reject the entire request.  A subsequent
patch will enable us to shorten dedupe requests correctly.

When reflinking sub-file ranges, a data corruption can occur when the
source file range includes a partial EOF block. This shares the unknown
data beyond EOF into the second file at a position inside EOF, exposing
stale data in the second file.

If the reflink request includes the last block of the souce file, only
proceed with the reflink operation if it lands at or past the
destination file's current EOF. If it lands within the destination file
EOF, reject the entire request with -EINVAL and make the caller go the
hard way.  A subsequent patch will enable us to shorten reflink requests
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: strengthen checking of file range inputs to generic_remap_checks
Darrick J. Wong [Mon, 29 Oct 2018 23:40:46 +0000 (10:40 +1100)]
vfs: strengthen checking of file range inputs to generic_remap_checks

File range remapping, if allowed to run past the destination file's EOF,
is an optimization on a regular file write.  Regular file writes that
extend the file length are subject to various constraints which are not
checked by range cloning.

This is a correctness problem because we're never allowed to touch
ranges that the page cache can't support (s_maxbytes); we're not
supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't
set; and we must obey resource limits (RLIMIT_FSIZE).

Therefore, add these checks to the new generic_remap_checks function so
that we curtail unexpected behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: exit early from zero length remap operations
Darrick J. Wong [Mon, 29 Oct 2018 23:40:39 +0000 (10:40 +1100)]
vfs: exit early from zero length remap operations

If a remap caller asks us to remap to the source file's EOF and the
source file length leaves us with a zero byte request, exit early.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: check file ranges before cloning files
Darrick J. Wong [Mon, 29 Oct 2018 23:40:31 +0000 (10:40 +1100)]
vfs: check file ranges before cloning files

Move the file range checks from vfs_clone_file_prep into a separate
generic_remap_checks function so that all the checks are collected in a
central location.  This forms the basis for adding more checks from
generic_write_checks that will make cloning's input checking more
consistent with write input checking.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
6 years agovfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF
Darrick J. Wong [Mon, 29 Oct 2018 23:40:22 +0000 (10:40 +1100)]
vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF

vfs_clone_file_prep_inodes cannot return 0 if it is asked to remap from
a zero byte file because that's what btrfs does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoLinux 4.19 v4.19
Greg Kroah-Hartman [Mon, 22 Oct 2018 06:37:37 +0000 (07:37 +0100)]
Linux 4.19

7 years agoMAINTAINERS: Add an entry for the code of conduct
Greg Kroah-Hartman [Fri, 19 Oct 2018 09:30:16 +0000 (11:30 +0200)]
MAINTAINERS: Add an entry for the code of conduct

As I introduced these files, I'm willing to be the maintainer of them as
well.

Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoCode of Conduct: Change the contact email address
Greg Kroah-Hartman [Fri, 19 Oct 2018 09:08:12 +0000 (11:08 +0200)]
Code of Conduct: Change the contact email address

The contact point for the kernel's Code of Conduct should now be the
Code of Conduct Committee, not the full TAB.  Change the email address
in the file to properly reflect this.

Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoCode of Conduct Interpretation: Put in the proper URL for the committee
Greg Kroah-Hartman [Fri, 19 Oct 2018 09:04:07 +0000 (11:04 +0200)]
Code of Conduct Interpretation: Put in the proper URL for the committee

There was a blank <URL> reference for how to find the Code of Conduct
Committee.  Fix that up by pointing it to the correct kernel.org website
page location.

Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoCode of Conduct: Provide links between the two documents
Greg Kroah-Hartman [Fri, 19 Oct 2018 08:45:08 +0000 (10:45 +0200)]
Code of Conduct: Provide links between the two documents

Create a link between the Code of Conduct and the Code of Conduct
Interpretation so that people can see that they are related.

Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoCode of Conduct Interpretation: Properly reference the TAB correctly
Greg Kroah-Hartman [Fri, 19 Oct 2018 08:28:14 +0000 (10:28 +0200)]
Code of Conduct Interpretation: Properly reference the TAB correctly

We use the term "TAB" before defining it later in the document.  Fix
that up by defining it at the first location.

Reported-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoCode of Conduct Interpretation: Add document explaining how the Code of Conduct is...
Greg Kroah-Hartman [Sun, 14 Oct 2018 14:16:47 +0000 (16:16 +0200)]
Code of Conduct Interpretation: Add document explaining how the Code of Conduct is to be interpreted

The Contributor Covenant Code of Conduct is a general document meant to
provide a set of rules for almost any open source community.  Every
open-source community is unique and the Linux kernel is no exception.
Because of this, this document describes how we in the Linux kernel
community will interpret it.  We also do not expect this interpretation
to be static over time, and will adjust it as needed.

This document was created with the input and feedback of the TAB as well
as many current kernel maintainers.

Co-Developed-by: Thomas Gleixner <tglx@linutronix.de>
Co-Developed-by: Olof Johansson <olof@lixom.net>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Boris Brezillon <boris.brezillon@bootlin.com>
Acked-by: Borislav Petkov <bp@kernel.org>
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Christian Lütke-Stetzkamp <christian@lkamp.de>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Sterba <kdave@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Acked-by: Hans de Goede <j.w.r.degoede@gmail.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: James Smart <james.smart@broadcom.com>
Acked-by: James Smart <jsmart2021@gmail.com>
Acked-by: Jan Kara <jack@ucw.cz>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Jiri Kosina <jikos@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Lina Iyer <ilina@codeaurora.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Matias Bjørling <mb@lightnvm.io>
Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Mishi Choudhary <mishi@linux.com>
Acked-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Palmer Dabbelt <palmer@dabbelt.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Sean Paul <sean@poorly.run>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Shuah Khan <shuah@kernel.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Takashi Iwai <tiwai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Acked-by: Todd Poynor <toddpoynor@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoCode of conduct: Fix wording around maintainers enforcing the code of conduct
Chris Mason [Thu, 11 Oct 2018 16:09:31 +0000 (09:09 -0700)]
Code of conduct: Fix wording around maintainers enforcing the code of conduct

As it was originally worded, this paragraph requires maintainers to
enforce the code of conduct, or face potential repercussions.  It sends
the wrong message, when really we just want maintainers to be part of
the solution and not violate the code of conduct themselves.

Removing it doesn't limit our ability to enforce the code of conduct,
and we can still encourage maintainers to help maintain high standards
for the level of discourse in their subsystem.

Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Boris Brezillon <boris.brezillon@bootlin.com>
Acked-by: Borislav Petkov <bp@kernel.org>
Acked-by: Christian Lütke-Stetzkamp <christian@lkamp.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Sterba <kdave@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Acked-by: Hans de Goede <j.w.r.degoede@gmail.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: James Smart <james.smart@broadcom.com>
Acked-by: James Smart <jsmart2021@gmail.com>
Acked-by: Jan Kara <jack@ucw.cz>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Jiri Kosina <jikos@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Lina Iyer <ilina@codeaurora.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Matias Bjørling <mb@lightnvm.io>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Palmer Dabbelt <palmer@dabbelt.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Shuah Khan <shuah@kernel.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Takashi Iwai <tiwai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tim Bird <tim.bird@sony.com>
Acked-by: Todd Poynor <toddpoynor@google.com>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Greg Kroah-Hartman [Sun, 21 Oct 2018 11:51:36 +0000 (13:51 +0200)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Wolfram writes:
  "i2c for 4.19

   Another driver bugfix and MAINTAINERS addition from I2C."

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rcar: cleanup DMA for all kinds of failure
  MAINTAINERS: Add entry for Broadcom STB I2C controller

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Greg Kroah-Hartman [Sun, 21 Oct 2018 08:08:38 +0000 (10:08 +0200)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

David writes:
  "Networking:

   A few straggler bug fixes:

   1) Fix indexing of multi-pass dumps of ipv6 addresses, from David
      Ahern.

   2) Revert RCU locking change for bonding netpoll, causes worse
      problems than it solves.

   3) pskb_trim_rcsum_slow() doesn't handle odd trim offsets, resulting
      in erroneous bad hw checksum triggers with CHECKSUM_COMPLETE
      devices.  From Dimitris Michailidis.

   4) a revert to some neighbour code changes that adjust notifications
      in a way that confuses some apps."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"
  net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
  net: fix pskb_trim_rcsum_slow() with odd trim offset
  Revert "bond: take rcu lock in netpoll_send_skb_on_dev"

7 years agoRevert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"
Roopa Prabhu [Sun, 21 Oct 2018 01:09:31 +0000 (18:09 -0700)]
Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"

This reverts commit 8e326289e3069dfc9fa9c209924668dd031ab8ef.

This patch results in unnecessary netlink notification when one
tries to delete a neigh entry already in NUD_FAILED state. Found
this with a buggy app that tries to delete a NUD_FAILED entry
repeatedly. While the notification issue can be fixed with more
checks, adding more complexity here seems unnecessary. Also,
recent tests with other changes in the neighbour code have
shown that the INCOMPLETE and PROBE checks are good enough for
the original issue.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
David Ahern [Fri, 19 Oct 2018 17:00:19 +0000 (10:00 -0700)]
net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs

The loop wants to skip previously dumped addresses, so loops until
current index >= saved index. If the message fills it wants to save
the index for the next address to dump - ie., the one that did not
fit in the current message.

Currently, it is incrementing the index counter before comparing to the
saved index, and then the saved index is off by 1 - it assumes the
current address is going to fit in the message.

Change the index handling to increment only after a succesful dump.

Fixes: 502a2ffd7376a ("ipv6: convert idev_list to list macros")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoi2c: rcar: cleanup DMA for all kinds of failure
Wolfram Sang [Fri, 19 Oct 2018 19:15:26 +0000 (21:15 +0200)]
i2c: rcar: cleanup DMA for all kinds of failure

DMA needs to be cleaned up not only on timeout, but on all errors where
it has been setup before.

Fixes: 73e8b0528346 ("i2c: rcar: add DMA support")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
7 years agoMAINTAINERS: Add entry for Broadcom STB I2C controller
Kamal Dasu [Wed, 17 Oct 2018 16:05:09 +0000 (12:05 -0400)]
MAINTAINERS: Add entry for Broadcom STB I2C controller

Add an entry for the Broadcom STB I2C controller in the MAINTAINERS file.

Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
[wsa: fixed sorting and a whitespace error]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
7 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sat, 20 Oct 2018 13:04:23 +0000 (15:04 +0200)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Ingo writes:
  "x86 fixes:

   It's 4 misc fixes, 3 build warning fixes and 3 comment fixes.

   In hindsight I'd have left out the 3 comment fixes to make the pull
   request look less scary at such a late point in the cycle. :-/"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels
  x86/fpu: Fix i486 + no387 boot crash by only saving FPU registers on context switch if there is an FPU
  x86/fpu: Remove second definition of fpu in __fpu__restore_sig()
  x86/entry/64: Further improve paranoid_entry comments
  x86/entry/32: Clear the CS high bits
  x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS
  x86/time: Correct the attribute on jiffies' definition
  x86/entry: Add some paranoid entry/exit CR3 handling comments
  x86/percpu: Fix this_cpu_read()
  x86/tsc: Force inlining of cyc2ns bits

7 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sat, 20 Oct 2018 13:03:45 +0000 (15:03 +0200)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Ingo writes:
  "scheduler fixes:

   Two fixes: a CFS-throttling bug fix, and an interactivity fix."

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix the min_vruntime update logic in dequeue_entity()
  sched/fair: Fix throttle_list starvation with low CFS quota

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sat, 20 Oct 2018 13:02:51 +0000 (15:02 +0200)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Ingo writes:
  "perf fixes:

   Misc perf tooling fixes."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Stop fallbacking to kallsyms for vdso symbols lookup
  perf tools: Pass build flags to traceevent build
  perf report: Don't crash on invalid inline debug information
  perf cpu_map: Align cpu map synthesized events properly.
  perf tools: Fix tracing_path_mount proper path
  perf tools: Fix use of alternatives to find JDIR
  perf evsel: Store ids for events with their own cpus perf_event__synthesize_event_update_cpus
  perf vendor events intel: Fix wrong filter_band* values for uncore events
  Revert "perf tools: Fix PMU term format max value calculation"
  tools headers uapi: Sync kvm.h copy
  tools arch uapi: Sync the x86 kvm.h copy

7 years agonet: fix pskb_trim_rcsum_slow() with odd trim offset
Dimitris Michailidis [Sat, 20 Oct 2018 00:07:13 +0000 (17:07 -0700)]
net: fix pskb_trim_rcsum_slow() with odd trim offset

We've been getting checksum errors involving small UDP packets, usually
59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
has been complaining that HW is providing bad checksums. Turns out the
problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98d1bb
("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").

The source of the problem is that when the bytes we are trimming start
at an odd address, as in the case of the 1 padding byte above,
skb_checksum() returns a byte-swapped value. We cannot just combine this
with skb->csum using csum_sub(). We need to use csum_block_sub() here
that takes into account the parity of the start address and handles the
swapping.

Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'drm-fixes-2018-10-20-1' of git://anongit.freedesktop.org/drm/drm
Greg Kroah-Hartman [Sat, 20 Oct 2018 07:23:12 +0000 (09:23 +0200)]
Merge tag 'drm-fixes-2018-10-20-1' of git://anongit.freedesktop.org/drm/drm

Dave writes:
  "drm fixes for 4.19 final (part 2)

   Looked like two stragglers snuck in, one very urgent the pageflipping
   was missing a reference that could result in a GPF on non-i915
   drivers, the other is an overflow in the sun4i dotclock calcs
   resulting in a mode not getting set."

* tag 'drm-fixes-2018-10-20-1' of git://anongit.freedesktop.org/drm/drm:
  drm/sun4i: Fix an ulong overflow in the dotclock driver
  drm: Get ref on CRTC commit object when waiting for flip_done

7 years agoMerge tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rosted...
Greg Kroah-Hartman [Sat, 20 Oct 2018 07:20:48 +0000 (09:20 +0200)]
Merge tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Steven writes:
  "tracing: A few small fixes to synthetic events

   Masami found some issues with the creation of synthetic events.  The
   first two patches fix handling of unsigned type, and handling of a
   space before an ending semi-colon.

   The third patch adds a selftest to test the processing of synthetic
   events."

* tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  selftests: ftrace: Add synthetic event syntax testcase
  tracing: Fix synthetic event to allow semicolon at end
  tracing: Fix synthetic event to accept unsigned modifier

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Greg Kroah-Hartman [Sat, 20 Oct 2018 06:42:56 +0000 (08:42 +0200)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Dmitry writes:
  "Input updates for 4.19-rc8

   Just an addition to elan touchpad driver ACPI table."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15IGM

7 years agoMerge tag 'drm-misc-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Fri, 19 Oct 2018 21:18:12 +0000 (07:18 +1000)]
Merge tag 'drm-misc-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Second pull request for v4.19:
- Fix ulong overflow in sun4i
- Fix a serious GPF in waiting for flip_done from commit_tail().

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/97d1ed42-1d99-fcc5-291e-cd1dc29a4252@linux.intel.com
7 years agoselftests: ftrace: Add synthetic event syntax testcase
Masami Hiramatsu [Thu, 18 Oct 2018 13:13:02 +0000 (22:13 +0900)]
selftests: ftrace: Add synthetic event syntax testcase

Add a testcase to check the syntax and field types for
synthetic_events interface.

Link: http://lkml.kernel.org/r/153986838264.18251.16627517536956299922.stgit@devbox
Acked-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agotracing: Fix synthetic event to allow semicolon at end
Masami Hiramatsu [Thu, 18 Oct 2018 13:12:34 +0000 (22:12 +0900)]
tracing: Fix synthetic event to allow semicolon at end

Fix synthetic event to allow independent semicolon at end.

The synthetic_events interface accepts a semicolon after the
last word if there is no space.

 # echo "myevent u64 var;" >> synthetic_events

But if there is a space, it returns an error.

 # echo "myevent u64 var ;" > synthetic_events
 sh: write error: Invalid argument

This behavior is difficult for users to understand. Let's
allow the last independent semicolon too.

Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agotracing: Fix synthetic event to accept unsigned modifier
Masami Hiramatsu [Thu, 18 Oct 2018 13:12:05 +0000 (22:12 +0900)]
tracing: Fix synthetic event to accept unsigned modifier

Fix synthetic event to accept unsigned modifier for its field type
correctly.

Currently, synthetic_events interface returns error for "unsigned"
modifiers as below;

 # echo "myevent unsigned long var" >> synthetic_events
 sh: write error: Invalid argument

This is because argv_split() breaks "unsigned long" into "unsigned"
and "long", but parse_synth_field() doesn't expected it.

With this fix, synthetic_events can handle the "unsigned long"
correctly like as below;

 # echo "myevent unsigned long var" >> synthetic_events
 # cat synthetic_events
 myevent unsigned long var

Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agoRevert "bond: take rcu lock in netpoll_send_skb_on_dev"
David S. Miller [Fri, 19 Oct 2018 17:45:08 +0000 (10:45 -0700)]
Revert "bond: take rcu lock in netpoll_send_skb_on_dev"

This reverts commit 6fe9487892b32cb1c8b8b0d552ed7222a527fe30.

It is causing more serious regressions than the RCU warning
it is fixing.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'usb-4.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Greg Kroah-Hartman [Fri, 19 Oct 2018 17:25:44 +0000 (19:25 +0200)]
Merge tag 'usb-4.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

I wrote:
  "USB fixes for 4.19-final

   Here are a small number of last-minute USB driver fixes

   Included here are:
     - spectre fix for usb storage gadgets
     - xhci fixes
     - cdc-acm fixes
     - usbip fixes for reported problems

   All of these have been in linux-next with no reported issues."

* tag 'usb-4.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: gadget: storage: Fix Spectre v1 vulnerability
  USB: fix the usbfs flag sanitization for control transfers
  usb: xhci: pci: Enable Intel USB role mux on Apollo Lake platforms
  usb: roles: intel_xhci: Fix Unbalanced pm_runtime_enable
  cdc-acm: correct counting of UART states in serial state notification
  cdc-acm: do not reset notification buffer index upon urb unlinking
  cdc-acm: fix race between reset and control messaging
  usb: usbip: Fix BUG: KASAN: slab-out-of-bounds in vhci_hub_control()
  selftests: usbip: add wait after attach and before checking port status

7 years agoMerge tag 'for-linus-20181019' of git://git.kernel.dk/linux-block
Greg Kroah-Hartman [Fri, 19 Oct 2018 16:51:07 +0000 (18:51 +0200)]
Merge tag 'for-linus-20181019' of git://git.kernel.dk/linux-block

Jens writes:
  "Block fixes for 4.19-final

   Two small fixes that should go into this release."

* tag 'for-linus-20181019' of git://git.kernel.dk/linux-block:
  block: don't deal with discard limit in blkdev_issue_discard()
  nvme: remove ns sibling before clearing path

7 years agodrm/sun4i: Fix an ulong overflow in the dotclock driver
Boris Brezillon [Thu, 18 Oct 2018 10:02:50 +0000 (12:02 +0200)]
drm/sun4i: Fix an ulong overflow in the dotclock driver

The calculated ideal rate can easily overflow an unsigned long, thus
making the best div selection buggy as soon as no ideal match is found
before the overflow occurs.

Fixes: 4731a72df273 ("drm/sun4i: request exact rates to our parents")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181018100250.12565-1-boris.brezillon@bootlin.com
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Greg Kroah-Hartman [Fri, 19 Oct 2018 07:16:20 +0000 (09:16 +0200)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

David writes:
  "Networking

   1) Fix gro_cells leak in xfrm layer, from Li RongQing.

   2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that.  From
      Eric Dumazet.

   3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn
      Töpel.

   4) Out of bounds packet access in _decode_session6(), from Alexei
      Starovoitov.

   5) Several ethtool bugs, where we copy a struct into the kernel twice
      and our validations of the values in the first copy can be
      invalidated by the second copy due to asynchronous updates to the
      memory by the user.  From Wenwen Wang.

   6) Missing netlink attribute validation in cls_api, from Davide
      Caratti.

   7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang.

   8) rxrpc operates on wrong kvec, from Yue Haibing.

   9) A regression was introduced by the disassosciation of route
      neighbour references in rt6_probe(), causing probe for
      neighbourless routes to not be properly rate limited.  Fix from
      Sabrina Dubroca.

   10) Unsafe RCU locking in tipc, from Tung Nguyen.

   11) Use after free in inet6_mc_check(), from Eric Dumazet.

   12) PMTU from icmp packets should update the SCTP transport pathmtu,
       from Xin Long.

   13) Missing peer put on error in rxrpc, from David Howells.

   14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren.

   15) Fix overflowing shift statement in qla3xxx driver, from Nathan
       Chancellor.

   16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva.

   17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return
       value in an inverted manner, fix from Paolo Abeni.

   18) Fix missed unresolved entries in ipmr dumps, from Nikolay
       Aleksandrov.

   19) Fix NAPI handling under high load, we can completely miss events
       when NAPI has to loop more than one time in a cycle.  From Heiner
       Kallweit."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits)
  ip6_tunnel: Fix encapsulation layout
  tipc: fix info leak from kernel tipc_event
  net: socket: fix a missing-check bug
  net: sched: Fix for duplicate class dump
  r8169: fix NAPI handling under high load
  net: ipmr: fix unresolved entry dumps
  net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion()
  sctp: fix the data size calculation in sctp_data_size
  virtio_net: avoid using netif_tx_disable() for serializing tx routine
  udp6: fix encap return code for resubmitting
  mlxsw: core: Fix use-after-free when flashing firmware during init
  sctp: not free the new asoc when sctp_wait_for_connect returns err
  sctp: fix race on sctp_id2asoc
  r8169: re-enable MSI-X on RTL8168g
  net: bpfilter: use get_pid_task instead of pid_task
  ptp: fix Spectre v1 vulnerability
  net: qla3xxx: Remove overflowing shift statement
  geneve, vxlan: Don't set exceptions if skb->len < mtu
  geneve, vxlan: Don't check skb_dst() twice
  sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead
  ...

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Greg Kroah-Hartman [Fri, 19 Oct 2018 07:15:12 +0000 (09:15 +0200)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

David writes:
  "Sparc fixes:

   The main bit here is fixing how fallback system calls are handled in
   the sparc vDSO.

   Unfortunately, I fat fingered the commit and some perf debugging
   hacks slipped into the vDSO fix, which I revert in the very next
   commit."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Revert unintended perf changes.
  sparc: vDSO: Silence an uninitialized variable warning
  sparc: Fix syscall fallback bugs in VDSO.

7 years agoMerge tag 'drm-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm
Greg Kroah-Hartman [Fri, 19 Oct 2018 06:31:22 +0000 (08:31 +0200)]
Merge tag 'drm-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm

Dave writes:
  "drm fixes for 4.19 final

   Just a last set of misc core fixes for final.

   4 fixes, one use after free, one fb integration fix, one EDID fix,
   and one laptop panel quirk,"

* tag 'drm-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm:
  drm/edid: VSDB yCBCr420 Deep Color mode bit definitions
  drm: fix use of freed memory in drm_mode_setcrtc
  drm: fb-helper: Reject all pixel format changing requests
  drm/edid: Add 6 bpc quirk for BOE panel in HP Pavilion 15-n233sl

7 years agoMerge tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Greg Kroah-Hartman [Fri, 19 Oct 2018 06:30:35 +0000 (08:30 +0200)]
Merge tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Doug writes:
  "Really final for-rc pull request for 4.19

   Ok, so last week I thought we had sent our final pull request for
   4.19.  Well, wouldn't ya know someone went and found a couple Spectre
   v1 fixes were needed :-/.  So, a couple *very* small specter patches
   for this (hopefully) final -rc week."

* tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/ucma: Fix Spectre v1 vulnerability
  IB/ucm: Fix Spectre v1 vulnerability

7 years agox86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels
Christoph Hellwig [Sun, 14 Oct 2018 07:52:08 +0000 (09:52 +0200)]
x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.

Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.

[ mingo: Changelog fixes. ]

Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in illegal_highdma")
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Cc: konrad.wilk@oracle.com
Cc: iommu@lists.linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge tag 'drm-misc-fixes-2018-10-18' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Fri, 19 Oct 2018 03:51:55 +0000 (13:51 +1000)]
Merge tag 'drm-misc-fixes-2018-10-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v4.19:
- Fix use of freed memory in drm_mode_setcrtc.
- Reject pixel format changing requests in fb helper.
- Add 6 bpc quirk for HP Pavilion 15-n233sl
- Fix VSDB yCBCr420 Deep Color mode bit definitions

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/647fe5d0-4ec5-57cc-9f23-a4836b29e278@linux.intel.com
7 years agoip6_tunnel: Fix encapsulation layout
Stefano Brivio [Thu, 18 Oct 2018 19:25:07 +0000 (21:25 +0200)]
ip6_tunnel: Fix encapsulation layout

Commit 058214a4d1df ("ip6_tun: Add infrastructure for doing
encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.

As long as the option didn't actually end up in generated packets, this
wasn't an issue. Then commit 89a23c8b528b ("ip6_tunnel: Fix missing tunnel
encapsulation limit option") fixed sending of this option, and the
resulting layout, e.g. for FoU, is:

.-------------------.------------.----------.-------------------.----- - -
| Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
'-------------------'------------'----------'-------------------'----- - -

Needless to say, FoU and GUE (at least) won't work over IPv6. The option
is appended by default, and I couldn't find a way to disable it with the
current iproute2.

Turn this into a more reasonable:

.-------------------.----------.------------.-------------------.----- - -
| Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
'-------------------'----------'------------'-------------------'----- - -

With this, and with 84dad55951b0 ("udp6: fix encap return code for
resubmitting"), FoU and GUE work again over IPv6.

Fixes: 058214a4d1df ("ip6_tun: Add infrastructure for doing encapsulation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: fix info leak from kernel tipc_event
Jon Maloy [Thu, 18 Oct 2018 15:38:29 +0000 (17:38 +0200)]
tipc: fix info leak from kernel tipc_event

We initialize a struct tipc_event allocated on the kernel stack to
zero to avert info leak to user space.

Reported-by: syzbot+057458894bc8cada4dee@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: socket: fix a missing-check bug
Wenwen Wang [Thu, 18 Oct 2018 14:36:46 +0000 (09:36 -0500)]
net: socket: fix a missing-check bug

In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.

This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: Fix for duplicate class dump
Phil Sutter [Thu, 18 Oct 2018 08:34:26 +0000 (10:34 +0200)]
net: sched: Fix for duplicate class dump

When dumping classes by parent, kernel would return classes twice:

| # tc qdisc add dev lo root prio
| # tc class show dev lo
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| # tc class show dev lo parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:

This comes from qdisc_match_from_root() potentially returning the root
qdisc itself if its handle matched. Though in that case, root's classes
were already dumped a few lines above.

Fixes: cb395b2010879 ("net: sched: optimize class dumps")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8169: fix NAPI handling under high load
Heiner Kallweit [Thu, 18 Oct 2018 17:56:01 +0000 (19:56 +0200)]
r8169: fix NAPI handling under high load

rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.

Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.

Fixes: da78dbff2e05 ("r8169: remove work from irq handler.")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc: Revert unintended perf changes.
David S. Miller [Thu, 18 Oct 2018 18:32:29 +0000 (11:32 -0700)]
sparc: Revert unintended perf changes.

Some local debugging hacks accidently slipped into the VDSO commit.

Sorry!

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrm: Get ref on CRTC commit object when waiting for flip_done
Leo Li [Mon, 15 Oct 2018 13:46:40 +0000 (09:46 -0400)]
drm: Get ref on CRTC commit object when waiting for flip_done

This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.

Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:

        ...
     1. W does flip work
     2. W runs commit_hw_done()
     3. W waits for flip_done on CRTC 1
     4. > flip_done for CRTC 1 completes
     5. W finishes waiting for CRTC 1
     6. W waits for flip_done on CRTC 2

     7. > Preempted by X
     8. > flip_done for CRTC 2 completes
     9. X atomic_check: hw_done and flip_done are complete on all CRTCs
    10. X updates cursor on both CRTCs
    11. X destroys atomic state
    12. X done

    13. > Switch back to W
    14. W waits for flip_done on CRTC 2
    15. W raises general protection fault

The error looks like so:

    general protection fault: 0000 [#1] PREEMPT SMP PTI
    **snip**
    Call Trace:
     lock_acquire+0xa2/0x1b0
     _raw_spin_lock_irq+0x39/0x70
     wait_for_completion_timeout+0x31/0x130
     drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
     amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
     commit_tail+0x3d/0x70 [drm_kms_helper]
     process_one_work+0x212/0x650
     worker_thread+0x49/0x420
     kthread+0xfb/0x130
     ret_from_fork+0x3a/0x50
    Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
    gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
    fb_sys_fops ttm(O) drm(O)

Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.

v2: The reference on the commit object needs to be obtained before
    hw_done() is signaled, since that's the point where another commit
    is allowed to modify the state. Assuming that the
    new_crtc_state->commit object still exists within flip_done() is
    incorrect.

    Fix by getting a reference in setup_commit(), and releasing it
    during default_clear().

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1539611200-6184-1-git-send-email-sunpeng.li@amd.com
7 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Thu, 18 Oct 2018 16:55:08 +0000 (09:55 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2018-10-18

1) Free the xfrm interface gro_cells when deleting the
   interface, otherwise we leak it. From Li RongQing.

2) net/core/flow.c does not exist anymore, so remove it
   from the MAINTAINERS file.

3) Fix a slab-out-of-bounds in _decode_session6.
   From Alexei Starovoitov.

4) Fix RCU protection when policies inserted into
   thei bydst lists. From Florian Westphal.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoblock: don't deal with discard limit in blkdev_issue_discard()
Ming Lei [Fri, 12 Oct 2018 07:53:10 +0000 (15:53 +0800)]
block: don't deal with discard limit in blkdev_issue_discard()

blk_queue_split() does respect this limit via bio splitting, so no
need to do that in blkdev_issue_discard(), then we can align to
normal bio submit(bio_add_page() & submit_bio()).

More importantly, this patch fixes one issue introduced in a22c4d7e34402cc
("block: re-add discard_granularity and alignment checks"), in which
zero discard bio may be generated in case of zero alignment.

Fixes: a22c4d7e34402ccdf3 ("block: re-add discard_granularity and alignment checks")
Cc: stable@vger.kernel.org
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Xiao Ni <xni@redhat.com>
Tested-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
7 years agofscache: Fix out of bound read in long cookie keys
Eric Sandeen [Wed, 17 Oct 2018 14:23:59 +0000 (15:23 +0100)]
fscache: Fix out of bound read in long cookie keys

fscache_set_key() can incur an out-of-bounds read, reported by KASAN:

 BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
 Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615

and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236

  BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
  BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
  Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466

This happens for any index_key_len which is not divisible by 4 and is
larger than the size of the inline key, because the code allocates exactly
index_key_len for the key buffer, but the hashing loop is stepping through
it 4 bytes (u32) at a time in the buf[] array.

Fix this by calculating how many u32 buffers we'll need by using
DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
buffer to hold the index_key, then using that same count as the hashing
index limit.

Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofscache: Fix incomplete initialisation of inline key space
David Howells [Wed, 17 Oct 2018 14:23:45 +0000 (15:23 +0100)]
fscache: Fix incomplete initialisation of inline key space

The inline key in struct rxrpc_cookie is insufficiently initialized,
zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
bytes will end up hashing uninitialized memory because the memcpy only
partially fills the last buf[] element.

Fix this by clearing fscache_cookie objects on allocation rather than using
the slab constructor to initialise them.  We're going to pretty much fill
in the entire struct anyway, so bringing it into our dcache writably
shouldn't incur much overhead.

This removes the need to do clearance in fscache_set_key() (where we aren't
doing it correctly anyway).

Also, we don't need to set cookie->key_len in fscache_set_key() as we
already did it in the only caller, so remove that.

Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Reported-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocachefiles: fix the race between cachefiles_bury_object() and rmdir(2)
Al Viro [Wed, 17 Oct 2018 14:23:26 +0000 (15:23 +0100)]
cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)

the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent.  Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc.  So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.

The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.

Cc: stable@vger.kernel.org
Fixes: 9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomremap: properly flush TLB before releasing the page
Linus Torvalds [Fri, 12 Oct 2018 22:22:59 +0000 (15:22 -0700)]
mremap: properly flush TLB before releasing the page

Jann Horn points out that our TLB flushing was subtly wrong for the
mremap() case.  What makes mremap() special is that we don't follow the
usual "add page to list of pages to be freed, then flush tlb, and then
free pages".  No, mremap() obviously just _moves_ the page from one page
table location to another.

That matters, because mremap() thus doesn't directly control the
lifetime of the moved page with a freelist: instead, the lifetime of the
page is controlled by the page table locking, that serializes access to
the entry.

As a result, we need to flush the TLB not just before releasing the lock
for the source location (to avoid any concurrent accesses to the entry),
but also before we release the destination page table lock (to avoid the
TLB being flushed after somebody else has already done something to that
page).

This also makes the whole "need_flush" logic unnecessary, since we now
always end up flushing the TLB for every valid entry.

Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLICENSES: Remove CC-BY-SA-4.0 license text
Christoph Hellwig [Thu, 18 Oct 2018 06:22:39 +0000 (08:22 +0200)]
LICENSES: Remove CC-BY-SA-4.0 license text

Using non-GPL licenses for our documentation is rather problematic,
as it can directly include other files, which generally are GPLv2
licensed and thus not compatible.

Remove this license now that the only user (idr.rst) is gone to avoid
people semi-accidentally using it again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoMerge branch 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax
Greg Kroah-Hartman [Thu, 18 Oct 2018 09:24:32 +0000 (11:24 +0200)]
Merge branch 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax

Matthew writes:
  "IDA/IDR fixes for 4.19

   I have two tiny fixes, one for the IDA test-suite and one for the IDR
   documentation license."

* 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax:
  idr: Change documentation license
  test_ida: Fix lockdep warning

7 years agoMerge tag 'perf-urgent-for-mingo-4.19-20181017' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Thu, 18 Oct 2018 05:41:29 +0000 (07:41 +0200)]
Merge tag 'perf-urgent-for-mingo-4.19-20181017' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Stop falling back to kallsyms for vDSO symbols lookup, this wasn't
  being really used and is not valid in arches such as Sparc, where
  user and kernel space don't share the address space, relying only on
  cpumode to figure out what DSOs to lookup (Arnaldo Carvalho de Melo)

- Align CPU map synthesized events properly, fixing SIGBUS in
  CPUs like Sparc (David Miller)

- Fix use of alternatives to find JDIR (Jarod Wilson)

- Store IDs for events with their own CPUs when synthesizing user
  level event details (scale, unit, etc) events, fixing a crash
  when recording a PMU event with a cpumask defined (Jiri Olsa)

- Fix wrong filter_band* values for uncore Intel vendor events (Jiri Olsa)

- Fix detection of tracefs path in systems without tracefs, where
  that path should be the debugfs mountpoint plus "/tracing/" (Jiri Olsa)

- Pass build flags to traceevent build, allowing using alternative
  flags in distro packages, RPM, for instance (Jiri Olsa)

- Fix 'perf report' crash on invalid inline debug information (Milian Wolff)

- Synch KVM UAPI copies (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agonet: ipmr: fix unresolved entry dumps
Nikolay Aleksandrov [Wed, 17 Oct 2018 19:34:34 +0000 (22:34 +0300)]
net: ipmr: fix unresolved entry dumps

If the skb space ends in an unresolved entry while dumping we'll miss
some unresolved entries. The reason is due to zeroing the entry counter
between dumping resolved and unresolved mfc entries. We should just
keep counting until the whole table is dumped and zero when we move to
the next as we have a separate table counter.

Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: 8fb472c09b9d ("ipmr: improve hash scalability")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion()
Gregory CLEMENT [Wed, 17 Oct 2018 15:26:35 +0000 (17:26 +0200)]
net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion()

The ocelot_vlant_wait_for_completion() function is very similar to the
ocelot_mact_wait_for_completion(). It seemed to have be copied but the
comment was not updated, so let's fix it.

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: fix the data size calculation in sctp_data_size
Xin Long [Wed, 17 Oct 2018 13:11:27 +0000 (21:11 +0800)]
sctp: fix the data size calculation in sctp_data_size

sctp data size should be calculated by subtracting data chunk header's
length from chunk_hdr->length, not just data header.

Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: avoid using netif_tx_disable() for serializing tx routine
Ake Koomsin [Wed, 17 Oct 2018 10:44:12 +0000 (19:44 +0900)]
virtio_net: avoid using netif_tx_disable() for serializing tx routine

Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
introduces netif_tx_disable() after netif_device_detach() in order to
avoid use-after-free of tx queues. However, there are two issues.

1) Its operation is redundant with netif_device_detach() in case the
   interface is running.
2) In case of the interface is not running before suspending and
   resuming, the tx does not get resumed by netif_device_attach().
   This results in losing network connectivity.

It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
serializing tx routine during reset. This also preserves the symmetry
of netif_device_detach() and netif_device_attach().

Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Greg Kroah-Hartman [Thu, 18 Oct 2018 05:29:05 +0000 (07:29 +0200)]
Merge tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Steven writes:
  "tracing: Two fixes for 4.19

   This fixes two bugs:
    - Fix size mismatch of tracepoint array
    - Have preemptirq test module use same clock source of the selftest"

* tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c
  tracepoint: Fix tracepoint array element size mismatch

7 years agoudp6: fix encap return code for resubmitting
Paolo Abeni [Wed, 17 Oct 2018 09:44:04 +0000 (11:44 +0200)]
udp6: fix encap return code for resubmitting

The commit eb63f2964dbe ("udp6: add missing checks on edumux packet
processing") used the same return code convention of the ipv4 counterpart,
but ipv6 uses the opposite one: positive values means resubmit.

This change addresses the issue, using positive return value for
resubmitting. Also update the related comment, which was broken, too.

Fixes: eb63f2964dbe ("udp6: add missing checks on edumux packet processing")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: core: Fix use-after-free when flashing firmware during init
Ido Schimmel [Wed, 17 Oct 2018 08:05:45 +0000 (08:05 +0000)]
mlxsw: core: Fix use-after-free when flashing firmware during init

When the switch driver (e.g., mlxsw_spectrum) determines it needs to
flash a new firmware version it resets the ASIC after the flashing
process. The bus driver (e.g., mlxsw_pci) then registers itself again
with mlxsw_core which means (among other things) that the device
registers itself again with the hwmon subsystem again.

Since the device was registered with the hwmon subsystem using
devm_hwmon_device_register_with_groups(), then the old hwmon device
(registered before the flashing) was never unregistered and was
referencing stale data, resulting in a use-after free.

Fix by removing reliance on device managed APIs in mlxsw_hwmon_init().

Fixes: c86d62cc410c ("mlxsw: spectrum: Reset FW after flash")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: not free the new asoc when sctp_wait_for_connect returns err
Xin Long [Tue, 16 Oct 2018 19:06:12 +0000 (03:06 +0800)]
sctp: not free the new asoc when sctp_wait_for_connect returns err

When sctp_wait_for_connect is called to wait for connect ready
for sp->strm_interleave in sctp_sendmsg_to_asoc, a panic could
be triggered if cpu is scheduled out and the new asoc is freed
elsewhere, as it will return err and later the asoc gets freed
again in sctp_sendmsg.

[  285.840764] list_del corruption, ffff9f0f7b284078->next is LIST_POISON1 (dead000000000100)
[  285.843590] WARNING: CPU: 1 PID: 8861 at lib/list_debug.c:47 __list_del_entry_valid+0x50/0xa0
[  285.846193] Kernel panic - not syncing: panic_on_warn set ...
[  285.846193]
[  285.848206] CPU: 1 PID: 8861 Comm: sctp_ndata Kdump: loaded Not tainted 4.19.0-rc7.label #584
[  285.850559] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  285.852164] Call Trace:
...
[  285.872210]  ? __list_del_entry_valid+0x50/0xa0
[  285.872894]  sctp_association_free+0x42/0x2d0 [sctp]
[  285.873612]  sctp_sendmsg+0x5a4/0x6b0 [sctp]
[  285.874236]  sock_sendmsg+0x30/0x40
[  285.874741]  ___sys_sendmsg+0x27a/0x290
[  285.875304]  ? __switch_to_asm+0x34/0x70
[  285.875872]  ? __switch_to_asm+0x40/0x70
[  285.876438]  ? ptep_set_access_flags+0x2a/0x30
[  285.877083]  ? do_wp_page+0x151/0x540
[  285.877614]  __sys_sendmsg+0x58/0xa0
[  285.878138]  do_syscall_64+0x55/0x180
[  285.878669]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is a similar issue with the one fixed in Commit ca3af4dd28cf
("sctp: do not free asoc when it is already dead in sctp_sendmsg").
But this one can't be fixed by returning -ESRCH for the dead asoc
in sctp_wait_for_connect, as it will break sctp_connect's return
value to users.

This patch is to simply set err to -ESRCH before it returns to
sctp_sendmsg when any err is returned by sctp_wait_for_connect
for sp->strm_interleave, so that no asoc would be freed due to
this.

When users see this error, they will know the packet hasn't been
sent. And it also makes sense to not free asoc because waiting
connect fails, like the second call for sctp_wait_for_connect in
sctp_sendmsg_to_asoc.

Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: fix race on sctp_id2asoc
Marcelo Ricardo Leitner [Tue, 16 Oct 2018 18:18:17 +0000 (15:18 -0300)]
sctp: fix race on sctp_id2asoc

syzbot reported an use-after-free involving sctp_id2asoc.  Dmitry Vyukov
helped to root cause it and it is because of reading the asoc after it
was freed:

        CPU 1                       CPU 2
(working on socket 1)            (working on socket 2)
                         sctp_association_destroy
sctp_id2asoc
   spin lock
     grab the asoc from idr
   spin unlock
                                   spin lock
     remove asoc from idr
   spin unlock
   free(asoc)
   if asoc->base.sk != sk ... [*]

This can only be hit if trying to fetch asocs from different sockets. As
we have a single IDR for all asocs, in all SCTP sockets, their id is
unique on the system. An application can try to send stuff on an id
that matches on another socket, and the if in [*] will protect from such
usage. But it didn't consider that as that asoc may belong to another
socket, it may be freed in parallel (read: under another socket lock).

We fix it by moving the checks in [*] into the protected region. This
fixes it because the asoc cannot be freed while the lock is held.

Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8169: re-enable MSI-X on RTL8168g
Heiner Kallweit [Tue, 16 Oct 2018 17:35:17 +0000 (19:35 +0200)]
r8169: re-enable MSI-X on RTL8168g

Similar to d49c88d7677b ("r8169: Enable MSI-X on RTL8106e") after
e9d0ba506ea8 ("PCI: Reprogram bridge prefetch registers on resume")
we can safely assume that this also fixes the root cause of
the issue worked around by 7c53a722459c ("r8169: don't use MSI-X on
RTL8168g"). So let's revert it.

Fixes: 7c53a722459c ("r8169: don't use MSI-X on RTL8168g")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: bpfilter: use get_pid_task instead of pid_task
Taehee Yoo [Tue, 16 Oct 2018 15:35:10 +0000 (00:35 +0900)]
net: bpfilter: use get_pid_task instead of pid_task

pid_task() dereferences rcu protected tasks array.
But there is no rcu_read_lock() in shutdown_umh() routine so that
rcu_read_lock() is needed.
get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock()
then calls pid_task(). if task isn't NULL, it increases reference count
of task.

test commands:
   %modprobe bpfilter
   %modprobe -rv bpfilter

splat looks like:
[15102.030932] =============================
[15102.030957] WARNING: suspicious RCU usage
[15102.030985] 4.19.0-rc7+ #21 Not tainted
[15102.031010] -----------------------------
[15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage!
[15102.031063]
       other info that might help us debug this:

[15102.031332]
       rcu_scheduler_active = 2, debug_locks = 1
[15102.031363] 1 lock held by modprobe/1570:
[15102.031389]  #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter]
[15102.031552]
               stack backtrace:
[15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21
[15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[15102.031628] Call Trace:
[15102.031676]  dump_stack+0xc9/0x16b
[15102.031723]  ? show_regs_print_info+0x5/0x5
[15102.031801]  ? lockdep_rcu_suspicious+0x117/0x160
[15102.031855]  pid_task+0x134/0x160
[15102.031900]  ? find_vpid+0xf0/0xf0
[15102.032017]  shutdown_umh.constprop.1+0x1e/0x53 [bpfilter]
[15102.032055]  stop_umh+0x46/0x52 [bpfilter]
[15102.032092]  __x64_sys_delete_module+0x47e/0x570
[ ... ]

Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoptp: fix Spectre v1 vulnerability
Gustavo A. R. Silva [Tue, 16 Oct 2018 13:06:41 +0000 (15:06 +0200)]
ptp: fix Spectre v1 vulnerability

pin_index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/ptp/ptp_chardev.c:253 ptp_ioctl() warn: potential spectre issue
'ops->pin_config' [r] (local cap)

Fix this by sanitizing pin_index before using it to index
ops->pin_config, and before passing it as an argument to
function ptp_set_pinfunc(), in which it is used to index
info->pin_config.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc: vDSO: Silence an uninitialized variable warning
Dan Carpenter [Sat, 13 Oct 2018 10:26:53 +0000 (13:26 +0300)]
sparc: vDSO: Silence an uninitialized variable warning

Smatch complains that "val" would be uninitialized if kstrtoul() fails.

Fixes: 9a08862a5d2e ("vDSO for sparc")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qla3xxx: Remove overflowing shift statement
Nathan Chancellor [Sat, 13 Oct 2018 02:14:58 +0000 (19:14 -0700)]
net: qla3xxx: Remove overflowing shift statement

Clang currently warns:

drivers/net/ethernet/qlogic/qla3xxx.c:384:24: warning: signed shift
result (0xF00000000) requires 37 bits to represent, but 'int' only has
32 bits [-Wshift-overflow]
                    ((ISP_NVRAM_MASK << 16) | qdev->eeprom_cmd_data));
                      ~~~~~~~~~~~~~~ ^  ~~
1 warning generated.

The warning is certainly accurate since ISP_NVRAM_MASK is defined as
(0x000F << 16) which is then shifted by 16, resulting in 64424509440,
well above UINT_MAX.

Given that this is the only location in this driver where ISP_NVRAM_MASK
is shifted again, it seems likely that ISP_NVRAM_MASK was originally
defined without a shift and during the move of the shift to the
definition, this statement wasn't properly removed (since ISP_NVRAM_MASK
is used in the statenent right above this). Only the maintainers can
confirm this since this statment has been here since the driver was
first added to the kernel.

Link: https://github.com/ClangBuiltLinux/linux/issues/127
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'geneve-vxlan-mtu'
David S. Miller [Thu, 18 Oct 2018 04:51:14 +0000 (21:51 -0700)]
Merge branch 'geneve-vxlan-mtu'

Stefano Brivio says:

====================
geneve, vxlan: Don't set exceptions if skb->len < mtu

This series fixes the exception abuse described in 2/2, and 1/2
is just a preparatory change to make 2/2 less ugly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agogeneve, vxlan: Don't set exceptions if skb->len < mtu
Stefano Brivio [Fri, 12 Oct 2018 21:53:59 +0000 (23:53 +0200)]
geneve, vxlan: Don't set exceptions if skb->len < mtu

We shouldn't abuse exceptions: if the destination MTU is already higher
than what we're transmitting, no exception should be created.

Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agogeneve, vxlan: Don't check skb_dst() twice
Stefano Brivio [Fri, 12 Oct 2018 21:53:58 +0000 (23:53 +0200)]
geneve, vxlan: Don't check skb_dst() twice

Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
that we try updating PMTU for a non-existent destination, but didn't clean
up cases where the check was already explicit. Drop those redundant checks.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc: Fix syscall fallback bugs in VDSO.
David S. Miller [Thu, 18 Oct 2018 04:28:01 +0000 (21:28 -0700)]
sparc: Fix syscall fallback bugs in VDSO.

First, the trap number for 32-bit syscalls is 0x10.

Also, only negate the return value when syscall error is indicated by
the carry bit being set.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotracing: Use trace_clock_local() for looping in preemptirq_delay_test.c
Steven Rostedt (VMware) [Tue, 16 Oct 2018 03:31:42 +0000 (23:31 -0400)]
tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c

The preemptirq_delay_test module is used for the ftrace selftest code that
tests the latency tracers. The problem is that it uses ktime for the delay
loop, and then checks the tracer to see if the delay loop is caught, but the
tracer uses trace_clock_local() which uses various different other clocks to
measure the latency. As ktime uses the clock cycles, and the code then
converts that to nanoseconds, it causes rounding errors, and the preemptirq
latency tests are failing due to being off by 1 (it expects to see a delay
of 500000 us, but the delay is only 499999 us). This is happening due to a
rounding error in the ktime (which is totally legit). The purpose of the
test is to see if it can catch the delay, not to test the accuracy between
trace_clock_local() and ktime_get(). Best to use apples to apples, and have
the delay loop use the same clock as the latency tracer does.

Cc: stable@vger.kernel.org
Fixes: f96e8577da102 ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agotracepoint: Fix tracepoint array element size mismatch
Mathieu Desnoyers [Sat, 13 Oct 2018 19:10:50 +0000 (15:10 -0400)]
tracepoint: Fix tracepoint array element size mismatch

commit 46e0c9be206f ("kernel: tracepoints: add support for relative
references") changes the layout of the __tracepoint_ptrs section on
architectures supporting relative references. However, it does so
without turning struct tracepoint * const into const int elsewhere in
the tracepoint code, which has the following side-effect:

Setting mod->num_tracepoints is done in by module.c:

    mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs",
                                         sizeof(*mod->tracepoints_ptrs),
                                         &mod->num_tracepoints);

Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size
(rather than sizeof(int)), num_tracepoints is erroneously set to half the
size it should be on 64-bit arch. So a module with an odd number of
tracepoints misses the last tracepoint due to effect of integer
division.

So in the module going notifier:

        for_each_tracepoint_range(mod->tracepoints_ptrs,
                mod->tracepoints_ptrs + mod->num_tracepoints,
                tp_module_going_check_quiescent, NULL);

the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually
evaluates to something within the bounds of the array, but miss the
last tracepoint if the number of tracepoints is odd on 64-bit arch.

Fix this by introducing a new typedef: tracepoint_ptr_t, which
is either "const int" on architectures that have PREL32 relocations,
or "struct tracepoint * const" on architectures that does not have
this feature.

Also provide a new tracepoint_ptr_defer() static inline to
encapsulate deferencing this type rather than duplicate code and
ugly idefs within the for_each_tracepoint_range() implementation.

This issue appears in 4.19-rc kernels, and should ideally be fixed
before the end of the rc cycle.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com
Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agousb: gadget: storage: Fix Spectre v1 vulnerability
Gustavo A. R. Silva [Tue, 16 Oct 2018 10:16:45 +0000 (12:16 +0200)]
usb: gadget: storage: Fix Spectre v1 vulnerability

num can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/usb/gadget/function/f_mass_storage.c:3177 fsg_lun_make() warn:
potential spectre issue 'fsg_opts->common->luns' [r] (local cap)

Fix this by sanitizing num before using it to index
fsg_opts->common->luns

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Felipe Balbi <felipe.balbi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoperf tools: Stop fallbacking to kallsyms for vdso symbols lookup
Arnaldo Carvalho de Melo [Tue, 16 Oct 2018 20:08:29 +0000 (17:08 -0300)]
perf tools: Stop fallbacking to kallsyms for vdso symbols lookup

David reports that:

<quote>
Perf has this hack where it uses the kernel symbol map as a backup when
a symbol can't be found in the user's symbol table(s).

This causes problems because the tests driving this code path use
machine__kernel_ip(), and that is completely meaningless on Sparc.  On
sparc64 the kernel and user live in physically separate virtual address
spaces, rather than a shared one.  And the kernel lives at a virtual
address that overlaps common userspace addresses.  So this test passes
almost all the time when a user symbol lookup fails.

The consequence of this is that, if the unfound user virtual address in
the sample doesn't match up to a kernel symbol either, we trigger things
like this code in builtin-top.c:

if (al.sym == NULL && al.map != NULL) {
const char *msg = "Kernel samples will not be resolved.\n";
/*
 * As we do lazy loading of symtabs we only will know if the
 * specified vmlinux file is invalid when we actually have a
 * hit in kernel space and then try to load it. So if we get
 * here and there are _no_ symbols in the DSO backing the
 * kernel map, bail out.
 *
 * We may never get here, for instance, if we use -K/
 * --hide-kernel-symbols, even if the user specifies an
 * invalid --vmlinux ;-)
 */
if (!machine->kptr_restrict_warned && !top->vmlinux_warned &&
    __map__is_kernel(al.map) && map__has_symbols(al.map)) {
if (symbol_conf.vmlinux_name) {
char serr[256];
dso__strerror_load(al.map->dso, serr, sizeof(serr));
ui__warning("The %s file can't be used: %s\n%s",
    symbol_conf.vmlinux_name, serr, msg);
} else {
ui__warning("A vmlinux file was not found.\n%s",
    msg);
}

if (use_browser <= 0)
sleep(5);
top->vmlinux_warned = true;
}
}

When I fire up a compilation on sparc, this triggers immediately.

I'm trying to figure out what the "backup to kernel map" code is
accomplishing.

I see some language in the current code and in the changes that have
happened in this area talking about vdso.  Does that really happen?

The vdso is mapped into userspace virtual addresses, not kernel ones.

More history.  This didn't cause problems on sparc some time ago,
because the kernel IP check used to be "ip < 0" :-) Sparc kernel
addresses are not negative.  But now with machine__kernel_ip(), which
works using the symbol table determined kernel address range, it does
trigger.

What it all boils down to is that on architectures like sparc,
machine__kernel_ip() should always return false in this scenerio, and
therefore this kind of logic:

if (cpumode == PERF_RECORD_MISC_USER && machine &&
    mg != &machine->kmaps &&
    machine__kernel_ip(machine, al->addr)) {

is basically invalid.  PERF_RECORD_MISC_USER implies no kernel address
can possibly match for the sample/event in question (no matter how
hard you try!) :-)
</>

So, I thought something had changed and in the past we would somehow
find that address in the kallsyms, but I couldn't find anything to back
that up, the patch introducing this is over a decade old, lots of things
changed, so I was just thinking I was missing something.

I tried a gtod busy loop to generate vdso activity and added a 'perf
probe' at that branch, on x86_64 to see if it ever gets hit:

Made thread__find_map() noinline, as 'perf probe' in lines of inline
functions seems to not be working, only at function start. (Masami?)

  # perf probe -x ~/bin/perf -L thread__find_map:57
  <thread__find_map@/home/acme/git/perf/tools/perf/util/event.c:57>
     57                 if (cpumode == PERF_RECORD_MISC_USER && machine &&
     58                     mg != &machine->kmaps &&
     59                     machine__kernel_ip(machine, al->addr)) {
     60                         mg = &machine->kmaps;
     61                         load_map = true;
     62                         goto try_again;
                        }
                } else {
                        /*
                         * Kernel maps might be changed when loading
                         * symbols so loading
                         * must be done prior to using kernel maps.
                         */
     69                 if (load_map)
     70                         map__load(al->map);
     71                 al->addr = al->map->map_ip(al->map, al->addr);

  # perf probe -x ~/bin/perf thread__find_map:60
  Added new event:
    probe_perf:thread__find_map (on thread__find_map:60 in /home/acme/bin/perf)

  You can now use it in all perf tools, such as:

perf record -e probe_perf:thread__find_map -aR sleep 1

  #

  Then used this to see if, system wide, those probe points were being hit:

  # perf trace -e *perf:thread*/max-stack=8/
  ^C[root@jouet ~]#

  No hits when running 'perf top' and:

  # cat gtod.c
  #include <sys/time.h>

  int main(void)
  {
struct timeval tv;

while (1)
gettimeofday(&tv, 0);

return 0;
  }
  [root@jouet c]# ./gtod
  ^C

  Pressed 'P' in 'perf top' and the [vdso] samples are there:

  62.84%  [vdso]                    [.] __vdso_gettimeofday
   8.13%  gtod                      [.] main
   7.51%  [vdso]                    [.] 0x0000000000000914
   5.78%  [vdso]                    [.] 0x0000000000000917
   5.43%  gtod                      [.] _init
   2.71%  [vdso]                    [.] 0x000000000000092d
   0.35%  [kernel]                  [k] native_io_delay
   0.33%  libc-2.26.so              [.] __memmove_avx_unaligned_erms
   0.20%  [vdso]                    [.] 0x000000000000091d
   0.17%  [i2c_i801]                [k] i801_access
   0.06%  firefox                   [.] free
   0.06%  libglib-2.0.so.0.5400.3   [.] g_source_iter_next
   0.05%  [vdso]                    [.] 0x0000000000000919
   0.05%  libpthread-2.26.so        [.] __pthread_mutex_lock
   0.05%  libpixman-1.so.0.34.0     [.] 0x000000000006d3a7
   0.04%  [kernel]                  [k] entry_SYSCALL_64_trampoline
   0.04%  libxul.so                 [.] style::dom_apis::query_selector_slow
   0.04%  [kernel]                  [k] module_get_kallsym
   0.04%  firefox                   [.] malloc
   0.04%  [vdso]                    [.] 0x0000000000000910

  I added a 'perf probe' to thread__find_map:69, and that surely got tons
  of hits, i.e. for every map found, just to make sure the 'perf probe'
  command was really working.

  In the process I noticed a bug, we're only have records for '[vdso]' for
  pre-existing commands, i.e. ones that are running when we start 'perf top',
  when we will generate the PERF_RECORD_MMAP by looking at /perf/PID/maps.

  I.e. like this, for preexisting processes with a vdso map, again,
  tracing for all the system, only pre-existing processes get a [vdso] map
  (when having one):

  [root@jouet ~]# perf probe -x ~/bin/perf __machine__addnew_vdso
  Added new event:
  probe_perf:__machine__addnew_vdso (on __machine__addnew_vdso in /home/acme/bin/perf)

  You can now use it in all perf tools, such as:

perf record -e probe_perf:__machine__addnew_vdso -aR sleep 1

  [root@jouet ~]# perf trace -e probe_perf:__machine__addnew_vdso/max-stack=8/
     0.000 probe_perf:__machine__addnew_vdso:(568eb3)
                                       __machine__addnew_vdso (/home/acme/bin/perf)
                                       map__new (/home/acme/bin/perf)
                                       machine__process_mmap2_event (/home/acme/bin/perf)
                                       machine__process_event (/home/acme/bin/perf)
                                       perf_event__process (/home/acme/bin/perf)
                                       perf_tool__process_synth_event (/home/acme/bin/perf)
                                       perf_event__synthesize_mmap_events (/home/acme/bin/perf)
                                       __event__synthesize_thread (/home/acme/bin/perf)

The kernel is generating a PERF_RECORD_MMAP for vDSOs, but somehow
'perf top' is not getting those records while 'perf record' is:

  # perf record ~acme/c/gtod
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.076 MB perf.data (1499 samples) ]

  # perf report -D | grep PERF_RECORD_MMAP2
  71293612401913 0x11b48 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x400000(0x1000) @ 0 fd:02 1137 541179306]: r-xp /home/acme/c/gtod
  71293612419012 0x11be0 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x7fa4a2783000(0x227000) @ 0 fd:00 3146370 854107250]: r-xp /usr/lib64/ld-2.26.so
  71293612432110 0x11c50 [0x60]: PERF_RECORD_MMAP2 25484/25484: [0x7ffcdb53a000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
  71293612509944 0x11cb0 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x7fa4a23cd000(0x3b6000) @ 0 fd:00 3149723 262067164]: r-xp /usr/lib64/libc-2.26.so
  #
  # perf script | grep vdso | head
      gtod 25484 71293.612768: 2485554 cycles:ppp:  7ffcdb53a914 [unknown] ([vdso])
      gtod 25484 71293.613576: 2149343 cycles:ppp:  7ffcdb53a917 [unknown] ([vdso])
      gtod 25484 71293.614274: 1814652 cycles:ppp:  7ffcdb53aca8 __vdso_gettimeofday+0x98 ([vdso])
      gtod 25484 71293.614862: 1669070 cycles:ppp:  7ffcdb53acc5 __vdso_gettimeofday+0xb5 ([vdso])
      gtod 25484 71293.615404: 1451589 cycles:ppp:  7ffcdb53acc5 __vdso_gettimeofday+0xb5 ([vdso])
      gtod 25484 71293.615999: 1269941 cycles:ppp:  7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
      gtod 25484 71293.616405: 1177946 cycles:ppp:  7ffcdb53a914 [unknown] ([vdso])
      gtod 25484 71293.616775: 1121290 cycles:ppp:  7ffcdb53ac47 __vdso_gettimeofday+0x37 ([vdso])
      gtod 25484 71293.617150: 1037721 cycles:ppp:  7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
      gtod 25484 71293.617478:  994526 cycles:ppp:  7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
  #

The patch is the obvious one and with it we also continue to resolve
vdso symbols for pre-existing processes in 'perf top' and for all
processes in 'perf record' + 'perf report/script'.

Suggested-by: David Miller <davem@davemloft.net>
Acked-by: David Miller <davem@davemloft.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-cs7skq9pp0kjypiju6o7trse@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoMerge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus
Jens Axboe [Wed, 17 Oct 2018 15:45:49 +0000 (09:45 -0600)]
Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus

Pull single NVMe fix from Christoph.

* 'nvme-4.19' of git://git.infradead.org/nvme:
  nvme: remove ns sibling before clearing path

7 years agoMerge branch 'parisc-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Greg Kroah-Hartman [Wed, 17 Oct 2018 12:01:00 +0000 (14:01 +0200)]
Merge branch 'parisc-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Helge writes:
   "parisc fix:

    Fix an unitialized variable usage in the parisc unwind code."

* 'parisc-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix uninitialized variable usage in unwind.c