www.infradead.org Git - users/jedix/linux-maple.git/log

xfs: Properly retry failed inode items in case of error during buffer writeback

[ Upstream commit d3a304b6292168b83b45d624784f973fdc1ca674 ]

When a buffer has been failed during writeback, the inode items into it
are kept flush locked, and are never resubmitted due the flush lock, so,
if any buffer fails to be written, the items in AIL are never written to
disk and never unlocked.

This causes unmount operation to hang due these items flush locked in AIL,
but this also causes the items in AIL to never be written back, even when
the IO device comes back to normal.

I've been testing this patch with a DM-thin device, creating a
filesystem larger than the real device.

When writing enough data to fill the DM-thin device, XFS receives ENOSPC
errors from the device, and keep spinning on xfsaild (when 'retry
forever' configuration is set).

At this point, the filesystem can not be unmounted because of the flush locked
items in AIL, but worse, the items in AIL are never retried at all
(once xfs_inode_item_push() will skip the items that are flush locked),
even if the underlying DM-thin device is expanded to the proper size.

This patch fixes both cases, retrying any item that has been failed
previously, using the infra-structure provided by the previous patch.

Orabug: 27609404
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: Add infrastructure needed for error propagation during buffer IO failure

[ Upstream commit 0b80ae6ed13169bd3a244e71169f2cc020b0c57a ]

With the current code, XFS never re-submit a failed buffer for IO,
because the failed item in the buffer is kept in the flush locked state
forever.

To be able to resubmit an log item for IO, we need a way to mark an item
as failed, if, for any reason the buffer which the item belonged to
failed during writeback.

Add a new log item callback to be used after an IO completion failure
and make the needed clean ups.

Orabug: 27609404
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: remove xfs_trans_ail_delete_bulk

[ Upstream commit 27af1bbf524459962d1477a38ac6e0b7f79aaecc ]

xfs_iflush_done uses an on-stack variable length array to pass the log
items to be deleted to xfs_trans_ail_delete_bulk. On-stack VLAs are a
nasty gcc extension that can lead to unbounded stack allocations, but
fortunately we can easily avoid them by simply open coding
xfs_trans_ail_delete_bulk in xfs_iflush_done, which is the only caller
of it except for the single-item xfs_trans_ail_delete.

Orabug: 27609404
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: fix and streamline error handling in xfs_end_io

[ Upstream commit 787eb485509f9d58962bd8b4dbc6a5ac6e2034fe ]

There are two different cases of buffered I/O errors:

- first we can have an already shutdown fs.  In that case we should skip
   any on-disk operations and just clean up the appen transaction if
   present and destroy the ioend
- a real I/O error.  In that case we should cleanup any lingering COW
   blocks.  This gets skipped in the current code and is fixed by this
   patch.

Orabug: 27609404
Originally-Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick: heavily modified since we don't support cow in uek4...]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: don't leave EFIs on AIL on mount failure

[ Upstream commit f0b2efad16e78623b5a156f6e4e9166907b83155 ]

Log recovery occurs in two phases at mount time. In the first phase,
EFIs and EFDs are processed and potentially cancelled out. EFIs without
EFD objects are inserted into the AIL for processing and recovery in the
second phase. xfs_mountfs() runs various other operations between the
phases and is thus subject to failure. If failure occurs after the first
phase but before the second, pending EFIs sit on the AIL, pin it and
cause the mount to hang.

Update the mount sequence to ensure that pending EFIs are cancelled in
the event of failure. Add a recovery cancellation mechanism to iterate
the AIL and cancel all EFI items when requested. Plumb cancellation
support through the log mount finish helper and update xfs_mountfs() to
invoke cancellation in the event of failure after recovery has started.

Orabug: 27609404
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: use EFI refcount consistently in log recovery

[ Upstream commit e32a1d1fbf6eb2bdc24aa0502e827ff4d2234604 ]

The EFI is initialized with a reference count of 2. One for the EFI to
ensure the item makes it to the AIL and one for the subsequently created
EFD to release the EFI once the EFD is committed. Log recovery uses the
EFI in a similar manner, but implements a hack to remove both references
in one call once the EFD is handled.

Update log recovery to use EFI reference counting in a manner consistent
with the log. When an EFI is encountered during recovery, an EFI item is
allocated and inserted to the AIL directly. Since the EFI reference is
typically dropped when the EFI is unpinned and this is analogous with
AIL insertion, drop the EFI reference at this point.

When a corresponding EFD is encountered in the log, this indicates that
the extents were freed, no processing is required and the EFI can be
dropped. Update xlog_recover_efd_pass2() to simply drop the EFD
reference at this point rather than open code the AIL removal and EFI
free.

Remaining EFIs (i.e., with no corresponding EFD) are processed in
xlog_recover_finish(). An EFD transaction is allocated and the extents
are freed, which transfers ownership of the EFI reference to the EFD
item in the log.

Orabug: 27609404
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: ensure EFD trans aborts on log recovery extent free failure

[ Upstream commit 6bc43af3d5f507254b8de2058ea51f6ec998ae52 ]

Log recovery attempts to free extents with leftover EFIs in the AIL
after initial processing. If the extent free fails (e.g., due to
unrelated fs corruption), the transaction is cancelled, though it
might not be dirtied at the time. If this is the case, the EFD does
not abort and thus does not release the EFI. This can lead to hangs
as the EFI pins the AIL.

Update xlog_recover_process_efi() to log the EFD in the transaction
before xfs_free_extent() errors are handled to ensure the
transaction is dirty, aborts the EFD and releases the EFI on error.
Since this is a requirement for EFD processing (and consistent with
xfs_bmap_finish()), update the EFD logging helper to do the extent
free and unconditionally log the EFD. This encodes the required EFD
logging behavior into the helper and reduces the likelihood of
errors down the road.

[dchinner: re-add xfs_alloc.h to xfs_log_recover.c to fix build
failure.]

Orabug: 27609404
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: fix efi/efd error handling to avoid fs shutdown hangs

[ Upstream commit 8d99fe92fed019e203f458370129fb28b3fb5740 ]

Freeing an extent in XFS involves logging an EFI (extent free
intention), freeing the actual extent, and logging an EFD (extent
free done). The EFI object is created with a reference count of 2:
one for the current transaction and one for the subsequently created
EFD. Under normal circumstances, the first reference is dropped when
the EFI is unpinned and the second reference is dropped when the EFD
is committed to the on-disk log.

In event of errors or filesystem shutdown, there are various
potential cleanup scenarios depending on the state of the EFI/EFD.
The cleanup scenarios are confusing and racy, as demonstrated by the
following test sequence:

# mount $dev $mnt
# fsstress -d $mnt -n 99999 -p 16 -z -f fallocate=1 \
-f punch=1 -f creat=1 -f unlink=1 &
# sleep 5
# killall -9 fsstress; wait
# godown -f $mnt
# umount

... in which the final umount can hang due to the AIL being pinned
indefinitely by one or more EFI items. This can occur due to several
conditions. For example, if the shutdown occurs after the EFI is
committed to the on-disk log and the EFD committed to the CIL, but
before the EFD committed to the log, the EFD iop_committed() abort
handler does not drop its reference to the EFI. Alternatively,
manual error injection in the xfs_bmap_finish() codepath shows that
if an error occurs after the EFI transaction is committed but before
the EFD is constructed and logged, the EFI is never released from
the AIL.

Update the EFI/EFD item handling code to use a more straightforward
and reliable approach to error handling. If an error occurs after
the EFI transaction is committed and before the EFD is constructed,
release the EFI explicitly from xfs_bmap_finish(). If the EFI
transaction is cancelled, release the EFI in the unlock handler.

Once the EFD is constructed, it is responsible for releasing the EFI
under any circumstances (including whether the EFI item aborts due
to log I/O error). Update the EFD item handlers to release the EFI
if the transaction is cancelled or aborts due to log I/O error.
Finally, update xfs_bmap_finish() to log at least one EFD extent to
the transaction before xfs_free_extent() errors are handled to
ensure the transaction is dirty and EFD item error handling is
triggered.

Orabug: 27609404
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: return committed status from xfs_trans_roll()

[ Upstream commit d43ac29be7a174f93a3d26cc1e68668fe86b782f ]

Some callers need to make error handling decisions based on whether
the current transaction successfully committed or not. Rename
xfs_trans_roll(), add a new parameter and provide a wrapper to
preserve existing callers.

Orabug: 27609404
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

xfs: disentagle EFI release from the extent count

[ Upstream commit 5e4b5386a2c29429add601c8cfb45bb10d80c490 ]

Release of the EFI either occurs based on the reference count or the
extent count. The extent count used is either the count tracked in
the EFI or EFD, depending on the particular situation. In either
case, the count is initialized to the final value and thus always
matches the current efi_next_extent value once the EFI is completely
constructed. For example, the EFI extent count is increased as the
extents are logged in xfs_bmap_finish() and the full free list is
always completely processed. Therefore, the count is guaranteed to
be complete once the EFI transaction is committed. The EFD uses the
efd_nextents counter to release the EFI. This counter is initialized
to the count of the EFI when the EFD is created. Thus the EFD, as
currently used, has no concept of partial EFI release based on
extent count.

Given that the EFI extent count is always released in whole, use of
the extent count for reference counting is unnecessary. Remove this
level of the API and release the EFI based on the core reference
count. The efi_next_extent counter remains because it is still used
to track the slot to log the next extent to free.

Orabug: 27609404
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: wen.gang.wang@oracle.com
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets

We need to make sure the offsets are not out of range of the
total size.
Also check that they are in ascending order.

The WARN_ON triggered by syzkaller (it sets panic_on_warn) is
changed to also bail out, no point in continuing parsing.

Briefly tested with simple ruleset of
-A INPUT --limit 1/s' --log
plus jump to custom chains using 32bit ebtables binary.

Reported-by: <syzbot+845a53d13171abf8bf29@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit b71812168571fa55e44cdd0254471331b9c4c4c6)

Orabug: 27774012
CVE: CVE-2018-1068

Signed-off-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ACPI / PAD: don't register acpi_pad driver if running as Xen dom0

When running as Xen dom0 a special processor_aggregator driver is
needed. Don't register the standard driver in this case.

Without that check an error message:

"Error: Driver 'processor_aggregator' is already registered,
aborting..."

will be displayed.

Signed-off-by: Juergen Gross <jgross@suse.com>
[ rjw: Minor fixups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e311404f7925f6879817ebf471651c0bb5935604)
Orabug: 27796473
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Sriharsha NS <sriharsha.ns@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

sched/fair: Fix typo in sync_throttle()

We should update cfs_rq->throttled_clock_task, not
pcfs_rq->throttle_clock_task.

The effects of this bug was probably occasionally erratic
group scheduling, particularly in cgroups-intense workloads.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
[ Added changelog. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 55e16d30bd99 ("sched/fair: Rework throttle_count sync")
Link: http://lkml.kernel.org/r/1468050862-18864-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit b8922125e4790fa237a8a4204562ecf457ef54bb)

Orabug: 27787518

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Mridula Shastry <mridula.c.shastry@oracle.com>

sched/fair: Do not announce throttled next buddy in dequeue_task_fair()

Hierarchy could be already throttled at this point. Throttled next
buddy could trigger a NULL pointer dereference in pick_next_task_fair().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7)

Orabug: 27787518

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Mridula Shastry <mridula.c.shastry@oracle.com>

sched/fair: Initialize and rework throttle_count for new task-groups

This patch is a combination of the following three patches from mainline:

094f469172e0 sched/fair: Initialize throttle_count for new task-groups lazily

Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

8663e24d56dc sched/fair: Reorder cgroup creation code

A future patch needs rq->lock held _after_ we link the task_group into
the hierarchy. In order to avoid taking every rq->lock twice, reorder
things a little and create online_fair_sched_group() to be called
after we link the task_group.

All this code is still ran from css_alloc() so css_online() isn't in
fact used for this.

55e16d30bd99 sched/fair: Rework throttle_count sync

Since we already take rq->lock when creating a cgroup, use it to also
sync the throttle_count and avoid the extra state and enqueue path
branch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: linux-kernel@vger.kernel.org
[ Fixed build warning. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The patches have been combined because applying them separately will
cause a KABI breakage and introduce a dummy function.

Orabug: 27787518

Conflicts:
kernel/sched/fair.c
kernel/sched/core.c
kernel/sched/sched.h

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Mridula Shastry <mridula.c.shastry@oracle.com>

perf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include/asm/

And remove the empty tools/arch/x86/include/asm/unistd_{32,64}.h files
introduced by eae7a755ee81 ("perf tools, x86: Build perf on older
user-space as well").

This way we get closer to mirroring the kernel for cases where __NR_
can't be found for some include path/_GNU_SOURCE/whatever scenario.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-kpj6m3mbjw82kg6krk2z529e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit cec07f53c398)

Orabug: 27240053
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: FIPS - allow tests to be disabled in FIPS mode

In FIPS mode, additional restrictions may apply. If these restrictions
are violated, the kernel will panic(). This patch allows test vectors
for symmetric ciphers to be marked as to be skipped in FIPS mode.

Together with the patch, the XTS test vectors where the AES key is
identical to the tweak key is disabled in FIPS mode. This test vector
violates the FIPS requirement that both keys must be different.

Reported-by: Tapas Sarangi <TSarangi@trustwave.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 10faa8c0d6c3b22466f97713a9533824a2ea1c57)

Orabug: 27809271

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Conflicts:
crypto/testmgr.h

crypto: xts - consolidate sanity check for keys

The patch centralizes the XTS key check logic into the service function
xts_check_key which is invoked from the different XTS implementations.
With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
a sanity check for the XTS keys similar to the other arches.

In addition, this service function received a check to ensure that the
key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
check is not present in the standards defining XTS, it is only enforced
in FIPS mode of the kernel.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 28856a9e52c7cac712af6c143de04766617535dc)

Orabug: 27809271

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

crypto: rng - Zero seed in crypto_rng_reset

If we allocate a seed on behalf ot the user in crypto_rng_reset,
we must ensure that it is zeroed afterwards or the RNG may be
compromised.

Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b617b702da4e922277806f81c411d3051107d462)

Orabug: 27809271

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

enic: set IG desc cache flag in open

New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.

Also increment driver version

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5de0c022f1b0bce073cb04dd69ed7982805e5763)
Orabug: 27587345
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Drivers: hv: utils: fix crash when device is removed from host side

The crash is observed when a service is being disabled host side while
userspace daemon is connected to the device:

[   90.244859] general protection fault: 0000 [#1] SMP
...
[   90.800082] Call Trace:
[   90.800082]  [<ffffffff81187008>] __fput+0xc8/0x1f0
[   90.800082]  [<ffffffff8118716e>] ____fput+0xe/0x10
...
[   90.800082]  [<ffffffff81015278>] do_signal+0x28/0x580
[   90.800082]  [<ffffffff81086656>] ? finish_task_switch+0xa6/0x180
[   90.800082]  [<ffffffff81443ebf>] ? __schedule+0x28f/0x870
[   90.800082]  [<ffffffffa01ebbaa>] ? hvt_op_read+0x12a/0x140 [hv_utils]
...

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit 9420098adc50a88d4a441e0f92d54bfa7af44448)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode

When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.

As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit a15025660d4703a8b37290a14734cb4a84875770)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

Drivers: hv: utils: rename outmsg_lock

As a preparation to reusing outmsg_lock to protect test-and-set openrations
on 'mode' rename it the more general 'lock'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit a72f3a4ccff22de879a1f599210ecdd9bd483a43)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

Drivers: hv: utils: fix memory leak on on_msg() failure

inmsg should be freed in case of on_msg() failure to avoid memory leak.
Preserve the error code from on_msg().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit 1f75338b6fece2bbd42ac3623830c65e2df6e031)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

Drivers: hv: utils: use memdup_user in hvt_op_write

Use memdup_user to handle OOM.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit b00359642c2427da89dc8f77daa2c9e8a84e6d76)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

hv: util: checking the wrong variable

We don't catch this allocation failure because there is a typo and we
check the wrong variable.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit 9dd6a06430c94299651d74b9ed5ca8396ab8ff1f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

net/rds: Avoid copy overhead if send buff is full

Avoid copying the message from user-space if we already
know there's not enough space in the send buffer.

Change-Id: I5ddde7d41bbaeaf25f398c721d02babd3893b73f
Orabug: 27747165
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

ext4: fix ->put_link panic

Orabug: 27498770

The following panic was caught. Something wrong with the storage and io error
was returned, generic_readlink()->ext4_follow_link()->page_follow_link_light()
returned with NULL page and error link, then ext4_put_link() tried to free the
error link and panic.

[25144440.198756] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
[25144440.332969] Aborting journal on device dm-7-8.
[25144440.338462] Buffer I/O error on dev dm-7, logical block 3702784, lost sync page write
[25144440.342625] Buffer I/O error on dev dm-7, logical block 0, lost sync page write
[25144440.342627] EXT4-fs error (device dm-7): ext4_journal_check_start:56: Detected aborted journal
[25144440.342629] EXT4-fs (dm-7): Remounting filesystem read-only
[25144440.342630] EXT4-fs (dm-7): previous I/O error to superblock detected
[25144440.342634] Buffer I/O error on dev dm-7, logical block 0, lost sync page write
[25144440.390336] JBD2: Error -5 detected when updating journal superblock for dm-7-8.
[25144464.799979] Buffer I/O error on dev dm-7, logical block 1573499, lost async page write
[25144464.809517] Buffer I/O error on dev dm-7, logical block 1573501, lost async page write
[25144464.819048] Buffer I/O error on dev dm-7, logical block 1573659, lost async page write
[25144464.828669] Buffer I/O error on dev dm-7, logical block 1573660, lost async page write
[25144464.838207] Buffer I/O error on dev dm-7, logical block 1573662, lost async page write
[25144464.847798] Buffer I/O error on dev dm-7, logical block 1573675, lost async page write
[25144464.857326] Buffer I/O error on dev dm-7, logical block 1573677, lost async page write
[25144464.866848] Buffer I/O error on dev dm-7, logical block 1573696, lost async page write
[25144464.876383] Buffer I/O error on dev dm-7, logical block 1573698, lost async page write
[25144464.885903] Buffer I/O error on dev dm-7, logical block 1573703, lost async page write
[25144496.335355] ------------[ cut here ]------------
[25144496.341039] kernel BUG at mm/slub.c:3413!
[25144496.345997] invalid opcode: 0000 [#1] SMP
[25144496.351074] Modules linked in: dm_snapshot dm_bufio nfnetlink_queue nfnetlink_log nfnetlink iptable_filter ip_tables oracleacfs(PO) oracleadvm(PO) oracleoks(PO) mpt3sas scsi_transport_sas raid_class nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ipmi_poweroff ipmi_devintf bonding rds_rdma rds ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad mlx4_ib mlx4_core dm_multipath bnx2i cnic uio cxgb4i libcxgbi cxgb4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 fuse iTCO_wdt iTCO_vendor_support sb_edac edac_core i2c_i801 lpc_ich mfd_core sg ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio ipmi_ssif i2c_core ipmi_si ipmi_msghandler wmi acpi_pad ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas dm_mirror
[25144496.433281]  dm_region_hash dm_log dm_mod
[25144496.436514] CPU: 8 PID: 201734 Comm: tar Tainted: P           O    4.1.12-61.28.1.el6uek.x86_64 #2
[25144496.447204] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38070000 12/16/2016
[25144496.458974] task: ffff888ce26d0e00 ti: ffff888da1c50000 task.ti: ffff888da1c50000
[25144496.467999] RIP: 0010:[<ffffffff811e5de9>]  [<ffffffff811e5de9>] kfree+0x159/0x170
[25144496.477238] RSP: 0018:ffff888da1c53da8  EFLAGS: 00010246
[25144496.483734] RAX: 001fffff80000400 RBX: fffffffffffffffb RCX: ffffffffa00ee860
[25144496.492480] RDX: 0000000000000000 RSI: fffffffffffffffb RDI: ffffea0001ffffc0
[25144496.501116] RBP: ffff888da1c53dc8 R08: 000000000001abc0 R09: ffff885efec07480
[25144496.509753] R10: ffffffff8118b1b7 R11: 0000000000000000 R12: 0000000000000000
[25144496.518425] R13: ffffffffa00ee890 R14: 00000000fffffffb R15: 000000000000005f
[25144496.527061] FS:  00007f949c3957a0(0000) GS:ffff885eff400000(0000) knlGS:0000000000000000
[25144496.536769] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25144496.543674] CR2: 00007f47cdbd8d20 CR3: 0000006093b7b000 CR4: 00000000003406e0
[25144496.552333] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[25144496.560961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[25144496.569630] Stack:
[25144496.572452]  ffff888da1c53de8 ffff8801e33e7bc0 0000000000000000 ffff888da1c53de8
[25144496.581455]  ffff888da1c53dd8 ffffffffa00ee890 ffff888da1c53eb8 ffffffff8120f7ec
[25144496.590422]  ffff888da1c53eb8 ffffffff812144c3 ffff88bec2658320 ffff8801e33e7bc0
[25144496.599391] Call Trace:
[25144496.602619]  [<ffffffffa00ee890>] ext4_put_link+0x30/0x40 [ext4]
[25144496.609819]  [<ffffffff8120f7ec>] generic_readlink+0x8c/0xb0
[25144496.616627]  [<ffffffff812144c3>] ? user_path_at_empty+0x63/0xa0
[25144496.623816]  [<ffffffff8120a496>] SyS_readlinkat+0x116/0x130
[25144496.630629]  [<ffffffff8120a0eb>] SyS_readlink+0x1b/0x20
[25144496.637065]  [<ffffffff8169d76e>] system_call_fastpath+0x12/0x71
[25144496.644278] Code: ff ff eb 8e 66 f7 07 00 c0 74 20 48 8b 07 31 f6 f6 c4 40 74 03 8b 77 68 e8 f5 d7 fa ff e9 70 ff ff ff 48 8b 7f 30 e9 19 ff ff ff <0f> 0b 0f 1f 44 00 00 eb f9 66 66 66 66 66 2e 0f 1f 84 00 00 00
[25144496.667149] RIP  [<ffffffff811e5de9>] kfree+0x159/0x170
[25144496.673496]  RSP <ffff888da1c53da8>

Mainline/uek5 not have this issue, as the ->following_link and ->put_link have
been refactored there. The patche set to do that is a little big, so I don't
bother to backport them, just write this small patch to fix the issue.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>

KVM/VMX: Clear spec_ctrl status when resetting vcpu

vmx->spec_ctrl was not set to 0 in vmx_vcpu_reset, which could result in
IBRS getting stuck on all the time, even with 'spectre_v2=off' set. This
was most notable when rebooting from an older kernel into a newer
retpoline-enabled kernel resulted in up to 80% CPU performance drop.

OraBug: 27774415

Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Patrick Colp <patrick.colp@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

mlx4: change the ICM table allocations to lowest needed size

The driver currently allocates 256KB contig memort for ICM
tables which puts pressure on memory management to allocate such
large contig page in fragmented memory system. Such allocation
itself contributes to memory fragmentation and at times user
process stalls for 10's of seconds leading to slow path
dumps with mm lock contention.

This change makes the driver allocate lowest page size
needed for ICM allocation(8K), which fixes these stalls.

With 4K chunk sizes the QP table size is 4MB, which cannot be allocated
by kmalloc. A larger design change would be neccesary to break up the
table. With 8k chunks the same table is 2MB, which can be allocated by
kmalloc. This large table allocation only happens once at driver load
time.

Orabug: 27718303

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

Revert "Drivers: hv: utils: fix a race on userspace daemons registration"

This reverts commit 3c908e953ef2771f683dae4ef4c5c1f4dbdfe27d. This change has
caused an fcopy failure and turned out to be unnecessary for the original bug.

Orabug: 27673755
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

crypto: af_alg - Avoid sock_graft call warning

The newly added sock_graft warning triggers in af_alg_accept.
It's harmless as we're essentially doing sock->sk = sock->sk.

The sock_graft call is actually redundant because all the work
it does is subsumed by sock_init_data. However, it was added
to placate SELinux as it uses it to initialise its internal state.

This patch avoisd the warning by making the SELinux call directly.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2acce6aa9f6569d4e135b2c4cfb56acce95efaeb)

Orabug: 26895616,27426147

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

iscsi-target: Fix initial login PDU asynchronous socket close OOPs

This patch fixes a OOPs originally introduced by:

   commit bb048357dad6d604520c91586334c9c230366a14
   Author: Nicholas Bellinger <nab@linux-iscsi.org>
   Date:   Thu Sep 5 14:54:04 2013 -0700

   iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed.  For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 25cdda95fda78d22d44157da15aa7ea34be3c804)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

target/iscsi: Fix indentation in iscsi_target_start_negotiation()

This patch avoids that smatch complains about inconsistent
indentation in iscsi_target_start_negotiation().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 1efaa949396b5d9e8d1e6edef7e97e9ce1a97319)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race

There is a iscsi-target/tcp login race in LOGIN_FLAGS_READY
state assignment that can result in frequent errors during
iscsi discovery:

"iSCSI Login negotiation failed."

To address this bug, move the initial LOGIN_FLAGS_READY
assignment ahead of iscsi_target_do_login() when handling
the initial iscsi_target_start_negotiation() request PDU
during connection login.

As iscsi_target_do_login_rx() work_struct callback is
clearing LOGIN_FLAGS_READ_ACTIVE after subsequent calls
to iscsi_target_do_login(), the early sk_data_ready
ahead of the first iscsi_target_do_login() expects
LOGIN_FLAGS_READY to also be set for the initial
login request PDU.

As reported by Maged, this was first obsered using an
MSFT initiator running across multiple VMWare host
virtual machines with iscsi-target/tcp.

Reported-by: Maged Mokhtar <mmokhtar@binarykinetics.com>
Tested-by: Maged Mokhtar <mmokhtar@binarykinetics.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 8f0dfb3d8b1120c61f6e2cc3729290db10772b2d)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

iscsi-target: Fix rx_login_comp hang after login failure

This patch addresses a case where iscsi_target_do_tx_login_io()
fails sending the last login response PDU, after the RX/TX
threads have already been started.

The case centers around iscsi_target_rx_thread() not invoking
allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
from the failure path, resulting in RX thread hanging
indefinately on iscsi_conn->rx_login_comp.

Note this bug is a regression introduced by:

  commit e54198657b65625085834847ab6271087323ffea
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Wed Jul 22 23:14:19 2015 -0700

      iscsi-target: Fix iscsit_start_kthreads failure OOPs

To address this bug, complete ->rx_login_complete for good
measure in the failure path, and immediately return from
RX thread context if connection state did not actually reach
full feature phase (TARG_CONN_STATE_LOGGED_IN).

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit ca82c2bded29b38d36140bfa1e76a7bbfcade390)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

KVM: x86: fix singlestepping over syscall

TF is handled a bit differently for syscall and sysret, compared
to the other instructions: TF is checked after the instruction completes,
so that the OS can disable #DB at a syscall by adding TF to FMASK.
When the sysret is executed the #DB is taken "as if" the syscall insn
just completed.

KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
Fix the behavior, otherwise you could get #DB on a user stack which is not
nice. This does not affect Linux guests, as they use an IST or task gate
for #DB.

This fixes CVE-2017-7518.

Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit c8401dda2f0a00cd25c0af6a95ed50e478d25de4)

Orabug: 27669904
CVE: CVE-2017-7518

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Conflicts:
arch/x86/kvm/x86.c
skipped changes for kvm_skip_emulated_instruction()

nfs: system crashes after NFS4ERR_MOVED recovery

nfs4_update_server unconditionally releases the nfs_client for the
source server. If migration fails, this can cause the source server's
nfs_client struct to be left with a low reference count, resulting in
use-after-free. Also, adjust reference count handling for ELOOP.

NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
nfs_put_client+0xfa/0x110 [nfs]
nfs4_run_state_manager+0x30/0x40 [nfsv4]
kthread+0xd8/0xf0

BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
nfs4_xdr_enc_write+0x6b/0x160 [nfsv4]
rpcauth_wrap_req+0xac/0xf0 [sunrpc]
call_transmit+0x18c/0x2c0 [sunrpc]
__rpc_execute+0xa6/0x490 [sunrpc]
rpc_async_schedule+0x15/0x20 [sunrpc]
process_one_work+0x160/0x470
worker_thread+0x112/0x540
? rescuer_thread+0x3f0/0x3f0
kthread+0xd8/0xf0

This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"),
but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops")

Reported-by: Helen Chao <helen.chao@oracle.com>
Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops")
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 27679350
(cherry picked from commit ad86f605c59500da82d196ac312cfbac3daba31d)
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Manjunath Patil <manjunath.b.patil@oracle.com>

NFS: Clean up nfs4_set_client()

If we cut out the dprintk()s, then we can return error codes directly
and cut out the goto.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 27679350
(cherry picked from commit 2dc42c0d60e0104f7cd8beee3871f953565392ff)
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Manjunath Patil <manjunath.b.patil@oracle.com>

NFS4: Avoid migration loops

If a server returns itself as a location while migrating, the client may
end up getting stuck attempting to migrate twice to the same server. Catch
this by checking if the nfs_client found is the same as the existing
client. For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL).

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 27679350
(cherry picked from commit 52442f9b11b7e5d4a38d99143011831fd171f8d9)
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Manjunath Patil <manjunath.b.patil@oracle.com>

mstflint: update Makefile and Kconfig

1, fix a typo in Makefile
2, update dependecy setting
3, change build option to y (built-in) to
provide easy access in Exadata Secure Boot env

Orabug: 27707445

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Victor Erminpour <victor.erminpour@oracle.com>

target: add inquiry_product module param to override LIO default

Orabug: 27679431

OL6 iscsi target used IET which presented VIRTUAL-DISK for inquiry product.
OL7 uses the LIO iscsi target instead, which presented LIO iblock name.

Exadata targets upgrading from OL6 to OL7 need to present the same
product ID to existing iscsi initiator multipath mappings.

Add target_core_mod parameter inquiry_product for target inquiry vendor
string override. It defaults to LIO iblock name.

The user will also need to do one of the following in targetcli:
set global export_backstore_name_as_model=false
or for each backstore:
/backstores/<type>/<name> set attribute emulate_model_alias=0

(cherry picked from commit faf91b95fd22dbf0a1a7fd5b18ab71a929385927)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

target: add inquiry_vendor module param to override LIO-ORG

Orabug: 27679431

OL6 iscsi target used IET which presented IET for the inquiry vendor.
OL7 uses LIO iscsi target instead, which presents 'LIO-ORG'.

Exadata targets upgrading from OL6 to OL7 need to present the same
vendor ID to existing iscsi initiator multipath mappings.

Add target_core_mod parameter inquiry_vendor for target inquiry vendor
string override. It defaults to original "LIO-ORG " if not set.

(cherry picked from commit 87e9da042f42aa5bef1a76c4ef84989680e1df2e)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

IB/core: Avoid calling ib_query_device

Use the cached copy of the attributes present on the device, except for
the case of a query originating from user-space, where we have to invoke
the driver query_device entry, so they can fill in their udata.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Orabug: 27687711
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
(cherry-picked from upstream 86bee4c9c126b4f73e3f152cd43c806cac9135ad)
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Conflicts:
  drivers/infiniband/core/uverbs_cmd.c
    o Function "ib_uverbs_get_context":
      In UEK it's "ibdev", taken from "file->device->ib_dev"

      Upstream it's "ib_dev", introduced by:
        commit 057aec0d23f750b27f0bb92d2606871f60417e0a
        Author: Yishai Hadas <yishaih@mellanox.com>
        Date:   Thu Aug 13 18:32:04 2015 +0300

          IB/uverbs: Explicitly pass ib_dev to uverbs commands

      The earliest linux-4.x.y it showed up in was x==3, or rather "linux-4.3.y".

    o Function "ib_uverbs_query_device":
      Just like before, UEK does not have "ib_dev", so using
      "file->device->ib_dev" instead.
      This follows precedence of the deleted code, where "ib_query_device"
      was called with "file->device->ib_dev" in UEK.

  drivers/infiniband/core/verbs.c
    The handling of "IB_DEVICE_LOCAL_DMA_LKEY" inside "ib_alloc_pd" was introduced with:
      commit 96249d70dd70496084c7ec1465ec449cd032955a
      Author: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Date:   Wed Aug 5 14:14:45 2015 -0600

        IB/core: Guarantee that a local_dma_lkey is available

     The earliest linux-4.x.y it showed up in was x==3, or rather "linux-4.3.y".

     Since UEK is/was based on 4.1, the corresponding code does not exist here.

IB/core: Save the device attributes on the device structure

This way both the IB core and upper level drivers can access these cached
device attributes rather than querying or caching them on their own.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Orabug: 27687711
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
(cherry-picked from upstream 3e153a93a1c12e3354dd38cca414fb51a15136a2)
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

nvme: fix uninitialized prp2 value on small transfers

The value of iod->first_dma ends up as prp2 in NVMe commands. In case
there is not enough data to cross a page boundary, iod->first_dma is
never initialized and contains random data.

Comply with the NVMe specification and fill in 0 in that case.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
(cherry picked from commit 5228b3280b9bb8fa6aef59f891cca64a028e9b36)

Orabug: 27624149

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

bnxt_en: initialize bnxt_pf_wq

Orabug: 27674029

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>

x86/spectre_v2: Fix cpu offlining with IPBP.

We didn't check if tsk->mm is available when an CPU goes down - and
of course - that is exactly when there is no task.

As such we would crash.

OraBug: 27678629
Reveiwed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

retpoline: selectively disable IBRS in disable_ibrs_and_friends()

disable_ibrs_and_friends() is called:
(1) when the boot parameter "spectre_v2=off" is specified.
(2) the CPU is not affected by Spectre V2 and:
- spectre_v2=off
- or spectre_v2=auto
- or the spectre_v2 is not specified
(3) retpoline is selected as the Spectre V2 mitigation.

For (1) and (2) IBRS should be disabled (SPEC_CTRL_IBRS_ADMIN_DISABLED
is set). This prevents setting IBRS in use even if it is the only
Spectre V2 mitigation available.

For (3) IBRS should be set not-in-use but remain enabled in case
it is selected by disable_repoline() as the fall back Spectre V2
mitigation.

Orabug: 27665263

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

bnxt_en: Add cache line size setting to optimize performance.

Orabug: 27648355, 27648339

The chip supports 64-byte and 128-byte cache line size for more optimal
DMA performance when matched to the CPU cache line size. The default is 64.
If the system is using 128-byte cache line size, set it to 128.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c3480a603773cfc5d8aa44dbbee6c96e0f9d4d9d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: Forward VF MAC address to the PF.

Orabug: 27648355, 27648339

Forward hwrm_func_vf_cfg command from VF to PF driver, to store
VF MAC address in PF's context. This will allow "ip link show"
to display all VF MAC addresses.

Maintain 2 locations of MAC address in VF info structure, one for
a PF assigned MAC and one for VF assigned MAC.

Display VF assigned MAC in "ip link show", only if PF assigned MAC is
not valid.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 91cdda40714178497cbd182261b2ea6ec5cb9276)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Add BCM5745X NPAR device IDs

Orabug: 27648355, 27648339

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 92abef361bd233ea2a99db9e9a637626f523f82e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Expand bnxt_check_rings() to check all resources.

Orabug: 27648355, 27648339

bnxt_check_rings() is called by ethtool, XDP setup, and ndo_setup_tc()
to see if there are enough resources to support the new configuration.
Expand the call to test all resources if the firmware supports the new
API. With the more flexible resource allocation scheme, this call must
be made to check that all resources are available before committing to
allocate the resources.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8f23d638b36b4ff0fe5785cf01f9bdc41afb9c06)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: Implement new method for the PF to assign SRIOV resources.

Orabug: 27648355, 27648339

Instead of the old method of evenly dividing the resources to the VFs,
use the new firmware API to specify min and max resources for each VF.
This way, there is more flexibility for each VF to allocate more or less
resources.

The min is the absolute minimum for each VF to function.  The max is the
global resources minus the resources used by the PF.  Each VF is
guaranteed the min.  Up to max resources may be available for some VFs.

The PF driver can use one of 2 strategies specified in NVRAM to assign
the resources.  The old legacy strategy of evenly dividing the resources
or the new flexible strategy.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4673d66468b80dc37abd1159a4bd038128173d48)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Reserve resources for RFS.

Orabug: 27648355, 27648339

In bnxt_rfs_capable(), add call to reserve vnic resources to support
NTUPLE. Return true if we can successfully reserve enough vnics.
Otherwise, reserve the minimum 1 VNIC for normal operations not
supporting NTUPLE and return false.

Also, suppress warning message about not enough resources for NTUPLE when
only 1 RX ring is in use. NTUPLE filters by definition require multiple
RX rings.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a1eef5b9079742ecfad647892669bd5fe6b0e3f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Implement new method to reserve rings.

Orabug: 27648355, 27648339

The new method will call firmware to reserve the desired tx, rx, cmpl
rings, ring groups, stats context, and vnic resources.  A second query
call will check the actual resources that firmware is able to reserve.
The driver will then trim and adjust based on the actual resources
provided by firmware.  The driver will then reserve the final resources
in use.

This method is a more flexible way of using hardware resources.  The
resources are not fixed and can by adjusted by firmware.  The driver
adapts to the available resources that the firmware can reserve for
the driver.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 674f50a5b026151f4109992cb594d89f5334adde)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Set initial default RX and TX ring numbers the same in combined mode.

Orabug: 27648355, 27648339

In combined mode, the driver is currently not setting RX and TX ring
numbers the same when firmware can allocate more RX than TX or vice versa.
This will confuse the user as the ethtool convention assumes they are the
same in combined mode. Fix it by adding bnxt_trim_dflt_sh_rings() to trim
RX and TX ring numbers to be the same as the completion ring number in
combined mode.

Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
will not be the same as RX rings in combined mode.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 58ea801ac4c166cdcaa399ce7f9b3e9095ff2842)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Add the new firmware API to query hardware resources.

Orabug: 27648355, 27648339

The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
resources. Use the new API when it is supported by firmware.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit be0dd9c4100c9549fe50258e3d928072e6c31590)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h

bnxt_en: Refactor hardware resource data structures.

Orabug: 27648355, 27648339

In preparation for new firmware APIs to allocate hardware resources,
add a new struct bnxt_hw_resc to hold various min, max and reserved
resources. This new structure is common for PFs and VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a4f29470569c5a158c1871a2f752ca22e433420)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Restore MSIX after disabling SRIOV.

Orabug: 27648355, 27648339

After SRIOV has been enabled and disabled, the MSIX vectors assigned to
the VFs have to be re-initialized. Otherwise they cannot be re-used by
the PF. For example, increasing the number of PF rings after disabling
SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.

To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
NIC, clear and re-init MSIX, and re-open the NIC.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 80fcaf46c09262a71f32bb577c976814c922f864)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h

bnxt_en: Refactor bnxt_close_nic().

Orabug: 27648355, 27648339

Add a new __bnxt_close_nic() function to do all the work previously done
in bnxt_close_nic() except waiting for SRIOV configuration. The new
function will be used in the next patch as part of SRIOV cleanup.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 86e953db0114f396f916344395160aa267bf2627)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: Update firmware interface to 1.9.0.

Orabug: 27648355, 27648339

The version has new firmware APIs to allocate PF/VF resources more
flexibly.

New toolchains were used to generate this file, resulting in a one-time
large diffstat.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 894aa69a90932907f3de9d849ab9970884151d0e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.

Orabug: 27648355, 27648339

In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 78f300049335ae81a5cc6b4b232481dc5e1f9d41)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix sources of spurious netpoll warnings

Orabug: 27648355, 27648339

After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and
903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
  bnxt_poll+0x0/0xd0 exceeded budget in poll
  <snip>
  Call Trace:
   [<ffffffff814be5cd>] dump_stack+0x4d/0x70
   [<ffffffff8107e013>] __warn+0xd3/0xf0
   [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
   [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
   [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
   [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
   [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
   [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
   [<ffffffff810c9549>] console_unlock+0x339/0x5b0
   [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
   [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
   [<ffffffff81173df5>] printk+0x48/0x50
   [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
   [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
   [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
   [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
   [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
   [<ffffffff81095f8b>] process_one_work+0x19b/0x480
   [<ffffffff810967ca>] worker_thread+0x6a/0x520
   [<ffffffff8109c7c4>] kthread+0xe4/0x100
   [<ffffffff81884c52>] ret_from_fork+0x22/0x40

This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2edbdb3159d6f6bd3a9b6e7f789f2b879699a519)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Don't print "Link speed -1 no longer supported" messages.

Orabug: 27648355, 27648339

On some dual port NICs, the 2 ports have to be configured with compatible
link speeds.  Under some conditions, a port's configured speed may no
longer be supported.  The firmware will send a message to the driver
when this happens.

Improve this logic that prints out the warning by only printing it if
we can determine the link speed that is no longer supported.  If the
speed is unknown or it is in autoneg mode, skip the warning message.

Reported-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a8168b6cee6e9334dfebb4b9108e8d73794f6088)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()

Orabug: 27648355, 27648339

short_input variable is assigned to another data pointer which is
referred out of its scope. Fix it by moving short_input definition
to the beginning of bnxt_hwrm_do_send_msg() function.

No failure has been reported so far due to this issue.

Fixes: e605db801bde ("bnxt_en: Support for Short Firmware Message")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ebd5818cc5d4847897d7fe872e2d9799d7b7fcbb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown

Orabug: 27648355, 27648339

The current 'bnxt_shutdown' implementation only invokes
'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in
the path of power off (SYSTEM_POWER_OFF). While this may work in most
cases, it does not work in the smart NIC case, when Linux 'reboot'
command is initiated from the Linux that runs on the ARM cores of the
NIC card. In this particular case, Linux 'reboot' results in a system
'L3' level reset where the entire ARM and associated subsystems are
being reset, but at the same time, Nitro core is being kept in sane state
(to allow external PCIe connected servers to continue to work). Without
properly shutting down RoCE and freeing all associated resources, it
results in the ARM core to hang immediately after the 'reboot'

By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the
above issue

Fixes: 0efd2fc65c92 ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.")
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a7f3f939dd7d8398acebecd1ceb2e9e7ffbe91d2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'

Orabug: 27648355, 27648339

Error code returned by 'bnxt_read_sfp_module_eeprom_info()' is handled a
few lines above when reading the A0 portion of the EEPROM.
The same should be done when reading the A2 portion of the EEPROM.

In order to correctly propagate an error, update 'rc' in this 2nd call as
well, otherwise 0 (success) is returned.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dea521a2b9f96e905fa2bb2f95e23ec00c2ec436)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt: fix bnxt_hwrm_fw_set_time for y2038

Orabug: 27648355, 27648339

On 32-bit architectures, rtc_time_to_tm() returns incorrect results
in 2038 or later, and do_gettimeofday() is broken for the same reason.

This changes the code to use ktime_get_real_seconds() and time64_to_tm()
instead, both of them are 2038-safe, and we can also get rid of the
CONFIG_RTC_LIB dependency that way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7dfaa7bc99498da1c6c4a48bee8d2d5265161a8c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix IRQ coalescing regression.

Orabug: 27648355, 27648339

Recent IRQ coalescing clean up has removed a guard-rail for the max DMA
buffer coalescing value. This is a 6-bit value and must not be 0. We
already have a check for 0 but 64 is equivalent to 0 and will cause
non-stop interrupts. Fix it by adding the proper check.

Fixes: f8503969d27b ("bnxt_en: Refactor and simplify coalescing code.")
Reported-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b153cbc507946f52d5aa687fd64f45d82cb36a3b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: fix typo in bnxt_set_coalesce

Orabug: 27648355, 27648339

Recent refactoring of coalesce settings contained a typo that prevents
receive settings from being set properly.

Fixes: 18775aa8a91f ("bnxt_en: Reorganize the coalescing parameters.")
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit de4a10ef6eff0eb0ced97a39dc3edd0d3101b6ed)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Refactor and simplify coalescing code.

Orabug: 27648355, 27648339

The mapping of the ethtool coalescing parameters to hardware parameters
is now done in bnxt_hwrm_set_coal_params().  The same function can
handle both RX and TX settings.  The code is now more clear.  Some
adjustments have been made to get better hardware settings.  The
coal_frames setting is now accurately set in hardware.  The max_timer
is set to coal_ticks value.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f8503969d27b2b26ff0adbce4b7d7cf4ba5e43c2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Reorganize the coalescing parameters.

Orabug: 27648355, 27648339

The current IRQ coalescing logic is a little messy.  The ethtool
parameters are mapped to hardware parameters in a way that is difficult
to understand.  The first step is to better organize the parameters
by adding the new structure bnxt_coal.  The structure is used by both
the RX and TX sets of coalescing parameters.

Adjust the default coal_ticks to 14 us and 28 us for RX and TX.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18775aa8a91fcd4cd07c722d575b4b852e3624c3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h

bnxt_en: Add ethtool reset method

Orabug: 27648355, 27648339

This is a firmware internal reset after driver is unloaded.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 49f7972fd16407b3d1f03c2d447d2f1e1b95e9ba)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Optimize .ndo_set_mac_address() for VFs.

Orabug: 27648355, 27648339

No need to call bnxt_approve_mac() which will send a message to the
PF if the MAC address hasn't changed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c1a7bdff17247332ecff7f243e42d269b3f74c65)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Get firmware package version one time.

Orabug: 27648355, 27648339

The current code retrieves the firmware package version from firmware
everytime ethtool -i is run. There is no reason to do that as the
firmware will not change while the driver is loaded. Get the version
once at init time.

Also, display the full 4-part firmware version string and remove the
less useful interface spec version.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 431aa1eb20d8ae2674723292adb832b968da868e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Check for zero length value in bnxt_get_nvram_item().

Orabug: 27648355, 27648339

Return -EINVAL if the length is zero and not proceed to do essentially
nothing.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e0ad8fc5980b362028cfd63ec037f4b491e726c6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: adding PCI ID for SMARTNIC VF support

Orabug: 27648355, 27648339

Signed-off-by: Rob Miller <rmiller@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 618784e3ee1870e43e50e1c7922cc123cc050566)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Add PCIe device ID for bcm58804

Orabug: 27648355, 27648339

Add new PCIe device ID and chip number for bcm58804

Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8ed693b7bbd179949f6947adaae5eff2e386a534)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Update firmware interface to 1.8.3.1

Orabug: 27648355, 27648339

Vxlan encap/decap filters are added to this firmware spec.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57922b0a2f7ef9effbcdbbf7d1f8dad95aa567f7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix possible corruption in DCB parameters from firmware.

Orabug: 27648355, 27648339

hwrm_send_message() is replaced with _hwrm_send_message(), and
hwrm_cmd_lock mutex lock is grabbed for the whole period of
firmware call until the firmware DCB parameters have been copied.
This will prevent possible corruption of the firmware data.

Fixes: 7df4ae9fe855 ("bnxt_en: Implement DCBNL to support host-based DCBX.")
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b1e1a9ce06fd94b563d6c3dd896589231995d89)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix VF resource checking.

Orabug: 27648355, 27648339

In bnxt_sriov_enable(), we calculate to see if we have enough hardware
resources to enable the requested number of VFs. The logic to check
for minimum completion rings and statistics contexts is missing. Add
the required checks so that VF configuration won't fail.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 021570793d8cd86cb62ac038c535f4450586b454)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix VF PCIe link speed and width logic.

Orabug: 27648355, 27648339

PCIE PCIE_EP_REG_LINK_STATUS_CONTROL register is only defined in PF
config space, so we must read it from the PF.

Fixes: 90c4f788f6c0 ("bnxt_en: Report PCIe link speed and width during driver load")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7ab0760f5178169c4c218852f51646ea90817d7c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Don't use rtnl lock to protect link change logic in workqueue.

Orabug: 27648355, 27648339

As a further improvement to the PF/VF link change logic, use a private
mutex instead of the rtnl lock to protect link change logic.  With the
new mutex, we don't have to take the rtnl lock in the workqueue when
we have to handle link related functions.  If the VF and PF drivers
are running on the same host and both take the rtnl lock and one is
waiting for the other, it will cause timeout.  This patch fixes these
timeouts.

Fixes: 90c694bb7181 ("bnxt_en: Fix RTNL lock usage on bnxt_update_link().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e2dc9b6e38fa3919e63d6d7905da70ca41cbf908)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Improve VF/PF link change logic.

Orabug: 27648355, 27648339

Link status query firmware messages originating from the VFs are forwarded
to the PF.  The driver handles these interactions in a workqueue for the
VF and PF.  The VF driver waits for the response from the PF in the
workqueue.  If the PF and VF driver are running on the same host and the
work for both PF and VF are queued on the same workqueue, the VF driver
may not get the response if the PF work item is queued behind it on the
same workqueue.  This will lead to the VF link query message timing out.

To prevent this, we create a private workqueue for PFs instead of using
the common workqueue.  The VF query and PF response will never be on
the same workqueue.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c213eae8d3cd4c026f348ce4fd64f4754b3acf2b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: Remove redundant unlikely()

Orabug: 27648355, 27648339

IS_ERR() already implies unlikely(), so it can be omitted.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1fac4b2fdbccab69cb781aae68f540be94d5549e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

drivers: net: bnxt: use setup_timer() helper.

Orabug: 27648355, 27648339

Use setup_timer function instead of initializing timer with the
function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c43824477c2ac722325ba460c2ce683c48fb76b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Reduce default rings on multi-port cards.

Orabug: 27648355, 27648339

Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d5430d31ca72ec37fd539fd1c5230859509be4ef)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Improve -ENOMEM logic in NAPI poll loop.

Orabug: 27648355, 27648339

If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget. This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers. Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 903649e718f80da2ba4b65a0adf6930219b4b2e5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt: initialize board_info values with proper enums

Orabug: 27648355, 27648339

initialize board_info values with proper enums for defensive programming
purposes. This will avoid any errors of the enums being declared not
lining up with the board_info array.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 27573a7d905a49dc756fda9c0e148372136356e6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt: Add PCIe device IDs for bcm58802/bcm58808

Orabug: 27648355, 27648339

Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later

Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a58139b8493624c6c6223b58a9e70ebbdf56338)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: assign CPU affinity hints to bnxt_en IRQs

Orabug: 27648355, 27648339

This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 56f0fd80d1886479a42ac07ed239538eb145a669)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Improve tx ring reservation logic.

Orabug: 27648355, 27648339

When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC.  If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.

The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings.  We fix it as follows:

1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available.  There is a flag in
the firmware message to do that.  If not available, abort and the
current TX rings will not be disabled.

2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved.  If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 98fdbe73bfb809b1f8eec9f27a36e737caed3a44)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c

bnxt_en: Update firmware interface spec. to 1.8.1.4.

Orabug: 27648355, 27648339

Flow APIs are added in this firmware interface.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a17eb27bf7ece364627fcf16ad50c24b793300b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Do not setup MAC address in bnxt_hwrm_func_qcaps().

Orabug: 27648355, 27648339

bnxt_hwrm_func_qcaps() is called during probe to get all device
resources and it also sets up the factory MAC address. The same function
is called when SRIOV is disabled to reclaim all resources. If
the MAC address has been overridden by a user administered MAC
address, calling this function will overwrite it.

Separate the logic that sets up the default MAC address into a new
function bnxt_init_mac_addr() that is only called during probe time.

Fixes: 4a21b49b34c0 ("bnxt_en: Improve VF resource accounting.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a22a6ac2ff8080c87e446e20592725c064229c71)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Free MSIX vectors when unregistering the device from bnxt_re.

Orabug: 27648355, 27648339

Take back ownership of the MSIX vectors when unregistering the device
from bnxt_re.

Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 146ed3c5b87d8c65ec31bc56df26f027fe624b8f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Fix .ndo_setup_tc() to include XDP rings.

Orabug: 27648355, 27648339

When the number of TX rings is changed in bnxt_setup_tc(), we need to
include the XDP rings in the total TX ring count.

Fixes: 38413406277f ("bnxt_en: Add support for XDP_TX action.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 87e9b3778c94694c9e098c91a0cc05725f0e017f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt: fix unused variable warnings

Orabug: 27648355, 27648339

Fix a couple of warnings where variable ‘txq’ set but not used

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>v, i);
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 351bac30613378c4684d4673aac0c7917980a652)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt: fix unsigned comparsion with 0

Orabug: 27648355, 27648339

Fixes warning because location is u32 and can never be netative
warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b721cfaf03bcaac0a3abf702c4240326eed9e4b1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

bnxt_en: Use SWITCHDEV_SET_OPS().

Orabug: 27648355, 27648339

Suggested by Jakub Kicinski.

Fixes: c124a62ff2dd ("bnxt_en: add support for port_attr_get and and get_phys_port_name")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc88055ab72c0eaa080926c888628b77d2055513)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c