]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
7 years agosched/fair: Initialize and rework throttle_count for new task-groups
Peter Zijlstra [Wed, 22 Jun 2016 13:14:26 +0000 (15:14 +0200)]
sched/fair: Initialize and rework throttle_count for new task-groups

This patch is a combination of the following three patches from mainline:

094f469172e0 sched/fair: Initialize throttle_count for new task-groups lazily

Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

8663e24d56dc sched/fair: Reorder cgroup creation code

A future patch needs rq->lock held _after_ we link the task_group into
the hierarchy. In order to avoid taking every rq->lock twice, reorder
things a little and create online_fair_sched_group() to be called
after we link the task_group.

All this code is still ran from css_alloc() so css_online() isn't in
fact used for this.

55e16d30bd99 sched/fair: Rework throttle_count sync

Since we already take rq->lock when creating a cgroup, use it to also
sync the throttle_count and avoid the extra state and enqueue path
branch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: linux-kernel@vger.kernel.org
[ Fixed build warning. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The patches have been combined because applying them separately will
cause a KABI breakage and introduce a dummy function.

Orabug: 27787518

Conflicts:
kernel/sched/fair.c
kernel/sched/core.c
kernel/sched/sched.h

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Mridula Shastry <mridula.c.shastry@oracle.com>
7 years agoperf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include...
Arnaldo Carvalho de Melo [Thu, 7 Jul 2016 21:28:43 +0000 (18:28 -0300)]
perf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include/asm/

And remove the empty tools/arch/x86/include/asm/unistd_{32,64}.h files
introduced by eae7a755ee81 ("perf tools, x86: Build perf on older
user-space as well").

This way we get closer to mirroring the kernel for cases where __NR_
can't be found for some include path/_GNU_SOURCE/whatever scenario.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-kpj6m3mbjw82kg6krk2z529e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit cec07f53c398)

Orabug: 27240053
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agocrypto: FIPS - allow tests to be disabled in FIPS mode
Stephan Mueller [Thu, 25 Aug 2016 13:15:01 +0000 (15:15 +0200)]
crypto: FIPS - allow tests to be disabled in FIPS mode

In FIPS mode, additional restrictions may apply. If these restrictions
are violated, the kernel will panic(). This patch allows test vectors
for symmetric ciphers to be marked as to be skipped in FIPS mode.

Together with the patch, the XTS test vectors where the AES key is
identical to the tweak key is disabled in FIPS mode. This test vector
violates the FIPS requirement that both keys must be different.

Reported-by: Tapas Sarangi <TSarangi@trustwave.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 10faa8c0d6c3b22466f97713a9533824a2ea1c57)

Orabug: 27809271

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Conflicts:
crypto/testmgr.h

7 years agocrypto: xts - consolidate sanity check for keys
Stephan Mueller [Tue, 9 Feb 2016 14:37:47 +0000 (15:37 +0100)]
crypto: xts - consolidate sanity check for keys

The patch centralizes the XTS key check logic into the service function
xts_check_key which is invoked from the different XTS implementations.
With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
a sanity check for the XTS keys similar to the other arches.

In addition, this service function received a check to ensure that the
key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
check is not present in the standards defining XTS, it is only enforced
in FIPS mode of the kernel.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 28856a9e52c7cac712af6c143de04766617535dc)

Orabug: 27809271

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
7 years agocrypto: rng - Zero seed in crypto_rng_reset
Herbert Xu [Tue, 21 Apr 2015 02:46:49 +0000 (10:46 +0800)]
crypto: rng - Zero seed in crypto_rng_reset

If we allocate a seed on behalf ot the user in crypto_rng_reset,
we must ensure that it is zeroed afterwards or the RNG may be
compromised.

Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b617b702da4e922277806f81c411d3051107d462)

Orabug: 27809271

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
7 years agoenic: set IG desc cache flag in open
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:24 +0000 (11:07 -0800)]
enic: set IG desc cache flag in open

New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.

Also increment driver version

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5de0c022f1b0bce073cb04dd69ed7982805e5763)
Orabug: 27587345
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoDrivers: hv: utils: fix crash when device is removed from host side
Vitaly Kuznetsov [Tue, 15 Dec 2015 03:01:56 +0000 (19:01 -0800)]
Drivers: hv: utils: fix crash when device is removed from host side

The crash is observed when a service is being disabled host side while
userspace daemon is connected to the device:

[   90.244859] general protection fault: 0000 [#1] SMP
...
[   90.800082] Call Trace:
[   90.800082]  [<ffffffff81187008>] __fput+0xc8/0x1f0
[   90.800082]  [<ffffffff8118716e>] ____fput+0xe/0x10
...
[   90.800082]  [<ffffffff81015278>] do_signal+0x28/0x580
[   90.800082]  [<ffffffff81086656>] ? finish_task_switch+0xa6/0x180
[   90.800082]  [<ffffffff81443ebf>] ? __schedule+0x28f/0x870
[   90.800082]  [<ffffffffa01ebbaa>] ? hvt_op_read+0x12a/0x140 [hv_utils]
...

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit 9420098adc50a88d4a441e0f92d54bfa7af44448)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
7 years agoDrivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode
Vitaly Kuznetsov [Tue, 15 Dec 2015 03:01:55 +0000 (19:01 -0800)]
Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode

When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.

As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit a15025660d4703a8b37290a14734cb4a84875770)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
7 years agoDrivers: hv: utils: rename outmsg_lock
Vitaly Kuznetsov [Tue, 15 Dec 2015 03:01:54 +0000 (19:01 -0800)]
Drivers: hv: utils: rename outmsg_lock

As a preparation to reusing outmsg_lock to protect test-and-set openrations
on 'mode' rename it the more general 'lock'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit a72f3a4ccff22de879a1f599210ecdd9bd483a43)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
7 years agoDrivers: hv: utils: fix memory leak on on_msg() failure
Vitaly Kuznetsov [Tue, 15 Dec 2015 03:01:53 +0000 (19:01 -0800)]
Drivers: hv: utils: fix memory leak on on_msg() failure

inmsg should be freed in case of on_msg() failure to avoid memory leak.
Preserve the error code from on_msg().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit 1f75338b6fece2bbd42ac3623830c65e2df6e031)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
7 years agoDrivers: hv: utils: use memdup_user in hvt_op_write
Olaf Hering [Tue, 15 Dec 2015 00:01:37 +0000 (16:01 -0800)]
Drivers: hv: utils: use memdup_user in hvt_op_write

Use memdup_user to handle OOM.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit b00359642c2427da89dc8f77daa2c9e8a84e6d76)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
7 years agohv: util: checking the wrong variable
Dan Carpenter [Sat, 1 Aug 2015 23:08:17 +0000 (16:08 -0700)]
hv: util: checking the wrong variable

We don't catch this allocation failure because there is a typo and we
check the wrong variable.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit 9dd6a06430c94299651d74b9ed5ca8396ab8ff1f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
7 years agonet/rds: Avoid copy overhead if send buff is full
Gerd Rausch [Tue, 13 Mar 2018 00:04:52 +0000 (17:04 -0700)]
net/rds: Avoid copy overhead if send buff is full

Avoid copying the message from user-space if we already
know there's not enough space in the send buffer.

Change-Id: I5ddde7d41bbaeaf25f398c721d02babd3893b73f
Orabug: 27747165
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
7 years agoext4: fix ->put_link panic
Junxiao Bi [Thu, 1 Mar 2018 23:33:12 +0000 (15:33 -0800)]
ext4: fix ->put_link panic

Orabug: 27498770

The following panic was caught. Something wrong with the storage and io error
was returned, generic_readlink()->ext4_follow_link()->page_follow_link_light()
returned with NULL page and error link, then ext4_put_link() tried to free the
error link and panic.

[25144440.198756] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
[25144440.332969] Aborting journal on device dm-7-8.
[25144440.338462] Buffer I/O error on dev dm-7, logical block 3702784, lost sync page write
[25144440.342625] Buffer I/O error on dev dm-7, logical block 0, lost sync page write
[25144440.342627] EXT4-fs error (device dm-7): ext4_journal_check_start:56: Detected aborted journal
[25144440.342629] EXT4-fs (dm-7): Remounting filesystem read-only
[25144440.342630] EXT4-fs (dm-7): previous I/O error to superblock detected
[25144440.342634] Buffer I/O error on dev dm-7, logical block 0, lost sync page write
[25144440.390336] JBD2: Error -5 detected when updating journal superblock for dm-7-8.
[25144464.799979] Buffer I/O error on dev dm-7, logical block 1573499, lost async page write
[25144464.809517] Buffer I/O error on dev dm-7, logical block 1573501, lost async page write
[25144464.819048] Buffer I/O error on dev dm-7, logical block 1573659, lost async page write
[25144464.828669] Buffer I/O error on dev dm-7, logical block 1573660, lost async page write
[25144464.838207] Buffer I/O error on dev dm-7, logical block 1573662, lost async page write
[25144464.847798] Buffer I/O error on dev dm-7, logical block 1573675, lost async page write
[25144464.857326] Buffer I/O error on dev dm-7, logical block 1573677, lost async page write
[25144464.866848] Buffer I/O error on dev dm-7, logical block 1573696, lost async page write
[25144464.876383] Buffer I/O error on dev dm-7, logical block 1573698, lost async page write
[25144464.885903] Buffer I/O error on dev dm-7, logical block 1573703, lost async page write
[25144496.335355] ------------[ cut here ]------------
[25144496.341039] kernel BUG at mm/slub.c:3413!
[25144496.345997] invalid opcode: 0000 [#1] SMP
[25144496.351074] Modules linked in: dm_snapshot dm_bufio nfnetlink_queue nfnetlink_log nfnetlink iptable_filter ip_tables oracleacfs(PO) oracleadvm(PO) oracleoks(PO) mpt3sas scsi_transport_sas raid_class nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ipmi_poweroff ipmi_devintf bonding rds_rdma rds ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad mlx4_ib mlx4_core dm_multipath bnx2i cnic uio cxgb4i libcxgbi cxgb4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 fuse iTCO_wdt iTCO_vendor_support sb_edac edac_core i2c_i801 lpc_ich mfd_core sg ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio ipmi_ssif i2c_core ipmi_si ipmi_msghandler wmi acpi_pad ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas dm_mirror
[25144496.433281]  dm_region_hash dm_log dm_mod
[25144496.436514] CPU: 8 PID: 201734 Comm: tar Tainted: P           O    4.1.12-61.28.1.el6uek.x86_64 #2
[25144496.447204] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38070000 12/16/2016
[25144496.458974] task: ffff888ce26d0e00 ti: ffff888da1c50000 task.ti: ffff888da1c50000
[25144496.467999] RIP: 0010:[<ffffffff811e5de9>]  [<ffffffff811e5de9>] kfree+0x159/0x170
[25144496.477238] RSP: 0018:ffff888da1c53da8  EFLAGS: 00010246
[25144496.483734] RAX: 001fffff80000400 RBX: fffffffffffffffb RCX: ffffffffa00ee860
[25144496.492480] RDX: 0000000000000000 RSI: fffffffffffffffb RDI: ffffea0001ffffc0
[25144496.501116] RBP: ffff888da1c53dc8 R08: 000000000001abc0 R09: ffff885efec07480
[25144496.509753] R10: ffffffff8118b1b7 R11: 0000000000000000 R12: 0000000000000000
[25144496.518425] R13: ffffffffa00ee890 R14: 00000000fffffffb R15: 000000000000005f
[25144496.527061] FS:  00007f949c3957a0(0000) GS:ffff885eff400000(0000) knlGS:0000000000000000
[25144496.536769] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25144496.543674] CR2: 00007f47cdbd8d20 CR3: 0000006093b7b000 CR4: 00000000003406e0
[25144496.552333] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[25144496.560961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[25144496.569630] Stack:
[25144496.572452]  ffff888da1c53de8 ffff8801e33e7bc0 0000000000000000 ffff888da1c53de8
[25144496.581455]  ffff888da1c53dd8 ffffffffa00ee890 ffff888da1c53eb8 ffffffff8120f7ec
[25144496.590422]  ffff888da1c53eb8 ffffffff812144c3 ffff88bec2658320 ffff8801e33e7bc0
[25144496.599391] Call Trace:
[25144496.602619]  [<ffffffffa00ee890>] ext4_put_link+0x30/0x40 [ext4]
[25144496.609819]  [<ffffffff8120f7ec>] generic_readlink+0x8c/0xb0
[25144496.616627]  [<ffffffff812144c3>] ? user_path_at_empty+0x63/0xa0
[25144496.623816]  [<ffffffff8120a496>] SyS_readlinkat+0x116/0x130
[25144496.630629]  [<ffffffff8120a0eb>] SyS_readlink+0x1b/0x20
[25144496.637065]  [<ffffffff8169d76e>] system_call_fastpath+0x12/0x71
[25144496.644278] Code: ff ff eb 8e 66 f7 07 00 c0 74 20 48 8b 07 31 f6 f6 c4 40 74 03 8b 77 68 e8 f5 d7 fa ff e9 70 ff ff ff 48 8b 7f 30 e9 19 ff ff ff <0f> 0b 0f 1f 44 00 00 eb f9 66 66 66 66 66 2e 0f 1f 84 00 00 00
[25144496.667149] RIP  [<ffffffff811e5de9>] kfree+0x159/0x170
[25144496.673496]  RSP <ffff888da1c53da8>

Mainline/uek5 not have this issue, as the ->following_link and ->put_link have
been refactored there. The patche set to do that is a little big, so I don't
bother to backport them, just write this small patch to fix the issue.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
7 years agoKVM/VMX: Clear spec_ctrl status when resetting vcpu
Patrick Colp [Wed, 28 Mar 2018 01:30:49 +0000 (18:30 -0700)]
KVM/VMX: Clear spec_ctrl status when resetting vcpu

vmx->spec_ctrl was not set to 0 in vmx_vcpu_reset, which could result in
IBRS getting stuck on all the time, even with 'spectre_v2=off' set. This
was most notable when rebooting from an older kernel into a newer
retpoline-enabled kernel resulted in up to 80% CPU performance drop.

OraBug: 27774415

Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Patrick Colp <patrick.colp@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agomlx4: change the ICM table allocations to lowest needed size
Daniel Jurgens [Tue, 6 Mar 2018 21:49:08 +0000 (13:49 -0800)]
mlx4: change the ICM table allocations to lowest needed size

The driver currently allocates 256KB contig memort for ICM
tables which puts pressure on memory management to allocate such
large contig page in fragmented memory system. Such allocation
itself contributes to memory fragmentation and at times user
process stalls for 10's of seconds leading to slow path
dumps with mm lock contention.

This change makes the driver allocate lowest page size
needed for ICM allocation(8K), which fixes these stalls.

With 4K chunk sizes the QP table size is 4MB, which cannot be allocated
by kmalloc. A larger design change would be neccesary to break up the
table. With 8k chunks the same table is 2MB, which can be allocated by
kmalloc. This large table allocation only happens once at driver load
time.

Orabug: 27718303

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
7 years agoRevert "Drivers: hv: utils: fix a race on userspace daemons registration"
Jack Vogel [Mon, 26 Mar 2018 03:55:27 +0000 (20:55 -0700)]
Revert "Drivers: hv: utils: fix a race on userspace daemons registration"

This reverts commit 3c908e953ef2771f683dae4ef4c5c1f4dbdfe27d. This change has
caused an fcopy failure and turned out to be unnecessary for the original bug.

Orabug: 27673755
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>
7 years agocrypto: af_alg - Avoid sock_graft call warning
Herbert Xu [Mon, 10 Jul 2017 14:00:48 +0000 (22:00 +0800)]
crypto: af_alg - Avoid sock_graft call warning

The newly added sock_graft warning triggers in af_alg_accept.
It's harmless as we're essentially doing sock->sk = sock->sk.

The sock_graft call is actually redundant because all the work
it does is subsumed by sock_init_data.  However, it was added
to placate SELinux as it uses it to initialise its internal state.

This patch avoisd the warning by making the SELinux call directly.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2acce6aa9f6569d4e135b2c4cfb56acce95efaeb)

Orabug: 26895616,27426147

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoiscsi-target: Fix initial login PDU asynchronous socket close OOPs
Nicholas Bellinger [Thu, 25 May 2017 04:47:09 +0000 (21:47 -0700)]
iscsi-target: Fix initial login PDU asynchronous socket close OOPs

This patch fixes a OOPs originally introduced by:

   commit bb048357dad6d604520c91586334c9c230366a14
   Author: Nicholas Bellinger <nab@linux-iscsi.org>
   Date:   Thu Sep 5 14:54:04 2013 -0700

   iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed.  For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 25cdda95fda78d22d44157da15aa7ea34be3c804)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agotarget/iscsi: Fix indentation in iscsi_target_start_negotiation()
Bart Van Assche [Fri, 23 Dec 2016 11:45:27 +0000 (12:45 +0100)]
target/iscsi: Fix indentation in iscsi_target_start_negotiation()

This patch avoids that smatch complains about inconsistent
indentation in iscsi_target_start_negotiation().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 1efaa949396b5d9e8d1e6edef7e97e9ce1a97319)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agoiscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
Nicholas Bellinger [Sun, 28 Feb 2016 02:15:46 +0000 (18:15 -0800)]
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race

There is a iscsi-target/tcp login race in LOGIN_FLAGS_READY
state assignment that can result in frequent errors during
iscsi discovery:

      "iSCSI Login negotiation failed."

To address this bug, move the initial LOGIN_FLAGS_READY
assignment ahead of iscsi_target_do_login() when handling
the initial iscsi_target_start_negotiation() request PDU
during connection login.

As iscsi_target_do_login_rx() work_struct callback is
clearing LOGIN_FLAGS_READ_ACTIVE after subsequent calls
to iscsi_target_do_login(), the early sk_data_ready
ahead of the first iscsi_target_do_login() expects
LOGIN_FLAGS_READY to also be set for the initial
login request PDU.

As reported by Maged, this was first obsered using an
MSFT initiator running across multiple VMWare host
virtual machines with iscsi-target/tcp.

Reported-by: Maged Mokhtar <mmokhtar@binarykinetics.com>
Tested-by: Maged Mokhtar <mmokhtar@binarykinetics.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 8f0dfb3d8b1120c61f6e2cc3729290db10772b2d)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agoiscsi-target: Fix rx_login_comp hang after login failure
Nicholas Bellinger [Thu, 5 Nov 2015 22:11:59 +0000 (14:11 -0800)]
iscsi-target: Fix rx_login_comp hang after login failure

This patch addresses a case where iscsi_target_do_tx_login_io()
fails sending the last login response PDU, after the RX/TX
threads have already been started.

The case centers around iscsi_target_rx_thread() not invoking
allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
from the failure path, resulting in RX thread hanging
indefinately on iscsi_conn->rx_login_comp.

Note this bug is a regression introduced by:

  commit e54198657b65625085834847ab6271087323ffea
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Wed Jul 22 23:14:19 2015 -0700

      iscsi-target: Fix iscsit_start_kthreads failure OOPs

To address this bug, complete ->rx_login_complete for good
measure in the failure path, and immediately return from
RX thread context if connection state did not actually reach
full feature phase (TARG_CONN_STATE_LOGGED_IN).

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit ca82c2bded29b38d36140bfa1e76a7bbfcade390)

Orabug: 27701211

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agoKVM: x86: fix singlestepping over syscall
Paolo Bonzini [Wed, 7 Jun 2017 13:13:14 +0000 (15:13 +0200)]
KVM: x86: fix singlestepping over syscall

TF is handled a bit differently for syscall and sysret, compared
to the other instructions: TF is checked after the instruction completes,
so that the OS can disable #DB at a syscall by adding TF to FMASK.
When the sysret is executed the #DB is taken "as if" the syscall insn
just completed.

KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
Fix the behavior, otherwise you could get #DB on a user stack which is not
nice.  This does not affect Linux guests, as they use an IST or task gate
for #DB.

This fixes CVE-2017-7518.

Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit c8401dda2f0a00cd25c0af6a95ed50e478d25de4)

Orabug: 27669904
CVE: CVE-2017-7518

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
 Conflicts:
arch/x86/kvm/x86.c
skipped changes for kvm_skip_emulated_instruction()

7 years agonfs: system crashes after NFS4ERR_MOVED recovery
Bill.Baker@oracle.com [Wed, 21 Feb 2018 18:46:43 +0000 (12:46 -0600)]
nfs: system crashes after NFS4ERR_MOVED recovery

nfs4_update_server unconditionally releases the nfs_client for the
source server. If migration fails, this can cause the source server's
nfs_client struct to be left with a low reference count, resulting in
use-after-free.  Also, adjust reference count handling for ELOOP.

NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
nfs_put_client+0xfa/0x110 [nfs]
nfs4_run_state_manager+0x30/0x40 [nfsv4]
kthread+0xd8/0xf0

BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
nfs4_xdr_enc_write+0x6b/0x160 [nfsv4]
rpcauth_wrap_req+0xac/0xf0 [sunrpc]
call_transmit+0x18c/0x2c0 [sunrpc]
__rpc_execute+0xa6/0x490 [sunrpc]
rpc_async_schedule+0x15/0x20 [sunrpc]
process_one_work+0x160/0x470
worker_thread+0x112/0x540
? rescuer_thread+0x3f0/0x3f0
kthread+0xd8/0xf0

This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"),
but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops")

Reported-by: Helen Chao <helen.chao@oracle.com>
Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops")
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 27679350
(cherry picked from commit ad86f605c59500da82d196ac312cfbac3daba31d)
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Manjunath Patil <manjunath.b.patil@oracle.com>
7 years agoNFS: Clean up nfs4_set_client()
Anna Schumaker [Fri, 7 Apr 2017 18:15:16 +0000 (14:15 -0400)]
NFS: Clean up nfs4_set_client()

If we cut out the dprintk()s, then we can return error codes directly
and cut out the goto.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 27679350
(cherry picked from commit 2dc42c0d60e0104f7cd8beee3871f953565392ff)
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Manjunath Patil <manjunath.b.patil@oracle.com>
7 years agoNFS4: Avoid migration loops
Benjamin Coddington [Tue, 30 Aug 2016 13:20:32 +0000 (09:20 -0400)]
NFS4: Avoid migration loops

If a server returns itself as a location while migrating, the client may
end up getting stuck attempting to migrate twice to the same server.  Catch
this by checking if the nfs_client found is the same as the existing
client.  For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL).

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 27679350
(cherry picked from commit 52442f9b11b7e5d4a38d99143011831fd171f8d9)
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Manjunath Patil <manjunath.b.patil@oracle.com>
7 years agomstflint: update Makefile and Kconfig
Qing Huang [Thu, 8 Mar 2018 23:29:02 +0000 (15:29 -0800)]
mstflint: update Makefile and Kconfig

1, fix a typo in Makefile
2, update dependecy setting
3, change build option to y (built-in) to
   provide easy access in Exadata Secure Boot env

Orabug: 27707445

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Victor Erminpour <victor.erminpour@oracle.com>
7 years agotarget: add inquiry_product module param to override LIO default
Kyle Fortin [Tue, 13 Mar 2018 17:57:50 +0000 (13:57 -0400)]
target: add inquiry_product module param to override LIO default

Orabug: 27679431

OL6 iscsi target used IET which presented VIRTUAL-DISK for inquiry product.
OL7 uses the LIO iscsi target instead, which presented LIO iblock name.

Exadata targets upgrading from OL6 to OL7 need to present the same
product ID to existing iscsi initiator multipath mappings.

Add target_core_mod parameter inquiry_product for target inquiry vendor
string override.  It defaults to LIO iblock name.

The user will also need to do one of the following in targetcli:
set global export_backstore_name_as_model=false
or for each backstore:
/backstores/<type>/<name> set attribute emulate_model_alias=0

(cherry picked from commit faf91b95fd22dbf0a1a7fd5b18ab71a929385927)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agotarget: add inquiry_vendor module param to override LIO-ORG
Kyle Fortin [Tue, 13 Mar 2018 17:44:33 +0000 (13:44 -0400)]
target: add inquiry_vendor module param to override LIO-ORG

Orabug: 27679431

OL6 iscsi target used IET which presented IET for the inquiry vendor.
OL7 uses LIO iscsi target instead, which presents 'LIO-ORG'.

Exadata targets upgrading from OL6 to OL7 need to present the same
vendor ID to existing iscsi initiator multipath mappings.

Add target_core_mod parameter inquiry_vendor for target inquiry vendor
string override.  It defaults to original "LIO-ORG " if not set.

(cherry picked from commit 87e9da042f42aa5bef1a76c4ef84989680e1df2e)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agoIB/core: Avoid calling ib_query_device
Or Gerlitz [Fri, 18 Dec 2015 08:59:45 +0000 (10:59 +0200)]
IB/core: Avoid calling ib_query_device

Use the cached copy of the attributes present on the device, except for
the case of a query originating from user-space, where we have to invoke
the driver query_device entry, so they can fill in their udata.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Orabug: 27687711
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
(cherry-picked from upstream 86bee4c9c126b4f73e3f152cd43c806cac9135ad)
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Conflicts:
  drivers/infiniband/core/uverbs_cmd.c
    o Function "ib_uverbs_get_context":
      In UEK it's "ibdev", taken from "file->device->ib_dev"

      Upstream it's "ib_dev", introduced by:
        commit 057aec0d23f750b27f0bb92d2606871f60417e0a
        Author: Yishai Hadas <yishaih@mellanox.com>
        Date:   Thu Aug 13 18:32:04 2015 +0300

          IB/uverbs: Explicitly pass ib_dev to uverbs commands

      The earliest linux-4.x.y it showed up in was x==3, or rather "linux-4.3.y".

    o Function "ib_uverbs_query_device":
      Just like before, UEK does not have "ib_dev", so using
      "file->device->ib_dev" instead.
      This follows precedence of the deleted code, where "ib_query_device"
      was called with "file->device->ib_dev" in UEK.

  drivers/infiniband/core/verbs.c
    The handling of "IB_DEVICE_LOCAL_DMA_LKEY" inside "ib_alloc_pd" was introduced with:
      commit 96249d70dd70496084c7ec1465ec449cd032955a
      Author: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Date:   Wed Aug 5 14:14:45 2015 -0600

        IB/core: Guarantee that a local_dma_lkey is available

     The earliest linux-4.x.y it showed up in was x==3, or rather "linux-4.3.y".

     Since UEK is/was based on 4.1, the corresponding code does not exist here.

7 years agoIB/core: Save the device attributes on the device structure
Ira Weiny [Fri, 18 Dec 2015 08:59:44 +0000 (10:59 +0200)]
IB/core: Save the device attributes on the device structure

This way both the IB core and upper level drivers can access these cached
device attributes rather than querying or caching them on their own.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Orabug: 27687711
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
(cherry-picked from upstream 3e153a93a1c12e3354dd38cca414fb51a15136a2)
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
7 years agonvme: fix uninitialized prp2 value on small transfers
Jan H. Schönherr [Sun, 27 Aug 2017 13:56:37 +0000 (15:56 +0200)]
nvme: fix uninitialized prp2 value on small transfers

The value of iod->first_dma ends up as prp2 in NVMe commands. In case
there is not enough data to cross a page boundary, iod->first_dma is
never initialized and contains random data.

Comply with the NVMe specification and fill in 0 in that case.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
(cherry picked from commit 5228b3280b9bb8fa6aef59f891cca64a028e9b36)

Orabug: 27624149

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
7 years agobnxt_en: initialize bnxt_pf_wq
Brian Maly [Wed, 14 Mar 2018 19:54:16 +0000 (15:54 -0400)]
bnxt_en: initialize bnxt_pf_wq

Orabug: 27674029

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
7 years agox86/spectre_v2: Fix cpu offlining with IPBP.
Konrad Rzeszutek Wilk [Mon, 12 Mar 2018 15:29:19 +0000 (11:29 -0400)]
x86/spectre_v2: Fix cpu offlining with IPBP.

We didn't check if tsk->mm is available when an CPU goes down - and
of course - that is exactly when there is no task.

As such we would crash.

OraBug: 27678629
Reveiwed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoretpoline: selectively disable IBRS in disable_ibrs_and_friends()
Chuck Anderson [Wed, 7 Mar 2018 05:29:14 +0000 (21:29 -0800)]
retpoline: selectively disable IBRS in disable_ibrs_and_friends()

disable_ibrs_and_friends() is called:
(1) when the boot parameter "spectre_v2=off" is specified.
(2) the CPU is not affected by Spectre V2 and:
- spectre_v2=off
- or spectre_v2=auto
- or the spectre_v2 is not specified
(3) retpoline is selected as the Spectre V2 mitigation.

For (1) and (2) IBRS should be disabled (SPEC_CTRL_IBRS_ADMIN_DISABLED
is set).  This prevents setting IBRS in use even if it is the only
Spectre V2 mitigation available.

For (3) IBRS should be set not-in-use but remain enabled in case
it is selected by disable_repoline() as the fall back Spectre V2
mitigation.

Orabug: 27665263

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agobnxt_en: Add cache line size setting to optimize performance.
Michael Chan [Wed, 17 Jan 2018 08:21:15 +0000 (03:21 -0500)]
bnxt_en: Add cache line size setting to optimize performance.

Orabug: 2764835527648339

The chip supports 64-byte and 128-byte cache line size for more optimal
DMA performance when matched to the CPU cache line size.  The default is 64.
If the system is using 128-byte cache line size, set it to 128.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c3480a603773cfc5d8aa44dbbee6c96e0f9d4d9d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

7 years agobnxt_en: Forward VF MAC address to the PF.
Vasundhara Volam [Wed, 17 Jan 2018 08:21:14 +0000 (03:21 -0500)]
bnxt_en: Forward VF MAC address to the PF.

Orabug: 2764835527648339

Forward hwrm_func_vf_cfg command from VF to PF driver, to store
VF MAC address in PF's context.  This will allow "ip link show"
to display all VF MAC addresses.

Maintain 2 locations of MAC address in VF info structure, one for
a PF assigned MAC and one for VF assigned MAC.

Display VF assigned MAC in "ip link show", only if PF assigned MAC is
not valid.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 91cdda40714178497cbd182261b2ea6ec5cb9276)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Add BCM5745X NPAR device IDs
Vasundhara Volam [Wed, 17 Jan 2018 08:21:13 +0000 (03:21 -0500)]
bnxt_en: Add BCM5745X NPAR device IDs

Orabug: 2764835527648339

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 92abef361bd233ea2a99db9e9a637626f523f82e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Expand bnxt_check_rings() to check all resources.
Michael Chan [Wed, 17 Jan 2018 08:21:12 +0000 (03:21 -0500)]
bnxt_en: Expand bnxt_check_rings() to check all resources.

Orabug: 2764835527648339

bnxt_check_rings() is called by ethtool, XDP setup, and ndo_setup_tc()
to see if there are enough resources to support the new configuration.
Expand the call to test all resources if the firmware supports the new
API.  With the more flexible resource allocation scheme, this call must
be made to check that all resources are available before committing to
allocate the resources.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8f23d638b36b4ff0fe5785cf01f9bdc41afb9c06)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

7 years agobnxt_en: Implement new method for the PF to assign SRIOV resources.
Michael Chan [Wed, 17 Jan 2018 08:21:11 +0000 (03:21 -0500)]
bnxt_en: Implement new method for the PF to assign SRIOV resources.

Orabug: 2764835527648339

Instead of the old method of evenly dividing the resources to the VFs,
use the new firmware API to specify min and max resources for each VF.
This way, there is more flexibility for each VF to allocate more or less
resources.

The min is the absolute minimum for each VF to function.  The max is the
global resources minus the resources used by the PF.  Each VF is
guaranteed the min.  Up to max resources may be available for some VFs.

The PF driver can use one of 2 strategies specified in NVRAM to assign
the resources.  The old legacy strategy of evenly dividing the resources
or the new flexible strategy.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4673d66468b80dc37abd1159a4bd038128173d48)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Reserve resources for RFS.
Michael Chan [Wed, 17 Jan 2018 08:21:10 +0000 (03:21 -0500)]
bnxt_en: Reserve resources for RFS.

Orabug: 2764835527648339

In bnxt_rfs_capable(), add call to reserve vnic resources to support
NTUPLE.  Return true if we can successfully reserve enough vnics.
Otherwise, reserve the minimum 1 VNIC for normal operations not
supporting NTUPLE and return false.

Also, suppress warning message about not enough resources for NTUPLE when
only 1 RX ring is in use.  NTUPLE filters by definition require multiple
RX rings.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a1eef5b9079742ecfad647892669bd5fe6b0e3f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Implement new method to reserve rings.
Michael Chan [Wed, 17 Jan 2018 08:21:09 +0000 (03:21 -0500)]
bnxt_en: Implement new method to reserve rings.

Orabug: 2764835527648339

The new method will call firmware to reserve the desired tx, rx, cmpl
rings, ring groups, stats context, and vnic resources.  A second query
call will check the actual resources that firmware is able to reserve.
The driver will then trim and adjust based on the actual resources
provided by firmware.  The driver will then reserve the final resources
in use.

This method is a more flexible way of using hardware resources.  The
resources are not fixed and can by adjusted by firmware.  The driver
adapts to the available resources that the firmware can reserve for
the driver.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 674f50a5b026151f4109992cb594d89f5334adde)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Set initial default RX and TX ring numbers the same in combined mode.
Michael Chan [Wed, 17 Jan 2018 08:21:08 +0000 (03:21 -0500)]
bnxt_en: Set initial default RX and TX ring numbers the same in combined mode.

Orabug: 2764835527648339

In combined mode, the driver is currently not setting RX and TX ring
numbers the same when firmware can allocate more RX than TX or vice versa.
This will confuse the user as the ethtool convention assumes they are the
same in combined mode.  Fix it by adding bnxt_trim_dflt_sh_rings() to trim
RX and TX ring numbers to be the same as the completion ring number in
combined mode.

Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
will not be the same as RX rings in combined mode.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 58ea801ac4c166cdcaa399ce7f9b3e9095ff2842)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Add the new firmware API to query hardware resources.
Michael Chan [Wed, 17 Jan 2018 08:21:07 +0000 (03:21 -0500)]
bnxt_en: Add the new firmware API to query hardware resources.

Orabug: 2764835527648339

The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
resources.  Use the new API when it is supported by firmware.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit be0dd9c4100c9549fe50258e3d928072e6c31590)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h

7 years agobnxt_en: Refactor hardware resource data structures.
Michael Chan [Wed, 17 Jan 2018 08:21:06 +0000 (03:21 -0500)]
bnxt_en: Refactor hardware resource data structures.

Orabug: 2764835527648339

In preparation for new firmware APIs to allocate hardware resources,
add a new struct bnxt_hw_resc to hold various min, max and reserved
resources.  This new structure is common for PFs and VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a4f29470569c5a158c1871a2f752ca22e433420)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Restore MSIX after disabling SRIOV.
Michael Chan [Wed, 17 Jan 2018 08:21:05 +0000 (03:21 -0500)]
bnxt_en: Restore MSIX after disabling SRIOV.

Orabug: 2764835527648339

After SRIOV has been enabled and disabled, the MSIX vectors assigned to
the VFs have to be re-initialized.  Otherwise they cannot be re-used by
the PF.  For example, increasing the number of PF rings after disabling
SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.

To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
NIC, clear and re-init MSIX, and re-open the NIC.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 80fcaf46c09262a71f32bb577c976814c922f864)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h

7 years agobnxt_en: Refactor bnxt_close_nic().
Michael Chan [Wed, 17 Jan 2018 08:21:04 +0000 (03:21 -0500)]
bnxt_en: Refactor bnxt_close_nic().

Orabug: 2764835527648339

Add a new __bnxt_close_nic() function to do all the work previously done
in bnxt_close_nic() except waiting for SRIOV configuration.  The new
function will be used in the next patch as part of SRIOV cleanup.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 86e953db0114f396f916344395160aa267bf2627)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

7 years agobnxt_en: Update firmware interface to 1.9.0.
Michael Chan [Wed, 17 Jan 2018 08:21:03 +0000 (03:21 -0500)]
bnxt_en: Update firmware interface to 1.9.0.

Orabug: 2764835527648339

The version has new firmware APIs to allocate PF/VF resources more
flexibly.

New toolchains were used to generate this file, resulting in a one-time
large diffstat.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 894aa69a90932907f3de9d849ab9970884151d0e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
Venkat Duvvuru [Thu, 4 Jan 2018 23:46:55 +0000 (18:46 -0500)]
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.

Orabug: 2764835527648339

In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 78f300049335ae81a5cc6b4b232481dc5e1f9d41)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix sources of spurious netpoll warnings
Calvin Owens [Fri, 8 Dec 2017 17:05:26 +0000 (09:05 -0800)]
bnxt_en: Fix sources of spurious netpoll warnings

Orabug: 2764835527648339

After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and
903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
  bnxt_poll+0x0/0xd0 exceeded budget in poll
  <snip>
  Call Trace:
   [<ffffffff814be5cd>] dump_stack+0x4d/0x70
   [<ffffffff8107e013>] __warn+0xd3/0xf0
   [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
   [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
   [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
   [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
   [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
   [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
   [<ffffffff810c9549>] console_unlock+0x339/0x5b0
   [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
   [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
   [<ffffffff81173df5>] printk+0x48/0x50
   [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
   [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
   [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
   [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
   [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
   [<ffffffff81095f8b>] process_one_work+0x19b/0x480
   [<ffffffff810967ca>] worker_thread+0x6a/0x520
   [<ffffffff8109c7c4>] kthread+0xe4/0x100
   [<ffffffff81884c52>] ret_from_fork+0x22/0x40

This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2edbdb3159d6f6bd3a9b6e7f789f2b879699a519)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Don't print "Link speed -1 no longer supported" messages.
Michael Chan [Wed, 6 Dec 2017 22:31:22 +0000 (17:31 -0500)]
bnxt_en: Don't print "Link speed -1 no longer supported" messages.

Orabug: 2764835527648339

On some dual port NICs, the 2 ports have to be configured with compatible
link speeds.  Under some conditions, a port's configured speed may no
longer be supported.  The firmware will send a message to the driver
when this happens.

Improve this logic that prints out the warning by only printing it if
we can determine the link speed that is no longer supported.  If the
speed is unknown or it is in autoneg mode, skip the warning message.

Reported-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a8168b6cee6e9334dfebb4b9108e8d73794f6088)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
Vasundhara Volam [Fri, 1 Dec 2017 08:13:05 +0000 (03:13 -0500)]
bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()

Orabug: 2764835527648339

short_input variable is assigned to another data pointer which is
referred out of its scope. Fix it by moving short_input definition
to the beginning of bnxt_hwrm_do_send_msg() function.

No failure has been reported so far due to this issue.

Fixes: e605db801bde ("bnxt_en: Support for Short Firmware Message")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ebd5818cc5d4847897d7fe872e2d9799d7b7fcbb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
Ray Jui [Fri, 1 Dec 2017 08:13:02 +0000 (03:13 -0500)]
bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown

Orabug: 2764835527648339

The current 'bnxt_shutdown' implementation only invokes
'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in
the path of power off (SYSTEM_POWER_OFF). While this may work in most
cases, it does not work in the smart NIC case, when Linux 'reboot'
command is initiated from the Linux that runs on the ARM cores of the
NIC card. In this particular case, Linux 'reboot' results in a system
'L3' level reset where the entire ARM and associated subsystems are
being reset, but at the same time, Nitro core is being kept in sane state
(to allow external PCIe connected servers to continue to work). Without
properly shutting down RoCE and freeing all associated resources, it
results in the ARM core to hang immediately after the 'reboot'

By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the
above issue

Fixes: 0efd2fc65c92 ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.")
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a7f3f939dd7d8398acebecd1ceb2e9e7ffbe91d2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'
Christophe JAILLET [Tue, 21 Nov 2017 19:46:49 +0000 (20:46 +0100)]
bnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'

Orabug: 2764835527648339

Error code returned by 'bnxt_read_sfp_module_eeprom_info()' is handled a
few lines above when reading the A0 portion of the EEPROM.
The same should be done when reading the A2 portion of the EEPROM.

In order to correctly propagate an error, update 'rc' in this 2nd call as
well, otherwise 0 (success) is returned.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dea521a2b9f96e905fa2bb2f95e23ec00c2ec436)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt: fix bnxt_hwrm_fw_set_time for y2038
Arnd Bergmann [Wed, 7 Mar 2018 21:30:59 +0000 (16:30 -0500)]
bnxt: fix bnxt_hwrm_fw_set_time for y2038

Orabug: 2764835527648339

On 32-bit architectures, rtc_time_to_tm() returns incorrect results
in 2038 or later, and do_gettimeofday() is broken for the same reason.

This changes the code to use ktime_get_real_seconds() and time64_to_tm()
instead, both of them are 2038-safe, and we can also get rid of the
CONFIG_RTC_LIB dependency that way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7dfaa7bc99498da1c6c4a48bee8d2d5265161a8c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix IRQ coalescing regression.
Michael Chan [Fri, 3 Nov 2017 07:32:39 +0000 (03:32 -0400)]
bnxt_en: Fix IRQ coalescing regression.

Orabug: 2764835527648339

Recent IRQ coalescing clean up has removed a guard-rail for the max DMA
buffer coalescing value.  This is a 6-bit value and must not be 0.  We
already have a check for 0 but 64 is equivalent to 0 and will cause
non-stop interrupts.  Fix it by adding the proper check.

Fixes: f8503969d27b ("bnxt_en: Refactor and simplify coalescing code.")
Reported-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b153cbc507946f52d5aa687fd64f45d82cb36a3b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: fix typo in bnxt_set_coalesce
Andy Gospodarek [Fri, 3 Nov 2017 07:32:38 +0000 (03:32 -0400)]
bnxt_en: fix typo in bnxt_set_coalesce

Orabug: 2764835527648339

Recent refactoring of coalesce settings contained a typo that prevents
receive settings from being set properly.

Fixes: 18775aa8a91f ("bnxt_en: Reorganize the coalescing parameters.")
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit de4a10ef6eff0eb0ced97a39dc3edd0d3101b6ed)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Refactor and simplify coalescing code.
Michael Chan [Thu, 26 Oct 2017 15:51:28 +0000 (11:51 -0400)]
bnxt_en: Refactor and simplify coalescing code.

Orabug: 2764835527648339

The mapping of the ethtool coalescing parameters to hardware parameters
is now done in bnxt_hwrm_set_coal_params().  The same function can
handle both RX and TX settings.  The code is now more clear.  Some
adjustments have been made to get better hardware settings.  The
coal_frames setting is now accurately set in hardware.  The max_timer
is set to coal_ticks value.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f8503969d27b2b26ff0adbce4b7d7cf4ba5e43c2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Reorganize the coalescing parameters.
Michael Chan [Thu, 26 Oct 2017 15:51:27 +0000 (11:51 -0400)]
bnxt_en: Reorganize the coalescing parameters.

Orabug: 2764835527648339

The current IRQ coalescing logic is a little messy.  The ethtool
parameters are mapped to hardware parameters in a way that is difficult
to understand.  The first step is to better organize the parameters
by adding the new structure bnxt_coal.  The structure is used by both
the RX and TX sets of coalescing parameters.

Adjust the default coal_ticks to 14 us and 28 us for RX and TX.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18775aa8a91fcd4cd07c722d575b4b852e3624c3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h

7 years agobnxt_en: Add ethtool reset method
Vasundhara Volam [Thu, 26 Oct 2017 15:51:26 +0000 (11:51 -0400)]
bnxt_en: Add ethtool reset method

Orabug: 2764835527648339

This is a firmware internal reset after driver is unloaded.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 49f7972fd16407b3d1f03c2d447d2f1e1b95e9ba)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Optimize .ndo_set_mac_address() for VFs.
Michael Chan [Thu, 26 Oct 2017 15:51:24 +0000 (11:51 -0400)]
bnxt_en: Optimize .ndo_set_mac_address() for VFs.

Orabug: 2764835527648339

No need to call bnxt_approve_mac() which will send a message to the
PF if the MAC address hasn't changed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c1a7bdff17247332ecff7f243e42d269b3f74c65)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Get firmware package version one time.
Michael Chan [Thu, 26 Oct 2017 15:51:23 +0000 (11:51 -0400)]
bnxt_en: Get firmware package version one time.

Orabug: 2764835527648339

The current code retrieves the firmware package version from firmware
everytime ethtool -i is run.  There is no reason to do that as the
firmware will not change while the driver is loaded.  Get the version
once at init time.

Also, display the full 4-part firmware version string and remove the
less useful interface spec version.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 431aa1eb20d8ae2674723292adb832b968da868e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Check for zero length value in bnxt_get_nvram_item().
Michael Chan [Thu, 26 Oct 2017 15:51:22 +0000 (11:51 -0400)]
bnxt_en: Check for zero length value in bnxt_get_nvram_item().

Orabug: 2764835527648339

Return -EINVAL if the length is zero and not proceed to do essentially
nothing.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e0ad8fc5980b362028cfd63ec037f4b491e726c6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: adding PCI ID for SMARTNIC VF support
Rob Miller [Thu, 26 Oct 2017 15:51:21 +0000 (11:51 -0400)]
bnxt_en: adding PCI ID for SMARTNIC VF support

Orabug: 2764835527648339

Signed-off-by: Rob Miller <rmiller@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 618784e3ee1870e43e50e1c7922cc123cc050566)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Add PCIe device ID for bcm58804
Ray Jui [Thu, 26 Oct 2017 15:51:20 +0000 (11:51 -0400)]
bnxt_en: Add PCIe device ID for bcm58804

Orabug: 2764835527648339

Add new PCIe device ID and chip number for bcm58804

Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8ed693b7bbd179949f6947adaae5eff2e386a534)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Update firmware interface to 1.8.3.1
Michael Chan [Thu, 26 Oct 2017 15:51:19 +0000 (11:51 -0400)]
bnxt_en: Update firmware interface to 1.8.3.1

Orabug: 2764835527648339

Vxlan encap/decap filters are added to this firmware spec.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57922b0a2f7ef9effbcdbbf7d1f8dad95aa567f7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix possible corruption in DCB parameters from firmware.
Sankar Patchineelam [Sat, 14 Oct 2017 01:09:34 +0000 (21:09 -0400)]
bnxt_en: Fix possible corruption in DCB parameters from firmware.

Orabug: 2764835527648339

hwrm_send_message() is replaced with _hwrm_send_message(), and
hwrm_cmd_lock mutex lock is grabbed for the whole period of
firmware call until the firmware DCB parameters have been copied.
This will prevent possible corruption of the firmware data.

Fixes: 7df4ae9fe855 ("bnxt_en: Implement DCBNL to support host-based DCBX.")
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b1e1a9ce06fd94b563d6c3dd896589231995d89)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix VF resource checking.
Michael Chan [Sat, 14 Oct 2017 01:09:32 +0000 (21:09 -0400)]
bnxt_en: Fix VF resource checking.

Orabug: 2764835527648339

In bnxt_sriov_enable(), we calculate to see if we have enough hardware
resources to enable the requested number of VFs.  The logic to check
for minimum completion rings and statistics contexts is missing.  Add
the required checks so that VF configuration won't fail.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 021570793d8cd86cb62ac038c535f4450586b454)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix VF PCIe link speed and width logic.
Vasundhara Volam [Sat, 14 Oct 2017 01:09:31 +0000 (21:09 -0400)]
bnxt_en: Fix VF PCIe link speed and width logic.

Orabug: 2764835527648339

PCIE PCIE_EP_REG_LINK_STATUS_CONTROL register is only defined in PF
config space, so we must read it from the PF.

Fixes: 90c4f788f6c0 ("bnxt_en: Report PCIe link speed and width during driver load")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7ab0760f5178169c4c218852f51646ea90817d7c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Don't use rtnl lock to protect link change logic in workqueue.
Michael Chan [Sat, 14 Oct 2017 01:09:30 +0000 (21:09 -0400)]
bnxt_en: Don't use rtnl lock to protect link change logic in workqueue.

Orabug: 2764835527648339

As a further improvement to the PF/VF link change logic, use a private
mutex instead of the rtnl lock to protect link change logic.  With the
new mutex, we don't have to take the rtnl lock in the workqueue when
we have to handle link related functions.  If the VF and PF drivers
are running on the same host and both take the rtnl lock and one is
waiting for the other, it will cause timeout.  This patch fixes these
timeouts.

Fixes: 90c694bb7181 ("bnxt_en: Fix RTNL lock usage on bnxt_update_link().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e2dc9b6e38fa3919e63d6d7905da70ca41cbf908)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Improve VF/PF link change logic.
Michael Chan [Sat, 14 Oct 2017 01:09:29 +0000 (21:09 -0400)]
bnxt_en: Improve VF/PF link change logic.

Orabug: 2764835527648339

Link status query firmware messages originating from the VFs are forwarded
to the PF.  The driver handles these interactions in a workqueue for the
VF and PF.  The VF driver waits for the response from the PF in the
workqueue.  If the PF and VF driver are running on the same host and the
work for both PF and VF are queued on the same workqueue, the VF driver
may not get the response if the PF work item is queued behind it on the
same workqueue.  This will lead to the VF link query message timing out.

To prevent this, we create a private workqueue for PFs instead of using
the common workqueue.  The VF query and PF response will never be on
the same workqueue.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c213eae8d3cd4c026f348ce4fd64f4754b3acf2b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

7 years agobnxt_en: Remove redundant unlikely()
Tobias Klauser [Tue, 26 Sep 2017 13:12:26 +0000 (15:12 +0200)]
bnxt_en: Remove redundant unlikely()

Orabug: 2764835527648339

IS_ERR() already implies unlikely(), so it can be omitted.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1fac4b2fdbccab69cb781aae68f540be94d5549e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agodrivers: net: bnxt: use setup_timer() helper.
Allen Pais [Thu, 21 Sep 2017 17:05:08 +0000 (22:35 +0530)]
drivers: net: bnxt: use setup_timer() helper.

Orabug: 2764835527648339

Use setup_timer function instead of initializing timer with the
    function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c43824477c2ac722325ba460c2ce683c48fb76b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Reduce default rings on multi-port cards.
Michael Chan [Mon, 28 Aug 2017 17:40:31 +0000 (13:40 -0400)]
bnxt_en: Reduce default rings on multi-port cards.

Orabug: 2764835527648339

Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d5430d31ca72ec37fd539fd1c5230859509be4ef)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Improve -ENOMEM logic in NAPI poll loop.
Michael Chan [Mon, 28 Aug 2017 17:40:30 +0000 (13:40 -0400)]
bnxt_en: Improve -ENOMEM logic in NAPI poll loop.

Orabug: 2764835527648339

If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget.  This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers.  Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 903649e718f80da2ba4b65a0adf6930219b4b2e5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt: initialize board_info values with proper enums
Scott Branden [Mon, 28 Aug 2017 17:40:29 +0000 (13:40 -0400)]
bnxt: initialize board_info values with proper enums

Orabug: 2764835527648339

initialize board_info values with proper enums for defensive programming
purposes.  This will avoid any errors of the enums being declared not
lining up with the board_info array.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 27573a7d905a49dc756fda9c0e148372136356e6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt: Add PCIe device IDs for bcm58802/bcm58808
Ray Jui [Mon, 28 Aug 2017 17:40:28 +0000 (13:40 -0400)]
bnxt: Add PCIe device IDs for bcm58802/bcm58808

Orabug: 2764835527648339

Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later

Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a58139b8493624c6c6223b58a9e70ebbdf56338)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: assign CPU affinity hints to bnxt_en IRQs
Vasundhara Volam [Mon, 28 Aug 2017 17:40:27 +0000 (13:40 -0400)]
bnxt_en: assign CPU affinity hints to bnxt_en IRQs

Orabug: 2764835527648339

This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 56f0fd80d1886479a42ac07ed239538eb145a669)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Improve tx ring reservation logic.
Michael Chan [Mon, 28 Aug 2017 17:40:26 +0000 (13:40 -0400)]
bnxt_en: Improve tx ring reservation logic.

Orabug: 2764835527648339

When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC.  If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.

The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings.  We fix it as follows:

1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available.  There is a flag in
the firmware message to do that.  If not available, abort and the
current TX rings will not be disabled.

2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved.  If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 98fdbe73bfb809b1f8eec9f27a36e737caed3a44)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c

7 years agobnxt_en: Update firmware interface spec. to 1.8.1.4.
Michael Chan [Mon, 28 Aug 2017 17:40:25 +0000 (13:40 -0400)]
bnxt_en: Update firmware interface spec. to 1.8.1.4.

Orabug: 2764835527648339

Flow APIs are added in this firmware interface.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a17eb27bf7ece364627fcf16ad50c24b793300b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Do not setup MAC address in bnxt_hwrm_func_qcaps().
Michael Chan [Wed, 23 Aug 2017 23:34:05 +0000 (19:34 -0400)]
bnxt_en: Do not setup MAC address in bnxt_hwrm_func_qcaps().

Orabug: 2764835527648339

bnxt_hwrm_func_qcaps() is called during probe to get all device
resources and it also sets up the factory MAC address.  The same function
is called when SRIOV is disabled to reclaim all resources.  If
the MAC address has been overridden by a user administered MAC
address, calling this function will overwrite it.

Separate the logic that sets up the default MAC address into a new
function bnxt_init_mac_addr() that is only called during probe time.

Fixes: 4a21b49b34c0 ("bnxt_en: Improve VF resource accounting.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a22a6ac2ff8080c87e446e20592725c064229c71)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Free MSIX vectors when unregistering the device from bnxt_re.
Michael Chan [Wed, 23 Aug 2017 23:34:04 +0000 (19:34 -0400)]
bnxt_en: Free MSIX vectors when unregistering the device from bnxt_re.

Orabug: 2764835527648339

Take back ownership of the MSIX vectors when unregistering the device
from bnxt_re.

Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 146ed3c5b87d8c65ec31bc56df26f027fe624b8f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Fix .ndo_setup_tc() to include XDP rings.
Michael Chan [Wed, 23 Aug 2017 23:34:03 +0000 (19:34 -0400)]
bnxt_en: Fix .ndo_setup_tc() to include XDP rings.

Orabug: 2764835527648339

When the number of TX rings is changed in bnxt_setup_tc(), we need to
include the XDP rings in the total TX ring count.

Fixes: 38413406277f ("bnxt_en: Add support for XDP_TX action.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 87e9b3778c94694c9e098c91a0cc05725f0e017f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt: fix unused variable warnings
stephen hemminger [Mon, 24 Jul 2017 17:25:19 +0000 (10:25 -0700)]
bnxt: fix unused variable warnings

Orabug: 2764835527648339

Fix a couple of warnings where variable ‘txq’ set but not used

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>v, i);
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 351bac30613378c4684d4673aac0c7917980a652)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt: fix unsigned comparsion with 0
stephen hemminger [Mon, 24 Jul 2017 17:25:18 +0000 (10:25 -0700)]
bnxt: fix unsigned comparsion with 0

Orabug: 2764835527648339

Fixes warning because location is u32 and can never be netative
warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b721cfaf03bcaac0a3abf702c4240326eed9e4b1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobnxt_en: Use SWITCHDEV_SET_OPS().
David S. Miller [Tue, 25 Jul 2017 04:20:16 +0000 (21:20 -0700)]
bnxt_en: Use SWITCHDEV_SET_OPS().

Orabug: 2764835527648339

Suggested by Jakub Kicinski.

Fixes: c124a62ff2dd ("bnxt_en: add support for port_attr_get and and get_phys_port_name")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc88055ab72c0eaa080926c888628b77d2055513)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

7 years agobnxt_en: Set ETS min_bw parameter for older firmware.
Michael Chan [Mon, 24 Jul 2017 16:34:26 +0000 (12:34 -0400)]
bnxt_en: Set ETS min_bw parameter for older firmware.

Orabug: 2764835527648339

In addition to the ETS weight, older firmware also requires the min_bw
parameter to be set for it to work properly.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 70098a47bbf131b65c64ca935c2480e64c9c7c51)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agodccp/tcp: fix routing redirect race
Jon Maxwell [Fri, 10 Mar 2017 05:40:33 +0000 (16:40 +1100)]
dccp/tcp: fix routing redirect race

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Orabug: 27661864

Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0)

Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
7 years agoRevert "RDS: don't commit to queue till transport connection is up"
Santosh Shilimkar [Thu, 22 Feb 2018 19:16:38 +0000 (11:16 -0800)]
Revert "RDS: don't commit to queue till transport connection is up"

This reverts commit 238a807df5e57afb4b1e13ba87015093e3212247.

This change was introduced to address the RDS internal sendQ
occupancy for the messages targeted to dead/non-existing nodes. It was
discovered as part of a customer issue where remote node was shut down
and RDS attempted to repeatedly establish a connection without success.
Same issue can be exploited by sending messages to non-existing node
too since RDS forms connection as part of sendmsg if it doesn't
exist already.

While at that time sending EAGAIN instead of adding messages to
sendQ when remote connection not up, looked straightforward, it
has undesired effect on application to keep spinning even though
there is space to write on socket buffer. And application has
no notion of underneath connections, so RDS needs to handle this
problem internally and transparently. Application will automatically
move to POLL OUT once its own socket buffer is full and will
avoid the CPU tight spinning.

To address draining the internal sendQ messages targeted to
dead nodes or non-existing nodes, one possible way is to
retire/destroy those connections, after some large timeout. That
will also drop those messages from sendQ. This change will be
addressed separately.

Orabug: 27606911

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
7 years agobe2net: locking/atomics: COCCINELLE/treewide: Convert trivial
Mark Rutland [Wed, 7 Mar 2018 00:39:51 +0000 (19:39 -0500)]
be2net: locking/atomics: COCCINELLE/treewide: Convert trivial
  ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Orabug: 27615319

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobe2net: Handle transmit completion errors in Lancer
Suresh Reddy [Tue, 6 Feb 2018 13:52:42 +0000 (08:52 -0500)]
be2net: Handle transmit completion errors in Lancer

Orabug: 27615319

If the driver receives a TX CQE with status as 0x1 or 0x9 or 0xb,
the completion indexes should not be used. The driver must stop
consuming CQEs from this TXQ/CQ. The TXQ from this point on-wards
to be in a bad state. Driver should destroy and recreate the TXQ.

0x1: LANCER_TX_COMP_LSO_ERR
0x9 LANCER_TX_COMP_SGE_ERR
0xb: LANCER_TX_COMP_PARITY_ERR

Reset the adapter if driver sees this error in TX completion. Also
adding sge error counter in ethtool stats.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ffc39620102dfe62711fadb9a297b66aee816013)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobe2net: Fix HW stall issue in Lancer
Suresh Reddy [Tue, 6 Feb 2018 13:52:41 +0000 (08:52 -0500)]
be2net: Fix HW stall issue in Lancer

Orabug: 27615319

Lancer HW cannot handle a TSO packet with a single segment.
Disable TSO/GSO for such packets.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3df40aad1a864af124bd50a1371ef16089ac9af2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobe2net: remove redundant initialization of 'head' and pointer txq
Colin Ian King [Wed, 31 Jan 2018 16:14:25 +0000 (16:14 +0000)]
be2net: remove redundant initialization of 'head' and pointer txq

Orabug: 27615319

Variable head is initialized to a value that is never read and is
being updated to a new value a few lines later, hence this
initialization is redundant and can be safely removed as well
as the now unused pointer txq.

Cleans up clang warning:
drivers/net/ethernet/emulex/benet/be_main.c:996:6: warning: Value
stored to 'head' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2e85283dabc22f4715b136e8a7426bd9bef4ce69)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobe2net: networking block comments don't use an empty /* line
Rohit Visavalia [Thu, 25 Jan 2018 12:58:24 +0000 (18:28 +0530)]
be2net: networking block comments don't use an empty /* line

Orabug: 27615319

Resolved Warning: networking block comments don't use an empty /* line,
use /* Comment...
Issue found by checkpatch.

Signed-off-by: Rohit Visavalia <rohit.visavalia@softnautics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5f834cf4b7c50d2172d9f2307499e6b64b7504ac)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobe2net: restore properly promisc mode after queues reconfiguration
Ivan Vecera [Fri, 19 Jan 2018 19:23:50 +0000 (20:23 +0100)]
be2net: restore properly promisc mode after queues reconfiguration

Orabug: 27615319

The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.

Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 52acf06451930eb4cefabd5ecea56e2d46c32f76)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agobe2net: use ARRAY_SIZE for array sizing calculation on array cmd_priv_map
Colin Ian King [Sun, 7 Jan 2018 23:45:08 +0000 (23:45 +0000)]
be2net: use ARRAY_SIZE for array sizing calculation on array cmd_priv_map

Orabug: 27615319

Use the ARRAY_SIZE macro on array cmd_priv_map to determine size of the
array.  Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoRDS: IB: Fix null pointer issue
Guanglei Li [Tue, 6 Feb 2018 02:43:21 +0000 (10:43 +0800)]
RDS: IB: Fix null pointer issue

Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Orabug: 27636711

Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2c0aa08631b86a4678dbc93b9caa5248014b4458)
Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
7 years agoxen/acpi: upload _PSD info for non-dom0 CPUs too
Joao Martins [Mon, 5 Mar 2018 13:29:25 +0000 (13:29 +0000)]
xen/acpi: upload _PSD info for non-dom0 CPUs too

All uploaded PM data from non-dom0 CPUs takes the info from CPU 0 with a
different acpi_id. For processors which P-state coordination type is
HW_ALL (0xFD) it is OK to upload bogus P-state dependency information
(_PSD), because Xen will ignore any domains created for past CPUs.

Albeit for platforms which expose coordination types as SW_ANY or
SW_ALL, this will have some unintended side effects. Effectively, it
will look at the P-state domain existence and *if it already exists* it
will skip the acpi-cpufreq initialization and thus inherit the policy
from the first CPU in the cpufreq domain. Finally it and won't change
the original cpu target freq to P0 other than the first in the domain.

Which will make turbo boost not getting enabled (e.g. for 'performance'
governor) for all cpus and instead only those with unique P-state
domains.

This patch fixes that, by also evaluating _PSD when enumerate all ACPI
procesors and uploading that instead.

Orabug: 27655759
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Shih-Yu Huang <shih-yu.huang@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoscsi: lpfc: Update 11.4.0.7 modified files for 2018 Copyright
James Smart [Tue, 30 Jan 2018 23:59:03 +0000 (15:59 -0800)]
scsi: lpfc: Update 11.4.0.7 modified files for 2018 Copyright

Orabug: 27631736

Updated Copyright in files updated 11.4.0.7

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 128bddacc4dd7c86070e1e0534687e3083a89d52)
Signed-off-by: Dick dkennedy <dick.kennedy@broadcom.com>
 Conflicts:
drivers/scsi/lpfc/lpfc_nvme.c
drivers/scsi/lpfc/lpfc_nvmet.c
drivers/scsi/lpfc/lpfc_nvmet.h
 I had to remover these from the patch
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoscsi: lpfc: update driver version to 11.4.0.7
James Smart [Tue, 30 Jan 2018 23:59:02 +0000 (15:59 -0800)]
scsi: lpfc: update driver version to 11.4.0.7

Orabug: 27631736

Update the driver version to 11.4.0.7

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6e9d2f1667ea12bd2f997a7529fb41cce8e0036d)
Signed-off-by: Dick dkennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>