Peter Zijlstra [Wed, 22 Jun 2016 13:14:26 +0000 (15:14 +0200)]
sched/fair: Initialize and rework throttle_count for new task-groups
This patch is a combination of the following three patches from mainline:
094f469172e0 sched/fair: Initialize throttle_count for new task-groups lazily
Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().
This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).
A future patch needs rq->lock held _after_ we link the task_group into
the hierarchy. In order to avoid taking every rq->lock twice, reorder
things a little and create online_fair_sched_group() to be called
after we link the task_group.
All this code is still ran from css_alloc() so css_online() isn't in
fact used for this.
Since we already take rq->lock when creating a cgroup, use it to also
sync the throttle_count and avoid the extra state and enqueue path
branch.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: linux-kernel@vger.kernel.org
[ Fixed build warning. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
The patches have been combined because applying them separately will
cause a KABI breakage and introduce a dummy function.
perf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include/asm/
And remove the empty tools/arch/x86/include/asm/unistd_{32,64}.h files
introduced by eae7a755ee81 ("perf tools, x86: Build perf on older
user-space as well").
This way we get closer to mirroring the kernel for cases where __NR_
can't be found for some include path/_GNU_SOURCE/whatever scenario.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-kpj6m3mbjw82kg6krk2z529e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit cec07f53c398)
Orabug: 27240053 Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Stephan Mueller [Thu, 25 Aug 2016 13:15:01 +0000 (15:15 +0200)]
crypto: FIPS - allow tests to be disabled in FIPS mode
In FIPS mode, additional restrictions may apply. If these restrictions
are violated, the kernel will panic(). This patch allows test vectors
for symmetric ciphers to be marked as to be skipped in FIPS mode.
Together with the patch, the XTS test vectors where the AES key is
identical to the tweak key is disabled in FIPS mode. This test vector
violates the FIPS requirement that both keys must be different.
Reported-by: Tapas Sarangi <TSarangi@trustwave.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 10faa8c0d6c3b22466f97713a9533824a2ea1c57)
Stephan Mueller [Tue, 9 Feb 2016 14:37:47 +0000 (15:37 +0100)]
crypto: xts - consolidate sanity check for keys
The patch centralizes the XTS key check logic into the service function
xts_check_key which is invoked from the different XTS implementations.
With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
a sanity check for the XTS keys similar to the other arches.
In addition, this service function received a check to ensure that the
key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
check is not present in the standards defining XTS, it is only enforced
in FIPS mode of the kernel.
Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 28856a9e52c7cac712af6c143de04766617535dc)
Herbert Xu [Tue, 21 Apr 2015 02:46:49 +0000 (10:46 +0800)]
crypto: rng - Zero seed in crypto_rng_reset
If we allocate a seed on behalf ot the user in crypto_rng_reset,
we must ensure that it is zeroed afterwards or the RNG may be
compromised.
Reported-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b617b702da4e922277806f81c411d3051107d462)
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:24 +0000 (11:07 -0800)]
enic: set IG desc cache flag in open
New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.
Also increment driver version
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5de0c022f1b0bce073cb04dd69ed7982805e5763)
Orabug: 27587345 Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.
In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit 9420098adc50a88d4a441e0f92d54bfa7af44448) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.
As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426102
(cherry picked from commit a15025660d4703a8b37290a14734cb4a84875770) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
The following panic was caught. Something wrong with the storage and io error
was returned, generic_readlink()->ext4_follow_link()->page_follow_link_light()
returned with NULL page and error link, then ext4_put_link() tried to free the
error link and panic.
Mainline/uek5 not have this issue, as the ->following_link and ->put_link have
been refactored there. The patche set to do that is a little big, so I don't
bother to backport them, just write this small patch to fix the issue.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Patrick Colp [Wed, 28 Mar 2018 01:30:49 +0000 (18:30 -0700)]
KVM/VMX: Clear spec_ctrl status when resetting vcpu
vmx->spec_ctrl was not set to 0 in vmx_vcpu_reset, which could result in
IBRS getting stuck on all the time, even with 'spectre_v2=off' set. This
was most notable when rebooting from an older kernel into a newer
retpoline-enabled kernel resulted in up to 80% CPU performance drop.
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Patrick Colp <patrick.colp@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Daniel Jurgens [Tue, 6 Mar 2018 21:49:08 +0000 (13:49 -0800)]
mlx4: change the ICM table allocations to lowest needed size
The driver currently allocates 256KB contig memort for ICM
tables which puts pressure on memory management to allocate such
large contig page in fragmented memory system. Such allocation
itself contributes to memory fragmentation and at times user
process stalls for 10's of seconds leading to slow path
dumps with mm lock contention.
This change makes the driver allocate lowest page size
needed for ICM allocation(8K), which fixes these stalls.
With 4K chunk sizes the QP table size is 4MB, which cannot be allocated
by kmalloc. A larger design change would be neccesary to break up the
table. With 8k chunks the same table is 2MB, which can be allocated by
kmalloc. This large table allocation only happens once at driver load
time.
Herbert Xu [Mon, 10 Jul 2017 14:00:48 +0000 (22:00 +0800)]
crypto: af_alg - Avoid sock_graft call warning
The newly added sock_graft warning triggers in af_alg_accept.
It's harmless as we're essentially doing sock->sk = sock->sk.
The sock_graft call is actually redundant because all the work
it does is subsumed by sock_init_data. However, it was added
to placate SELinux as it uses it to initialise its internal state.
This patch avoisd the warning by making the SELinux call directly.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2acce6aa9f6569d4e135b2c4cfb56acce95efaeb)
iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.
To address this issue, this patch makes the following changes.
First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.
Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running. For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().
The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed. For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.
Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.
Reported-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Tested-by: Mike Christie <mchristi@redhat.com> Cc: Mike Christie <mchristi@redhat.com> Reported-by: Hannes Reinecke <hare@suse.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Varun Prakash <varun@chelsio.com> Cc: <stable@vger.kernel.org> # v3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 25cdda95fda78d22d44157da15aa7ea34be3c804)
Bart Van Assche [Fri, 23 Dec 2016 11:45:27 +0000 (12:45 +0100)]
target/iscsi: Fix indentation in iscsi_target_start_negotiation()
This patch avoids that smatch complains about inconsistent
indentation in iscsi_target_start_negotiation().
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 1efaa949396b5d9e8d1e6edef7e97e9ce1a97319)
Nicholas Bellinger [Sun, 28 Feb 2016 02:15:46 +0000 (18:15 -0800)]
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
There is a iscsi-target/tcp login race in LOGIN_FLAGS_READY
state assignment that can result in frequent errors during
iscsi discovery:
"iSCSI Login negotiation failed."
To address this bug, move the initial LOGIN_FLAGS_READY
assignment ahead of iscsi_target_do_login() when handling
the initial iscsi_target_start_negotiation() request PDU
during connection login.
As iscsi_target_do_login_rx() work_struct callback is
clearing LOGIN_FLAGS_READ_ACTIVE after subsequent calls
to iscsi_target_do_login(), the early sk_data_ready
ahead of the first iscsi_target_do_login() expects
LOGIN_FLAGS_READY to also be set for the initial
login request PDU.
As reported by Maged, this was first obsered using an
MSFT initiator running across multiple VMWare host
virtual machines with iscsi-target/tcp.
Reported-by: Maged Mokhtar <mmokhtar@binarykinetics.com> Tested-by: Maged Mokhtar <mmokhtar@binarykinetics.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 8f0dfb3d8b1120c61f6e2cc3729290db10772b2d)
Nicholas Bellinger [Thu, 5 Nov 2015 22:11:59 +0000 (14:11 -0800)]
iscsi-target: Fix rx_login_comp hang after login failure
This patch addresses a case where iscsi_target_do_tx_login_io()
fails sending the last login response PDU, after the RX/TX
threads have already been started.
The case centers around iscsi_target_rx_thread() not invoking
allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
from the failure path, resulting in RX thread hanging
indefinately on iscsi_conn->rx_login_comp.
To address this bug, complete ->rx_login_complete for good
measure in the failure path, and immediately return from
RX thread context if connection state did not actually reach
full feature phase (TARG_CONN_STATE_LOGGED_IN).
Cc: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit ca82c2bded29b38d36140bfa1e76a7bbfcade390)
Paolo Bonzini [Wed, 7 Jun 2017 13:13:14 +0000 (15:13 +0200)]
KVM: x86: fix singlestepping over syscall
TF is handled a bit differently for syscall and sysret, compared
to the other instructions: TF is checked after the instruction completes,
so that the OS can disable #DB at a syscall by adding TF to FMASK.
When the sysret is executed the #DB is taken "as if" the syscall insn
just completed.
KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
Fix the behavior, otherwise you could get #DB on a user stack which is not
nice. This does not affect Linux guests, as they use an IST or task gate
for #DB.
This fixes CVE-2017-7518.
Cc: stable@vger.kernel.org Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit c8401dda2f0a00cd25c0af6a95ed50e478d25de4)
Bill.Baker@oracle.com [Wed, 21 Feb 2018 18:46:43 +0000 (12:46 -0600)]
nfs: system crashes after NFS4ERR_MOVED recovery
nfs4_update_server unconditionally releases the nfs_client for the
source server. If migration fails, this can cause the source server's
nfs_client struct to be left with a low reference count, resulting in
use-after-free. Also, adjust reference count handling for ELOOP.
NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
nfs_put_client+0xfa/0x110 [nfs]
nfs4_run_state_manager+0x30/0x40 [nfsv4]
kthread+0xd8/0xf0
Benjamin Coddington [Tue, 30 Aug 2016 13:20:32 +0000 (09:20 -0400)]
NFS4: Avoid migration loops
If a server returns itself as a location while migrating, the client may
end up getting stuck attempting to migrate twice to the same server. Catch
this by checking if the nfs_client found is the same as the existing
client. For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL).
OL6 iscsi target used IET which presented VIRTUAL-DISK for inquiry product.
OL7 uses the LIO iscsi target instead, which presented LIO iblock name.
Exadata targets upgrading from OL6 to OL7 need to present the same
product ID to existing iscsi initiator multipath mappings.
Add target_core_mod parameter inquiry_product for target inquiry vendor
string override. It defaults to LIO iblock name.
The user will also need to do one of the following in targetcli:
set global export_backstore_name_as_model=false
or for each backstore:
/backstores/<type>/<name> set attribute emulate_model_alias=0
(cherry picked from commit faf91b95fd22dbf0a1a7fd5b18ab71a929385927) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
OL6 iscsi target used IET which presented IET for the inquiry vendor.
OL7 uses LIO iscsi target instead, which presents 'LIO-ORG'.
Exadata targets upgrading from OL6 to OL7 need to present the same
vendor ID to existing iscsi initiator multipath mappings.
Add target_core_mod parameter inquiry_vendor for target inquiry vendor
string override. It defaults to original "LIO-ORG " if not set.
(cherry picked from commit 87e9da042f42aa5bef1a76c4ef84989680e1df2e) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Or Gerlitz [Fri, 18 Dec 2015 08:59:45 +0000 (10:59 +0200)]
IB/core: Avoid calling ib_query_device
Use the cached copy of the attributes present on the device, except for
the case of a query originating from user-space, where we have to invoke
the driver query_device entry, so they can fill in their udata.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Orabug: 27687711 Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
(cherry-picked from upstream 86bee4c9c126b4f73e3f152cd43c806cac9135ad) Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Conflicts:
drivers/infiniband/core/uverbs_cmd.c
o Function "ib_uverbs_get_context":
In UEK it's "ibdev", taken from "file->device->ib_dev"
IB/uverbs: Explicitly pass ib_dev to uverbs commands
The earliest linux-4.x.y it showed up in was x==3, or rather "linux-4.3.y".
o Function "ib_uverbs_query_device":
Just like before, UEK does not have "ib_dev", so using
"file->device->ib_dev" instead.
This follows precedence of the deleted code, where "ib_query_device"
was called with "file->device->ib_dev" in UEK.
drivers/infiniband/core/verbs.c
The handling of "IB_DEVICE_LOCAL_DMA_LKEY" inside "ib_alloc_pd" was introduced with:
commit 96249d70dd70496084c7ec1465ec449cd032955a
Author: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Wed Aug 5 14:14:45 2015 -0600
IB/core: Guarantee that a local_dma_lkey is available
The earliest linux-4.x.y it showed up in was x==3, or rather "linux-4.3.y".
Since UEK is/was based on 4.1, the corresponding code does not exist here.
Jan H. Schönherr [Sun, 27 Aug 2017 13:56:37 +0000 (15:56 +0200)]
nvme: fix uninitialized prp2 value on small transfers
The value of iod->first_dma ends up as prp2 in NVMe commands. In case
there is not enough data to cross a page boundary, iod->first_dma is
never initialized and contains random data.
Comply with the NVMe specification and fill in 0 in that case.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
(cherry picked from commit 5228b3280b9bb8fa6aef59f891cca64a028e9b36)
Chuck Anderson [Wed, 7 Mar 2018 05:29:14 +0000 (21:29 -0800)]
retpoline: selectively disable IBRS in disable_ibrs_and_friends()
disable_ibrs_and_friends() is called:
(1) when the boot parameter "spectre_v2=off" is specified.
(2) the CPU is not affected by Spectre V2 and:
- spectre_v2=off
- or spectre_v2=auto
- or the spectre_v2 is not specified
(3) retpoline is selected as the Spectre V2 mitigation.
For (1) and (2) IBRS should be disabled (SPEC_CTRL_IBRS_ADMIN_DISABLED
is set). This prevents setting IBRS in use even if it is the only
Spectre V2 mitigation available.
For (3) IBRS should be set not-in-use but remain enabled in case
it is selected by disable_repoline() as the fall back Spectre V2
mitigation.
The chip supports 64-byte and 128-byte cache line size for more optimal
DMA performance when matched to the CPU cache line size. The default is 64.
If the system is using 128-byte cache line size, set it to 128.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c3480a603773cfc5d8aa44dbbee6c96e0f9d4d9d) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
Forward hwrm_func_vf_cfg command from VF to PF driver, to store
VF MAC address in PF's context. This will allow "ip link show"
to display all VF MAC addresses.
Maintain 2 locations of MAC address in VF info structure, one for
a PF assigned MAC and one for VF assigned MAC.
Display VF assigned MAC in "ip link show", only if PF assigned MAC is
not valid.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 91cdda40714178497cbd182261b2ea6ec5cb9276) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 92abef361bd233ea2a99db9e9a637626f523f82e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
bnxt_check_rings() is called by ethtool, XDP setup, and ndo_setup_tc()
to see if there are enough resources to support the new configuration.
Expand the call to test all resources if the firmware supports the new
API. With the more flexible resource allocation scheme, this call must
be made to check that all resources are available before committing to
allocate the resources.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8f23d638b36b4ff0fe5785cf01f9bdc41afb9c06) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
Instead of the old method of evenly dividing the resources to the VFs,
use the new firmware API to specify min and max resources for each VF.
This way, there is more flexibility for each VF to allocate more or less
resources.
The min is the absolute minimum for each VF to function. The max is the
global resources minus the resources used by the PF. Each VF is
guaranteed the min. Up to max resources may be available for some VFs.
The PF driver can use one of 2 strategies specified in NVRAM to assign
the resources. The old legacy strategy of evenly dividing the resources
or the new flexible strategy.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4673d66468b80dc37abd1159a4bd038128173d48) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
In bnxt_rfs_capable(), add call to reserve vnic resources to support
NTUPLE. Return true if we can successfully reserve enough vnics.
Otherwise, reserve the minimum 1 VNIC for normal operations not
supporting NTUPLE and return false.
Also, suppress warning message about not enough resources for NTUPLE when
only 1 RX ring is in use. NTUPLE filters by definition require multiple
RX rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a1eef5b9079742ecfad647892669bd5fe6b0e3f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The new method will call firmware to reserve the desired tx, rx, cmpl
rings, ring groups, stats context, and vnic resources. A second query
call will check the actual resources that firmware is able to reserve.
The driver will then trim and adjust based on the actual resources
provided by firmware. The driver will then reserve the final resources
in use.
This method is a more flexible way of using hardware resources. The
resources are not fixed and can by adjusted by firmware. The driver
adapts to the available resources that the firmware can reserve for
the driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 674f50a5b026151f4109992cb594d89f5334adde) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
In combined mode, the driver is currently not setting RX and TX ring
numbers the same when firmware can allocate more RX than TX or vice versa.
This will confuse the user as the ethtool convention assumes they are the
same in combined mode. Fix it by adding bnxt_trim_dflt_sh_rings() to trim
RX and TX ring numbers to be the same as the completion ring number in
combined mode.
Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
will not be the same as RX rings in combined mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 58ea801ac4c166cdcaa399ce7f9b3e9095ff2842) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
resources. Use the new API when it is supported by firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit be0dd9c4100c9549fe50258e3d928072e6c31590) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
In preparation for new firmware APIs to allocate hardware resources,
add a new struct bnxt_hw_resc to hold various min, max and reserved
resources. This new structure is common for PFs and VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a4f29470569c5a158c1871a2f752ca22e433420) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
After SRIOV has been enabled and disabled, the MSIX vectors assigned to
the VFs have to be re-initialized. Otherwise they cannot be re-used by
the PF. For example, increasing the number of PF rings after disabling
SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.
To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
NIC, clear and re-init MSIX, and re-open the NIC.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 80fcaf46c09262a71f32bb577c976814c922f864) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
Add a new __bnxt_close_nic() function to do all the work previously done
in bnxt_close_nic() except waiting for SRIOV configuration. The new
function will be used in the next patch as part of SRIOV cleanup.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 86e953db0114f396f916344395160aa267bf2627) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
The version has new firmware APIs to allocate PF/VF resources more
flexibly.
New toolchains were used to generate this file, resulting in a one-time
large diffstat.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 894aa69a90932907f3de9d849ab9970884151d0e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 78f300049335ae81a5cc6b4b232481dc5e1f9d41) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and 903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:
This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.
Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2edbdb3159d6f6bd3a9b6e7f789f2b879699a519) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
On some dual port NICs, the 2 ports have to be configured with compatible
link speeds. Under some conditions, a port's configured speed may no
longer be supported. The firmware will send a message to the driver
when this happens.
Improve this logic that prints out the warning by only printing it if
we can determine the link speed that is no longer supported. If the
speed is unknown or it is in autoneg mode, skip the warning message.
Reported-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a8168b6cee6e9334dfebb4b9108e8d73794f6088) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
short_input variable is assigned to another data pointer which is
referred out of its scope. Fix it by moving short_input definition
to the beginning of bnxt_hwrm_do_send_msg() function.
No failure has been reported so far due to this issue.
Fixes: e605db801bde ("bnxt_en: Support for Short Firmware Message") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ebd5818cc5d4847897d7fe872e2d9799d7b7fcbb) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The current 'bnxt_shutdown' implementation only invokes
'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in
the path of power off (SYSTEM_POWER_OFF). While this may work in most
cases, it does not work in the smart NIC case, when Linux 'reboot'
command is initiated from the Linux that runs on the ARM cores of the
NIC card. In this particular case, Linux 'reboot' results in a system
'L3' level reset where the entire ARM and associated subsystems are
being reset, but at the same time, Nitro core is being kept in sane state
(to allow external PCIe connected servers to continue to work). Without
properly shutting down RoCE and freeing all associated resources, it
results in the ARM core to hang immediately after the 'reboot'
By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the
above issue
Fixes: 0efd2fc65c92 ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.") Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a7f3f939dd7d8398acebecd1ceb2e9e7ffbe91d2) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Error code returned by 'bnxt_read_sfp_module_eeprom_info()' is handled a
few lines above when reading the A0 portion of the EEPROM.
The same should be done when reading the A2 portion of the EEPROM.
In order to correctly propagate an error, update 'rc' in this 2nd call as
well, otherwise 0 (success) is returned.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dea521a2b9f96e905fa2bb2f95e23ec00c2ec436) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
On 32-bit architectures, rtc_time_to_tm() returns incorrect results
in 2038 or later, and do_gettimeofday() is broken for the same reason.
This changes the code to use ktime_get_real_seconds() and time64_to_tm()
instead, both of them are 2038-safe, and we can also get rid of the
CONFIG_RTC_LIB dependency that way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7dfaa7bc99498da1c6c4a48bee8d2d5265161a8c) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Recent IRQ coalescing clean up has removed a guard-rail for the max DMA
buffer coalescing value. This is a 6-bit value and must not be 0. We
already have a check for 0 but 64 is equivalent to 0 and will cause
non-stop interrupts. Fix it by adding the proper check.
Fixes: f8503969d27b ("bnxt_en: Refactor and simplify coalescing code.") Reported-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b153cbc507946f52d5aa687fd64f45d82cb36a3b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Recent refactoring of coalesce settings contained a typo that prevents
receive settings from being set properly.
Fixes: 18775aa8a91f ("bnxt_en: Reorganize the coalescing parameters.") Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit de4a10ef6eff0eb0ced97a39dc3edd0d3101b6ed) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The mapping of the ethtool coalescing parameters to hardware parameters
is now done in bnxt_hwrm_set_coal_params(). The same function can
handle both RX and TX settings. The code is now more clear. Some
adjustments have been made to get better hardware settings. The
coal_frames setting is now accurately set in hardware. The max_timer
is set to coal_ticks value.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f8503969d27b2b26ff0adbce4b7d7cf4ba5e43c2) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The current IRQ coalescing logic is a little messy. The ethtool
parameters are mapped to hardware parameters in a way that is difficult
to understand. The first step is to better organize the parameters
by adding the new structure bnxt_coal. The structure is used by both
the RX and TX sets of coalescing parameters.
Adjust the default coal_ticks to 14 us and 28 us for RX and TX.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18775aa8a91fcd4cd07c722d575b4b852e3624c3) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
This is a firmware internal reset after driver is unloaded.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 49f7972fd16407b3d1f03c2d447d2f1e1b95e9ba) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
No need to call bnxt_approve_mac() which will send a message to the
PF if the MAC address hasn't changed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c1a7bdff17247332ecff7f243e42d269b3f74c65) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The current code retrieves the firmware package version from firmware
everytime ethtool -i is run. There is no reason to do that as the
firmware will not change while the driver is loaded. Get the version
once at init time.
Also, display the full 4-part firmware version string and remove the
less useful interface spec version.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 431aa1eb20d8ae2674723292adb832b968da868e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Return -EINVAL if the length is zero and not proceed to do essentially
nothing.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e0ad8fc5980b362028cfd63ec037f4b491e726c6) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Rob Miller <rmiller@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 618784e3ee1870e43e50e1c7922cc123cc050566) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Add new PCIe device ID and chip number for bcm58804
Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8ed693b7bbd179949f6947adaae5eff2e386a534) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Vxlan encap/decap filters are added to this firmware spec.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57922b0a2f7ef9effbcdbbf7d1f8dad95aa567f7) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
hwrm_send_message() is replaced with _hwrm_send_message(), and
hwrm_cmd_lock mutex lock is grabbed for the whole period of
firmware call until the firmware DCB parameters have been copied.
This will prevent possible corruption of the firmware data.
Fixes: 7df4ae9fe855 ("bnxt_en: Implement DCBNL to support host-based DCBX.") Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b1e1a9ce06fd94b563d6c3dd896589231995d89) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
In bnxt_sriov_enable(), we calculate to see if we have enough hardware
resources to enable the requested number of VFs. The logic to check
for minimum completion rings and statistics contexts is missing. Add
the required checks so that VF configuration won't fail.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 021570793d8cd86cb62ac038c535f4450586b454) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
PCIE PCIE_EP_REG_LINK_STATUS_CONTROL register is only defined in PF
config space, so we must read it from the PF.
Fixes: 90c4f788f6c0 ("bnxt_en: Report PCIe link speed and width during driver load") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7ab0760f5178169c4c218852f51646ea90817d7c) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
As a further improvement to the PF/VF link change logic, use a private
mutex instead of the rtnl lock to protect link change logic. With the
new mutex, we don't have to take the rtnl lock in the workqueue when
we have to handle link related functions. If the VF and PF drivers
are running on the same host and both take the rtnl lock and one is
waiting for the other, it will cause timeout. This patch fixes these
timeouts.
Fixes: 90c694bb7181 ("bnxt_en: Fix RTNL lock usage on bnxt_update_link().") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e2dc9b6e38fa3919e63d6d7905da70ca41cbf908) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Link status query firmware messages originating from the VFs are forwarded
to the PF. The driver handles these interactions in a workqueue for the
VF and PF. The VF driver waits for the response from the PF in the
workqueue. If the PF and VF driver are running on the same host and the
work for both PF and VF are queued on the same workqueue, the VF driver
may not get the response if the PF work item is queued behind it on the
same workqueue. This will lead to the VF link query message timing out.
To prevent this, we create a private workqueue for PFs instead of using
the common workqueue. The VF query and PF response will never be on
the same workqueue.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c213eae8d3cd4c026f348ce4fd64f4754b3acf2b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1fac4b2fdbccab69cb781aae68f540be94d5549e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Use setup_timer function instead of initializing timer with the
function and data fields.
Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c43824477c2ac722325ba460c2ce683c48fb76b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d5430d31ca72ec37fd539fd1c5230859509be4ef) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget. This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers. Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.
Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 903649e718f80da2ba4b65a0adf6930219b4b2e5) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
initialize board_info values with proper enums for defensive programming
purposes. This will avoid any errors of the enums being declared not
lining up with the board_info array.
Signed-off-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 27573a7d905a49dc756fda9c0e148372136356e6) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later
Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a58139b8493624c6c6223b58a9e70ebbdf56338) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 56f0fd80d1886479a42ac07ed239538eb145a669) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC. If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.
The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings. We fix it as follows:
1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available. There is a flag in
the firmware message to do that. If not available, abort and the
current TX rings will not be disabled.
2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved. If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 98fdbe73bfb809b1f8eec9f27a36e737caed3a44) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6a17eb27bf7ece364627fcf16ad50c24b793300b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
bnxt_hwrm_func_qcaps() is called during probe to get all device
resources and it also sets up the factory MAC address. The same function
is called when SRIOV is disabled to reclaim all resources. If
the MAC address has been overridden by a user administered MAC
address, calling this function will overwrite it.
Separate the logic that sets up the default MAC address into a new
function bnxt_init_mac_addr() that is only called during probe time.
Fixes: 4a21b49b34c0 ("bnxt_en: Improve VF resource accounting.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a22a6ac2ff8080c87e446e20592725c064229c71) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Take back ownership of the MSIX vectors when unregistering the device
from bnxt_re.
Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 146ed3c5b87d8c65ec31bc56df26f027fe624b8f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
When the number of TX rings is changed in bnxt_setup_tc(), we need to
include the XDP rings in the total TX ring count.
Fixes: 38413406277f ("bnxt_en: Add support for XDP_TX action.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 87e9b3778c94694c9e098c91a0cc05725f0e017f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Fix a couple of warnings where variable ‘txq’ set but not used
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Acked-by: Michael Chan <michael.chan@broadcom.com>v, i); Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 351bac30613378c4684d4673aac0c7917980a652) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Fixes warning because location is u32 and can never be netative
warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b721cfaf03bcaac0a3abf702c4240326eed9e4b1) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Fixes: c124a62ff2dd ("bnxt_en: add support for port_attr_get and and get_phys_port_name") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc88055ab72c0eaa080926c888628b77d2055513) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
In addition to the ETS weight, older firmware also requires the min_bw
parameter to be set for it to work properly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 70098a47bbf131b65c64ca935c2480e64c9c7c51) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Jon Maxwell [Fri, 10 Mar 2017 05:40:33 +0000 (16:40 +1100)]
dccp/tcp: fix routing redirect race
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:
But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.
All the vmcores showed 2 significant clues:
- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.
- All vmcores showed a postitive LockDroppedIcmps value, e.g:
LockDroppedIcmps 267
A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:
do_redirect()->__sk_dst_check()-> dst_release().
Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.
To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.
The dccp/IPv6 code is very similar in this respect, so fixing it there too.
As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().
Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0)
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
This change was introduced to address the RDS internal sendQ
occupancy for the messages targeted to dead/non-existing nodes. It was
discovered as part of a customer issue where remote node was shut down
and RDS attempted to repeatedly establish a connection without success.
Same issue can be exploited by sending messages to non-existing node
too since RDS forms connection as part of sendmsg if it doesn't
exist already.
While at that time sending EAGAIN instead of adding messages to
sendQ when remote connection not up, looked straightforward, it
has undesired effect on application to keep spinning even though
there is space to write on socket buffer. And application has
no notion of underneath connections, so RDS needs to handle this
problem internally and transparently. Application will automatically
move to POLL OUT once its own socket buffer is full and will
avoid the CPU tight spinning.
To address draining the internal sendQ messages targeted to
dead nodes or non-existing nodes, one possible way is to
retire/destroy those connections, after some large timeout. That
will also drop those messages from sendQ. This change will be
addressed separately.
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
If the driver receives a TX CQE with status as 0x1 or 0x9 or 0xb,
the completion indexes should not be used. The driver must stop
consuming CQEs from this TXQ/CQ. The TXQ from this point on-wards
to be in a bad state. Driver should destroy and recreate the TXQ.
Reset the adapter if driver sees this error in TX completion. Also
adding sge error counter in ethtool stats.
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ffc39620102dfe62711fadb9a297b66aee816013) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Lancer HW cannot handle a TSO packet with a single segment.
Disable TSO/GSO for such packets.
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3df40aad1a864af124bd50a1371ef16089ac9af2) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Variable head is initialized to a value that is never read and is
being updated to a new value a few lines later, hence this
initialization is redundant and can be safely removed as well
as the now unused pointer txq.
Cleans up clang warning:
drivers/net/ethernet/emulex/benet/be_main.c:996:6: warning: Value
stored to 'head' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2e85283dabc22f4715b136e8a7426bd9bef4ce69) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Resolved Warning: networking block comments don't use an empty /* line,
use /* Comment...
Issue found by checkpatch.
Signed-off-by: Rohit Visavalia <rohit.visavalia@softnautics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5f834cf4b7c50d2172d9f2307499e6b64b7504ac) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.
Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.
Cc: Sathya Perla <sathya.perla@broadcom.com> Cc: Ajit Khaparde <ajit.khaparde@broadcom.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 52acf06451930eb4cefabd5ecea56e2d46c32f76) Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Use the ARRAY_SIZE macro on array cmd_priv_map to determine size of the
array. Improvement suggested by coccinelle.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Joao Martins [Mon, 5 Mar 2018 13:29:25 +0000 (13:29 +0000)]
xen/acpi: upload _PSD info for non-dom0 CPUs too
All uploaded PM data from non-dom0 CPUs takes the info from CPU 0 with a
different acpi_id. For processors which P-state coordination type is
HW_ALL (0xFD) it is OK to upload bogus P-state dependency information
(_PSD), because Xen will ignore any domains created for past CPUs.
Albeit for platforms which expose coordination types as SW_ANY or
SW_ALL, this will have some unintended side effects. Effectively, it
will look at the P-state domain existence and *if it already exists* it
will skip the acpi-cpufreq initialization and thus inherit the policy
from the first CPU in the cpufreq domain. Finally it and won't change
the original cpu target freq to P0 other than the first in the domain.
Which will make turbo boost not getting enabled (e.g. for 'performance'
governor) for all cpus and instead only those with unique P-state
domains.
This patch fixes that, by also evaluating _PSD when enumerate all ACPI
procesors and uploading that instead.
Orabug: 27655759 Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Shih-Yu Huang <shih-yu.huang@oracle.com> Reviewed-by: Ross Philipson <ross.philipson@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 128bddacc4dd7c86070e1e0534687e3083a89d52) Signed-off-by: Dick dkennedy <dick.kennedy@broadcom.com>
Conflicts:
drivers/scsi/lpfc/lpfc_nvme.c
drivers/scsi/lpfc/lpfc_nvmet.c
drivers/scsi/lpfc/lpfc_nvmet.h
I had to remover these from the patch Signed-off-by: Dan Duval <dan.duval@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6e9d2f1667ea12bd2f997a7529fb41cce8e0036d) Signed-off-by: Dick dkennedy <dick.kennedy@broadcom.com> Signed-off-by: Dan Duval <dan.duval@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>