www.infradead.org Git - users/jedix/linux-maple.git/log

btrfs: make use of btrfs_find_device_by_user_input()

btrfs_rm_device() has a section of the code which can be replaced
btrfs_find_device_by_user_input()

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Orabug: 26287586

(Cherry picked from commit 24fc572fe456c02ff4136c07861a3edd4b8de683)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

btrfs: create helper btrfs_find_device_by_user_input()

The patch renames btrfs_dev_replace_find_srcdev() to
btrfs_find_device_by_user_input() and moves it to volumes.c, so that
delete device can use it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Orabug: 26287586

(Cherry picked from commit 24e0474b59538cdb9d2b7318ec7c7ae9f6faf85d)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

btrfs: clean up and optimize __check_raid_min_device()

__check_raid_min_device() which was pealed from btrfs_rm_device()
maintianed its original code to show the block move. This patch cleans up
__check_raid_min_device().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Orabug: 26287586

(Cherry picked from commit bd45ffbcb1f082e3c5a0bd56b1d7310d8b707ffb,
drop the second argument of the btrfs_dev_replace_lock/unlock to
match with the current kernel api)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

btrfs: create helper function __check_raid_min_devices()

move a section of btrfs_rm_device() code to check for min number of the
devices into the function __check_raid_min_devices()

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Orabug: 26287586

(Cherry picked from commit f1fa7f264250f2cb60119aca3fd114c3f47070c2,
drop the second argument of the btrfs_dev_replace_lock/unlock to
match with the current kernel api)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

SPARC64: Correct ATU IOTSB binding flow

Any PCIe device attempting to use ATU for 64-bit DMA must be successfully
bound to IOTSB otherwise DMA access from device will be rejected by
PCIe root complex.

In the current code, ATU initialization and PCIe device binding to IOTSB
is done during system boot when PCI Bus Module (PBM) is initialized.
(PBM driver traverse through the PCI tree and binds each PCI device to
IOTSB).

In case if user/system add SRIOV VFs later on (either after system is up
or when PF driver is loaded), the new VF devices never get bounded to
IOTSB. So when attempt is made from VF devices to access DMA region, ATU
HW will flag it as invalid access; generates CTE_Invalid errors and VF
driver/device fails to perform DMA.

This patch fixes the aforementioned issue by delaying PCIe device
binding to IOTSB only when driver requests DMA setting from kernel.
Doing it this way makes sure every PCIe device requesting 64bit DMA mask
(and therefore attempting to use ATU) will get bind to IOTSB and can do
successful DMA operations.

Orabug: 25177689
Orabug: 25450353

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Govinda Tatti <Govinda.Tatti@Oracle.COM>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: Introduce IOMMU BYPASS method

This change adds IOMMU BYPASS HV API that will allow PCIe devices to
bypass IOMMU during DMA map/unmap.

Orabug: 25573557

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

i40e: Revert i40e temporary workaround

ATU and Hypervisor IOMMU v2 APIs enable 64bit DMA addressing so
i40e temporary workaround no longer needed.

This patch revert uek4 commit f59e8082fb57c0009
("i40e: Temporary workaround for DMA map issue").

Orabug: 23239179

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Enable 64-bit DMA

ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Orabug: 23239179

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Orabug: 23239179

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Bind PCIe devices to use IOMMU v2 service

In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Orabug: 23239179

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Initialize iommu_map_table and iommu_pool

Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Orabug: 23239179

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Add ATU (new IOMMU) support

ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Orabug: 23239179

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Make FORCE_MAX_ZONEORDER to 13 for ATU

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Orabug: 23239179

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

Revert "sparc64: bypass iommu to use 64bit address space"

This reverts commit 34edb40e5e13cba76ad13646d3a1823d877851c6.

Conflicts:
arch/sparc/kernel/pci_sun4v.c

Orabug: 21149316

Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

i40e: fix annoying message

The driver was printing a message about not being able
to assign VMDq because of a lack of MSI-X vectors.

This was because a line was missing that initialized a variable,
simply a merge error.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26409501
(cherry picked from commit e9e53662d8130dd950885e37dc1d97008e1283f9)
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

watchdog: Move hardlockup detector to separate file

Separate hardlockup code from watchdog.c and move it to watchdog_hld.c.
It is mostly straight forward. Remove everything inside
CONFIG_HARDLOCKUP_DETECTORS. This code will go to file watchdog_hld.c.
Also update the makefile accordigly.

Orabug: 24796651

Reviewed-by: Karl Volz <karl.volz@oracle.com>
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

watchdog: Move shared definitions to nmi.h

Move shared macros and definitions to nmi.h so that watchdog.c,
new file watchdog_hld.c or any other architecture specific handler
can use those definitions.

Orabug: 24796651

Reviewed-by: Karl Volz <karl.volz@oracle.com>
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Suppress kmalloc (DAX driver) warning due to allocation failure

dax_alloc is used by libdax to allocate 4MB chunks of physically
contiguous memory using kmalloc. It is normal for dax_alloc to fail
when kmalloc runs out of memory and this should not be treated as a warning.
This failure should be caught in libdax and attempted to allocate memory
from another source (Eg: mmap hugepage).

Orabug: 26224254

Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

i40evf: Use le32_to_cpu before evaluating HW desc fields.

i40e hardware descriptor fields are in little-endian format. Driver
must use le32_to_cpu while evaluating these fields otherwise on
big-endian arch we end up evaluating incorrect values, cause errors
like:

i40evf 0001:04:02.0: Expected response 0 from PF, received 285212672
i40evf 0001:04:02.1: Expected response 0 from PF, received 285212672

Orabug: 25577233

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: revert pause instruction patch for atomic backoff and cpu_relax()

This patch reverts the commit e9b9eb59ffcdee09ec96b040f85c919618f4043e
("sparc64: Use pause instruction when available").

This all started with our TPCC results on UEK4. During T7 testing, the TPCC results
were much lower compared to UEK2. The atomic calls like atomic_add and atomic_sub
were showing top on perf results. Karl found out that this was caused by
the upstream commit e9b9eb59ffcdee09ec96b040f85c919618f4043e
(sparc64: Use pause instruction when available). After reverting this commit on UEK4,
the TPCC numbers were back to UEK2 level. However, things changed after Atish's
scheduler fixes on UEK4. The TPCC numbers improved and the upstream commit
(sparc64: Use pause instruction when available) did not seem make any difference.
So, Karl's "revert pause instruction" patch was removed from UEK4.

Now again with T8 testing, we are seeing the same old behaviour. The atomic calls
like atomic_add and atomic_sub are showing top on perf results. After trying with
Karl's patch(revert pause instruction patch for atomic backoff) the TPCC numbers
improved(about %25 better than T7) and atomic calls are not showing on top in perf.

So, we are adding this patch back again. This is a temporary fix. Long term solution
is still in the discussion. The original patch is from Karl.
http://ca-git.us.oracle.com/?p=linux-uek-sparc.git;a=commit;h=f214eebf2223d23a2b1499be5b54719bdd7651e3

All the credit should go to Karl. Rebased it on latest sparc tree.

Orabug: 26306832

Signed-off-by: Karl Volz <karl.volz@Oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Karl Volz <karl.volz@Oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

be2net: Update the driver version to 11.4.0.0

Orabug: 26403655

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: Fix UE detection logic for BE3

On certain platforms BE3 chips may indicate spurious UEs (unrecoverable
error). Because of the UE detection logic was disabled in the driver
for BE3 chips. Because of this, even in cases of a real UE,
a failover will not occur. This patch re-enables UE detection on BE3
and if a UE is detected, reads the POST register. If the POST register,
reports either a FAT_LOG_STATE or a ARMFW_UE, then it means that a valid
UE occurred in the chip.

Orabug: 26403655

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: Fix offload features for Q-in-Q packets

At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.

This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.

Orabug: 26403655

CC: Sathya Perla <sathya.perla@broadcom.com>
CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
CC: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

benet: Use time_before_eq for time comparison

Use time_before_eq for time comparison more safe and dealing
with timer wrapping to be future-proof.

Orabug: 26403655

Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: Fix endian issue in logical link config command

Use cpu_to_le32() for link_config variable in set_logical_link_config
command as this variable is of type u32.

Orabug: 26403655

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: fix initial MAC setting

Recent commit 34393529163a ("be2net: fix MAC addr setting on privileged
BE3 VFs") allows privileged BE3 VFs to set its MAC address during
initialization. Although the initial MAC for such VFs is already
programmed by parent PF the subsequent setting performed by VF is OK,
but in certain cases (after fresh boot) this command in VF can fail.

The MAC should be initialized only when:
1) no MAC is programmed (always except BE3 VFs during first init)
2) programmed MAC is different from requested (e.g. MAC is set when
   interface is down). In this case the initial MAC programmed by PF
   needs to be deleted.

The adapter->dev_mac contains MAC address currently programmed in HW so
it should be zeroed when the MAC is deleted from HW and should not be
filled when MAC is set when interface is down in be_mac_addr_set() as
no programming is performed in this case.

Example of failure without the fix (immediately after fresh boot):

be2net 0000:01:00.0 eth0: Link is Up

...
be2net 0000:01:04.0: Emulex OneConnect(be3): VF  port 0

be2net 0000:01:04.0: opcode 59-1 failed:status 1-76
RTNETLINK answers: Input/output error

iommu: Removing device 0000:01:04.0 from group 33
...

iommu: Removing device 0000:01:04.0 from group 33
...

be2net 0000:01:04.0 eth8: Link is Up

Initialization is now OK.

v2 - Corrected the comment and condition check suggested by Suresh & Harsha

Orabug: 26403655

Fixes: 34393529163a ("be2net: fix MAC addr setting on privileged BE3 VFs")
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

drivers: net: generalize napi_complete_done()

napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396ba
("net: gro: add a per device gro flush timer")

This allows for more efficient GRO aggregation without
sacrifying latencies.

Orabug: 26403655

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: fix MAC addr setting on privileged BE3 VFs

During interface opening MAC address stored in netdev->dev_addr is
programmed in the HW with exception of BE3 VFs where the initial
MAC is programmed by parent PF. This is OK when MAC address is not
changed when an interfaces is down. In this case the requested MAC is
stored to netdev->dev_addr and later is stored into HW during opening.
But this is not done for all BE3 VFs so the NIC HW does not know
anything about this change and all traffic is filtered.

This is the case of bonding if fail_over_mac == 0 where the MACs of
the slaves are changed while they are down.

The be2net behavior is too restrictive because if a BE3 VF has
the FILTMGMT privilege then it is able to modify its MAC without
any restriction.

To solve the described problem the driver should take care about these
privileged BE3 VFs so the MAC is programmed during opening. And by
contrast unpriviled BE3 VFs should not be allowed to change its MAC
in any case.

Orabug: 26403655

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: fix unicast list filling

The adapter->pmac_id[0] item is used for primary MAC address but
this is not true for adapter->uc_list[0] as is assumed in
be_set_uc_list(). There are N UC addresses copied first from net_device
to adapter->uc_list[1..N] and then N UC addresses from
adapter->uc_list[0..N-1] are sent to HW. So the last UC address is never
stored into HW and address 00:00:00:00;00:00 (from uc_list[0]) is used
instead.

Orabug: 26403655

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Fixes: b717241 be2net: replace polling with sleeping in the FW completion path
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: fix accesses to unicast list

Commit 988d44b "be2net: Avoid redundant addition of mac address in HW"
introduced be_dev_mac_add & be_uc_mac_add helpers that incorrectly
access adapter->uc_list as an array of bytes instead of an array of
be_eth_addr. Consequently NIC is not filled with valid data so unicast
filtering is broken.

Orabug: 26403655

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Fixes: 988d44b be2net: Avoid redundant addition of mac address in HW
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: fix non static symbol warnings

Fixes the following sparse warnings:

drivers/net/ethernet/emulex/benet/be_main.c:47:25: warning:
symbol 'be_err_recovery_workq' was not declared. Should it be static?
drivers/net/ethernet/emulex/benet/be_main.c:63:25: warning:
symbol 'be_wq' was not declared. Should it be static?

Orabug: 26403655

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: Avoid redundant addition of mac address in HW

If a mac address is added to the uc_list and later the same mac address
is added via ndo_set_mac_address() or vice versa, the driver does not
detect this condition and tries to add it again. This results in a mac
address collision error when the FW rejects it.

Fix this by checking if the given mac address is present in uc_list while
setting the device mac address and vice versa. Similarly skip deletion if
the address is still in use in the other form.

Orabug: 26403655

Signed-off-by: Suresh Reddy <Suresh.Reddy@broadcom.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: Support UE recovery in BEx/Skyhawk adapters

This patch supports recovery from UEs caused due to Transient Parity
Errors (TPE), in BE2, BE3 and Skyhawk adapters. This change avoids
system reboot when such errors occur. The driver recovers from these
errors such that the adapter resumes full operational status as prior
to the UE.

Following is the list of changes in the driver to support this:

o The driver registers its UE recoverable capability with ARM FW at init
time. This also allows the driver to know if the feature is supported in
the FW.

o As the UE recovery requires precise time bound processing, the driver
creates its own error recovery work queue with a single worker thread (per
module, shared across functions).

o Each function runs an error detection task at an interval of 1 second as
required by the FW. The error detection logic already exists for BEx/SH,
but it now runs in the context of a separate worker thread.

o When an error is detected by the task, if it is recoverable, the PF0
driver instance initiates a soft reset, while other PF driver instances
wait for the reset to complete and the chip to become ready. Once
the chip is ready, all driver instances including PF0, resume to
reinitialize the respective functions.

o The PF0 driver checks for some recovery criteria, to determine if the
recovery can be initiated. If the criteria is not met, the PF0 driver does
not initiate a soft reset, it retains the existing behavior to stop
further processing and requires a reboot to get the chip to operational
state again.

o To allow each function to share the workq, while also making progress in
its recovery process, a per-function recovery state machine is used.
The per-function tasks avoid blocking operations like msleep() while in
this state machine (until reinit state) and instead reschedule for the
required delay.

o With these changes, the existing error recovery code for Lancer also
runs in the context of the new worker thread.

Orabug: 26403655

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: replace polling with sleeping in the FW completion path

The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with
BHs disabled. The driver currently issues the required cmds to the FW in
these contexts and polls on completions from the FW, while BHs remain
disabled. This can cause either packet loss or packet reception to be
delayed on that CPU.

This patch defers processing of the above cmds to a separate workqueue.
With this change, FW cmds are now issued only in process context.
Now that the FW cmds are issued only in process context, they can sleep
waiting for a completion instead of polling. All the spin_lock_bh(mcc_lock)
calls are now replaced with mutex calls.

Also a new rx_filter_lock is now needed to protect the RX filtering fields
like vids[] between be_vlan_add/rem_vid() and __be_set_rx_mode() contexts.

Orabug: 26403655

Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

be2net: support asymmetric rx/tx queue counts

be2net so far supported creation of RX/TX queues only in pairs.
On configs where rx and tx queue counts are different, creation of only
the lesser number of queues has been supported.

This patch now allows a combination of RX/TX-only channels along with
combined channels. N TX-queues and M RX-queues can be created with the
following cmds:
ethtool -L ethX combined N rx M-N (when N < M)
ethtool -L ethX combined M tx N-M (when M < N)

Setting both RX-only and TX-only channels is still not supported.
It is mandatory to create atleast one combined channel.

Orabug: 26403655

Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

net/rds: Add mutex exclusion for vector_load

Several TOS connections may invoke ibdev_get_unused_vector()
concurrently. Hence, mutual exclusion is required to protect
rds_ibdev->vector_load.

Changes from v1:
- Reworded commit message and commentary based on Avinash's feedback
- Added r-b from Wei Lin and Avinash

Orabug: 26406492

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>

net: properly release sk_frag.page

I mistakenly added the code to release sk->sk_frag in
sk_common_release() instead of sk_destruct()

TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call
sk_common_release() at close time, thus leaking one (order-3) page.

iSCSI is using such sockets.

Fixes: 5640f7685831 ("net: use a per task frag allocator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26409538

(cherry picked from commit 22a0e18eac7a9e986fec76c60fa4a2926d1291e2)

Signed-off-by: Ghazale Hosseinabadi <ghazale.hosseinabadi@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

i40e: Correct the macros for setting the DMA attributes

The I40E_DMA_ATTRS() macro was not setting the WEAK_ORDERING as
intended, this change corrects the issue.

Orabug: 26379170
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

bnxt_en: Fix netpoll handling.

Orabug: 26402533, 26325599, 26366387

To handle netpoll properly, the driver must only handle TX packets
during NAPI. Handling RX events cause warnings and errors in
netpoll mode. The ndo_poll_controller() method should call
napi_schedule() directly so that a NAPI weight of zero will be used
during netpoll mode.

The bnxt_en driver supports 2 ring modes: combined, and separate rx/tx.
In separate rx/tx mode, the ndo_poll_controller() method will only
process the tx rings. In combined mode, the rx and tx completion
entries are mixed in the completion ring and we need to drop the rx
entries and recycle the rx buffers.

Add a function bnxt_force_rx_discard() to handle this in netpoll mode
when we see rx entries in combined ring mode.

Reported-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2270bc5da34979454e6f2eb133d800b635156174)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add missing logic to handle TPA end error conditions.

Orabug: 26402533, 26325599, 26366387

When we get a TPA_END completion to handle a completed LRO packet, it
is possible that hardware would indicate errors. The current code is
not checking for the error condition. Define the proper error bits and
the macro to check for this error and abort properly.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 69c149e2e39e8d66437c9034bb4926ef2c1f7c23)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Fix xmit_more with BQL.

Orabug: 26402533, 26325599, 26366387

We need to write the doorbell if BQL has stopped the queue and
skb->xmit_more is set. Otherwise it is possible for the tx queue to
rot and cause tx timeout.

Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.")
Suggested-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ffe406457753a7ca2061ecc8c4d3971623066911)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Pass in sh parameter to bnxt_set_dflt_rings().

Orabug: 26402533, 26325599, 26366387

In the existing code, the local variable sh is hardcoded to true to
calculate default rings for shared ring configuration. It is better
to have the caller determine the value of sh.

Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 702c221ca64060b81af4461553be19cba275da8b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Implement xmit_more.

Orabug: 26402533, 26325599, 26366387

Do not write the TX doorbell if skb->xmit_more is set unless the TX
queue is full.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4d172f21cefe896df8477940269b8d52129f8c87)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Optimize doorbell write operations for newer chips.

Orabug: 26402533, 26325599, 26366387

Older chips require the doorbells to be written twice, but newer chips
do not. Add a new common function bnxt_db_write() to write all
doorbells appropriately depending on the chip. Eliminating the extra
doorbell on newer chips has a significant performance improvement
on pktgen.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 434c975a8fe2f70b70ac09ea5ddd008e0528adfa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add additional chip ID definitions.

Orabug: 26402533, 26325599, 26366387

Add additional chip definitions and macros for all supported chips.
Add a new macro BNXT_CHIP_P4_PLUS for the newer generation of chips and
use the macro to properly determine the features supported by these
newer chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3284f9e1ab505b41fa604c81e4b3271c6b88cdcb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.

Orabug: 26402533, 26325599, 26366387

When bnxt_en gets a PCI shutdown call, we need to have a new callback
to inform the RDMA driver to do proper shutdown and removal.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0efd2fc65c922dff207ff10a776a7a33e0e3c7c5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add PCI IDs for BCM57454 VF devices.

Orabug: 26402533, 26325599, 26366387

Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c7ef35eb0c8d0b58d2d5ae5be599e6aa730361b2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Support for Short Firmware Message

Orabug: 26402533, 26325599, 26366387

The new short message format is used on the new BCM57454 VFs. Each
firmware message is a fixed 16-byte message sent using the standard
firmware communication channel. The short message has a DMA address
pointing to the legacy long firmware message.

Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e605db801bdeb9d94cccbd4a2f641030067ef008)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST.

Orabug: 26402533, 26325599, 26366387

Otherwise, all the host based DCBX settings from lldpad will fail if the
firmware DCBX agent is running.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f667724b99ad1afc91f16064d8fb293d2805bd57)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.

Orabug: 26402533, 26325599, 26366387

In the current code, bnxt_dcb_init() is called too early before we
determine if the firmware DCBX agent is running or not. As a result,
we are not setting the DCB_CAP_DCBX_HOST and DCB_CAP_DCBX_LLD_MANAGED
flags properly to report to DCBNL.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 87fe603274aa9889c05cca3c3e45675e1997cb13)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt: add dma mapping attributes

On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING
attribute in our dma mapping in order to get the expected performance
out of the receive path. Setting this boosts a simple iperf receive
session from 2 Gbe to 23.4 Gbe.

Orabug: 26366387

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: allocate enough space for ->ntp_fltr_bmap

Orabug: 26402533, 26325599, 26366387

We have the number of longs, but we need to calculate the number of
bytes required.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ac45bd93a5035c2f39c9862b8b6ed692db0fdc87)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Restrict a PF in Multi-Host mode from changing port PHY configuration

Orabug: 26402533, 26325599, 26366387

This change restricts the PF in multi-host mode from setting any port
level PHY configuration. The settings are controlled by firmware in
Multi-Host mode.

Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9e54e322ded40f424dcb5a13508e2556919ce12a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Check the FW_LLDP_AGENT flag before allowing DCBX host agent.

Orabug: 26402533, 26325599, 26366387

Check the additional flag in bnxt_hwrm_func_qcfg() before allowing
DCBX to be done in host mode.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7d63818a35851cf00867248d5ab50a8fe8df5943)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add 100G link speed reporting for BCM57454 ASIC in ethtool

Orabug: 26402533, 26325599, 26366387

Added support for 100G link speed reporting for Broadcom BCM57454
ASIC in ethtool command.

Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 38a21b34aacd4db7b7b74c61afae42ea6718448d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Fix VF attributes reporting.

Orabug: 26402533, 26325599, 26366387

The .ndo_get_vf_config() is returning the wrong qos attribute. Fix
the code that checks and reports the qos and spoofchk attributes. The
BNXT_VF_QOS and BNXT_VF_LINK_UP flags should not be set by default
during init. time.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f0249056eaf2b9a17b2b76a6e099e9b7877e187d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Pass DCB RoCE app priority to firmware.

Orabug: 26402533, 26325599, 26366387

When the driver gets the RoCE app priority set/delete call through DCBNL,
the driver will send the information to the firmware to set up the
priority VLAN tag for RDMA traffic.

[ New version using the common ETH_P_IBOE constant in if_ether.h ]

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a82fba8dbfb522bd19b1644bf599135680fd0122)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Cap the msix vector with the max completion rings.

Orabug: 26402533, 26325599, 26366387

The current code enables up to the maximum MSIX vectors in the PCIE
config space without considering the max completion rings available.
An MSIX vector is only useful when it has an associated completion
ring, so it is better to cap it.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 68a946bb81e07ed0e59a99e0c068d091ed42cc1b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add interrupt test to ethtool -t selftest.

Orabug: 26402533, 26325599, 26366387

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 67fea463fd873492ab641459a6d1af0e9ea3c9ce)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add PHY loopback to ethtool self-test.

Orabug: 26402533, 26325599, 26366387

It is necessary to disable autoneg before enabling PHY loopback,
otherwise link won't come up.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 91725d89b97acea168a94c577d999801c3b3bcfb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add ethtool mac loopback self test.

Orabug: 26402533, 26325599, 26366387

The mac loopback self test operates in polling mode.  To support that,
we need to add functions to open and close the NIC half way.  The half
open mode allows the rings to operate without IRQ and NAPI.  We
use the XDP transmit function to send the loopback packet.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f7dc1ea6c4c1f31371b7098d6fae0d49dc6cdff1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add basic ethtool -t selftest support.

Orabug: 26402533, 26325599, 26366387

Add the basic infrastructure and only firmware tests initially.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit eb51365846bc418687af4c4f41b68b6e84cdd449)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add suspend/resume callbacks.

Orabug: 26402533, 26325599, 26366387

Add suspend/resume callbacks using the newer dev_pm_ops method.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f65a2044a8c988adf16788c51c04ac10dbbdb494)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add ethtool set_wol method.

Orabug: 26402533, 26325599, 26366387

And add functions to set and free magic packet filter.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5282db6c794fed3ea8b399bc5305c4078e084f7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add ethtool get_wol method.

Orabug: 26402533, 26325599, 26366387

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8e202366dd752564d7f090ba280cc51cbf7bbbd9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add pci shutdown method.

Orabug: 26402533, 26325599, 26366387

Add pci shutdown method to put device in the proper WoL and power state.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d196ece740bf337aa25731cd8cb44660a2a227dd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Add basic WoL infrastructure.

Orabug: 26402533, 26325599, 26366387

Add code to driver probe function to check if the device is WoL capable
and if Magic packet WoL filter is currently set.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c1ef146a5bd3b286d5c3eb2c9f631b38647c76d3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Update firmware interface spec to 1.7.6.2.

Orabug: 26402533, 26325599, 26366387

Features added include WoL and selftest.

Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8eb992e876a88de7539b1b9e132dd171d865cd2f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Fix DMA unmapping of the RX buffers in XDP mode during shutdown.

Orabug: 26402533, 26325599, 26366387

In bnxt_free_rx_skbs(), which is called to free up all RX buffers during
shutdown, we need to unmap the page if we are running in XDP mode.

Fixes: c61fb99cae51 ("bnxt_en: Add RX page mode support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3ed3a83e3f3871c57b18cef09b148e96921236ed)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: Correct the order of arguments to netdev_err() in bnxt_set_tpa()

Orabug: 26402533, 26325599, 26366387

Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 23e12c893489ed12ecfccbf866fc62af1bead4b0)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Fix NULL pointer dereference in reopen failure path

Orabug: 26402533, 26325599, 26366387

Net device reset can fail when the h/w or f/w is in a bad state.
Subsequent netdevice open fails in bnxt_hwrm_stat_ctx_alloc().
The cleanup invokes bnxt_hwrm_resource_free() which inturn
calls bnxt_disable_int(). In this routine, the code segment

if (ring->fw_ring_id != INVALID_HW_RING_ID)
BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons);

results in NULL pointer dereference as cpr->cp_doorbell is not yet
initialized, and fw_ring_id is zero.

The fix is to initialize cpr fw_ring_id to INVALID_HW_RING_ID before
bnxt_init_chip() is invoked.

Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2247925f0942dc4e7c09b1cde45ca18461d94c5f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Ignore 0 value in autoneg supported speed from firmware.

Orabug: 26402533, 26325599, 26366387

In some situations, the firmware will return 0 for autoneg supported
speed.  This may happen if the firmware detects no SFP module, for
example.  The driver should ignore this so that we don't end up with
an invalid autoneg setting with nothing advertised.  When SFP module
is inserted, we'll get the updated settings from firmware at that time.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 520ad89a54edea84496695d528f73ddcf4a52ea4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Check if firmware LLDP agent is running.

Orabug: 26402533, 26325599, 26366387

Set DCB_CAP_DCBX_HOST capability flag only if the firmware LLDP agent
is not running.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc39f885a9c3bdbff0a96ecaf07b162a78eff6e4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Call bnxt_ulp_stop() during tx timeout.

Orabug: 26402533, 26325599, 26366387

If we call bnxt_reset_task() due to tx timeout, we should call
bnxt_ulp_stop() to inform the RDMA driver about the error and the
impending reset.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b386cd362ffea09d05c56bfa85d104562e860647)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

bnxt_en: Perform function reset earlier during probe.

Orabug: 26402533, 26325599, 26366387

The firmware call to do function reset is done too late. It is causing
the rings that have been reserved to be freed. In NPAR mode, this bug
is causing us to run out of rings.

Fixes: 391be5c27364 ("bnxt_en: Implement new scheme to reserve tx rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3c2217a675bac22afb149166e0de71809189850d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Initialize fiblink list head during fib initialization

The fiblink pointer is used extensively for the sync mode driver, and has never been initialized
Initialize this during the fib_setup operation, and for copied fibs

Orabug: 26291289

Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

aacraid: Update scsi_host_template to use tagged commands

Most of the non-multi-queue scsi drivers were updated to include
.use_blk_tags in the scsi_host_template, however the aacraid driver
was left out. This will cause the inbox driver to fail to
allocate a tagged fib.

Update the scsi_host_template to include .use_blk_tags

Orabug: 26291289

Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

IB/mlx4: Suppress warning for not handled portmgmt event subtype

The new CX3 firmware can generate port management event sl to vl
table change which we are not handling yet in the driver.

Suppress the warning.

(This commit should be reverted when sl to vl table change
handling is added)

Orabug: 26409594

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

dccp/tcp: do not inherit mc_list from parent

Orabug: 26408144
CVE: CVE-2017-8890

syzkaller found a way to trigger double frees from ip_mc_drop_socket()

It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.

Very similar to commit 8b485ce69876 ("tcp: do not inherit
fastopen_req from parent")

Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pray3r <pray3r.z@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/inet_connection_sock.c

xen-blkfront: fix mq start/stop race

Orabug: 26397415

When ring buf full, hw queue will be stopped. While blkif interrupt consume
request and make free space in ring buf, hw queue will be started again.
But since start queue is protected by spin lock while stop not, that will
cause a race.

interrupt:                                      process:
blkif_interrupt()                               blkif_queue_rq()
kick_pending_request_queues_locked()
  blk_mq_start_stopped_hw_queues()
   clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
                                                 blk_mq_stop_hw_queue(hctx)
   blk_mq_run_hw_queue(hctx, async)

If ring buf is made empty in this case, interrupt will never come, then the
hw queue will be stopped forever, all processes waiting for the pending io
in the queue will hung.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

blk-mq: remap queues when adding/removing hardware queues

Orabug: 26397427

blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
behavior that drivers expect. However, commit 4e68a011428a changed
blk_mq_queue_reinit() to not remap queues for the case of CPU
hotplugging, inadvertently making blk_mq_update_nr_hw_queues() not remap
queues as well. This breaks, for example, NBD's multi-connection mode,
leaving the added hardware queues unused. Fix it by making
blk_mq_update_nr_hw_queues() explicitly remap the queues.

Fixes: 4e68a011428a ("blk-mq: don't redistribute hardware queues on a CPU hotplug event")
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ebe8bddb6e30d7a02775b9972099271fc5910f37)

Conflicts:

block/blk-mq.c

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

blk-mq: don't redistribute hardware queues on a CPU hotplug event

Orabug: 26397427

Currently blk-mq will totally remap hardware context when a CPU hotplug
even happened, which causes major havoc for drivers, as they are never
told about this remapping. E.g. any carefully sorted out CPU affinity
will just be completely messed up.

The rebuild also doesn't really help for the common case of cpu
hotplug, which is soft onlining / offlining of cpus - in this case we
should just leave the queue and irq mapping as is. If it actually
worked it would have helped in the case of physical cpu hotplug,
although for that we'd need a way to actually notify the driver.
Note that drivers may already be able to accommodate such a topology
change on their own, e.g. using the reset_controller sysfs file in NVMe
will cause the driver to get things right for this case.

With the rebuild removed we will simplify retain the queue mapping for
a soft offlined CPU that will work when it comes back online, and will
map any newly onlined CPU to queue 0 until the driver initiates
a rebuild of the queue map.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4e68a011428af3211facd932b4003b3fa3ef4faa)

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: qedi: Fix memory leak in tmf response processing.

Orabug: 26403604

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: qedi: fix build error without DEBUG_FS

Orabug: 26403604

Without CONFIG_DEBUG_FS, we run into a link error:

drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll':
qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect':
qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect':
qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work':
qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover'

This defines the symbol as a constant in this case, as there is no way to
set it to anything other than zero without DEBUG_FS. In addition, I'm renaming
it to qedi_do_not_recover in order to put it into a driver specific namespace,
as "do_not_recover" is a really bad name for a kernel-wide global identifier
when it is used only in one driver.

Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: qedi: fix missing return error code check on call to qedi_setup_int

Orabug: 26403604

The call to qedi_setup_int is not updating the return code rc yet rc
is being checked for an error. Fix this by assigning rc to the return
code from the call to qedi_setup_int.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: qedi: Fix possible memory leak in qedi_iscsi_update_conn()

Orabug: 26403604

'conn_info' is malloced in qedi_iscsi_update_conn() and should be freed
before leaving from the error handling cases, otherwise it will cause
memory leak.

Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: qedi: return via va_end to match corresponding va_start

Orabug: 26403604

Although on most systems va_end is a no-op, it is good practice to use
va_end on the function return path, especially since the va_start
documenation states:

"Each invocation of va_start() must be matched by a corresponding
invocation of va_end() in the same function."

Found with static analysis by CoverityScan, CIDs 1389477-1389479

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: qedi: fix build, depends on UIO

Orabug: 26403604

Fix build of SCSI qedi driver. It uses uio interfaces so it should
depend on UIO.

ERROR: "uio_unregister_device" [drivers/scsi/qedi/qedi.ko] undefined!
ERROR: "uio_event_notify" [drivers/scsi/qedi/qedi.ko] undefined!
ERROR: "__uio_register_device" [drivers/scsi/qedi/qedi.ko] undefined!

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: QLogic-Storage-Upstream@cavium.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.

Orabug: 26403604

The QLogic FastLinQ Driver for iSCSI (qedi) is the iSCSI specific module
for 41000 Series Converged Network Adapters by QLogic.

This patch consists of following changes:

  - MAINTAINERS Makefile and Kconfig changes for qedi,
  - PCI driver registration,
  - iSCSI host level initialization,
  - Debugfs and log level infrastructure.

The following indiviual changes are merged into this commit:

  qedi: Add LL2 iSCSI interface for offload iSCSI.
  qedi: Add support for iSCSI session management.
  qedi: Add support for data path.

Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Adheer Chandravanshi <adheer.chandravanshi@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@cavium.com>
Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen/grant-table: log the lack of grants

Orabug: 26324349

log a message when we enter this situation:
1) we already allocated the max number of available grants from hypervisor
and
2) we still need more (but the request fails because of 1)).

Sometimes the lack of grants causes IO hangs in xen_blkfront devices.
Adding this log would help debuging.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>

macsec: dynamically allocate space for sglist

Orabug: 25953290
CVE: CVE-2017-7477

We call skb_cow_data, which is good anyway to ensure we can actually
modify the skb as such (another error from prior). Now that we have the
number of fragments required, we can safely allocate exactly that amount
of memory.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5294b83086cc1c35b4efeca03644cf9d12282e5b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/macsec.c

macsec: avoid heap overflow in skb_to_sgvec

Orabug: 25953290
CVE: CVE-2017-7477

While this may appear as a humdrum one line change, it's actually quite
important. An sk_buff stores data in three places:

1. A linear chunk of allocated memory in skb->data. This is the easiest
   one to work with, but it precludes using scatterdata since the memory
   must be linear.
2. The array skb_shinfo(skb)->frags, which is of maximum length
   MAX_SKB_FRAGS. This is nice for scattergather, since these fragments
   can point to different pages.
3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff,
   which in turn can have data in either (1) or (2).

The first two are rather easy to deal with, since they're of a fixed
maximum length, while the third one is not, since there can be
potentially limitless chains of fragments. Fortunately dealing with
frag_list is opt-in for drivers, so drivers don't actually have to deal
with this mess. For whatever reason, macsec decided it wanted pain, and
so it explicitly specified NETIF_F_FRAGLIST.

Because dealing with (1), (2), and (3) is insane, most users of sk_buff
doing any sort of crypto or paging operation calls a convenient function
called skb_to_sgvec (which happens to be recursive if (3) is in use!).
This takes a sk_buff as input, and writes into its output pointer an
array of scattergather list items. Sometimes people like to declare a
fixed size scattergather list on the stack; othertimes people like to
allocate a fixed size scattergather list on the heap. However, if you're
doing it in a fixed-size fashion, you really shouldn't be using
NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its
frag_list children arent't shared and then you check the number of
fragments in total required.)

Macsec specifically does this:

        size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1);
        tmp = kmalloc(size, GFP_ATOMIC);
        *sg = (struct scatterlist *)(tmp + sg_offset);
...
        sg_init_table(sg, MAX_SKB_FRAGS + 1);
        skb_to_sgvec(skb, sg, 0, skb->len);

Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're
using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will
overflow the heap, and disaster ensues.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: stable@vger.kernel.org
Cc: security@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4d6fa57b4dab0d77f4d8e9d9c73d1e63f6fe8fee)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/macsec.c

nfsd: check for oversized NFSv2/v3 arguments

Orabug: 25917857
CVE: CVE-2017-7645

A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply.  This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE).  But a client that sends an incorrectly long reply
can violate those assumptions.  This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended.  That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
(cherry picked from commit e6838a29ecb484c97e4efef9429643b9851fba6e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal

On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.

Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.

If bond0 is configured on top of ipoib devices, with the
following commands:

ifup bond0
for slave in $BOND_SLAVES_LIST; do
ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>

we will obtain a skb_under_panic() with a similar call trace:
skb_push+0x3d/0x40
push_pseudo_header+0x17/0x30 [ib_ipoib]
ipoib_hard_header+0x4e/0x80 [ib_ipoib]
arp_create+0x12f/0x220
arp_send_dst.part.19+0x28/0x50
arp_solicit+0x115/0x290
neigh_probe+0x4d/0x70
__neigh_event_send+0xa7/0x230
neigh_resolve_output+0x12e/0x1c0
ip_finish_output2+0x14b/0x390
ip_finish_output+0x136/0x1e0
ip_output+0x76/0xe0
ip_local_out+0x35/0x40
ip_send_skb+0x19/0x40
ip_push_pending_frames+0x33/0x40
raw_sendmsg+0x7d3/0xb50
inet_sendmsg+0x31/0xb0
sock_sendmsg+0x38/0x50
SYSC_sendto+0x102/0x190
SyS_sendto+0xe/0x10
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25

This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().

The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard
header").

Orabug: 26087204

Reported-by: Norbert P <noe@physik.uzh.ch>
Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len")
Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 19cdead3e2ef8ed765c5d1ce48057ca9d97b5094)

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>
Tested-by: Qing Huang <qing.huang@oracle.com>

aacraid: initialize scsi shared tag map

Need to initialize shared tag map when initialize the aacraid adapter.

Orabug: 26367701

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

[PATCH] RDS: Print failed rdma op details if failure is remote access

Improves diagnosability when RDMA op fails allowing this print to be
matched with prints on the responder side; which prints RDMA keys
which are prematurely purged by the owning process.

Signed-off-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Hakon Bugge <haakon.bugge@oracle.com>
Orabug: 26277933

[PATCH] RDS: When RDS socket is closed, print unreleased MR's

Improves diagnosability when the requester RDMA operation fails with
remote access error. This commit prints RDMA credentials which are
prematurely purged by owning process.

Signed-off-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Hakon Bugge <haakon.bugge@oracle.com>
Orabug: 26276427

RDMA/core: not to set page dirty bit if it's already set.

This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.

Orabug: 24313031

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from upstream commit 53376fedb9da54c0d3b0bd3a6edcbeb681692909)
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

net/rds: Reduce memory footprint in rds_sendmsg

Orabug: 26151323

8K + 256 (8448B) is an important message size for the RDBMS workload. Since
Infiniband supports scatter-gather in hardware, there is no reason to
fragment each RDS message into PAGE_SIZE work requests. Hence, RDS fragment
sizes up to 16K has been introduced.

Fixes: 23f90cccfba4 ("RDS: fix the sg allocation based on actual msg sz")
Previous behavior was allocating a contiguous memory buffer, corresponding
to the size of the RDS message. Although this was functional correct, it
introduced hard pressure on the memory allocation system, which was
not needed.

This commit fixes the drawback introduced by only allocating
the buffer according to RDS_MAX_FRAG_SIZE.

Orabug: 26350949

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Tested-by: John Sobecki <john.sobecki@oracle.com>

xprtrdma: Don't require LOCAL_DMA_LKEY support for fastreg

There is no need to require LOCAL_DMA_LKEY support as the
PD allocation makes sure that there is a local_dma_lkey. Also
correctly set a return value in error path.

This caused a NULL pointer dereference in mlx5 which removed
the support for LOCAL_DMA_LKEY.

Fixes: bb6c96d72879 ("xprtrdma: Replace global lkey with lkey local to PD")
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Orabug: 26151481

(cherry picked from commit f022fa88ce6abc83c6e678a890df5c1e4b0eaf89)
Signed-off-by: Ghazale Hosseinabadi <ghazale.hosseinabadi@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>