Santosh Shilimkar [Tue, 20 Oct 2015 01:19:31 +0000 (18:19 -0700)]
Merge branch 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek: (30 commits)
Linux 4.1.10
hp-wmi: limit hotkey enable
zram: fix possible use after free in zcomp_create()
netlink: Replace rhash_portid with bound
netlink: Fix autobind race condition that leads to zero port ID
mvneta: use inband status only when explicitly enabled
of_mdio: add new DT property 'managed' to specify the PHY management type
net: phy: fixed_phy: handle link-down case
net: dsa: bcm_sf2: Do not override speed settings
fib_rules: fix fib rule dumps across multiple skbs
net: revert "net_sched: move tp->root allocation into fw_init()"
tcp: add proper TS val into RST packets
openvswitch: Zero flows on allocation.
macvtap: fix TUNSETSNDBUF values > 64k
net/mlx4_en: really allow to change RSS key
bridge: fix igmpv3 / mldv2 report parsing
sctp: fix race on protocol/netns initialization
netlink, mmap: transform mmap skb into full skb on taps
net: dsa: bcm_sf2: Fix 64-bits register writes
ipv6: fix multipath route replace error recovery
...
Santosh Shilimkar [Tue, 20 Oct 2015 01:19:19 +0000 (18:19 -0700)]
Merge branch 'topic/uek-4.1/sparc' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/sparc' of git://ca-git.us.oracle.com/linux-uek:
PCI: Restore pref MMIO allocation logic for host bridge without mmio64
PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64
PCI: Add has_mem64 for struct host_bridge
PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
PCI: Check pref compatible bit for mem64 resource of PCIe device
OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
PCI: kill wrong quirk about M7101
sparc/PCI: Keep resource idx order with bridge register number
sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
sparc/PCI: Reserve legacy mmio after PCI mmio
sparc/PCI: Unify pci_register_region()
sparc/PCI: Use correct bus address to resource offset
sparc/PCI: Add mem64 resource parsing for root bus
sparc: Revert commits that broke ixgbe and igb drivers on T7
sparc: vdso: lockdep fixes
SPARC64: LDoms suspend domain service.
Santosh Shilimkar [Tue, 20 Oct 2015 01:18:46 +0000 (18:18 -0700)]
Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek: (122 commits)
be2net: bump up the driver version to 10.6.0.4
enic: do hang reset only in case of tx timeout
enic: handle spurious error interrupt
enic: reduce ioread in devcmd2
enic: Fix build failure with SRIOV disabled.
enic: Fix namespace pollution causing build errors.
enic: Fix sparse warning in vnic_devcmd_init().
enic: add devcmd2
enic: add devcmd2 resources
enic: use netdev_<foo> or dev_<foo> instead of pr_<foo>
enic: move struct definition from .c to .h file
enic: allow adaptive coalesce setting for msi/legacy intr
enic: add adaptive coalescing intr for intx and msi poll
enic: fix issues in enic_poll
enic: use atomic_t instead of spin_lock in busy poll
drivers/net: remove all references to obsolete Ethernet-HOWTO
enic: Grammar s/an negative/a negative/
qla2xxx: Update driver version to 8.07.00.26.39.0-k.
be2net: remove vlan promisc capability from VF's profile descriptors
be2net: set pci_func_num while issuing GET_PROFILE_CONFIG cmd
...
Quentin Casasnovas [Mon, 19 Oct 2015 21:22:27 +0000 (14:22 -0700)]
RDS: fix race condition when sending a message on unbound socket.
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket. The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket. This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().
Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.
I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.
Complete earlier incomplete fix to CVE-2015-6937:
74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Chien Yen <chien.yen@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Yinghai Lu [Thu, 8 Oct 2015 21:38:34 +0000 (14:38 -0700)]
PCI: Restore pref MMIO allocation logic for host bridge without mmio64
From 5b2854155 (PCI: Restrict 64-bit prefetchable bridge windows to 64-bit
resources), we change the logic for pref mmio allocation:
When bridge pref support mmio64, we will only put children pref
that support mmio64 into it, and will put children pref mmio32
into bridge's non-pref mmio32.
That could leave bridge pref bar not used when that pref bar is mmio64,
and children res only has mmio32.
Also could have allocation failure when non-pref mmio32 is not big
enough space for those children pref mmio32.
That is not rational when the host bridge does not 64bit mmio above 4g
at all.
The patch restore to old logic:
when host bridge does not have has_mem64, put children pref mmio64 and
pref mmio32 all under bridges pref bars.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Orabug: 21826746
Yinghai Lu [Thu, 8 Oct 2015 21:38:32 +0000 (14:38 -0700)]
PCI: Add has_mem64 for struct host_bridge
Add has_mem64 for struct host_bridge, on root bus that does not support
mmio64 above 4g, will not set that.
We will use that info next two following patches:
1. Don't treat non-pref mmio64 as pref mmio, so will not put
it under bridge's pref range when rescan the devices
2. will keep pref mmio64 and pref mmio32 under bridge pref bar.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Orabug: 21826746
Yinghai Lu [Thu, 8 Oct 2015 21:38:31 +0000 (14:38 -0700)]
PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
If any bridge up to root only have 32bit pref mmio, We don't need to
treat device non-pref mmio64 as as pref mmio64.
We need to move pci_bridge_check_ranges calling early.
for parent bridges pref mmio BAR may not allocated by BIOS, res flags
is still 0, we need to have it correct set before we check them for
child device resources.
-v2: check all bus resources instead of just res[15].
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Orabug: 21826746
All the bridges 64-bit resource have pref bit, but the device resource does not
have pref set, then we can not find parent for the device resource,
as we can not put non-pref mem under pref mem.
According to pcie spec errta
https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf
page 13, in some case it is ok to mark some as pref.
Mark if the entire path from the host to the adapter is over PCI Express.
Then set pref compatible bit for claim/sizing/assign for 64bit mem resource
on that pcie device.
-v2: set pref for mmio 64 when whole path is PCI Express, according to David Miller.
-v3: don't set pref directly, change to UNDER_PREF, and set PREF before
sizing and assign resource, and cleart PREF afterwards. requested by BenH.
-v4: use on_all_pcie_path device flag instead.
Yinghai Lu [Thu, 8 Oct 2015 21:38:29 +0000 (14:38 -0700)]
OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource, so set
IORESOUCE_MEM_64 for 64bit resource during OF device resource flags
parsing.
Yinghai Lu [Thu, 8 Oct 2015 21:38:26 +0000 (14:38 -0700)]
PCI: kill wrong quirk about M7101
Meelis reported that qla2000 driver does not get loaded on one sparc system.
schizo f00732d0: PCI host bridge to bus 0001:00
pci_bus 0001:00: root bus resource [io 0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
pci 0001:00:06.0: quirk: [io 0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
pci 0001:00:06.0: quirk: [io 0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
pci 0001:00:07.0: can't claim BAR 0 [io 0x7fe01000000-0x7fe0100ffff]: address conflict with 0001:00:06.0 [io 0x7fe01000600-0x7fe0100061f]
So the quirk for M7101 claim the io range early.
According to spec with M7101 in M1543 page 103/104,
http://www.versalogic.com/Support/Downloads/pdf/ali1543.pdf
0xe0, and 0xe2 do not include address info for acpi/smb.
and we already had pref_compat support that add extra pref bit for device
resource.
It turns out that pci_resource_compatible()/pci_up_path_over_pref_mem64()
just check resource with bridge pref mmio register idx 15, and we have put
resource to use mmio register idx 14 during of_scan_pci_bridge()
as the bridge does not mmio resource.
We already fix pci_up_path_over_pref_mem64() to check all bus resources.
And at the same time, this patch will make resource to consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even non-pref mmio is missing, or out of ordering in firmware reporting.
So hold i = 1 for non pref mmio, and i =2 for pref mmio.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Orabug: 21826746
Yinghai Lu [Thu, 8 Oct 2015 21:38:24 +0000 (14:38 -0700)]
sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource, so set
IORESOUCE_MEM_64 for 64bit resource during of device resource flags
parsing.
Yinghai Lu [Thu, 8 Oct 2015 21:38:20 +0000 (14:38 -0700)]
sparc/PCI: Add mem64 resource parsing for root bus
Found "no compatible bridge window" warning in boot log from T5-8.
pci 0000:00:01.0: can't claim BAR 15 [mem 0x100000000-0x4afffffff pref]: no compatible bridge window
That resource is above 4G, but does not get offset correctly as
root bus only report io and mem32.
pci_sun4v f02dbcfc: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x804000000000-0x80400fffffff] (bus address [0x0000-0xfffffff])
pci_bus 0000:00: root bus resource [mem 0x800000000000-0x80007effffff] (bus address [0x00000000-0x7effffff])
pci_bus 0000:00: root bus resource [bus 00-77]
Add mem64 handling in pci_common for sparc, so we can have 64bit resource
registered for root bus at first.
After patch, will have:
pci_sun4v f02dbcfc: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x804000000000-0x80400fffffff] (bus address [0x0000-0xfffffff])
pci_bus 0000:00: root bus resource [mem 0x800000000000-0x80007effffff] (bus address [0x00000000-0x7effffff])
pci_bus 0000:00: root bus resource [mem 0x800100000000-0x8007ffffffff] (bus address [0x100000000-0x7ffffffff])
pci_bus 0000:00: root bus resource [bus 00-77]
-v2: mem64_space should use mem_space.start as offset.
-v3: add IORESOURCE_MEM_64 flag
-v4: set name for mem64_space, otherwise /proc/iomem will have <bad> for name
Sujith Sankar [Tue, 13 Oct 2015 10:04:02 +0000 (15:34 +0530)]
enic: do hang reset only in case of tx timeout
The current code invokes hang reset in case of error interrupt. We should
hang reset only in case of tx timeout. This because of the way hang reset
is implemented in firmware. Hang reset takes more firmware resources than
soft reset. Adaptor does not generate error interrupt in case of tx
timeout.
Hang reset only in case of tx timeout, in .ndo_tx_timeout. Do soft reset
otherwise. Introduce deferred work, enic_tx_hang_reset, to do hang reset.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 10:02:31 +0000 (15:32 +0530)]
enic: handle spurious error interrupt
Some of the enic adaptors are know to generate spurious interrupts. When
error interrupt is generated, driver just resets the device. This patch
resets the device only when an error is occurred.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/built-in.o: In function `.vnic_wq_devcmd2_alloc':
(.text+0x49fe40): multiple definition of `.vnic_wq_devcmd2_alloc'
drivers/scsi/built-in.o:(.text+0xb4318): first defined here
drivers/net/built-in.o:(.opd+0x2af00): multiple definition of `vnic_wq_devcmd2_alloc'
drivers/scsi/built-in.o:(.opd+0xad70): first defined here
drivers/net/built-in.o: In function `.vnic_wq_init_start':
(.text+0x49f9c0): multiple definition of `.vnic_wq_init_start'
drivers/scsi/built-in.o:(.text+0xb3b58): first defined here
drivers/net/built-in.o:(.opd+0x2ae88): multiple definition of `vnic_wq_init_start'
drivers/scsi/built-in.o:(.opd+0xace0): first defined here
Rename these to 'enic_*' to avoid the conflict with the functiosn of
the same name in the snic scsi driver.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:55:27 +0000 (15:25 +0530)]
enic: add devcmd2
devcmd is an interface for driver to communicate with fw/adaptor. It
involves writing data to hardware registers and waiting for the result.
This mechanism does not scale well. The queuing of "no wait" devcmds is
done in firmware memory rather than on the host. Firmware memory is a
rather more scarce and valuable resource than host memory. A devcmd storm
from one vf can disrupt the service on other pf/vf. The lack of flow
control allows for possible denial of server from one VM to another.
Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
allows better flow control.
Initialize devcmd2, if fails we fall back to devcmd1.
Also change the driver version.
Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:49:59 +0000 (15:19 +0530)]
enic: add devcmd2 resources
Add devcmd resources to vnic_res_type. Add data types used by devcmd.
Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:47:36 +0000 (15:17 +0530)]
enic: use netdev_<foo> or dev_<foo> instead of pr_<foo>
pr_info does not give any details about the interface involved. This patch
uses netdev_info for printing the message. Use dev_info where netdev is not
ready.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:46:19 +0000 (15:16 +0530)]
enic: move struct definition from .c to .h file
Some of the structure definitions are in .c file to make them private to
that file. This patch moves the struct definition to .h file, So that their
definitions are accessible from other files.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:44:45 +0000 (15:14 +0530)]
enic: allow adaptive coalesce setting for msi/legacy intr
* Allow setting of adaptive coalescing setting for all types of interrupt.
* In msi & legacy intr, we use single interrupt for rx & tx. In this case
tx_coalesce_usecs is invalid. We should use only rx_coalesce_usecs.
Do not display tx_coal values for msi/intx. And do not allow user to set
this as well.
* Driver supports only tx/rx_coalesce_usec and adaptive coalesce settings.
For other values, driver does not return error. So ethtool succeeds for
unsupported values. Introduce enic_coalesce_valid() function to validate
the coalescing values.
* If user requests for coalesce value greater than what adaptor supports,
driver uses the max value. We should at least log this.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:42:52 +0000 (15:12 +0530)]
enic: add adaptive coalescing intr for intx and msi poll
Adaptive interrupt coalescing is available for msix. This patch adds the support
for msi poll. Interface for adaptive interrupt coalescing is already added in
driver. We just did not enable it for legacy intr & msi.
enic_calc_int_moderation() & enic_set_int_moderation() are defined as static
after enic_poll. Since enic_poll needs it, move both of these function
definitions above enic_poll. No change in functionality.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:32:18 +0000 (15:02 +0530)]
enic: fix issues in enic_poll
In enic_poll, we clean tx and rx queues, when low latency busy socket polling
is happening, enic_poll will only clean tx queue. After cleaning tx, it should
return total budget for re-poll.
There is a small window between vnic_intr_unmask() and enic_poll_unlock_napi().
In this window if an irq occurs and napi is scheduled on different cpu, it tries
to acquire enic_poll_lock_napi() and fails. Unlock napi_poll before unmasking
the interrupt.
v2:
Do not change tx wonk done behaviour. Consider only rx work done for completing
napi.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:27:34 +0000 (14:57 +0530)]
enic: use atomic_t instead of spin_lock in busy poll
We use spinlock to access a single flag. We can avoid spin_locks by using
atomic variable and atomic_cmpxchg(). Use atomic_cmpxchg to set the flag
for idle to poll. And a simple atomic_set to unlock (set idle from poll).
In napi poll, if gro is enabled, we call napi_gro_receive() to deliver the
packets. Before we call napi_complete(), i.e while re-polling, if low
latency busy poll is called, we use netif_receive_skb() to deliver the packets.
At this point if there are some skb's held in GRO, busy poll could deliver the
packets out of order. So we call napi_gro_flush() to flush skbs before we
move the napi poll to idle.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Tue, 13 Oct 2015 09:21:06 +0000 (14:51 +0530)]
drivers/net: remove all references to obsolete Ethernet-HOWTO
This howto made sense in the 1990s when users had to manually configure
ISA cards with jumpers or vendor utilities, but with the implementation
of PCI it became increasingly less and less relevant, to the point where
it has been well over a decade since I last updated it. And there is
no value in anyone else taking over updating it either.
However the references to it continue to spread as boiler plate text
from one Kconfig file into the next. We are not doing end users any
favours by pointing them at this old document, so lets kill it with
fire, once and for all, to hopefully stop any further spread.
No code is changed in this commit, just Kconfig help text.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Mon, 12 Oct 2015 07:47:21 +0000 (03:47 -0400)]
be2net: remove vlan promisc capability from VF's profile descriptors
The commit 435452aa8847 ("Prevent VFs from enabling VLAN promiscuous mode")
fixed the PF driver to not include the VLAN promisc capability while
provisioning the interface for a VF. But the fix did not remove this
capability from the profile descriptor of the VF. This causes the VF
driver to request this capability when it tries to create it's interface
at probe time. This could potentailly cause the VF probe to fail if the
FW enforces strict checking of the flags based on what was provisoned
by the PF. This strict checking is not being done by FW currently but
will be fixed in a future version. This patch fixes this issue by updating
the VF's profile descriptor so that they match the interface capability
flags provisioned by the PF.
Fixes: 435452aa8847 ("Prevent VFs from enabling VLAN promiscuous mode") Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Somnath Kotur [Mon, 12 Oct 2015 07:47:20 +0000 (03:47 -0400)]
be2net: set pci_func_num while issuing GET_PROFILE_CONFIG cmd
The FW requires the pf_num field in the cmd hdr to be set for it to return
the specific function's descriptors in the GET_PROFILE_CONFIG cmd. If not
set, the FW returns the descriptors of all the functions on the device.
If the first descriptor is not what is being queried for, the driver will
read wrong data. This patch fixes this issue by using the GET_CNTL_ATTRIB
cmd to query the real pci_func_num of a function and then uses it in the
GET_PROFILE_CONFIG cmd.
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Suresh Reddy [Mon, 12 Oct 2015 07:47:19 +0000 (03:47 -0400)]
be2net: pad skb to meet minimum TX pkt size in BE3
On BE3 chips in SRIOV configs, the TX path stalls when a packet less
than 32B is received from the host. A workaround to pad such packets
already exists for the Skyhawk and Lancer chips. Use the same workaround
for BE3 chips too.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Mon, 12 Oct 2015 07:47:18 +0000 (03:47 -0400)]
be2net: release mcc-lock in a failure case in be_cmd_notify_wait()
The mcc/mbox lock is not being released when be_cmd_copy() returns
an error.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Mon, 12 Oct 2015 07:47:17 +0000 (03:47 -0400)]
be2net: fix BE3-R FW download compatibility check
In the BE3 FW image, unlike Skyhawk's, the "asic_type_rev" field doesn't
track the asic_rev of chip it is compatible with. When asic_type_rev
is 0 the image is compatible only with pre-BE3-R chips (asic_rev < 0x10).
Fix the current compatibility check to take care of this.
We hit this issue when we try to flash old BE3 images (used prior to the
release of BE3-R) on pre-BE3-R adapters.
Fixes: a6e6ff6eee12f3e ("be2net: simplify UFI compatibility checking") Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Ivan Vecera [Fri, 14 Aug 2015 20:30:01 +0000 (22:30 +0200)]
be2net: avoid vxlan offloading on multichannel configs
VxLAN offloading is not functional if the NIC is running in multichannel
mode (UMC, FLEX-10, VNIC...). Enabling this additionally kills whole
connectivity through the NIC and the device needs to be down and up to
restore it. The firmware should take care about it and does not allow
the conversion of interface to tunnel type (be_cmd_manage_iface) or should
support VxLAN offloading if multichannel config is enabled.
I have tested this on the latest available firmware (10.6.144.21).
Result:
[root@sm-04 ~]# ip link set enp5s0f0 up[root@sm-04 ~]# ip addr add 172.30.10.50/24 dev enp5s0f0
[root@sm-04 ~]# ping -c 3 172.30.10.254PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.317 ms
64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.187 ms
64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.188 ms
--- 172.30.10.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.187/0.230/0.317/0.063 ms
[root@sm-04 ~]# ip link add link enp5s0f0 vxlan10 type vxlan id 10 remote 172.30.10.60 dstport 4789
[root@sm-04 ~]# ip link set vxlan10 up
[ 7900.442811] be2net 0000:05:00.0: Enabled VxLAN offloads for UDP port 4789
[ 7900.455722] be2net 0000:05:00.1: Enabled VxLAN offloads for UDP port 4789
[ 7900.468635] be2net 0000:05:00.2: Enabled VxLAN offloads for UDP port 4789
[ 7900.481553] be2net 0000:05:00.3: Enabled VxLAN offloads for UDP port 4789
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
[root@sm-04 ~]# ip link set vxlan10 down
[ 7959.434093] be2net 0000:05:00.0: Disabled VxLAN offloads for UDP port 4789
[ 7959.444792] be2net 0000:05:00.1: Disabled VxLAN offloads for UDP port 4789
[ 7959.455592] be2net 0000:05:00.2: Disabled VxLAN offloads for UDP port 4789
[ 7959.466416] be2net 0000:05:00.3: Disabled VxLAN offloads for UDP port 4789
[root@sm-04 ~]# ip link del vxlan10
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
[root@sm-04 ~]# ip link set enp5s0f0 down
[root@sm-04 ~]# ip link set enp5s0f0 up
[ 8071.019003] be2net 0000:05:00.0 enp5s0f0: Link is Up
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.318 ms
64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.196 ms
64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.194 ms
--- 172.30.10.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.194/0.236/0.318/0.057 ms
Cc: Sathya Perla <sathya.perla@avagotech.com> Cc: Ajit Khaparde <ajit.khaparde@avagotech.com> Cc: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Ajit Khaparde <ajit.khaparde@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 5 Aug 2015 07:27:50 +0000 (03:27 -0400)]
be2net: protect eqo->affinity_mask from getting freed twice
There are paths in the driver such as an unrecoverable error (UE) detection
followed by a driver unload wherein be_clear() is invoked twice.
Individual data structures are reset so that they are not cleaned/freed
twice. This patch does the same for eqo->affinity_mask. It is freed only
if EQs haven't yet been destroyed. This fixes a possible crash when
affinity_mask is freed twice.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 5 Aug 2015 07:27:49 +0000 (03:27 -0400)]
be2net: post buffers before destroying RXQs in Lancer
An RX stall issue was seen on Lancer adapters, when RXQs are destroyed
while they are in an "out of buffer" state. This patch fixes this issue
by posting 64 buffers to each RXQ before destroying them in the close path.
This is done after ensuring that no more new packets are selected for
transfer to the RXQs by disabling interface filters.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 5 Aug 2015 07:27:48 +0000 (03:27 -0400)]
be2net: enable IFACE filters only after creating RXQs
HW issues were observed on Lancer adapters if IFACE filters
(flags, mac addrs etc) are enabled *before* creating RXQs. This patch
changes the driver design by enabling filters in be_open() --
instead of be_setup() -- after RXQs are created and buffers posted.
Two new wrapper functions, be_enable_if_filters() and
be_disable_if_filters() are introduced to enable/disable IFACE filters in
be_open()/be_close() respectively. In be_setup() the IFACE is now created
only with the RSS flag.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds vxlan offload specific counters to ethtool stats. We
provide tx/rx queue counters to show the number of vxlan offload pkts
sent/received.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add be_get_phys_port_id() function to report physical port id. The port id
should be unique across different be2net devices in the system. We use the
chip serial number along with the physical port number for this.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The SET_LOOPBACK_MODE command is always issued from ethtool only in a
process context. So, while waiting for the cmd to complete, the driver
can sleep instead of holding spin_lock_bh() on the mcc_lock. This is done
by calling be_mcc_notify() instead of be_mcc_notify_wait() (that returns
only after the cmd completes while the MCCQ is locked).
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the adapter is in error state, return error from be_mcc_notify()
so that the caller routines need not sleep waiting for a response.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: convert dest field in udp-hdr to host-endian
The "dest" field in the UDP-hdr of a TX skb is in network endian format.
Convert it to host endian before accessing it. The os2bmc patch,
mentioned below introduced this code.
Fixes: 760c295e0e8d ("be2net: Support for OS2BMC") Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: fix wrong return value in be_check_ufi_compatibility()
In the commit a6e6ff6eee12f3e
("be2net: simplify UFI compatibility checking"), a return value of "-1"
was incorrectly used in place of "false". This patch fixes it.
Fixes: a6e6ff6eee12f3e ("be2net: simplify UFI compatibility checking") Signed-off-by: Vasundhara Volam <vasundhara.volam@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pci_enable_device() call sets device power state to D0; there is no need
doing it again.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The current code assumes that bridge functionality (EVB) in the adapter
is enabled only when SR-IOV is enabled. This is not always true.
This patch uses the GET_HSW_CONFIG FW cmd to query this from the FW.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This change will make be_setup_wol() routine more compact and readable
by removing some duplicate code.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 18 May 2015 21:06:45 +0000 (23:06 +0200)]
be2net: make hwmon interface optional
The hwmon interface in the be2net driver causes a link error when
be2net is built-in while the hwmon subsystem is a loadable module:
drivers/built-in.o: In function `be_probe':
drivers/net/ethernet/emulex/benet/be_main.c:5761: undefined reference to `devm_hwmon_device_register_with_groups'
This adds a new Kconfig symbol, following the example of multiple
other drivers that have the same problem. The new CONFIG_BE2NET_HWMON
will not be available when (BE2NET=y && HWMON=m) to avoid this
problem.
We have to also mark be_hwmon_show_temp as 'static' to ensure the
compiler can optimize out all the unused code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 29e9122b3a ("be2net: Export board temperature using hwmon-sysfs interface.") Signed-off-by: David S. Miller <davem@davemloft.net>
Venkata Duvvuru [Wed, 13 May 2015 07:30:14 +0000 (13:00 +0530)]
be2net: Support for OS2BMC.
OS2BMC feature will allow the server to communicate with the on-board
BMC/idrac (Baseboard Management Controller) over the LOM via
standard Ethernet.
When OS2BMC feature is enabled, the LOM will filter traffic coming
from the host. If the destination MAC address matches the iDRAC MAC
address, it will forward the packet to the NC-SI side band interface
for iDRAC processing. Otherwise, it would send it out on the wire to
the external network. Broadcast and multicast packets are sent on the
side-band NC-SI channel and on the wire as well. Some of the packet
filters are not supported in the NIC and hence driver will identify
such packets and will hint the NIC to send those packets to the BMC.
This is done by duplicating packets on the management ring. Packets
are sent to the management ring, by setting mgmt bit in the wrb header.
The NIC will forward the packets on the management ring to the BMC
through the side-band NC-SI channel.
Please refer to this online document for more details,
http://www.dell.com/downloads/global/products/pedge/
os_to_bmc_passthrough_a_new_chapter_in_system_management.pdf
Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Venkata Duvvuru [Wed, 13 May 2015 07:30:13 +0000 (13:00 +0530)]
be2net: Report a "link down" to the stack when a fatal error or fw reset happens.
When an error (related to HW or FW) is detected on a function, the driver
must pro-actively report a "link down" to the stack so that a possible
failover can be initiated. This is being done currently only for some
HW errors. This patch reports a "link down" even for fatal FW errors and
EEH errors.
Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Venkata Duvvuru [Wed, 13 May 2015 07:30:12 +0000 (13:00 +0530)]
be2net: Export board temperature using hwmon-sysfs interface.
Ethtool statistics is not the right place to display board temperature.
This patch adds support to export die temperature of devices supported
by be2net driver via the sysfs hwmon interface.
Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Wed, 6 May 2015 09:30:39 +0000 (05:30 -0400)]
be2net: update copyright year to 2015
Signed-off-by: Vasundhara Volam <vasundhara.volam@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Wed, 6 May 2015 09:30:37 +0000 (05:30 -0400)]
be2net: simplify UFI compatibility checking
The code in be_check_ufi_compatibility() checks to see if a UFI file meant
for a lower rev of a chip is being flashed on a higher rev, which is
disallowed. This patch re-writes the code needed for this check in a much
simpler manner.
Signed-off-by: Vasundhara Volam <vasundhara.volam@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Wed, 6 May 2015 09:30:36 +0000 (05:30 -0400)]
be2net: post full RXQ on interface enable
When an RXQ is created in be_open(), the driver currently posts only
64 buffers. This sometimes results in packet drops when there is a traffic
burst as soon as the interface is enabled.
This patch fixes this problem by posting the full RXQ on interface enable.
Signed-off-by: Suresh Reddy <Suresh.Reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 6 May 2015 09:30:35 +0000 (05:30 -0400)]
be2net: check for INSUFFICIENT_VLANS error
When the FW runs out of vlan filters it can either return an
INSUFFICIENT_RESOURCES error or an INSUFFICIENT_VLANS error.
The driver currently checks only for the former error value.
This patch adds a check for the latter value too.
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Somnath Kotur [Wed, 6 May 2015 09:30:34 +0000 (05:30 -0400)]
be2net: receive pkts with L3, L4 errors on VFs
Currently pkts with L3 or L4 errors received on PFs are not dropped
by the adapter, but instead sent to the stack. This helps the network stack
to better reflect error statistics. This was not being done on BE3 VFs.
This patch fixes this for BE3 VFs.
Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Padmanabh Ratnakar [Wed, 6 May 2015 09:30:33 +0000 (05:30 -0400)]
be2net: set interrupt moderation for Skyhawk-R using EQ-DB
Currently adaptive interrupt moderation is set by calculating
and configuring an EQ-delay every second. This is done via
a FW-cmd. But, on Skyhawk-R a "re-arm to interrupt" delay
can be set while ringing the EQ-DB. This patch uses this
facility to calculate and set the interrupt delay every 1ms.
This helps moderating interrupts better when the traffic
is bursty.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 6 May 2015 09:30:32 +0000 (05:30 -0400)]
be2net: add support for spoofchk setting
This patch adds support for spoofchk configuration for VFs.
When it is enabled, "spoof checking" is done for both MAC-address and VLAN.
For each VF, the HW ensures that the source MAC address (or vlan) of
every outgoing packet exists in the MAC-list (or vlan-list) configured
for RX filtering for that VF. If not, the packet is dropped and an error
is reported to the driver in the TX completion; this is reflected in the
"tx_spoof_check_err" ethtool counter.
This feature is supported in Skyhawk FW version 10.6.31.0 and above.
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
On certain conditions, login failures will just invoke
qla2x00_mark_device_lost() with the intend to do login again;
but if login_retry has been set already, that would fail to set the
relogin needed flag which is required to wakeup the DPC to retry.
Signed-off-by: Arun Easi <arun.easi@qlogic.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
If an SRB is NULL but the handle is in range just drop the
command instead of also resetting the adapter. If the handle
is in range then the command was valid at some point and may
have been aborted. Resetting the adapter can lead to extended
recovery times in this case.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
Aovid crashing the system in the scenario where firmware
just completes the command and it can not find the command
during abort mailbox processing. This scenario can lead to
sp reference counter being zero. Instead of crashing the
system, use WARN_ON to print warning in log file.
Signed-off-by: Hiral Patel <hiral.patel@qlogic.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
The "return QLA_SUCCESS" statement just above the "fw_load_failed"
label cannot be reached, hence remove it. Additionally remove the
"else" keyword since the code block below the if-statement ends
with a return statement.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
Whether htonl() or __constant_htonl() is used, if the argument
is a constant the conversion happens at compile time. Hence leave
out the __constant_ prefix for this and other endianness
conversion functions. This improves source code readability.
[jejb: checkpatch fixes] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
Replace the QLA82XX_ADDR_IN_RANGE() and QLA8044_ADDR_IN_RANGE() macros
with the inline function addr_in_range(). This avoids that the compiler
reports the following warning when building with W=1: comparison of
unsigned expression >= 0 is always true.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
Jana Iyengar found an interesting issue on CUBIC :
The epoch is only updated/reset initially and when experiencing losses.
The delta "t" of now - epoch_start can be arbitrary large after app idle
as well as the bic_target. Consequentially the slope (inverse of
ca->cnt) would be really large, and eventually ca->cnt would be
lower-bounded in the end to 2 to have delayed-ACK slow-start behavior.
This particularly shows up when slow_start_after_idle is disabled
as a dangerous cwnd inflation (1.5 x RTT) after few seconds of idle
time.
Jana initial fix was to reset epoch_start if app limited,
but Neal pointed out it would ask the CUBIC algorithm to recalculate the
curve so that we again start growing steeply upward from where cwnd is
now (as CUBIC does just after a loss). Ideally we'd want the cwnd growth
curve to be the same shape, just shifted later in time by the amount of
the idle period.
Reported-by: Jana Iyengar <jri@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Sangtae Ha <sangtae.ha@gmail.com> Cc: Lawrence Brakmo <lawrence@brakmo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 30927520dbae297182990bb21d08762bcc35ce1d)