In qed code, sb_id means 2 different things:
- An interrupt vector [usually when received as a parameter from
a protocol driver, but not only] that's associated with a status
block.
- An index to a status block entity existing in HW.
This patch renames the references to the HW entity, adding an 'igu_'
prefix to allow an easier distinction.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit d031548e9194714dc2e8cb928d9f671432c8a342 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
As a first step for relaxing various assumptions done by driver
about the IGU mapping, the driver is now going to read the entire
IGU into a shadow copy, and mark in its database each status block
that's relevant for it.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit d749dd0dc117e7b02fa3a169c431476d59d18950 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Current implementation lacks the logic for providing management
firmware with RDMA-related statistics; [much] worse than that -
it logs such events by default to system logs.
Since the statistics' gathering is done periodically, using sufficiently
new management firmware the system logs would get filled with these
unnecessary prints.
For now, reduce the verbosity of the log so that it would not be
logged by default.
Fixes: 6c75424612a7 ("qed: Add support for NCSI statistics") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 512c7840cd692fdac0333684249753ebf3c819f9 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Parities might exhibit a flood behavior since we re-enable the
attention line without preventing the parity from re-triggering the
assertion.
Mask the source in AEU until the parity would be handled.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 9790c35e9682e0e158653108cc6950f2be196c80 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
In strucuture reflecting the AEU hw block some entries
represent multiple HW bits, and the associated name is in fact
a pattern.
Today, whenever such an attention would be asserted the resulted
prints would show the pattern string instead of indicating which
of the possible bits was set.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 6010179da3a27f4622eb40a731337fbdb8bbc713 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
There are 4 attention bits in AEU that have different meaning
for QL45xxx and QL41xxx adapters.
Instead of doing a massive infrastructure change in favor of these
bits, we implement a point fix where only those four would change
meaning dependent on the adapter involved.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit ba36f718c7fedbf0b083faec5e3606d98b846cb7 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
We have almost all the necessary information regarding attentions
in the logic employed for taking register dumps.
Add some more and get rid of the seperate implementation we have today
for identifying & printing various attention sources.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 0ebbd1c8d9424a341a21eb18170f4eff1f1f0670 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The QL41xxx adapters' PCI allows a single configuration for the
MSI-x table size of all child VFs of a given PF.
The existing code wouldn't cause the management firmware to set
that value, meaning the VFs would retain the default MSI-x table
size.
Introduce a new scheme so that whenever a VF is enabled, driver
would set the number of MSI-x to be the maximum over the various
VFs' needs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 88072fd4002a9976063d8f2babd3d030bd6ae0f9 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Older firmware used by device didn't distinguish between RoCE and RoCE
V2 from DCBx configuration perspective, and as a result we've used to
take a the RoCE-related configuration and apply to it for both.
Since we now support configuring each its own values, there's no reason
to reflect [& configure] that both are using the same.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 38b23e43ee6f0903de989913884a2142bf8b3d7c ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Today, driver has a synchronization point while closing
the device which synchronizes its slowpath interrupt line.
However, that's insufficient as that ISR would schedule the
slowpath-tasklet - so even after ISR is over it's possible the
handling of the interrupt has not completed.
By doing a disable/enable on the taskelt we guarantee that all
HW events that should no longer be genereated from that point
onward in the flow are truly behind us.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 06892f2ea2bd6b146707e4ab367aa5b20eac0ba7 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Solves the following warning in qede -
- Several cases of missing cpu_to_le16() conversions
- Adds 'static' to one function declaration
- Removes dcbnl operation that's currently getting populated twice
Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 48848a0690a36d0248255f6c3b7b6fd2a9948a57 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The management firmware HSI contains masks which are already
shifted to their right place, so QED_MFW_SET_FIELD() is clearing
incorrect fields by shifting the mask by the offset.
Luckily, today we set the fields in an incrementing order [so we're
not erasing any previously set fields], but this still needs fixing.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit b19601bbf1a1a230beb35ea77acbbfb5bbf542fa ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
If too many CQs are requested, qed would print the available
number as if it's a resource and not a feature leading to the
wrong print.
Fixes: 08737a3fa30a ("qed: Inform qedi the number of possible CQs") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 88fa95278503523df5fbb18b4e98526e61e13218 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Driver maintains its primary MAC in a private field which
gets updated when ndo_dev_set_mac() gets called.
However, there are flows where the primary MAC of the device can change
without said NDO being called [bond device in TLB mode configuring
slaves' addresses], resulting in a configuration where there's a mismatch
between what's apparent to user [the netdevice's value] and what's
configured in the HW [the private value].
As we don't have any real motivation of maintaining this
private field, simply remove it and start using the netdevice's
field instead.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 492a1d9811cbd17c833bd0af18bfaff00cd3ac85 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
When management firmware declares that the device is WoL-capable,
the default driver behavior would be to allow the management firmware
to take the decision of whether it's actually needed or not.
Problem is ethtool interface doesn't have a 'default' kind
of option, and user would see the interface WoL as disabled,
which doesn't accurately reflect the actual configuration.
More-so, if the user actually wants to explicitly disable WoL he'd have
to first enable it [otherwise ethtool would block the command].
Instead of allowing management to make the decision, enable WoL by
default on all devices capable of it.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit ba798b5b6d067baa7ca7be3cdfd1f37a89da873f ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This pushes qed [and as result, all qed* drivers] into using 8.20.0.0
firmware. The changes are mostly contained in qed with minor changes
to qedi due to some HSI changes.
Content-wise, the firmware contains fixes to various issues exposed
since the release of the previous firmware, including:
- Corrects iSCSI fast retransmit when data digest is enabled.
- Stop draining packets when receiving several consecutive PFCs.
- Prevent possible assertion when consecutively opening/closing
many connections.
- Prevent possible assertion due to too long BDQ fetch time.
In addition, the new firmware would allow us to later add iWARP support
in qed and qedr.
Changes from previous version
-----------------------------
- V2: Fix warning in qed_debug.c
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com> Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 7b6859fbdcc4a590c8ef03bcc00d770b42d41c42 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Current memset is using incorrect type of variable, causing the
upper-half of the strucutre to be left uninitialized and causing:
ethernet/qlogic/qed/qed_init_fw_funcs.c: In function 'qed_set_rfs_mode_disable':
ethernet/qlogic/qed/qed_init_fw_funcs.c:993:3: error: '*((void *)&ramline+4)' is used uninitialized in this function [-Werror=uninitialized]
PFs and VFs share the same structure of NDOs today,
and the VFs explicitly fails the ndo_xdp() callback stating
it doesn't support XDP.
This results in lots of:
[qede_xdp:1032(enp131s2)]VFs don't support XDP
------------[ cut here ]------------
WARNING: CPU: 4 PID: 1426 at net/core/rtnetlink.c:1637 rtnl_dump_ifinfo+0x354/0x3c0
...
Call Trace:
? __alloc_skb+0x9b/0x1d0
netlink_dump+0x122/0x290
netlink_recvmsg+0x27d/0x430
sock_recvmsg+0x3d/0x50
...
As every dump request for the VF interface info would fail due to
rtnl_xdp_fill() returning an error code.
To resolve this, introduce a subset of the NDOs meant for the VF
in a seperate structure and register that one instead for VFs,
and omit the ndo_xdp initialization.
Fixes: 40b8c45492ef ("qede: Prevent VFs from using XDP") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit be47c5555778fa3354950731023deb034a9e445e ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
When configuring the doorbell DPI address, driver aligns the start
address to 4KB [HW-pages] instead of host PAGE_SIZE.
As a result, RoCE applications might receive addresses which are
unaligned to pages [when PAGE_SIZE > 4KB], which is a security risk.
Fixes: 51ff17251c9c ("qed: Add support for RoCE hw init") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit a82dadbce47395747824971db08a128130786fdc ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Driver doesn't pass the number of tasks to the QM init logic
which would cause back-pressure in scenarios requiring many tasks
[E.g., using max MRs] and thus reduced performance.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit c9f0523bb3d1e70fbfd3245842de855096194925 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Driver currently uses advertised-autoneg value to populate the
supported-autoneg field. When advertised field is updated, user gets
the same value for supported field. Supported-autoneg value need to be
populated from the link capabilities value returned by the MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 34f9199ce7b7e5c641b96e928bd60e086bf7f278 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
PTP hardware filter configuration performed by the driver for a given
user requested config is not correct for some of the PTP modes.
Following changes are needed for PTP config-filter implementation.
1. NIG_REG_TX_PTP_EN register - Bits 0/1/2 respectively enables
TimeSync/"V1 frame format support"/"V2 frame format support" on
the TX side. Set the associated bits based on the user request.
2. ptp4l application fails to operate in Peer Delay mode. Following
changes are needed to fix this,
a. Driver should enable (set to 0) DA #1-related bits for IPv4,
IPv6 and MAC destination addresses in these registers:
NIG_REG_TX_LLH_PTP_RULE_MASK
NIG_REG_LLH_PTP_RULE_MASK
b. NIG_REG_LLH_PTP_PARAM_MASK/NIG_REG_TX_LLH_PTP_PARAM_MASK should
be set to 0x0 in all modes.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 8d3f87d8cd0a16c58ae7e4410938528866c1c0db ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
After removing the PTP related initialization from slowpath start,
the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set.
Otherwise, it leads to a warning due to it being unused.
Fixes: d179bd1699fc ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 07ff2ed03bb874a5bb97361a5a07ee28f1afa574 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
When calculating doorbell BAR partitioning round up the number of
CPUs to the nearest power of 2 so the size of the DPI (per user
section) configured in the hardware will be stored properly and
not truncated.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 107392b75ffc96a2418d5382e52b08c598575e1b ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The patch adds necessary changes to the driver to use qed resource
locking functionality. Currently the ptp initialization is spread
between driver probe/open implementations, associated APIs are
qede_ptp_register_phc()/qede_ptp_start(). Clubbed this functionality
into single API qed_ptp_enable() to simplify the usage of qed resource
locking implementation. The new API will be invoked in the probe path.
Similarly the ptp clean-up code is moved to qede_ptp_disable() which
gets invoked in the driver unload path.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 035744975aecf9b8e02920d93027a432c51062d1 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The patch adds support for per-port resource lock in favour of PTP.
PTP module acquires/releases the MFW resource lock while enabling/
disabling the PTP on the interface. The PF instance which has the
ownership of this resource lock will get the exclusive access to the
PTP clock functionality on the port.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit db82f70e4c3e0901ba1e5c0eecbd913133261985 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch adds hardware channel APIs support between
VF and PF for tunnelling configuration for the VFs.
According to that configuration VFs can run VXLAN/GENEVE/GRE
tunnels over it with tunnel features offloaded.
Using these APIs VF can also request for UDP ports configuration
to the PF, although PF and it's child VFs share the same port.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit eaf3c0c6b4e307e5c7e6cbeb8c5a17be7feee249 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch configures UDP ports locally instead of
configuring them in deferred context which would be
helpful in synchronizing UDP ports configuration for VFs
which will be enabled in further patches.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 327a2b750c486c8e8f390dcff888881ad54d2f23 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch disables tunnel offloads via ndo_features_check()
if given UDP port is not offloaded to hardware. This in turn
allows to run multiple tunnel interfaces using different UDP ports.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 369bfd4ec77f1668e48d395e95849d29fccaa4c3 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch changes the tunnel APIs to use per tunnel
info instead of using bitmasks for all tunnels and also
uses single struct to hold the data to prepare multiple
variant of tunnel configuration ramrods to be sent to the hardware.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 1996843012629825e4a2c339fedef1f7eade87bc ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The patch adds driver support for static/local dcbx mode. In this mode
adapter brings up the dcbx link with locally configured parameters
instead of performing the dcbx negotiation with the peer. The feature
is useful when peer device/switch doesn't support dcbx.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 49632b5822ea2af0e9531f8d20dcd5fb786093a9 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
In the older firmware there was no distinction between RoCE and RoCEv2
whereas the newer firmware (8.15.3.0) allows us to configure each
independently. Driver need to populate the RoCEv2 data in its specific
structure.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 449ad505e9d2f420b7bf590a708c101ff587593e ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets
invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need
to invoke this kmalloc in atomic mode.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 62289ba27558553871fd047baadaaeda886c6a63 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
PFC error-mask value is not supported by MFW, but this bit could be
set in the pfc bit-map of the operational parameters if remote device
supports it. These operational parameters are used as basis for
populating the dcbx config parameters. User provided configs will be
applied on top of these parameters and then send them to MFW when
requested. Driver need to clear the error-mask bit before sending the
config parameters to MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 6cf75f1cebb048cfc1424b4b8ac9bbc08d5f9f66 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch adds necessary APIs to interface with
qede aRFS support in successive patch.
It also reserves separate PTT entry for aRFS,
[as being in fastpath flow] for hardware access instead of
trying to acquire it at run time from the ptt pool.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit d51e4af5c2092c48a06ceaf2323b13a39a2df4ee ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Nick Alcock [Tue, 20 Jun 2017 18:27:37 +0000 (19:27 +0100)]
uek-rpm: build: sign modules in parallel
Assuming you have enough entropy, this gives a speedup in the
signing phase almost precisely proportional to the core count.
e.g. on a 20-core box (with %_smp_ncpus_max forced to 0 in
/etc/rpm/macros, but this will be irrelevant shortly, and even with the
present RPM configuration a 16-fold speedup can be seen):
When using the indirect buffers feature, 'desc' is allocated in
virtqueue_add() but isn't freed before leaving on a ring full error,
causing a memory leak.
For example, it seems rather clear that this can trigger
with virtio net if mergeable buffers are not used.
Cc: stable@vger.kernel.org Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 58625edf9e2515ed41dac2a24fa8004030a87b87) Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tomas Jedlicka [Fri, 28 Jul 2017 11:46:53 +0000 (07:46 -0400)]
dtrace: modules provide called from rcu atomic section
The per-module provide callback is called from within RCU read
critical section. This results of running proivder code in
atomic context which can cause troubles. Mainly due to sleeping
allocations calls in this code path.
Tomas Jedlicka [Thu, 9 Mar 2017 14:48:56 +0000 (09:48 -0500)]
dtrace: Implement high precision walltimestamp
There are lock-free implementations for other timers (mono & raw) but
lock-free access to realtime clock is missing. This patch allows DTrace
to provide CLOCK_REALTIME_COARSE time via walltimestamp without taking
any lock.
Michael S. Tsirkin [Wed, 29 Mar 2017 16:09:14 +0000 (19:09 +0300)]
virtio_net: clear MTU when out of range
virtio attempts to clear the MTU feature bit if the value is out of the
supported range, but this has no real effect since FEATURES_OK has
already been set.
Fix this up by checking the MTU in the new validate callback.
Fixes: 14de9d114a82 ("virtio-net: Add initial MTU advice feature") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit fe36cbe0671e868cbd2f534a50ac60273fa5acf2)
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/net/virtio_net.c
Due to the lack of commit d0c2c9973ecd ("net: use core MTU range
checking in virt drivers") the MTU size check is still done in the
virtio_net.
Michael S. Tsirkin [Wed, 8 Mar 2017 00:14:25 +0000 (02:14 +0200)]
virtio_net: enable big packets for large MTU values
If one enables e.g. jumbo frames without mergeable
buffers, packets won't fit in 1500 byte buffers
we use. Switch to big packet mode instead.
TODO: make sizing more exact, possibly extend small
packet mode to use larger pages.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/net/virtio_net.c
Due to the lack of commit d0c2c9973ecd ("net: use core MTU range
checking in virt drivers") the MTU size check is still done in the
virtio_net.
Michael S. Tsirkin [Wed, 29 Mar 2017 16:06:20 +0000 (19:06 +0300)]
virtio: allow drivers to validate features
Some drivers can't support all features in all configurations. At the
moment we blindly set FEATURES_OK and later FAILED. Support this better
by adding a callback drivers can use to do some early checks.
Aaron Conole [Fri, 3 Jun 2016 20:57:12 +0000 (16:57 -0400)]
virtio-net: Add initial MTU advice feature
This commit adds the feature bit and associated mtu device entry for the
virtio network device. When a virtio device comes up, it checks the
feature bit for the VIRTIO_NET_F_MTU feature. If such feature bit is
enabled, the driver will read the advised MTU and use it as the initial
value.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 14de9d114a82a564b94388c95af79a701dc93134)
Jason Wang [Tue, 13 Dec 2016 06:23:05 +0000 (14:23 +0800)]
virtio-net: correctly enable multiqueue
Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable
multiqueue by default") blindly set the affinity instead of queues
during probe which can cause a mismatch of #queues between guest and
host. This patch fixes it by setting queues.
Reported-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Theodore Ts'o <tytso@mit.edu> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Michael S. Tsirkin <mst@redhat.com> Fixes: 49000102901 ("virtio-net: enable multiqueue by default") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a220871be66f99d8957c693cf22ec67ecbd9c23a)
Jason Wang [Fri, 25 Nov 2016 04:37:26 +0000 (12:37 +0800)]
virtio-net: enable multiqueue by default
We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:
- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
cases
So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.
Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Marko Myllynen <myllynen@redhat.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4490001029012539937ff02778fe6180613fa949)
Eric Dumazet [Fri, 31 Jul 2015 16:25:17 +0000 (18:25 +0200)]
virtio_net: add gro capability
Straightforward patch to add GRO processing to virtio_net.
napi_complete_done() usage allows more aggressive aggregation,
opted-in by setting /sys/class/net/xxx/gro_flush_timeout
Tested:
Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec,
Rick Jones reported following results.
One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs
and Intel 82599ES-based NICs. So, two "before" and two "after" VMs.
The OpenStack compute nodes were running OpenStack Kilo, with VxLAN
encapsulation being used through OVS so no GRO coming-up the host
stack. The compute nodes themselves were running a 3.14-based kernel.
Single-stream netperf, CPU utilizations and thus service demands are
based on intra-guest reported CPU.
Throughput Mbit/s, bigger is better
Min Median Average Max
4.2.0-rc3+ 1364 1686 1678 1938
4.2.0-rc3+flush1k 1824 2269 2275 2647
Send Service Demand, smaller is better
Min Median Average Max
4.2.0-rc3+ 0.236 0.558 0.524 0.802
4.2.0-rc3+flush1k 0.176 0.503 0.471 0.738
Receive Service Demand, smaller is better.
Min Median Average Max
4.2.0-rc3+ 1.906 2.188 2.191 2.531
4.2.0-rc3+flush1k 0.448 0.529 0.533 0.692
Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Rick Jones <rick.jones2@hp.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0fbd050a7d262b74527a289ae75a33626d1060a8)
dtrace: fix lquantize for 32-bit overflow on values
Fix dtrace_aggregate_lquantize() so that it no longer truncates
value to or computes bin index in 32 bits. Linux bug is 26268136 dtrace_aggregate_lquantize() suffers from 32-bit overflow
It references a corresponding Solaris bug.
K. Den [Mon, 31 Jul 2017 16:05:39 +0000 (01:05 +0900)]
gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.
However, since b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.
This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.
Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: Koichiro Den <den@klaipeden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 1bff8a0c1f8c236209ee369b7952751c04eaa71a) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
K. Den [Mon, 31 Jul 2017 16:05:20 +0000 (01:05 +0900)]
vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.
However, since b7fe10e5ebac("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.
This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.
Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: Koichiro Den <den@klaipeden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit be73b3043bf465455d4c9b88f68e03b6447bcfb0) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Tom Herbert [Thu, 20 Aug 2015 00:07:34 +0000 (17:07 -0700)]
fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks
Do WARN_ON_ONCE instead of WARN_ON in gue_gro_receive when the offload
callcaks are bad (either don't exist or gro_receive is not specified).
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 270136613bf7306e2b83457628e2b2f6c6be3989) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Tom Herbert [Thu, 20 Aug 2015 00:07:33 +0000 (17:07 -0700)]
vxlan: GRO support at tunnel layer
Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel.
Testing:
Ran 200 netperf TCP_STREAM instance
- With fix (GRO enabled on VXLAN interface)
Verify GRO is happening.
9084 MBps tput
3.44% CPU utilization
- Without fix (GRO disabled on VXLAN interface)
Verified no GRO is happening.
9084 MBps tput
5.54% CPU utilization
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 58ce31cca1ffe057f4744c3f671e3e84606d3d4a) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Tom Herbert [Thu, 20 Aug 2015 00:07:32 +0000 (17:07 -0700)]
gro: Fix remcsum offload to deal with frags in GRO
The remote checksum offload GRO did not consider the case that frag0
might be in use. This patch fixes that by accessing headers using the
skb_gro functions and not saving offsets relative to skb->head.
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit b7fe10e5ebac2a3f37e95535e616494b65fa020f) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
NFSv4.1: Don't deadlock the state manager on the SEQUENCE status flags
As described in RFC5661, section 18.46, some of the status flags exist
in order to tell the client when it needs to acknowledge the existence of
revoked state on the server and/or to recover state.
Those flags will then remain set until the recovery procedure is done.
In order to avoid looping, the client therefore needs to ignore
those particular flags while recovering.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
(cherry picked from commit 0a014a44a50839a8064618e959fae5bbc44c2fd5)
Trond Myklebust [Sun, 28 Aug 2016 14:28:25 +0000 (10:28 -0400)]
NFSv4.1: Defer bumping the slot sequence number until we free the slot
For operations like OPEN or LAYOUTGET, which return recallable state
(i.e. delegations and layouts) we want to enable the mechanism for
resolving recall races in RFC5661 Section 2.10.6.3.
To do so, we will want to defer bumping the slot's sequence number until
we have finished processing the RPC results.