When calculating doorbell BAR partitioning round up the number of
CPUs to the nearest power of 2 so the size of the DPI (per user
section) configured in the hardware will be stored properly and
not truncated.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 107392b75ffc96a2418d5382e52b08c598575e1b ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The patch adds necessary changes to the driver to use qed resource
locking functionality. Currently the ptp initialization is spread
between driver probe/open implementations, associated APIs are
qede_ptp_register_phc()/qede_ptp_start(). Clubbed this functionality
into single API qed_ptp_enable() to simplify the usage of qed resource
locking implementation. The new API will be invoked in the probe path.
Similarly the ptp clean-up code is moved to qede_ptp_disable() which
gets invoked in the driver unload path.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 035744975aecf9b8e02920d93027a432c51062d1 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The patch adds support for per-port resource lock in favour of PTP.
PTP module acquires/releases the MFW resource lock while enabling/
disabling the PTP on the interface. The PF instance which has the
ownership of this resource lock will get the exclusive access to the
PTP clock functionality on the port.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit db82f70e4c3e0901ba1e5c0eecbd913133261985 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch adds hardware channel APIs support between
VF and PF for tunnelling configuration for the VFs.
According to that configuration VFs can run VXLAN/GENEVE/GRE
tunnels over it with tunnel features offloaded.
Using these APIs VF can also request for UDP ports configuration
to the PF, although PF and it's child VFs share the same port.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit eaf3c0c6b4e307e5c7e6cbeb8c5a17be7feee249 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch configures UDP ports locally instead of
configuring them in deferred context which would be
helpful in synchronizing UDP ports configuration for VFs
which will be enabled in further patches.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 327a2b750c486c8e8f390dcff888881ad54d2f23 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch disables tunnel offloads via ndo_features_check()
if given UDP port is not offloaded to hardware. This in turn
allows to run multiple tunnel interfaces using different UDP ports.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 369bfd4ec77f1668e48d395e95849d29fccaa4c3 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch changes the tunnel APIs to use per tunnel
info instead of using bitmasks for all tunnels and also
uses single struct to hold the data to prepare multiple
variant of tunnel configuration ramrods to be sent to the hardware.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 1996843012629825e4a2c339fedef1f7eade87bc ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The patch adds driver support for static/local dcbx mode. In this mode
adapter brings up the dcbx link with locally configured parameters
instead of performing the dcbx negotiation with the peer. The feature
is useful when peer device/switch doesn't support dcbx.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 49632b5822ea2af0e9531f8d20dcd5fb786093a9 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
In the older firmware there was no distinction between RoCE and RoCEv2
whereas the newer firmware (8.15.3.0) allows us to configure each
independently. Driver need to populate the RoCEv2 data in its specific
structure.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 449ad505e9d2f420b7bf590a708c101ff587593e ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets
invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need
to invoke this kmalloc in atomic mode.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 62289ba27558553871fd047baadaaeda886c6a63 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
PFC error-mask value is not supported by MFW, but this bit could be
set in the pfc bit-map of the operational parameters if remote device
supports it. These operational parameters are used as basis for
populating the dcbx config parameters. User provided configs will be
applied on top of these parameters and then send them to MFW when
requested. Driver need to clear the error-mask bit before sending the
config parameters to MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 6cf75f1cebb048cfc1424b4b8ac9bbc08d5f9f66 ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This patch adds necessary APIs to interface with
qede aRFS support in successive patch.
It also reserves separate PTT entry for aRFS,
[as being in fastpath flow] for hardware access instead of
trying to acquire it at run time from the ptt pool.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit d51e4af5c2092c48a06ceaf2323b13a39a2df4ee ] Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Nick Alcock [Tue, 20 Jun 2017 18:27:37 +0000 (19:27 +0100)]
uek-rpm: build: sign modules in parallel
Assuming you have enough entropy, this gives a speedup in the
signing phase almost precisely proportional to the core count.
e.g. on a 20-core box (with %_smp_ncpus_max forced to 0 in
/etc/rpm/macros, but this will be irrelevant shortly, and even with the
present RPM configuration a 16-fold speedup can be seen):
When using the indirect buffers feature, 'desc' is allocated in
virtqueue_add() but isn't freed before leaving on a ring full error,
causing a memory leak.
For example, it seems rather clear that this can trigger
with virtio net if mergeable buffers are not used.
Cc: stable@vger.kernel.org Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 58625edf9e2515ed41dac2a24fa8004030a87b87) Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tomas Jedlicka [Fri, 28 Jul 2017 11:46:53 +0000 (07:46 -0400)]
dtrace: modules provide called from rcu atomic section
The per-module provide callback is called from within RCU read
critical section. This results of running proivder code in
atomic context which can cause troubles. Mainly due to sleeping
allocations calls in this code path.
Tomas Jedlicka [Thu, 9 Mar 2017 14:48:56 +0000 (09:48 -0500)]
dtrace: Implement high precision walltimestamp
There are lock-free implementations for other timers (mono & raw) but
lock-free access to realtime clock is missing. This patch allows DTrace
to provide CLOCK_REALTIME_COARSE time via walltimestamp without taking
any lock.
Michael S. Tsirkin [Wed, 29 Mar 2017 16:09:14 +0000 (19:09 +0300)]
virtio_net: clear MTU when out of range
virtio attempts to clear the MTU feature bit if the value is out of the
supported range, but this has no real effect since FEATURES_OK has
already been set.
Fix this up by checking the MTU in the new validate callback.
Fixes: 14de9d114a82 ("virtio-net: Add initial MTU advice feature") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit fe36cbe0671e868cbd2f534a50ac60273fa5acf2)
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/net/virtio_net.c
Due to the lack of commit d0c2c9973ecd ("net: use core MTU range
checking in virt drivers") the MTU size check is still done in the
virtio_net.
Michael S. Tsirkin [Wed, 8 Mar 2017 00:14:25 +0000 (02:14 +0200)]
virtio_net: enable big packets for large MTU values
If one enables e.g. jumbo frames without mergeable
buffers, packets won't fit in 1500 byte buffers
we use. Switch to big packet mode instead.
TODO: make sizing more exact, possibly extend small
packet mode to use larger pages.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/net/virtio_net.c
Due to the lack of commit d0c2c9973ecd ("net: use core MTU range
checking in virt drivers") the MTU size check is still done in the
virtio_net.
Michael S. Tsirkin [Wed, 29 Mar 2017 16:06:20 +0000 (19:06 +0300)]
virtio: allow drivers to validate features
Some drivers can't support all features in all configurations. At the
moment we blindly set FEATURES_OK and later FAILED. Support this better
by adding a callback drivers can use to do some early checks.
Aaron Conole [Fri, 3 Jun 2016 20:57:12 +0000 (16:57 -0400)]
virtio-net: Add initial MTU advice feature
This commit adds the feature bit and associated mtu device entry for the
virtio network device. When a virtio device comes up, it checks the
feature bit for the VIRTIO_NET_F_MTU feature. If such feature bit is
enabled, the driver will read the advised MTU and use it as the initial
value.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 14de9d114a82a564b94388c95af79a701dc93134)
Jason Wang [Tue, 13 Dec 2016 06:23:05 +0000 (14:23 +0800)]
virtio-net: correctly enable multiqueue
Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable
multiqueue by default") blindly set the affinity instead of queues
during probe which can cause a mismatch of #queues between guest and
host. This patch fixes it by setting queues.
Reported-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Theodore Ts'o <tytso@mit.edu> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Michael S. Tsirkin <mst@redhat.com> Fixes: 49000102901 ("virtio-net: enable multiqueue by default") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a220871be66f99d8957c693cf22ec67ecbd9c23a)
Jason Wang [Fri, 25 Nov 2016 04:37:26 +0000 (12:37 +0800)]
virtio-net: enable multiqueue by default
We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:
- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
cases
So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.
Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Marko Myllynen <myllynen@redhat.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4490001029012539937ff02778fe6180613fa949)
Eric Dumazet [Fri, 31 Jul 2015 16:25:17 +0000 (18:25 +0200)]
virtio_net: add gro capability
Straightforward patch to add GRO processing to virtio_net.
napi_complete_done() usage allows more aggressive aggregation,
opted-in by setting /sys/class/net/xxx/gro_flush_timeout
Tested:
Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec,
Rick Jones reported following results.
One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs
and Intel 82599ES-based NICs. So, two "before" and two "after" VMs.
The OpenStack compute nodes were running OpenStack Kilo, with VxLAN
encapsulation being used through OVS so no GRO coming-up the host
stack. The compute nodes themselves were running a 3.14-based kernel.
Single-stream netperf, CPU utilizations and thus service demands are
based on intra-guest reported CPU.
Throughput Mbit/s, bigger is better
Min Median Average Max
4.2.0-rc3+ 1364 1686 1678 1938
4.2.0-rc3+flush1k 1824 2269 2275 2647
Send Service Demand, smaller is better
Min Median Average Max
4.2.0-rc3+ 0.236 0.558 0.524 0.802
4.2.0-rc3+flush1k 0.176 0.503 0.471 0.738
Receive Service Demand, smaller is better.
Min Median Average Max
4.2.0-rc3+ 1.906 2.188 2.191 2.531
4.2.0-rc3+flush1k 0.448 0.529 0.533 0.692
Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Rick Jones <rick.jones2@hp.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0fbd050a7d262b74527a289ae75a33626d1060a8)
dtrace: fix lquantize for 32-bit overflow on values
Fix dtrace_aggregate_lquantize() so that it no longer truncates
value to or computes bin index in 32 bits. Linux bug is 26268136 dtrace_aggregate_lquantize() suffers from 32-bit overflow
It references a corresponding Solaris bug.
K. Den [Mon, 31 Jul 2017 16:05:39 +0000 (01:05 +0900)]
gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.
However, since b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.
This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.
Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: Koichiro Den <den@klaipeden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 1bff8a0c1f8c236209ee369b7952751c04eaa71a) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
K. Den [Mon, 31 Jul 2017 16:05:20 +0000 (01:05 +0900)]
vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.
However, since b7fe10e5ebac("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.
This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.
Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: Koichiro Den <den@klaipeden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit be73b3043bf465455d4c9b88f68e03b6447bcfb0) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Tom Herbert [Thu, 20 Aug 2015 00:07:34 +0000 (17:07 -0700)]
fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks
Do WARN_ON_ONCE instead of WARN_ON in gue_gro_receive when the offload
callcaks are bad (either don't exist or gro_receive is not specified).
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 270136613bf7306e2b83457628e2b2f6c6be3989) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Tom Herbert [Thu, 20 Aug 2015 00:07:33 +0000 (17:07 -0700)]
vxlan: GRO support at tunnel layer
Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel.
Testing:
Ran 200 netperf TCP_STREAM instance
- With fix (GRO enabled on VXLAN interface)
Verify GRO is happening.
9084 MBps tput
3.44% CPU utilization
- Without fix (GRO disabled on VXLAN interface)
Verified no GRO is happening.
9084 MBps tput
5.54% CPU utilization
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 58ce31cca1ffe057f4744c3f671e3e84606d3d4a) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Tom Herbert [Thu, 20 Aug 2015 00:07:32 +0000 (17:07 -0700)]
gro: Fix remcsum offload to deal with frags in GRO
The remote checksum offload GRO did not consider the case that frag0
might be in use. This patch fixes that by accessing headers using the
skb_gro functions and not saving offsets relative to skb->head.
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit b7fe10e5ebac2a3f37e95535e616494b65fa020f) Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
NFSv4.1: Don't deadlock the state manager on the SEQUENCE status flags
As described in RFC5661, section 18.46, some of the status flags exist
in order to tell the client when it needs to acknowledge the existence of
revoked state on the server and/or to recover state.
Those flags will then remain set until the recovery procedure is done.
In order to avoid looping, the client therefore needs to ignore
those particular flags while recovering.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
(cherry picked from commit 0a014a44a50839a8064618e959fae5bbc44c2fd5)
Trond Myklebust [Sun, 28 Aug 2016 14:28:25 +0000 (10:28 -0400)]
NFSv4.1: Defer bumping the slot sequence number until we free the slot
For operations like OPEN or LAYOUTGET, which return recallable state
(i.e. delegations and layouts) we want to enable the mechanism for
resolving recall races in RFC5661 Section 2.10.6.3.
To do so, we will want to defer bumping the slot's sequence number until
we have finished processing the RPC results.
Revert "RDMA CM: Add reason code for IB_CM_REJ_CONSUMER_DEFINED"
This revert commit 5e86bae96237 ("RDMA CM: Add reason code for
IB_CM_REJ_CONSUMER_DEFINED". because rolling downgrade has been
de-featured. Thus, the underlying rdma_cm/ib_cm changes can be removed.
Revert "RDS: base connection dependency needed for rolling downgrade from version 4.1 to 3.1""
This commit partially revert commit 5e86bae96237 ("RDMA CM: Add reason code
for IB_CM_REJ_CONSUMER_DEFINED") as it contains two changes in a single
commit. The rolling downgrade support from 4.1 to 3.1 is no longer needed
because there is no practical use case for this feature. This commit removes
the base connection dependency for rolling downgrade, or changes the code
back to its original form before 5e86bae96237. The clean-up of the RDS
backward compatibility issue is not addresssed in this patch, and please
refer to bugdb #26772473 for more details.
Wei Lin Guay [Wed, 30 Aug 2017 08:11:34 +0000 (10:11 +0200)]
Revert "RDS: Ensure non-zero SL uses correct path before lane 0 connection is dropped"
This reverts commit 5fe5f2d6e883 ("RDS: Ensure non-zero SL uses correct
path before lane 0 connection is dropped") because RDS specific path record
caching has been removed in commit 81be7fc4f495 ("net/rds: remove the RDS
specific path record caching"). Thus, there is no dependency that TOS
connections can only be re-established after the base connection (lane 0)
is up.
Wei Lin Guay [Thu, 10 Aug 2017 20:25:45 +0000 (22:25 +0200)]
Revert "rds: make sure base connection is up on both sides"
This reverts commit cf80f396af3a ("rds: make sure base connection is up on
both sides") because RDS specific path record caching has been removed in
commit 81be7fc4f495 ("net/rds: remove the RDS specific path record
caching"). Thus, there is no dependency that TOS connections can only be
re-established after the base connection (lane 0) is up.
Wei Lin Guay [Tue, 30 May 2017 14:11:29 +0000 (16:11 +0200)]
net/rds: remove the RDS specific path record caching
This patch partially reverts commit b12826152417 ("RDS: SA query
optimization"), which has RDS specific path record caching, and uses the
underlying ibacm path record caching. ibacm considers all <source,dest,N>
entries as a similar path record (N is the TOS). Thus, RDS needs to update
the SL manually during the QP creation. RDS also assumes that it is a 1:1
mapping in the TOS to SL mapping.
During testing, I discovered that __generic_file_splice_read() returns
0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
page of a single/multi-page splice read operation. This EOF return code
causes the userspace test to (correctly) report a zero-length read error
when it was expecting otherwise.
The current strategy of returning a partial non-zero read when ->readpage
returns AOP_TRUNCATED_PAGE works only when the failed page is not the
first of the lot being processed.
This patch attempts to retry lookup and call ->readpage again on pages
that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
tests pass and I haven't noticed any unwanted side effects.
This version removes the thrice-retry loop and instead indefinitely
retries lookups on AOP_TRUNCATED_PAGE errors from ->readpage. This
behavior is now similar to do_generic_file_read().
Signed-off-by: Abhi Das <adas@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
AOP_TRUNCATED_PAGE is not used much in the kernel now, but ocfs2 uses it to
avoid deadlocks. Specifically, ocfs2_readpage() fails the read and returns
AOP_TRUNCATED_PAGE in order to avoid deadlock on page lock with the
downconvert thread, if it fails to get the inode cluster lock. It also uses
this return value to avoid livelock on the ip_alloc_sem semaphore. This is
done with the expectation that the VFS will check for this return value and
retry the read on the page and do_generic_file_read() does exactly this.
However, in case of splice read, __generic_file_splice_read() fails the read
and returns a partial/zero-length read back. This causes upper layers that use
splice read (such as nfs) to return EIO or other failures to userspace. Saar
ran into this issue while testing database workloads over knfs with ocfs2 as
the backend fs on the nfs server. This issue is fixed with this patch in place.
(cherrypicked from commit 90330e689c32e5105265c461c54af6ecec3373fa) Tested-by: Saar Maoz <saar.maoz@oracle.com> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Jack Vogel [Thu, 14 Sep 2017 23:18:26 +0000 (16:18 -0700)]
Remove dma_unmap_single_attrs call.
Mistaken addition of dma_unmap call in bnxt_free_rx_skbs
was causing panics in some circumstances, remove the call.
Orabug: 26713916 Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ken Brucker <ken.brucker@oracle.com> Reviewed-by: Shannon Nelson (shannon.nelson@oracle.com>
Tomas Jedlicka [Fri, 25 Aug 2017 13:46:26 +0000 (09:46 -0400)]
dtrace: cyclics taking lock in atomic context
The spin_lock_irqsave() makes cpu notifier addition run in atomic
context which may block. The fix is to move this to module init time
that should be run only once and before DTrace comes around during
kernel boot.
Tomas Jedlicka [Tue, 25 Jul 2017 09:54:48 +0000 (05:54 -0400)]
dtrace: should not sleep in idr code paths
The idr_preload() causes thread to continue in atomic context.
Taking mutex in this code path may lead to deadlock or scheduler
problems. This fix alters locking scheme in a way that may cause
some latency issues in DTrace framework but will keep host safe.
dtrace: implement tracemem optional third arg (dyn size)
Solaris DTrace added a third, optional argument, dynamic size.
Here we implement this for Linux, including introducing the
new user-visible arguments
DTRACE_TRACEMEM_STATIC
DTRACE_TRACEMEM_DYNAMIC
DTRACE_TRACEMEM_SIZE
DTRACE_TRACEMEM_SSIZE
as well as refactoring the
dt_print_bytes() raw label, making it its own function
dt_print_rawbytes().
DTrace offers a histogram aggregation, quantize(), whose bins are base-2
logarithmic. It also has a linear function lquantize().
Linux DTrace should also implement a log-linear function llquantize().
Such functionality is supported by Solaris DTrace and other tracing tools.
Motivations for such a function include:
- a logarithmic aggregation with base other than 2 (e.g. base 10)
- finer control than simply logarithmic
- greater dynamic range than simply linear
Signed-off-by: Nicolas Droux <nicolas.droux@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
dtrace: failing to allocate more ECB space can cause a crash
The existing code was not taking into consideration that when the
table of ECBs needs to be expanded, the memory allocation can fail.
This could lead to a NULL pointer access, and a kernel crash. We
now check the result of the allocation, and bail out if it fails.
Orabug: 26503342 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
As part of the Y2038 development, __getnstimeofday is not supposed to be
used any more. It is now replaced with ktime_get_ns. The Jitter RNG uses
the time stamp to measure the execution time of a given code path and
tries to detect variations in the execution time. Therefore, the only
requirement the Jitter RNG has, is a sufficient high resolution to
detect these variations.
The change was tested on x86 to show an identical behavior as RDTSC. The
used test code simply measures the execution time of the heart of the
RNG:
When building the jitterentropy driver by itself, we get a link error
when CRYPTO_RNG is not enabled as well:
crypto/built-in.o: In function `jent_mod_init':
jitterentropy-kcapi.c:(.init.text+0x98): undefined reference to `crypto_register_rng'
crypto/built-in.o: In function `jent_mod_exit':
jitterentropy-kcapi.c:(.exit.text+0x60): undefined reference to `crypto_unregister_rng'
This adds a 'select CRYPTO_RNG' to CRYPTO_JITTERENTROPY to ensure the API
is always there when it's used, not just when DRBG is also enabled.
CRYPTO_DRBG would set it implicitly through CRYPTO_JITTERENTROPY now,
but this leaves it in place to make it explicit what the driver does.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2f313e029020f1fa5f58f38f48ff6988d67fc3c1) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The clocksource does not provide clocksource_register() function since f893598 commit (clocksource: Mostly kill clocksource_register()), so
let's remove unnecessary information about this function from a comment.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f5128432b08c3e263e1a7ce709d686b1ded51131) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Since the API for jent_panic() does not include format string parameters,
adjust the call to panic() to use a literal string to avoid any future
callers from leaking format strings into the panic message.
Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0c5f0aa5dd92a36a2c6491695abcb95196b88ef6) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The core of the Jitter RNG is intended to be compiled with -O0. To
ensure that the Jitter RNG can be compiled on all architectures,
separate out the RNG core into a stand-alone C file that can be compiled
with -O0 which does not depend on any kernel include file.
As no kernel includes can be used in the C file implementing the core
RNG, any dependencies on kernel code must be extracted.
A second file provides the link to the kernel and the kernel crypto API
that can be compiled with the regular compile options of the kernel.
Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit dfc9fa91938bd0cd5597a3da33d613986149a1e6) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The patch removes the use of timekeeping_valid_for_hres which is now
marked as internal for the time keeping subsystem. The jitterentropy
does not really require this verification as a coarse timer (when
random_get_entropy is absent) is discovered by the initialization test
of jent_entropy_init, which would cause the jitter rng to not load in
that case.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit cf58fcb1bea9e0fcf3447bdb959ef5bcd22cfbcb) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
The CPU Jitter RNG provides a source of good entropy by
collecting CPU executing time jitter. The entropy in the CPU
execution time jitter is magnified by the CPU Jitter Random
Number Generator. The CPU Jitter Random Number Generator uses
the CPU execution timing jitter to generate a bit stream
which complies with different statistical measurements that
determine the bit stream is random.
The CPU Jitter Random Number Generator delivers entropy which
follows information theoretical requirements. Based on these
studies and the implementation, the caller can assume that
one bit of data extracted from the CPU Jitter Random Number
Generator holds one bit of entropy.
The CPU Jitter Random Number Generator provides a decentralized
source of entropy, i.e. every caller can operate on a private
state of the entropy pool.
The RNG does not have any dependencies on any other service
in the kernel. The RNG only needs a high-resolution time
stamp.
Further design details, the cryptographic assessment and
large array of test results are documented at
http://www.chronox.de/jent.html.
CC: Andreas Steffen <andreas.steffen@strongswan.org> CC: Theodore Ts'o <tytso@mit.edu> CC: Sandy Harris <sandyinchina@gmail.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit bb5530e4082446aac3a3d69780cd4dbfa4520013) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
This patch adds the function crypto_rng_set_entropy. It is only
meant to be used by testmgr when testing RNG implementations by
providing fixed entropy data in order to verify test vectors.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7ca99d814821e8a8ac6d7c48b2ccfc24bda27b1f) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
There is no reason why crypto_rng_reset should modify the seed
so this patch marks it as const. Since our algorithms don't
export a const seed function yet we have to go through some
contortions for now.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 3c5d8fa9f56ad0928e7a1f06003e5034f5eedb52) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
This patch adds the new top-level function crypto_rng_generate
which generates random numbers with additional input. It also
extends the mid-level rng_gen_random function to take additional
data as input.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ff030b099a21a4753af575b4304249e88400e506) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
This patch converts the top-level crypto_rng to the "new" style.
It was the last algorithm type added before we switched over
to the new way of doing things exemplified by shash.
All users will automatically switch over to the new interface.
Note that this patch does not touch the low-level interface to
rng implementations.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit d0e83059a6c9b04f00264a74b8f6439948de4613) Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:
This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.
Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com> Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com> Signed-off-by: Kees Cook <keescook@chromium.org>
(cherry picked from commit a4866aa812518ed1a37d8ea0c881dc946409de94)
This commit restores commit 5acb959ad599 ("RDS: add reconnect retry scheme
for stalled connections"). Even though this retry scheme "workaround"
causes a long brownout time in the OVM configuration, it is needed to avoid
RDS loopback connections stalls after switch reboot in the bare-metal
system. As for now, the plan agreed with Exadata is to put back this commit
first and have a similar code path among QU6, QU5 and QU4.
Wei Lin Guay [Thu, 10 Aug 2017 09:59:35 +0000 (11:59 +0200)]
Revert "net/rds: prioritize the base connection establishment"
This reverts commit 1bc87d23681a ("net/rds: prioritize the base connection
establishment"). This patch is reverted because the RDS path record caching
is replaced by the underlying ibacm path record caching. By doing so, all
the TOS connections can be established without depending on its base
connection to perform SA query. Thus, it is not required to prioritize
the base connection establishment.
Wei Lin Guay [Thu, 10 Aug 2017 09:57:34 +0000 (11:57 +0200)]
Revert "net/rds: determine active/passive connection with IP addresses"
This reverts commit 1f2ea7a020a1 ("net/rds: determine active/passive
connection with IP addresses"). The plan is to use the original, well
tested one-sided reconnection that was reverted in "812c02791: RDS: restore
the exponential back-off scheme".
Wei Lin Guay [Thu, 10 Aug 2017 09:36:03 +0000 (11:36 +0200)]
Revert "net/rds: use different workqueue for base_conn"
This reverts commit ad7312bb8b8e ("net/rds: use different workqueue for
base_conn"). The RDS path record caching will be replaced by ibacm. Thus,
all the TOS connections do not rely on its base connection to perform SA
query. As a result, RDS does not need a separate "priority" workqueue for
the base connection.
Shaohua Li [Fri, 8 May 2015 17:51:31 +0000 (10:51 -0700)]
blk-mq: avoid re-initialize request which is failed in direct dispatch
If we directly issue a request and it fails, we use
blk_mq_merge_queue_io(). But we already assigned bio to a request in
blk_mq_bio_to_request. blk_mq_merge_queue_io shouldn't run
blk_mq_bio_to_request again.
Orabug: 26339553 Suggested-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 239ad215f0d8388cbe6c09a0fab8ad8ff5dba420) Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Instead of duplicating the logic multiple times. Also, it is unnecessary
to zero the buffer in .get_ethtool_stats() because it is already zeroed
by the caller.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5c8227d0d3b1eb1ad8f98d0b6dc619d70f2cfa04) Signed-off-by: Brian Maly <brian.maly@oracle.com>
To allow users to set the hardware bridging mode to VEB or VEPA. Only
single function PF can change the bridging mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 39d8ba2e71fbdde686d7e31ad141a01994dc0793) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
Retrieve and store the hardware bridge mode, so that we can implement
ndo_bridge_{get|set)link methods in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 32e8239c9138a050bc1feeea7cf41f27d79e6664) Signed-off-by: Brian Maly <brian.maly@oracle.com>
VF representors and PTP are added features in the new firmware spec.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit acb2005463612930b07723e852b2483d669ff856) Signed-off-by: Brian Maly <brian.maly@oracle.com>