Eric Dumazet [Sat, 16 Nov 2019 01:55:54 +0000 (17:55 -0800)]
selftests: net: avoid ptl lock contention in tcp_mmap
tcp_mmap is used as a reference program for TCP rx zerocopy,
so it is important to point out some potential issues.
If multiple threads are concurrently using getsockopt(...
TCP_ZEROCOPY_RECEIVE), there is a chance the low-level mm
functions compete on shared ptl lock, if vma are arbitrary placed.
Instead of letting the mm layer place the chunks back to back,
this patch enforces an alignment so that each thread uses
a different ptl lock.
Performance measured on a 100 Gbit NIC, with 8 tcp_mmap clients
launched at the same time :
$ for f in {1..8}; do ./tcp_mmap -H 2002:a05:6608:290:: & done
In the following run, we reproduce the old behavior by requesting no alignment :
$ tcp_mmap -sz -C $((128*1024)) -a 4096
received 32768 MB (100 % mmap'ed) in 9.69532 s, 28.3516 Gbit
cpu usage user:0.08634 sys:3.86258, 120.511 usec per MB, 171839 c-switches
received 32768 MB (100 % mmap'ed) in 25.4719 s, 10.7914 Gbit
cpu usage user:0.055268 sys:21.5633, 659.745 usec per MB, 9065 c-switches
received 32768 MB (100 % mmap'ed) in 28.5419 s, 9.63069 Gbit
cpu usage user:0.057401 sys:23.8761, 730.392 usec per MB, 14987 c-switches
received 32768 MB (100 % mmap'ed) in 28.655 s, 9.59268 Gbit
cpu usage user:0.059689 sys:23.8087, 728.406 usec per MB, 18509 c-switches
received 32768 MB (100 % mmap'ed) in 28.7808 s, 9.55074 Gbit
cpu usage user:0.066042 sys:23.4632, 718.056 usec per MB, 24702 c-switches
received 32768 MB (100 % mmap'ed) in 28.8259 s, 9.5358 Gbit
cpu usage user:0.056547 sys:23.6628, 723.858 usec per MB, 23518 c-switches
received 32768 MB (100 % mmap'ed) in 28.8808 s, 9.51767 Gbit
cpu usage user:0.059357 sys:23.8515, 729.703 usec per MB, 14691 c-switches
received 32768 MB (100 % mmap'ed) in 28.8879 s, 9.51534 Gbit
cpu usage user:0.047115 sys:23.7349, 725.769 usec per MB, 21773 c-switches
New behavior (automatic alignment based on Hugepagesize),
we can see the system overhead being dramatically reduced.
$ tcp_mmap -sz -C $((128*1024))
received 32768 MB (100 % mmap'ed) in 13.5339 s, 20.3103 Gbit
cpu usage user:0.122644 sys:3.4125, 107.884 usec per MB, 168567 c-switches
received 32768 MB (100 % mmap'ed) in 16.0335 s, 17.1439 Gbit
cpu usage user:0.132428 sys:3.55752, 112.608 usec per MB, 188557 c-switches
received 32768 MB (100 % mmap'ed) in 17.5506 s, 15.6621 Gbit
cpu usage user:0.155405 sys:3.24889, 103.891 usec per MB, 226652 c-switches
received 32768 MB (100 % mmap'ed) in 19.1924 s, 14.3222 Gbit
cpu usage user:0.135352 sys:3.35583, 106.542 usec per MB, 207404 c-switches
received 32768 MB (100 % mmap'ed) in 22.3649 s, 12.2906 Gbit
cpu usage user:0.142429 sys:3.53187, 112.131 usec per MB, 250225 c-switches
received 32768 MB (100 % mmap'ed) in 22.5336 s, 12.1986 Gbit
cpu usage user:0.140654 sys:3.61971, 114.757 usec per MB, 253754 c-switches
received 32768 MB (100 % mmap'ed) in 22.5483 s, 12.1906 Gbit
cpu usage user:0.134035 sys:3.55952, 112.718 usec per MB, 252997 c-switches
received 32768 MB (100 % mmap'ed) in 22.6442 s, 12.139 Gbit
cpu usage user:0.126173 sys:3.71251, 117.147 usec per MB, 253728 c-switches
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Arjun Roy <arjunroy@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 15 Nov 2019 21:38:25 +0000 (22:38 +0100)]
r8169: load firmware for RTL8168fp/RTL8117
Load Realtek-provided firmware for RTL8168fp/RTL8117. Unlike the
firmware for other chip versions which is for the PHY, firmware for
RTL8168fp/RTL8117 is for the MAC.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 15 Nov 2019 20:35:22 +0000 (21:35 +0100)]
r8169: improve conditional firmware loading for RTL8168d
Using constant MII_EXPANSION is misleading here because register 0x06
has a different meaning on page 0x0005. Here a proprietary PHY
parameter is read by writing the parameter id to register 0x05 on page
0x0005, followed by reading the parameter value from register 0x06.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 15 Nov 2019 20:05:45 +0000 (20:05 +0000)]
net: phylink: update to use phy_support_asym_pause()
Use phy_support_asym_pause() rather than open-coding it.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Matteo Croce [Fri, 15 Nov 2019 11:10:37 +0000 (12:10 +0100)]
bonding: symmetric ICMP transmit
A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports
to balance packets between slaves. With some network errors, we receive an ICMP
error packet by the remote host or a router. If sent by a router, the source IP
can differ from the remote host one. Additionally the ICMP protocol has no port
numbers, so a layer3+4 bonding will get a different hash than the previous one.
These two conditions could let the packet go through a different interface than
the other packets of the same flow:
# tcpdump -qltnni veth0 |sed 's/^/0: /' &
# tcpdump -qltnni veth1 |sed 's/^/1: /' &
# hping3 -2 192.168.0.2 -p 9
0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
An ICMP error packet contains the header of the packet which caused the network
error, so inspect it and match the flow against it, so we can send the ICMP via
the same interface of the previous packet in the flow.
Move the IP and port dissect code into a generic function bond_flow_ip() and if
we are dissecting an ICMP error packet, call it again with the adjusted offset.
# hping3 -2 192.168.0.2 -p 9
1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0
0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0
0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Fri, 15 Nov 2019 10:11:15 +0000 (11:11 +0100)]
net: mscc: ocelot: omit error check from of_get_phy_mode
The commit 0c65b2b90d13c ("net: of_get_phy_mode: Change API to solve
int/unit warnings") updated the function of_get_phy_mode declaration.
Now it returns an error code and in case the node doesn't contain the
property 'phy-mode' or 'phy-connection-type' it returns -EINVAL and would
set the phy_interface_t to PHY_INTERFACE_MODE_NA.
Ocelot VSC7514 has 4 internal phys which have the phy interface
PHY_INTERFACE_MODE_NA. So because of_get_phy_mode would assign
PHY_INTERFACE_MODE_NA to phy_mode when there is an error, there is no need
to add the error check.
Updates for v2:
- drop error check because of_get_phy_mode already assigns phy_interface
to PHY_INTERFACE_MODE in case of error.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Lobakin [Fri, 15 Nov 2019 09:11:35 +0000 (12:11 +0300)]
net: core: allow fast GRO for skbs with Ethernet header in head
Commit 78d3fd0b7de8 ("gro: Only use skb_gro_header for completely
non-linear packets") back in May'09 (v2.6.31-rc1) has changed the
original condition '!skb_headlen(skb)' to
'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since
the drivers that need this optimisation all provide completely
non-linear packets" (note that this condition has become the current
'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit ced14f6804a9 ("net: Correct comparisons and calculations using
skb->tail and skb-transport_header") without any functional changes).
For now, we have the following rough statistics for v5.4-rc7:
1) napi_gro_frags: 14
2) napi_gro_receive with skb->head containing (most of) payload: 83
3) napi_gro_receive with skb->head containing all the headers: 20
4) napi_gro_receive with skb->head containing only Ethernet header: 2
With the current condition, fast GRO with the usage of
NAPI_GRO_CB(skb)->frag0 is available only in the [1] case.
Packets pushed by [2] and [3] go through the 'slow' path, but
it's not a problem for them as they already contain all the needed
headers in skb->head, so pskb_may_pull() only moves skb->data.
The layout of skbs in the fourth [4] case at the moment of
dev_gro_receive() is identical to skbs that have come through [1],
as napi_frags_skb() pulls Ethernet header to skb->head. The only
difference is that the mentioned condition is always false for them,
because skb_put() and friends irreversibly alter the tail pointer.
They also go through the 'slow' path, but now every single
pskb_may_pull() in every single .gro_receive() will call the *really*
slow __pskb_pull_tail() to pull headers to head. This significantly
decreases the overall performance for no visible reasons.
The only two users of method [4] is:
* drivers/staging/qlge
* drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq)
Note that in case with wireless drivers we can't use [1]
(napi_gro_frags()) at least for now and mac80211 stack always
performs pushes and pulls anyways, so performance hit is inavoidable.
At the moment of v2.6.31 the mentioned change was necessary (that's
why I don't add the "Fixes:" tag), but it became obsolete since
skb_gro_mac_header() has gone in commit a50e233c50db ("net-gro:
restore frag0 optimization"), so we can simply revert the condition
in gro_reset_offset() to allow skbs from [4] go through the 'fast'
path just like in case [1].
This was tested on a 600 MHz MIPS CPU and a custom driver and this
patch gave boosts up to 40 Mbps to method [4] in both directions
comparing to net-next, which made overall performance relatively
close to [1] (without it, [4] is the slowest).
v2:
- Add more references and explanations to commit message
- Fix some typos ibid
- No functional changes
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 16 Nov 2019 20:50:57 +0000 (12:50 -0800)]
Merge branch 'bnx2x-Remove-function-casts'
Kees Cook says:
====================
bnx2x: Remove function casts
In order to make the entire kernel usable under Clang's Control Flow
Integrity protections, function prototype casts need to be avoided
because this will trip CFI checks at runtime (i.e. a mismatch between
the caller's expected function prototype and the destination function's
prototype). Many of these cases can be found with -Wcast-function-type,
which found that bnx2x had a bunch of needless (or at least confusing)
function casts. This series removes them all.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 15 Nov 2019 05:07:15 +0000 (21:07 -0800)]
bnx2x: Remove hw_reset_t function casts
All .rw_reset callbacks except bnx2x_84833_hw_reset_phy() use a
void return type. No callers of .hw_reset check a return value and
bnx2x_84833_hw_reset_phy() unconditionally returns 0. Remove all
hw_reset_t casts and fix the return type to void.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 15 Nov 2019 05:07:14 +0000 (21:07 -0800)]
bnx2x: Remove format_fw_ver_t function casts
The return values for format_fw_ver_t callbacks are supposed to be
"int", not "u8". Ultimately, the top-level caller doesn't actually check
the return value at all, but just clean this all up anyway and fix the
prototypes so that casts are no longer needed.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 15 Nov 2019 05:07:12 +0000 (21:07 -0800)]
bnx2x: Remove read_status_t function casts
The function casts for .read_status callbacks end up casting some int
return values to u8. This seems to be bug-prone (-EINVAL being returned
into something that appears to be true/false), but fixing the function
prototypes doesn't change the existing behavior. Fix the return values
to remove the casts.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 15 Nov 2019 05:07:11 +0000 (21:07 -0800)]
bnx2x: Drop redundant callback function casts
NULL is already "void *" so it will auto-cast in assignments and
initializers. Additionally, all the callbacks for .link_reset,
.config_loopback, .set_link_led, and .phy_specific_func are already
correct. No casting is needed for these, so remove them.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Po Liu [Fri, 15 Nov 2019 03:33:41 +0000 (03:33 +0000)]
enetc: update TSN Qbv PSPEED set according to adjust link speed
ENETC has a register PSPEED to indicate the link speed of hardware.
It is need to update accordingly. PSPEED field needs to be updated
with the port speed for QBV scheduling purposes. Or else there is
chance for gate slot not free by frame taking the MAC if PSPEED and
phy speed not match. So update PSPEED when link adjust. This is
implement by the adjust_link.
Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Po Liu [Fri, 15 Nov 2019 03:33:33 +0000 (03:33 +0000)]
enetc: Configure the Time-Aware Scheduler via tc-taprio offload
ENETC supports in hardware for time-based egress shaping according
to IEEE 802.1Qbv. This patch implement the Qbv enablement by the
hardware offload method qdisc tc-taprio method.
Also update cbdr writeback to up level since control bd ring may
writeback data to control bd ring.
Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Lemon [Thu, 14 Nov 2019 22:13:00 +0000 (14:13 -0800)]
page_pool: do not release pool until inflight == 0.
The page pool keeps track of the number of pages in flight, and
it isn't safe to remove the pool until all pages are returned.
Disallow removing the pool until all pages are back, so the pool
is always available for page producers.
Make the page pool responsible for its own delayed destruction
instead of relying on XDP, so the page pool can be used without
the xdp memory model.
When all pages are returned, free the pool and notify xdp if the
pool is registered with the xdp memory system. Have the callback
perform a table walk since some drivers (cpsw) may share the pool
among multiple xdp_rxq_info.
Note that the increment of pages_state_release_cnt may result in
inflight == 0, resulting in the pool being released.
Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
last part of termination improvements
Patches 1 and 2 finish the set of termination patches, introducing
a reboot handler that terminates all link groups. Patch 3 adds an
rcu_barrier before the module is unloaded, and patch 4 is cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Sat, 16 Nov 2019 16:47:32 +0000 (17:47 +0100)]
net/smc: remove unused constant
Constant SMC_CLOSE_WAIT_LISTEN_CLCSOCK_TIME is defined, but since
commit 3d502067599f ("net/smc: simplify wait when closing listen socket")
no longer used. Remove it.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Sat, 16 Nov 2019 16:47:31 +0000 (17:47 +0100)]
net/smc: use rcu_barrier() on module unload
Add rcu_barrier() to make sure no RCU readers or callbacks are
pending when the module is unloaded.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Sat, 16 Nov 2019 16:47:30 +0000 (17:47 +0100)]
net/smc: guarantee removal of link groups in reboot
When rebooting it should be guaranteed all link groups are cleaned
up and freed.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Sat, 16 Nov 2019 16:47:29 +0000 (17:47 +0100)]
net/smc: introduce bookkeeping of SMCR link groups
If the smc module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
Counters for the total number of SMCR link groups and for the total
number of SMCR links per IB device are introduced. smc module unloading
continues only if the total number of SMCR link groups is zero. IB device
removal continues only it the total number of SMCR links per IB device
has decreased to zero.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petar Penkov [Thu, 14 Nov 2019 17:52:09 +0000 (09:52 -0800)]
tun: fix data-race in gro_normal_list()
There is a race in the TUN driver between napi_busy_loop and
napi_gro_frags. This commit resolves the race by adding the NAPI struct
via netif_tx_napi_add, instead of netif_napi_add, which disables polling
for the NAPI struct.
KCSAN reported:
BUG: KCSAN: data-race in gro_normal_list.part.0 / napi_busy_loop
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 11168 Comm: syz-executor.0 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 943170998b20 ("tun: enable NAPI for TUN/TAP driver") Signed-off-by: Petar Penkov <ppenkov@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 14 Nov 2019 16:43:27 +0000 (08:43 -0800)]
selftests: net: tcp_mmap should create detached threads
Since we do not plan using pthread_join() in the server do_accept()
loop, we better create detached threads, or risk increasing memory
footprint over time.
Fixes: 192dc405f308 ("selftests: net: add tcp_mmap program") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
nla_total_size returns the total length of attribute
including padding.
Cc: Joe Stringer <joe@ovn.org> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
DSA driver for Vitesse Felix switch
This series builds upon the previous "Accomodate DSA front-end into
Ocelot" topic and does the following:
- Reworks the Ocelot (VSC7514) driver to support one more switching core
(VSC9959), used in NPI mode. Some code which was thought to be
SoC-specific (ocelot_board.c) wasn't, and vice versa, so it is being
accordingly moved.
- Exports ocelot driver structures and functions to include/soc/mscc.
- Adds a DSA ocelot front-end for VSC9959, which is a PCI device and
uses the exported ocelot functionality for hardware configuration.
- Adds a tagger driver for the Vitesse injection/extraction DSA headers.
This is known to be compatible with at least Ocelot and Felix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:30 +0000 (17:03 +0200)]
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:29 +0000 (17:03 +0200)]
net: dsa: ocelot: add tagger for Ocelot/Felix switches
While it is entirely possible that this tagger format is in fact more
generic than just these 2 switch families, I don't have that knowledge.
The Seville switch in NXP T1040 has a similar frame format, but there
are enough differences (e.g. DEST field starts at bit 57 instead of 56)
that calling this file tag_vitesse.c is a bit of a stretch at the
moment. The frame format has been listed in a comment so that people who
add support for further Vitesse switches can rework this tagger while
keeping compatibility with Felix.
The "ocelot" name was chosen instead of "felix" because even the Ocelot
switch can act as a DSA device when it is used in NPI mode, and the Felix
tagger format is almost identical. Currently it is only used for the
Felix switch embedded in the NXP LS1028A chip.
The ABI for this tagger should be considered "not stable" at the moment.
The DSA tag is always placed before the Ethernet header and therefore,
we are using the long prefix for RX tags to avoid putting the DSA master
port in promiscuous mode. Once there will be an API in DSA for drivers
to request DSA masters to be in promiscuous mode unconditionally, we
will switch to the "no prefix" extraction frame header, which will save
16 padding bytes for each RX frame.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:28 +0000 (17:03 +0200)]
net: mscc: ocelot: publish ocelot_sys.h to include/soc/mscc
The Felix DSA driver needs to write to SYS_RAM_INIT_RAM_INIT for its own
chip initialization process.
Also update the MAINTAINERS file such that the headers exported by the
ocelot driver are under the same maintainers' umbrella as the driver
itself.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:27 +0000 (17:03 +0200)]
net: mscc: ocelot: publish structure definitions to include/soc/mscc/ocelot.h
We will be registering another switch driver based on ocelot, which
lives under drivers/net/dsa.
Make sure the Felix DSA front-end has the necessary abstractions to
implement a new Ocelot driver instantiation. This includes the function
prototypes for implementing DSA callbacks.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:26 +0000 (17:03 +0200)]
net: mscc: ocelot: separate the implementation of switch reset
The Felix switch has a different reset procedure, so a function pointer
needs to be created and added to the ocelot_ops structure.
The reset procedure has been moved into ocelot_init.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:25 +0000 (17:03 +0200)]
net: mscc: ocelot: adjust MTU on the CPU port in NPI mode
When using the NPI port, the DSA tag is passed through Ethernet, so the
switch's MAC needs to accept it as it comes from the DSA master. Increase
the MTU on the external CPU port to account for the length of the
injection header.
Without this patch, MTU-sized frames are dropped by the switch's CPU
port on xmit, which is especially obvious in TCP sessions.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:24 +0000 (17:03 +0200)]
net: mscc: ocelot: export a constant for the tag length in bytes
This constant will be used in a future patch to increase the MTU on NPI
ports, and will also be used in the tagger driver for Felix.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:23 +0000 (17:03 +0200)]
net: mscc: ocelot: create a helper for changing the port MTU
Since in an NPI/DSA setup, not all ports will have the same MTU, we need
to make sure the watermarks for pause frames and/or tail dropping logic
that existed in the driver is still coherent for the new MTU values.
We need to do this because the NPI (aka external CPU) port needs an
increased MTU for the DSA tag. This will be done in a future patch.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 14 Nov 2019 15:03:22 +0000 (17:03 +0200)]
net: mscc: ocelot: move invariant configs out of adjust_link
It doesn't make sense to rewrite all these registers every time the PHY
library notifies us about a link state change.
In a future patch we will customize the MTU for the CPU port, and since
the MTU was previously configured from adjust_link, if we don't make
this change, its value would have got overridden.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 14 Nov 2019 15:03:21 +0000 (17:03 +0200)]
net: mscc: ocelot: filter out ocelot SoC specific PCS config from common path
The adjust_link routine should be generic enough to be (re)used by
any SoC that integrates a switch core compatible with the Ocelot
core switch driver. Currently all configurations are generic except
for the PCS settings that are SoC specific. Move these out to the
Ocelot SoC/board instance.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 14 Nov 2019 15:03:20 +0000 (17:03 +0200)]
net: mscc: ocelot: move resource ioremap and regmap init to common code
Let's make this ioremap and regmap init code common. It should not
be platform dependent as it should be usable by PCI devices too.
Use better names where necessary to avoid clashes.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Part 3 of the SMC termination patches improves the link group
termination processing and introduces the ability to immediately
terminate a link group.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:47 +0000 (13:02 +0100)]
net/smc: immediate termination for SMCR link groups
If the SMC module is unloaded or an IB device is thrown away, the
immediate link group freeing introduced for SMCD is exploited for SMCR
as well. That means SMCR-specifics are added to smc_conn_kill().
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:46 +0000 (13:02 +0100)]
net/smc: wait for tx completions before link freeing
Make sure all pending work requests are completed before freeing
a link.
Dismiss tx pending slots already when terminating a link group to
exploit termination shortcut in tx completion queue handler.
And kill the completion queue tasklets after destroy of the
completion queues, otherwise there is a time window for another
tasklet schedule of an already killed tasklet.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:45 +0000 (13:02 +0100)]
net/smc: abnormal termination without orderly flag
For abnormal termination issue an LLC DELETE_LINK without the
orderly flag.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:44 +0000 (13:02 +0100)]
net/smc: no WR buffer wait for terminating link group
Avoid waiting for a free work request buffer, if the link group
is already terminating.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:43 +0000 (13:02 +0100)]
net/smc: introduce bookkeeping of SMCD link groups
If the ism module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
A counters for the total number of SMCD link groups per ISM device is
introduced. ism module unloading continues only if the total number of
SMCD link groups for all ISM devices is zero. ISM device
removal continues only it the total number of SMCD link groups per ISM
device has decreased to zero.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:42 +0000 (13:02 +0100)]
net/smc: abnormal termination of SMCD link groups
A final cleanup due to SMCD device removal means immediate freeing
of all link groups belonging to this device in interrupt context.
This patch introduces a separate SMCD link group termination routine,
which terminates all link groups of an SMCD device.
This new routine smcd_terminate_all ()is reused if the smc module is
unloaded.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:41 +0000 (13:02 +0100)]
net/smc: immediate termination for SMCD link groups
SMCD link group termination is called when peer signals its shutdown
of its corresponding link group. For regular shutdowns no connections
exist anymore. For abnormal shutdowns connections must be killed and
their DMBs must be unregistered immediately. That means the SMCR method
to delay the link group freeing several seconds does not fit.
This patch adds immediate termination of a link group and its SMCD
connections and makes sure all SMCD link group related cleanup steps
are finished.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 14 Nov 2019 12:02:40 +0000 (13:02 +0100)]
net/smc: fix final cleanup sequence for SMCD devices
If peer announces shutdown, use the link group terminate worker for
local cleanup of link groups and connections to terminate link group
in proper context.
Make sure link groups are cleaned up first before destroying the
event queue of the SMCD device, because link group cleanup may
raise events.
Send signal shutdown only if peer has not done it already.
Send socket abort or close only, if peer has not already announced
shutdown.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Thu, 14 Nov 2019 11:42:50 +0000 (12:42 +0100)]
net: stmmac: Rework TX Coalesce logic
Coalesce logic currently increments the number of packets and sets the
IC bit when the coalesced packets have passed a given limit. This does
not reflect very well what coalesce was meant for as we can have a large
number of packets that are coalesced and then a single one, sent later
on that has the IC bit.
Rework the logic so that it coalesces only upon a limit of packets and
sets the IC bit for large number of packets.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Thu, 14 Nov 2019 11:42:48 +0000 (12:42 +0100)]
net: stmmac: xgmac: Remove uneeded computation for RFA/RFD
RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO
space we have, the later we can activate Flow Control. Let's use
hard-coded values for RFA and RFD for all FIFO sizes with the exception
of 4k, which is a special case.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Thu, 14 Nov 2019 11:42:47 +0000 (12:42 +0100)]
net: stmmac: gmac4+: Remove uneeded computation for RFA/RFD
RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO
space we have, the later we can activate Flow Control. Let's use
hard-coded values for RFA and RFD for all FIFO sizes with the exception
of 4k, which is a special case.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Thu, 14 Nov 2019 11:42:46 +0000 (12:42 +0100)]
net: stmmac: Setup a default RX Coalesce value instead of the minimum
For performance reasons, sometimes using the minimum RX Coalesce value
is not optimal. Lets setup a default value that is optimal in most of
the use cases.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 14 Nov 2019 09:54:19 +0000 (11:54 +0200)]
mlxsw: spectrum_router: Allocate discard adjacency entry when needed
Commit 0c3cbbf96def ("mlxsw: Add specific trap for packets routed via
invalid nexthops") allocated an adjacency entry during driver
initialization whose purpose is to discard packets hitting the route
pointing to it.
These adjacency entries are allocated from a resource called KVD linear
(KVDL). There are situations in which the user can decide to set the
size of this resource (via devlink-resource) to 0, in which case the
driver will not be able to load.
Therefore, instead of pre-allocating this adjacency entry, simply
allocate it only when needed. A variable indicating the validity of the
entry is added and is used to ensure it is only allocated and written
once and that it is freed after all the routes were flushed.
Fixes: 0c3cbbf96def ("mlxsw: Add specific trap for packets routed via invalid nexthops") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 14 Nov 2019 07:39:46 +0000 (15:39 +0800)]
net/tls: Fix unused function warning
If PROC_FS is not set, gcc warning this:
net/tls/tls_proc.c:23:12: warning:
'tls_statistics_seq_show' defined but not used [-Wunused-function]
Use #ifdef to guard this.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Mon, 11 Nov 2019 03:34:27 +0000 (03:34 +0000)]
rtw88: remove duplicated include from ps.c
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Zheng Yongjun [Sun, 10 Nov 2019 10:49:55 +0000 (18:49 +0800)]
rtl8xxxu: Remove set but not used variable 'rsr'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c: In function rtl8xxxu_gen2_config_channel:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:1266:13: warning: variable rsr set but not used [-Wunused-but-set-variable]
rsr is never used, so remove it.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Chris Chiu <chiu@endlessm.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Johannes Berg [Fri, 15 Nov 2019 07:28:31 +0000 (09:28 +0200)]
iwlwifi: mvm: fix non-ACPI function
The code now compiles without ACPI, but there's a warning since
iwl_mvm_get_ppag_table() isn't used, and iwl_mvm_ppag_init() must
not unconditionally fail but return success instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Johannes Berg [Fri, 15 Nov 2019 07:28:28 +0000 (09:28 +0200)]
iwlwifi: 22000: fix some indentation
Somehow two tabs snuck into this file where just one should be
used, fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
This is dead code, nothing uses the IWL_DEVICE_22560 macro and
thus nothing every uses IWL_DEVICE_FAMILY_22560. Remove it all.
While at it, remove some code and definitions used only in this
case, and clean up some comments/names that still refer to it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Mordechay Goodstein [Fri, 15 Nov 2019 07:28:20 +0000 (09:28 +0200)]
iwlwifi: mvm: start CTDP budget from 2400mA
The current budget of 2000mA is preventing us from reaching maximum
throughput. According to our system engineers, we can increase the
maximum budget to 2400mA to solve this problem.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Haim Dreyfuss [Fri, 15 Nov 2019 07:28:17 +0000 (09:28 +0200)]
iwlwifi: mvm: don't skip mgmt tid when flushing all tids
There are various of flows which require tids flushing
(disconnection, suspend, etc...).
Currently, when the driver instructs the FW to flush
he masks all the data tids(0-7).
However, the driver doesn't set the management tid (#15)
which cause the FW not to flush it.
When the FW tries to remove the mgmt queue he throws an assert
since it is not an empty queue.
instead of just set only the data tids set everything and let
the FW ignore the invalid tids.
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Shahar S Matityahu [Fri, 15 Nov 2019 07:28:14 +0000 (09:28 +0200)]
iwlwifi: mvm: scan: enable adaptive dwell in p2p
Align to the requirement update and support adaptive dwell in p2p scan.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Ihab Zhaika [Fri, 15 Nov 2019 07:28:11 +0000 (09:28 +0200)]
iwlwifi: refactor the SAR tables from mvm to acpi
Refactored the SAR related functions from iwlmvm to acpi
in order to make it shared between different opmodes
in addition to removing unused variable ppag_rev.
Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Shahar S Matityahu [Fri, 15 Nov 2019 07:28:08 +0000 (09:28 +0200)]
iwlwifi: scan: support scan req cmd ver 12
Implement scan request command version 12.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Shahar S Matityahu [Fri, 15 Nov 2019 07:28:05 +0000 (09:28 +0200)]
iwlwifi: scan: make new scan req versioning flow
Implement a new versioning handling flow supported from version 11
onwards.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Tested with a Wireless AC 7265 for ~6 months, confirmed to fix the
problem. No other unaligned accesses are spotted yet.
Signed-off-by: Wang Xuerui <wangxuerui@qiniu.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Colin Ian King [Fri, 15 Nov 2019 07:27:59 +0000 (09:27 +0200)]
iwlwifi: remove redundant assignment to variable bufsz
The variable bufsz is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Johannes Berg [Fri, 15 Nov 2019 07:27:54 +0000 (09:27 +0200)]
iwlwifi: FW API: reference enum in docs of modify_mask
Add a reference to the correct enum rather than showing
the pattern of the actual constants.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Tova Mussai [Fri, 15 Nov 2019 07:27:48 +0000 (09:27 +0200)]
iwlwifi: scan: adapt the code to use api ver 11
FW scan api ver 11 adds support for some new features,
in this version the fw did also some cleanup in the api,
which causes the driver not to be able to use the
current scan req struct.
Therefore, in this patch the driver has new version for the scan command
code
Signed-off-by: Tova Mussai <tova.mussai@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Shahar S Matityahu [Fri, 15 Nov 2019 07:27:40 +0000 (09:27 +0200)]
iwlwifi: dbg_ini: support dump collection upon assert during D3
add assert time point in the D3 resume flow in case there was an assert
during D3.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Mordechay Goodstein [Fri, 15 Nov 2019 07:27:34 +0000 (09:27 +0200)]
iwlwifi: mvm: in VHT connection use only VHT capabilities
mac80211 limits amsdu size to the minimum of HT and VHT capabilities
but since in a VHT connection we don't transmit HT frames we can discard
HT limits.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Luca Coelho [Fri, 15 Nov 2019 07:27:25 +0000 (09:27 +0200)]
iwlwifi: mvm: fix support for single antenna diversity
When the single antenna diversity support was sent upstream, only some
definitions were sent, due to a bad revert.
Fix this by adding the actual code.
Fixes: 5952e0ec3f05 ("iwlwifi: mvm: add support for single antenna diversity") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Bjorn Andersson [Wed, 13 Nov 2019 23:35:58 +0000 (15:35 -0800)]
ath10k: qmi: Sleep for a while before assigning MSA memory
Unless we sleep for a while before transitioning the MSA memory to WLAN
the MPSS.AT.4.0.c2-01184-SDM845_GEN_PACK-1 firmware triggers a security
violation fairly reliably. Unforutnately recovering from this failure
always results in the entire system freezing.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
please apply the following qeth patches to net-next.
Along with the usual cleanups, this
(1) reduces collateral packet loss in the RX path when dealing with
bad packets and/or allocation errors, and
(2) simplifies how the L3 driver deals with mcast IP addresses.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 14 Nov 2019 10:19:21 +0000 (11:19 +0100)]
s390/qeth: consolidate L3 mcast registration code
Current code processes each (VLAN) device twice - once to inspect the
IPv4 mcast addresses, and then a second time to walk the IPv6 mcast
addresses. Unify all this into a single helper, thus removing some
checks and a duplicated VLAN lookup.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 14 Nov 2019 10:19:20 +0000 (11:19 +0100)]
s390/qeth: remove gratuitious RX modeset
Trust the IPv4/IPv6 code to properly remove its mcast addresses when a
VLAN device is unregistered, and then also trigger an RX modeset
whenever it's needed.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 14 Nov 2019 10:19:18 +0000 (11:19 +0100)]
s390/qeth: clean up error path in qeth_core_probe_device()
qeth_core_free_card() is meant to be the counterpart of
qeth_alloc_card() - but unfortunately was also picked as the place
to free the QDIO queues.
This gets messy when qeth_core_probe_device() fails during
qeth_add_dbf_entry(). At this point the card->qdio.state is not initialized
yet, so qeth_free_qdio_queues() ends up operating on uninitialized data.
Luckily for now, the whole qeth_card struct is zero-allocated and the value
of the QETH_QDIO_UNINITIALIZED enum is 0 as well. So there's no real impact
from this bug at the moment, it's just really fragile.
Clean this up by moving the qeth_free_qdio_queues() call up one level in
the hierarchy. This way it doesn't get called from the error path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 14 Nov 2019 10:19:17 +0000 (11:19 +0100)]
s390/qeth: handle skb allocation error gracefully
When current code fails to allocate an skb in the RX path, it drops the
whole RX buffer. Considering the large number of packets that a single
RX buffer might contain, this is quite drastic.
Skip over the packet instead, and try to extract the next packet from
the RX buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 14 Nov 2019 10:19:16 +0000 (11:19 +0100)]
s390/qeth: drop unwanted packets earlier in RX path
Packets with an unexpected HW format are currently first extracted from
the RX buffer, passed upwards to the layer-specific driver and only then
finally dropped.
Enhance the RX path so that we can drop such packets before even
allocating an skb. For this, add some additional logic so that when a
packet is meant to be dropped, we can still walk along the packet's data
chunks in the RX buffer. This allows us to extract the following
packet(s) from the buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 14 Nov 2019 10:19:15 +0000 (11:19 +0100)]
s390/qeth: support per-frame invalidation
Each RX buffer may contain up to 64KB worth of data. In case the device
needs to discard a packet _after_ already having reserved space for it
in the buffer, the whole buffer gets set to ERROR state. As the buffer
might contain any number of good packets, this can result in collateral
packet loss.
qeth can provide relief by enabling per-frame invalidation. The RX
buffer is then presented as usual, we just need to spot & drop any
individual packet that was flagged as invalid.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 14 Nov 2019 10:19:14 +0000 (11:19 +0100)]
s390/qeth: gather more detailed RX dropped/error statistics
Where available, use the fine-grained counters in rtnl_link_stats64 to
indicate different RX error causes. For drop reasons, use driver-private
ethtool counters.
In particular this patch allows us to keep track of driver-side drops due
to unknown/unsupported HW descriptor format.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Nov 2019 02:12:19 +0000 (18:12 -0800)]
Merge branch 'vsock-add-multi-transports-support'
Stefano Garzarella says:
====================
vsock: add multi-transports support
Most of the patches are reviewed by Dexuan, Stefan, and Jorgen.
The following patches need reviews:
- [11/15] vsock: add multi-transports support
- [12/15] vsock/vmci: register vmci_transport only when VMCI guest/host
are active
- [15/15] vhost/vsock: refuse CID assigned to the guest->host transport
v1 -> v2:
- Patch 11:
+ vmci_transport: sent reset when vsock_assign_transport() fails
[Jorgen]
+ fixed loopback in the guests, checking if the remote_addr is the
same of transport_g2h->get_local_cid()
+ virtio_transport_common: updated space available while creating
the new child socket during a connection request
- Patch 12:
+ removed 'features' variable in vmci_transport_init() [Stefan]
+ added a flag to register only once the host [Jorgen]
- Added patch 15 to refuse CID assigned to the guest->host transport in
the vhost_transport
This series adds the multi-transports support to vsock, following
this proposal: https://www.spinics.net/lists/netdev/msg575792.html
With the multi-transports support, we can use VSOCK with nested VMs
(using also different hypervisors) loading both guest->host and
host->guest transports at the same time.
Before this series, vmci_transport supported this behavior but only
using VMware hypervisor on L0, L1, etc.
The first 9 patches are cleanups and preparations, maybe some of
these can go regardless of this series.
Patch 10 changes the hvs_remote_addr_init(). setting the
VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make
the choice of transport to be used work properly.
Patch 11 adds multi-transports support.
Patch 12 changes a little bit the vmci_transport and the vmci driver
to register the vmci_transport only when there are active host/guest.
Patch 13 prevents the transport modules unloading while sockets are
assigned to them.
Patch 14 fixes an issue in the bind() logic discoverable only with
the new multi-transport support.
Patch 15 refuses CID assigned to the guest->host transport in the
vhost_transport.
I've tested this series with nested KVM (vsock-transport [L0,L1],
virtio-transport[L1,L2]) and with VMware (L0) + KVM (L1)
(vmci-transport [L0,L1], vhost-transport [L1], virtio-transport[L2]).
Dexuan successfully tested the RFC series on HyperV with a Linux guest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>