Igor Mitsyanko [Tue, 31 Oct 2017 01:04:53 +0000 (18:04 -0700)]
qtnfmac: configure and start AP interface with a single command
Current logic artificially divides "start AP" procedure into three
stages:
- generic interface configuration (security, channel etc)
- IE's processing
- enable AP mode on interface
This separation would not allow to do a proper device configuration as
first stage needs to use information from IEs that are processed on
a second stage. Which means first and second stages have to be meged.
In that case there is no point anymore to keep third stage either, so
merge all three into a single command.
This new command carries all the same info as contained in
"struct cfg80211_ap_settings".
Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Igor Mitsyanko [Tue, 31 Oct 2017 01:04:51 +0000 (18:04 -0700)]
qtnfmac: SCAN results: retreive frame type information from "IE set" TLV
"IE set" TLV carries the same information as
qlink_event_scan_result::frame_type. Convert the event to make use of
TLV and drop frame_type member.
While at it, make qlink_event_scan_result structure alignement-safe.
Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Igor Mitsyanko [Tue, 31 Oct 2017 01:04:47 +0000 (18:04 -0700)]
qtnfmac: use per-band HT/VHT info from wireless device
HT/VHT capabilities must be reported per each band supported by a radio,
not for all bands on a radio. Furthermore, driver better not assume
any capabilities and just use whetever is reported by device itself.
To support this, convert "get channels" command into "get band info"
command. Difference is that it may also carry HT/VHT capabilities along
with channels information.
While at it, also add "num_bitrates" field to "get band info" command,
for future use.
Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arend Van Spriel [Wed, 8 Nov 2017 13:36:37 +0000 (14:36 +0100)]
brcmfmac: move configuration of probe request IEs
The configuration of the IEs for probe requests was done in a P2P
related function, which is not very obvious. Moving it to
.scan callback function, ie. brcmf_cfg80211_scan().
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arend Van Spriel [Wed, 8 Nov 2017 13:36:36 +0000 (14:36 +0100)]
brcmfmac: get rid of struct brcmf_cfg80211_info::active_scan field
The field struct brcmf_cfg80211_info::active_scan is set to true upon
initializing the driver instance, but it is never changed so simply
get rid of it.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arend Van Spriel [Wed, 8 Nov 2017 13:36:35 +0000 (14:36 +0100)]
brcmfmac: get rid of brcmf_cfg80211_escan() function
The function brcmf_cfg80211_escan() is only called by brcmf_cfg80211_scan()
so there is no reason to split in two function especially since the latter
does not do an awful lot.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arend Van Spriel [Wed, 8 Nov 2017 13:36:33 +0000 (14:36 +0100)]
brcmfmac: cleanup brcmf_cfg80211_escan() function
The function brcmf_cfg80211_escan() was always called with a non-null
request parameter and null pointer for this_ssid parameter. Clean up
the function removing the dead code path.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Franky Lin [Wed, 8 Nov 2017 13:36:32 +0000 (14:36 +0100)]
brcmfmac: disable packet filtering in promiscuous mode
Disable arp and nd offload to allow all packets sending to host.
Reported-by: Phil Elwell <phil@raspberrypi.org> Tested-by: Phil Elwell <phil@raspberrypi.org> Reviewed-by: Arend Van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arend Van Spriel [Wed, 8 Nov 2017 13:36:31 +0000 (14:36 +0100)]
brcmfmac: handle FWHALT mailbox indication
The firmware uses a mailbox to communicate to the host what is going
on. In the driver we validate the bit received. Various people seen
the following message:
brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012
Bit 4 is cause of this message, but this actually indicates the firmware
has halted. Handle this bit by giving a more meaningful error message.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Colin Ian King [Sat, 4 Nov 2017 19:37:59 +0000 (19:37 +0000)]
rtlwifi: remove redundant initialization to cfg_cmd
cfg_cmd is initialized to zero and this value is never read, instead
it is over-written in the start of a do-while loop. Remove the
redundant initialization. Cleans up clang warning:
drivers/net/wireless/realtek/rtlwifi/core.c:1750:22: warning: Value
stored to 'cfg_cmd' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arnd Bergmann [Mon, 6 Nov 2017 13:55:36 +0000 (14:55 +0100)]
rtlwifi: use ktime_get_real_seconds() for suspend time
do_gettimeofday() is deprecated and slower than necessary for the purpose
of reading the seconds. This changes rtl_op_suspend/resume to use
ktime_get_real_seconds() instead, which is simpler and avoids confusion
about whether it is y2038-safe or not.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arnd Bergmann [Mon, 6 Nov 2017 13:55:35 +0000 (14:55 +0100)]
rtlwifi: fix uninitialized rtlhal->last_suspend_sec time
We set rtlhal->last_suspend_sec to an uninitialized stack variable,
but unfortunately gcc never warned about this, I only found it
while working on another patch. I opened a gcc bug for this.
Presumably the value of rtlhal->last_suspend_sec is not all that
important, but it does get used, so we probably want the
patch backported to stable kernels.
Cc: stable@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82839 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Larry Finger [Wed, 1 Nov 2017 15:29:18 +0000 (10:29 -0500)]
rtlwifi: rtl_pci: Simplify some code be eliminating extraneous variables
In several places, the code assigns a variable inside an "if" or "case"
block, but uses it only once. The code is simplified by eliminating
the extraneous variable. With this change, one level of indenting is
saved.
This patch does not cause any functional changes in the binary code.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Ping-Ke Shih <pkshih@realtek.com> Cc: Yan-Hsuan Chuang <yhchuang@realtek.com> Cc: Birming Chiu <birming@realtek.com> Cc: Shaofu <shaofu@realtek.com> Cc: Steven Ting <steventing@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Kalle Valo [Mon, 6 Nov 2017 10:31:07 +0000 (12:31 +0200)]
Merge tag 'iwlwifi-next-for-kalle-2017-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
iwlwifi updates
* Some new PCI IDs;
* A bunch of cleanups;
* The timers update by Kees;
* Add more register dump call-sites;
* A fix for a locking issue in the TX flush code;
* Actual implementation of the TX flush code for A000;
* An optimization to drop RX frames during restart to avoid BA issues;
This patchset introduces an eBPF-based device controller for cgroup v2.
Patches (1) and (2) are a preparational work required to share some code
with the existing device controller implementation.
Patch (3) is the main patch, which introduces a new bpf prog type
and all necessary infrastructure.
Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
Patch (5) implements an example of eBPF program which controls access
to device files and corresponding userspace test.
v3:
Renamed constants introduced by patch (3) to BPF_DEVCG_*
Roman Gushchin [Sun, 5 Nov 2017 13:15:34 +0000 (08:15 -0500)]
selftests/bpf: add a test for device cgroup controller
Add a test for device cgroup controller.
The test loads a simple bpf program which logs all
device access attempts using trace_printk() and forbids
all operations except operations with /dev/zero and
/dev/urandom.
Then the test creates and joins a test cgroup, and attaches
the bpf program to it.
Then it tries to perform some simple device operations
and checks the result:
create /dev/null (should fail)
create /dev/zero (should pass)
copy data from /dev/urandom to /dev/zero (should pass)
copy data from /dev/urandom to /dev/full (should fail)
copy data from /dev/random to /dev/zero (should fail)
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:33 +0000 (08:15 -0500)]
bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/
The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:32 +0000 (08:15 -0500)]
bpf, cgroup: implement eBPF-based device controller for cgroup v2
Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.
This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:31 +0000 (08:15 -0500)]
device_cgroup: prepare code for bpf-based device controller
This is non-functional change to prepare the device cgroup code
for adding eBPF-based controller for cgroups v2.
The patch performs the following changes:
1) __devcgroup_inode_permission() and devcgroup_inode_mknod()
are moving to the device-cgroup.h and converting into static inline.
2) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
by both existing and new bpf-based implementations.
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:30 +0000 (08:15 -0500)]
device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants
Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.
The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.
Signed-off-by: Roman Gushchin <guro@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: Tejun Heo <tj@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 5 Nov 2017 14:25:02 +0000 (23:25 +0900)]
Merge tag 'mlx5-updates-2017-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-11-04
This series includes:
From Huy: dscp to priority mapping for Ethernet packet.
===================================================
First six patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.
Firmware interface:
Mellanox firmware provides two control knobs for this feature:
QPTS register allow changing the trust state between dscp and
pcp mode. The default is pcp mode. Once in dscp mode, firmware will
route the packet based on its dscp value if the dscp field exists.
QPDPM register allow mapping a specific dscp (0 to 63) to a
specific priority (0 to 7). By default, all the dscps are mapped to
priority zero.
Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.
The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.
When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.
When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
===================================================
Plus to the dscp series, some small misc changes are include as well:
From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic
From Or Gerlitz, Enlarge the NIC TC offload table size
From Rabie, Initialize destination_flow struct to 0
From Feras, Add inner TTC table to IPoIB flow steering
From Tal, Enable CQE based moderation on TX CQ
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Sat, 4 Nov 2017 15:49:00 +0000 (16:49 +0100)]
nfp: implement ethtool FEC mode settings
Add support in the driver ethtool ops to modify the NFP FEC modes.
The FEC modes can be set for vNIC associated with physical ports or
for MAC representor netdevs.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Sat, 4 Nov 2017 15:48:59 +0000 (16:48 +0100)]
nfp: add helpers for FEC support
Implement helpers to determine and modify FEC modes via the NSP.
The NSP advertises FEC capabilities on a per port basis and provides
support for:
* Auto mode selection
* Reed Solomon
* BaseR
* None/Off
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Sat, 4 Nov 2017 15:48:58 +0000 (16:48 +0100)]
nfp: add get/set link settings ndos to representors
Since it is now safe to modify link settings for representors, we can
attach the get/set link settings ndos to it. The get/set link settings
are nfp_port based operations.
If a port becomes invalid, the representor will be removed in the same
way a vnic would be.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Sat, 4 Nov 2017 15:48:57 +0000 (16:48 +0100)]
nfp: resync repr state when port table sync
If the NSP port table has been refreshed, resync the representor state
with the new port information. At the moment, this only entails looking
for invalid ports and killing off representors associated with them.
The repr instance becomes NULL which is safe since the app accessor
function for reprs returns NULL when it cannot access a repr.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Sat, 4 Nov 2017 15:48:56 +0000 (16:48 +0100)]
nfp: refactor nfp_app_reprs_set
The criteria that reprs cannot be replaced with another new set of reprs
has been removed. This check is not needed since the only use case that
could exercise this at the moment, would be to modify the number of
SRIOV VFs without first disabling them. This case is explicitly
disallowed in any case and subsequent patches in this series
need to be able to replace the running set of reprs.
All cases where the return code used to be checked for the
nfp_app_reprs_set function have been removed.
As stated above, it is not possible for the current code to encounter a
case where reprs exist and need to be replaced.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 4 Nov 2017 15:48:55 +0000 (16:48 +0100)]
nfp: make use of MAC reinit
Recent management FW images can perform full reinit of MAC cores
without requiring a reboot. When loading the driver check if there
are changes pending and if so call NSP MAC reinit. Full application
FW reload is still required, and all MACs need to be reinited at the
same time (not only the ones which have been reconfigured, and thus
potentially causing disruption to unrelated netdevs) therefore for
now changing MAC config without reloading the driver still remains
future work.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 4 Nov 2017 15:48:54 +0000 (16:48 +0100)]
nfp: don't depend on compiler constant propagation
Matthias reports:
nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
identify the 'mask' parameter as known to be constant at compile time,
which is required to use the FIELD_GET() macro.
The forced inlining does the trick for gcc, but for kernel builds with
clang it results in undefined symbols:
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function
`__nfp_eth_set_aneg':
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787):
undefined reference to `__compiletime_assert_492'
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1):
undefined reference to `__compiletime_assert_496'
These __compiletime_assert_xyx() calls would have been optimized away
if
the compiler had seen 'mask' as a constant.
Add a macro to extract the mask and shift and pass those to
nfp_eth_set_bit_config() separately.
Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Priyaranjan Jha [Fri, 3 Nov 2017 23:38:48 +0000 (16:38 -0700)]
tcp: higher throughput under reordering with adaptive RACK reordering wnd
Currently TCP RACK loss detection does not work well if packets are
being reordered beyond its static reordering window (min_rtt/4).Under
such reordering it may falsely trigger loss recoveries and reduce TCP
throughput significantly.
This patch improves that by increasing and reducing the reordering
window based on DSACK, which is now supported in major TCP implementations.
It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries.
- If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded
by srtt), since there is possibility that spurious retransmission was
due to reordering delay longer than reo_wnd.
- Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16)
no. of successful recoveries (accounts for full DSACK-based loss
recovery undo). After that, reset it to default (min_rtt/4).
- At max, reo_wnd is incremented only once per rtt. So that the new
DSACK on which we are reacting, is due to the spurious retx (approx)
after the reo_wnd has been updated last time.
- reo_wnd is tracked in terms of steps (of min_rtt/4), rather than
absolute value to account for change in rtt.
In our internal testing, we observed significant increase in throughput,
in scenarios where reordering exceeds min_rtt/4 (previous static value).
Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 5 Nov 2017 13:31:39 +0000 (22:31 +0900)]
Merge branch 'dsa-parsing-stage'
Vivien Didelot says:
====================
net: dsa: parsing stage
When registering a DSA switch, there is basically two stages.
The first stage is the parsing of the switch device, from either device
tree or platform data. It fetches the DSA tree to which it belongs, and
validates its ports. The switch device is then added to the tree, and
the second stage is called if this was the last switch of the tree.
The second stage is the setup of the tree, which validates that the tree
is complete, sets up the routing tables, the default CPU port for user
ports, sets up the switch drivers and finally the master interfaces,
which makes the whole switch fabric functional.
This patch series covers the first parsing stage. It fixes the type of
the switch and tree indexes to unsigned int, simplifies the tree
reference counting and the switch and CPU ports parsing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:27 +0000 (19:05 -0400)]
net: dsa: rework switch parsing
When parsing a switch, we have to identify to which tree it belongs and
parse its ports. Provide two functions to separate the OF and platform
data specific paths.
Also use the of_property_read_variable_u32_array function to parse the
OF member array instead of calling of_property_read_u32_index twice.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:25 +0000 (19:05 -0400)]
net: dsa: rework switch addition and removal
This patch removes the unnecessary index argument from the
dsa_dst_add_ds and dsa_dst_del_ds functions and renames them to
dsa_tree_add_switch and dsa_tree_remove_switch respectively.
In addition to a more explicit scope, we now check the presence of an
existing switch with the same index directly within dsa_tree_add_switch.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:24 +0000 (19:05 -0400)]
net: dsa: provide a find or new tree helper
Rename dsa_get_dst to dsa_tree_find since it doesn't increment the
reference counter, rename dsa_add_dst to dsa_tree_alloc for symmetry
with dsa_tree_free, and provide a convenient dsa_tree_touch function to
find or allocate a new tree.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:23 +0000 (19:05 -0400)]
net: dsa: get and put tree reference counting
Provide convenient dsa_tree_get and dsa_tree_put functions scoping a DSA
tree used to increment and decrement its reference counter, instead of
poking directly its kref structure.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:22 +0000 (19:05 -0400)]
net: dsa: simplify tree reference counting
DSA trees have a refcount used to automatically free the dsa_switch_tree
structure once there is no switch devices inside of it.
The refcount is incremented when a switch is added to the tree, and
decremented when it is removed from it.
But because of kref_init, the refcount is also incremented at
initialization, and when looking up the tree from the list for symmetry.
Thus the current code stores the number of switches plus one, and makes
the switch registration more complex.
To simplify the switch registration function, we reset the refcount to
zero after initialization and don't increment it when looking up a tree.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:21 +0000 (19:05 -0400)]
net: dsa: make tree index unsigned
Similarly to a DSA switch and port, rename the tree index from "tree" to
"index" and make it an unsigned int because it isn't supposed to be less
than 0.
u32 is an OF specific data used to retrieve the value and has no need to
be propagated up to the tree index.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:20 +0000 (19:05 -0400)]
net: dsa: make switch index unsigned
Define the DSA switch index as an unsigned int, because it will never be
less than 0.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha [Fri, 3 Nov 2017 21:32:33 +0000 (14:32 -0700)]
liquidio: do not consider packets dropped by network stack as driver Rx dropped
netdev->rx_dropped was including packets dropped by napi_gro_receive.
If a packet is dropped by network stack, it should not be counted under
driver Rx dropped.
Made necessary changes to not include network stack drops under
netdev->rx_dropped.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Fri, 3 Nov 2017 20:59:07 +0000 (13:59 -0700)]
tools: bpftool: move p_err() and p_info() from main.h to common.c
The two functions were declared as static inline in a header file. There
is no particular reason why they should be inlined, they just happened to
remain in the same header file when they were turned from macros to
functions in a precious commit.
Make them non-inlined functions and move them to common.c file instead.
Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
bpf: add offload as a first class citizen
This series is my stab at what was discussed at a recent IOvisor
bi-weekly call. The idea is to make the device translator run at
the program load time. This makes the offload more explicit to
the user space. It also makes it easy for the device translator
to insert information into the original verifier log.
v2:
- include linux/bug.h instead of asm/bug.h;
- rebased on top of Craig's verifier fix (no changes, the last patch
just removes more code now). I checked the set doesn't conflict
with Jiri's, Josef's or Roman's patches, but missed Craig's fix :(
v1:
- rename the ifindex member on load;
- improve commit messages;
- split nfp patches more.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:30 +0000 (13:56 -0700)]
bpf: remove old offload/analyzer
Thanks to the ability to load a program for a specific device,
running verifier twice is no longer needed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:29 +0000 (13:56 -0700)]
nfp: bpf: move to new BPF program offload infrastructure
Following steps are taken in the driver to offload an XDP program:
XDP_SETUP_PROG:
* prepare:
- allocate program state;
- run verifier (bpf_analyzer());
- run translation;
* load:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
* clean up:
- free program image.
With new infrastructure the flow will look like this:
BPF_OFFLOAD_VERIFIER_PREP:
- allocate program state;
BPF_OFFLOAD_TRANSLATE:
- run translation;
XDP_SETUP_PROG:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
BPF_OFFLOAD_DESTROY:
- free program image.
Take advantage of the new infrastructure. Allocation of driver
metadata has to be moved from jit.c to offload.c since it's now
done at a different stage. Since there is no separate driver
private data for verification step, move temporary nfp_meta
pointer into nfp_prog. We will now use user space context
offsets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:28 +0000 (13:56 -0700)]
nfp: bpf: move translation prepare to offload.c
struct nfp_prog is currently only used internally by the translator.
This means there is a lot of parameter passing going on, between
the translator and different stages of offload. Simplify things
by allocating nfp_prog in offload.c already.
We will now use kmalloc() to allocate the program area and only
DMA map it for the time of loading (instead of allocating DMA
coherent memory upfront).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:27 +0000 (13:56 -0700)]
nfp: bpf: move program prepare and free into offload.c
Most of offload/translation prepare logic will be moved to
offload.c. To help git generate more reasonable diffs
move nfp_prog_prepare() and nfp_prog_free() functions
there as a first step.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:26 +0000 (13:56 -0700)]
nfp: bpf: require seamless reload for program replace
Firmware supports live replacement of programs for quite some
time now. Remove the software-fallback related logic and
depend on the FW for program replace. Seamless reload will
become a requirement if maps are present, anyway.
Load and start stages have to be split now, since replace
only needs a load, start has already been done on add.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:25 +0000 (13:56 -0700)]
nfp: bpf: refactor offload logic
We currently create a fake cls_bpf offload object when we want
to offload XDP. Simplify and clarify the code by moving the
TC/XDP specific logic out of common offload code. This is easy
now that we don't support legacy TC actions. We only need the
bpf program and state of the skip_sw flag.
Temporarily set @code to NULL in nfp_net_bpf_offload(), compilers
seem to have trouble recognizing it's always initialized. Next
patches will eliminate that variable.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:24 +0000 (13:56 -0700)]
nfp: bpf: remove unnecessary include of nfp_net.h
BPF offload's main header does not need to include nfp_net.h.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:23 +0000 (13:56 -0700)]
nfp: bpf: remove the register renumbering leftovers
The register renumbering was removed and will not be coming back
in its old, naive form, given that it would be fundamentally
incompatible with calling functions. Remove the leftovers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:22 +0000 (13:56 -0700)]
nfp: bpf: drop support for cls_bpf with legacy actions
Only support BPF_PROG_TYPE_SCHED_CLS programs in direct
action mode. This simplifies preparing the offload since
there will now be only one mode of operation for that type
of program. We need to know the attachment mode type of
cls_bpf programs, because exit codes are interpreted
differently for legacy vs DA mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:21 +0000 (13:56 -0700)]
cls_bpf: allow attaching programs loaded for specific device
If TC program is loaded with skip_sw flag, we should allow
the device-specific programs to be accepted.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:20 +0000 (13:56 -0700)]
xdp: allow attaching programs loaded for specific device
Pass the netdev pointer to bpf_prog_get_type(). This way
BPF code can decide whether the device matches what the
code was loaded/translated for.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:19 +0000 (13:56 -0700)]
bpftool: print program device bound info
If program is bound to a device, print the name of the relevant
interface or unknown if the netdev has since been removed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:18 +0000 (13:56 -0700)]
bpf: report offload info to user space
Extend struct bpf_prog_info to contain information about program
being bound to a device. Since the netdev may get destroyed while
program still exists we need a flag to indicate the program is
loaded for a device, even if the device is gone.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:17 +0000 (13:56 -0700)]
bpf: offload: add infrastructure for loading programs for a specific netdev
The fact that we don't know which device the program is going
to be used on is quite limiting in current eBPF infrastructure.
We have to reverse or limit the changes which kernel makes to
the loaded bytecode if we want it to be offloaded to a networking
device. We also have to invent new APIs for debugging and
troubleshooting support.
Make it possible to load programs for a specific netdev. This
helps us to bring the debug information closer to the core
eBPF infrastructure (e.g. we will be able to reuse the verifer
log in device JIT). It allows device JITs to perform translation
on the original bytecode.
__bpf_prog_get() when called to get a reference for an attachment
point will now refuse to give it if program has a device assigned.
Following patches will add a version of that function which passes
the expected netdev in. @type argument in __bpf_prog_get() is
renamed to attach_type to make it clearer that it's only set on
attachment.
All calls to ndo_bpf are protected by rtnl, only verifier callbacks
are not. We need a wait queue to make sure netdev doesn't get
destroyed while verifier is still running and calling its driver.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:16 +0000 (13:56 -0700)]
net: bpf: rename ndo_xdp to ndo_bpf
ndo_xdp is a control path callback for setting up XDP in the
driver. We can reuse it for other forms of communication
between the eBPF stack and the drivers. Rename the callback
and associated structures and definitions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Nov 2017 13:18:59 +0000 (06:18 -0700)]
tcp: do not clear again skb->csum in tcp_init_nondata_skb()
tcp_init_nondata_skb() is fed with freshly allocated skbs.
They already have a cleared csum field, no need to clear it again.
This is based on Neal review on commit 3b11775033dc ("tcp: do not mangle
skb->cb[] in tcp_make_synack()"), noticing I did not clear skb->csum.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We can now build this driver on ARM, so I ran into a randconfig build
warning that presumably had existed on powerpc already.
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c: In function 'sg_fd_to_skb':
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:1712:18: error: 'skb' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I'm slightly changing the logic here, to make it obvious to the
compiler that 'skb' is always initialized.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 5 Nov 2017 12:49:17 +0000 (21:49 +0900)]
Merge branch 'openvswitch-netns'
Flavio Leitner says:
====================
Allow openvswitch to query ports in another netns.
Today Open vSwitch users are moving internal ports to other namespaces and
although packets are flowing OK, the userspace daemon can't find out basic
information like if the port is UP or DOWN, for instance.
This patchset extends openvswitch API to retrieve the current netnsid of
a port. It will be used by the userspace daemon to find out in which netns
the port is located.
This patchset also extends the rtnetlink getlink call to accept and operate
on a given netnsid. More details are available in each patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 2 Nov 2017 19:04:38 +0000 (17:04 -0200)]
rtnetlink: use netnsid to query interface
Currently, when an application gets netnsid from the kernel (for example as
the result of RTM_GETLINK call on one end of the veth pair), it's not much
useful. There's no reliable way to get to the netns fd from the netnsid, nor
does any kernel API accept netnsid.
Extend the RTM_GETLINK call to also accept netnsid. It will operate on the
netns with the given netnsid in such case. Of course, the calling process
needs to have enough capabilities in the target name space; for now, require
CAP_NET_ADMIN. This can be relaxed in the future.
To signal to the calling process that the kernel understood the new
IFLA_IF_NETNSID attribute in the query, it will include it in the response.
This is needed to detect older kernels, as they will just ignore
IFLA_IF_NETNSID and query in the current name space.
This patch implemetns IFLA_IF_NETNSID only for get and dump. For set
operations, this can be extended later.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 2 Nov 2017 19:04:37 +0000 (17:04 -0200)]
openvswitch: reliable interface indentification in port dumps
This patch allows reliable identification of netdevice interfaces connected
to openvswitch bridges. In particular, user space queries the netdev
interfaces belonging to the ports for statistics, up/down state, etc.
Datapath dump needs to provide enough information for the user space to be
able to do that.
Currently, only interface names are returned. This is not sufficient, as
openvswitch allows its ports to be in different name spaces and the
interface name is valid only in its name space. What is needed and generally
used in other netlink APIs, is the pair ifindex+netnsid.
The solution is addition of the ifindex+netnsid pair (or only ifindex if in
the same name space) to vport get/dump operation.
On request side, ideally the ifindex+netnsid pair could be used to
get/set/del the corresponding vport. This is not implemented by this patch
and can be added later if needed.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tal Gilboa [Tue, 26 Sep 2017 13:20:43 +0000 (16:20 +0300)]
net/mlx5e: Enable CQE based moderation on TX CQ
By using CQE based moderation on TX CQ we can reduce the number of TX
interrupt rate. Besides the benefit of less interrupts, this also
allows the kernel to better utilize TSO. Since TSO has some CPU overhead,
it might not aggregate when CPU is under high stress. By reducing the
interrupt rate and the CPU utilization, we can get better aggregation
and better overall throughput.
The feature is enabled by default and has a private flag in ethtool
for control.
Or Gerlitz [Thu, 12 Jan 2017 14:19:29 +0000 (16:19 +0200)]
net/mlx5: Enlarge the NIC TC offload table size
The NIC TC offload table size was hard coded to 1k. Change it to be
min(max NIC RX table size,
min(max flow counters, 64k) * num flow groups)
where the max values are read from the firmware and the number of
flow groups is hard-coded as before this change.
We don't know upfront the division of flows to groups (== different masks).
This setup allows each group to be of size up to the where we want to go
(when supported, all offloaded flows use counters). Thus, we don't expect
multiple occurences for a group which in turn would add steering hops.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ
If the port is in DSCP trust state, packets are placed in the right
priority queue based on the dscp value. This is done by selecting
the transmit queue based on the dscp of the skb.
Until now select_queue honors priority only from the vlan header.
However that is not sufficient in cases where port trust state is DSCP
mode as packet might not even contain vlan header. Therefore if the port
is in dscp trust state and vport's min inline mode is not NONE,
copy the IP header to the eseg's inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features such
as xdpsq, icosq are not modified.
This patch implements dcbnl hooks to set and delete DSCP to priority map
as defined by the DCB subsystem. Device maintains internal trust state
which needs to be set to DSCP state for performing DSCP to priority mapping.
When the first dscp to priority APP entry is added by the user, the
trust state is changed to dscp.
When the last dscp to priority APP entry is deleted by the user, the
trust state is changed to pcp.
If user sends multiple dscp to priority APP entries on the same dscp,
the last sent one will take effect. All the previous sent will be
deleted.
The dscp to priority APP entries are added and deleted in the net/dcb
APP database using dcb_ieee_setapp/getapp.
net/mlx5: QPTS and QPDPM register firmware command support
The QPTS register allows changing the priority trust state between pcp and
dscp. Add support to get/set trust state from device. When the port is
in pcp/dscp trust state, packet is routed by hardware to matching priority
based on its pcp/dscp value respectively.
The QPDPM register allow channing the dscp to priority mapping. Add support
to get/set dscp to priority mapping from device.
Note that to change a dscp mapping, the "e" bit of this dscp structure
must be set in the QPDPM firmware command.
Eric Dumazet [Sat, 4 Nov 2017 15:27:14 +0000 (08:27 -0700)]
pktgen: do not abuse IN6_ADDR_HSIZE
pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an
IPv6 address.
Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old
bug is hitting us.
Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 3 Nov 2017 08:09:45 +0000 (08:09 +0000)]
net: sched: cls_u32: use bitwise & rather than logical && on n->flags
Currently n->flags is being operated on by a logical && operator rather
than a bitwise & operator. This looks incorrect as these should be bit
flag operations. Fix this.
Detected by CoverityScan, CID#1460398 ("Logical vs. bitwise operator")
Fixes: 245dc5121a9b ("net: sched: cls_u32: call block callbacks for offload") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>