Ping-Ke Shih [Fri, 17 Nov 2023 02:40:28 +0000 (10:40 +0800)]
wifi: rtw89: 8922a: read efuse content via efuse map struct from logic map
Define efuse map struct of RTW89_EFUSE_BLOCK_RF block, and read needed
data from efuse logic map into driver. Also, with efuse power-on state,
read MAC address via register interface according to HCI interface.
Ping-Ke Shih [Fri, 17 Nov 2023 02:40:27 +0000 (10:40 +0800)]
wifi: rtw89: 8852c: read RX gain offset from efuse for 6GHz channels
Read calibration values of RX gain offset from efuse, and set them to
registers to normalize RX gain for all hardware modules. Then, PHY dynamic
mechanism can get expected values to adjust hardware parameters to yield
expected performance.
Ping-Ke Shih [Fri, 17 Nov 2023 02:40:26 +0000 (10:40 +0800)]
wifi: rtw89: mac: add to access efuse for WiFi 7 chips
MAC address, hardware type, calibration values and etc are stored in efuse,
so we read them at probe stage and use them as capabilities to register
hardware.
There are two physical efuse -- one is the main efuse for digital hardware
part, and the other is for analog part. Because they are very similar, we
only describe the main efuse below.
The main efuse is split into two regions -- one is for logic map, and the
other is for physical map. For both regions, we use the same method to read
data, but need additional parser to get logic map. To allow reading
operation, we need to convert power state to active, and turn to idle state
after reading.
For WiFi 7 chips, we introduce efuse blocks to define feature group easier,
and these blocks are discontinue. For example, RF block is from 0x1_0000 ~
0x1_0240, and the next block PCIE_SDIO is starting from 0x2_0000.
Comparing to old one used by WiFi 6 chips, there is only single one logic
map, it would be a little hard to add an new field to a group if we don't
reserve a room in advance.
The relationship between efuse, region and block is shown as below:
The parser converting from raw data to logic map is to decode block page,
block page offset, and word_en bits. Each word_en bit indicates two
following bytes as data of logic map, so total four word_en bits can
represent eight bytes. Thus, block page offset is 8-byte alignment.
The layout of a tuple is shown as below
Ping-Ke Shih [Fri, 17 Nov 2023 02:40:24 +0000 (10:40 +0800)]
wifi: rtw89: 8922a: add 8922A basic chip info
8922A is a 802.11be chip that can support 2/5/6 GHz bands 160MHz bandwidth.
Introduce the basic info such as firmware file name, some hardware address
and size, supported spatial stream, TX descriptor and so on, and then
we can add more attributes by later patches.
Zong-Zhe Yang [Tue, 14 Nov 2023 09:13:58 +0000 (17:13 +0800)]
wifi: rtw89: regd: handle policy of 6 GHz according to BIOS
According to BIOS configuration of Realtek ACPI DSM function 4,
RTW89_ACPI_DSM_FUNC_6G_BP, we handle the regd policy of 6 GHz.
Policy defines two modes as below.
1. `BLOCK` mode:
The countries in configured list are blocked.
2. `ALLOW` mode:
_Only_ the countries in configured list are allowed.
(i.e. others are all blocked.)
Then, when receiving regulatory notification at runtime, if 6 GHz
is blocked on the country, 6 GHz channels will be disabled.
Zong-Zhe Yang [Tue, 14 Nov 2023 09:13:57 +0000 (17:13 +0800)]
wifi: rtw89: acpi: process 6 GHz band policy from DSM
Realtek ACPI DSM func 4, RTW89_ACPI_DSM_FUNC_6G_BP,
accepts a configuration via ACPI buffer as below.
| index | description |
-------------------------
| [0-2] | signature |
| [3] | reserved |
| [4] | policy mode |
| [5] | country count |
| [6-] | country list |
Through this function, BIOS can indicate to allow/block
6 GHz on some specific countries. Still, driver should
follow regd first before taking this configuration into
account.
Dmitry Antipov [Mon, 13 Nov 2023 14:47:30 +0000 (17:47 +0300)]
wifi: rtlwifi: simplify rtl_action_proc() and rtl_tx_agg_start()
Since 'drv_priv' is an in-place member allocated at the end of
'struct ieee80211_sta', it can't be NULL and so relevant checks
in 'rtl_action_proc()' and 'rtl_tx_agg_start()' may be dropped.
Compile tested only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Ping-Ke Shih [Fri, 10 Nov 2023 01:23:19 +0000 (09:23 +0800)]
wifi: rtw89: pci: update interrupt mitigation register for 8922AE
To reduce interrupts, configure the mitigation setting of 8922AE when
bringing interface up, and then check situations to decide turning on or
off the function. With this, interrupt count decreases to 20,141 from
202,141 in period of 20 seconds.
Ping-Ke Shih [Fri, 10 Nov 2023 01:23:18 +0000 (09:23 +0800)]
wifi: rtw89: pci: correct interrupt mitigation register for 8852CE
To reduce interrupt count, configure mitigation register with thresholds
of time and packet count. We missed that 8852CE uses different register
address, so correct it. Then, interrupt counts down to 30,763 from 229,825
during stress test in 20 seconds.
Ping-Ke Shih [Fri, 10 Nov 2023 01:23:15 +0000 (09:23 +0800)]
wifi: rtw89: pci: add pre_deinit to be called after probe complete
At probe stage, we only do partial initialization to enable ability to
download firmware and read capabilities. After that, we use this pre_deinit
to disable HCI to save power.
Zong-Zhe Yang [Fri, 10 Nov 2023 01:23:14 +0000 (09:23 +0800)]
wifi: rtw89: pci: stop/start DMA for level 1 recovery according to chip gen
Level 1 recovery is to recover TX/RX rings, so it needs PCI to stop/start
DMA. But, different chip gen have different implementations, either
register address/mask or function flow. So, configure callback of
stop/start DMA by chip gen.
Zong-Zhe Yang [Fri, 10 Nov 2023 01:23:13 +0000 (09:23 +0800)]
wifi: rtw89: pci: reset BDRAM according to chip gen
Configure callback of reset BDRAM (buffer descriptor RAM) by chip gen.
Refine the one of 802.11ax chip gen and drop a redundant duplicate of it
in 802.11ax chip gen. Then, assign right callback of rst_bdram for HCI ops
which needs to do callback according to chip gen.
Shiji Yang [Thu, 9 Nov 2023 04:38:51 +0000 (12:38 +0800)]
wifi: rt2x00: correct wrong BBP register in RxDCOC calibration
Refer to Mediatek vendor driver RxDCOC_Calibration() function, when
performing gainfreeze calibration, we should write register 140
instead of 141. This fix can reduce the total calibration time from
6 seconds to 1 second.
Shiji Yang [Sat, 4 Nov 2023 08:58:00 +0000 (16:58 +0800)]
wifi: rt2x00: restart beacon queue when hardware reset
When a hardware reset is triggered, all registers are reset, so all
queues are forced to stop in hardware interface. However, mac80211
will not automatically stop the queue. If we don't manually stop the
beacon queue, the queue will be deadlocked and unable to start again.
This patch fixes the issue where Apple devices cannot connect to the
AP after calling ieee80211_restart_hw().
Shiji Yang [Sat, 4 Nov 2023 08:57:59 +0000 (16:57 +0800)]
wifi: rt2x00: disable RTS threshold for rt2800 by default
rt2800 has a lot of registers to control the RTS enable/disable
status for different rates. And the driver control them via
rt2800_set_rts_threshold(). When RTS was disabled in user
interface, this function won't be called at all. This means that
the RTS is still 'on' for CCK and OFDM rates. So we'd better to
disable them by default because it should be like this. The RTS
for HT20 and HT40 is already default off so we don't need to
touch them. If we toggle the RTS status, these register bits
will be enable/disable again by rt2800_set_rts_threshold().
Shiji Yang [Sat, 4 Nov 2023 08:57:58 +0000 (16:57 +0800)]
wifi: rt2x00: introduce DMA busy check watchdog for rt2800
When I tried to fix the watchdog of rt2800, I found that sometimes
the watchdog can not reset the hung device. This is because the
queue is not completely stuck, it just becomes very slow. The MTK
vendor driver for the new chip MT7603/MT7612 has a DMA busy watchdog
to detect device hangs by checking DMA busy status. This watchdog
implementation is something similar to it. To reduce unnecessary
reset, we can check the INT_SOURCE_CSR register together as I found
that when the radio hung, the RX/TX coherent interrupt will always
stuck at triggered state.
The 'watchdog' module parameter has been extended to control all
watchdogs(0=disabled, 1=hang watchdog, 2=DMA watchdog, 3=both). This
new watchdog function is a slight schedule and it won't affect the
transmission speed. So we can turn on it by default. Due to the
INT_SOURCE_CSR register is invalid on rt2800 USB NICs, the DMA busy
watchdog will be automatically disabled for them.
Chih-Kang Chang [Fri, 3 Nov 2023 02:08:51 +0000 (10:08 +0800)]
wifi: rtw88: fix RX filter in FIF_ALLMULTI flag
The broadcast packets will be filtered in the FIF_ALLMULTI flag in
the original code, which causes beacon packets to be filtered out
and disconnection. Therefore, we fix it.
Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231103020851.102238-1-pkshih@realtek.com
Dmitry Antipov [Thu, 2 Nov 2023 11:55:55 +0000 (14:55 +0300)]
wifi: rtw88: simplify __rtw_tx_work()
Since 'ieee80211_txq_get_depth()' allows NULL for 2nd and
3rd arguments, simplify '__rtw_tx_work()' by dropping unused
'byte_cnt'. Compile tested only.
Ping-Ke Shih [Thu, 2 Nov 2023 00:37:16 +0000 (08:37 +0800)]
wifi: rtw89: coex: use struct assignment to replace memcpy() to append TDMA content
To notify firmware TDMA timeslot assignment, append TDMA parameters when
sending policy H2C firmware command. However, compiler warns we do memcpy()
data to val[] field of TLV struct. To avoid this, assign the struct value
with simple '=' instead. Compile tested only.
rtw89/coex.c: In function '_append_tdma':
drivers/net/wireless/realtek/rtw89/coex.c:1585:17:
warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
1585 | memcpy(&v3->tdma, &dm->tdma, sizeof(v3->tdma));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/net/wireless/realtek/rtw89/coex.h:8,
from drivers/net/wireless/realtek/rtw89/coex.c:5:
drivers/net/wireless/realtek/rtw89/core.h:2703:37:
note: at offset [5714, 71249] into destination object 'ver' of size 8
2703 | const struct rtw89_btc_ver *ver;
| ^~~
drivers/net/wireless/realtek/rtw89/coex.c:1579:17:
warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
1579 | memcpy(v, &dm->tdma, sizeof(*v));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/wireless/realtek/rtw89/core.h:2703:37:
note: at offset [5710, 71245] into destination object 'ver' of size 8
2703 | const struct rtw89_btc_ver *ver;
| ^~~
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310301908.Wrj0diqe-lkp@intel.com/ Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231102003716.25815-1-pkshih@realtek.com
Ping-Ke Shih [Wed, 1 Nov 2023 07:21:47 +0000 (15:21 +0800)]
wifi: rtw89: pci: implement PCI mac_pre_init for WiFi 7 chips
Call this function when doing MAC initialization at probe stage. It does
partial initial registers only, because we only need basic ability to
download firmware. The function to clear index is the sub-function,
so set its pointer as well.
Ping-Ke Shih [Wed, 1 Nov 2023 07:21:46 +0000 (15:21 +0800)]
wifi: rtw89: pci: use gen_def pointer to configure mac_{pre,post}_init and clear PCI ring index
Use gen_def pointer to call three WiFi 6 specific functions, and add _ax
suffix to them. Then, we will implement functions for WiFi 7 chips later.
The mac_{pre,post}_init are used to initialize PCI during doing MAC
initialization, and clear PCI ring index to make index consistent between
driver, firmware and hardware.
Dmitry Antipov [Tue, 31 Oct 2023 17:13:24 +0000 (20:13 +0300)]
wifi: wilc1000: simplify wilc_scan()
Simplify 'wilc_scan()' assuming 'struct wilc_priv *' is the only data
passed to '(*scan_result)()' callback and thus avoid typeless 'void *'
pointers in related code, including 'struct wilc_user_scan_req'.
Compile tested only.
Dmitry Antipov [Tue, 31 Oct 2023 17:13:23 +0000 (20:13 +0300)]
wifi: wilc1000: cleanup struct wilc_conn_info
Remove set but otherwise unused 'ch' member of 'struct wilc_conn_info'
and avoid typeless 'void *' pointers in '(*conn_result)()' callback.
Likewise for 'wilc_parse_join_bss_param()'. Compile tested only.
Arnd Bergmann [Mon, 23 Oct 2023 13:19:51 +0000 (15:19 +0200)]
wifi: remove orphaned rndis_wlan driver
Wireless RNDIS USB is a new-style CFG80211 driver for 802.11b and
802.11g USB hardware from around 2004 to 2006. This makes it more modern
than any of the others, but Kalle already classified it as "legacy"
in commit 298e50ad8eb8f ("wifi: move raycs, wl3501 and rndis_wlan to
legacy directory").
Jussi Kivilinna worked on this driver between 2008 and 2012, and it has
only seen cosmetic updates after that.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org>
Arnd Bergmann [Mon, 23 Oct 2023 13:19:49 +0000 (15:19 +0200)]
wifi: remove orphaned ray_cs driver
Aviator/Raytheon is an early PCMCIA driver, apparently predating 802.11b
and only supporting wireless extensions.
The driver has been orphaned since 2010 and only seen cosmetic updates
long before than. Jean Tourrilhes pointed out in a 2005 changelog that
he tested a change on actual hardware, which was apparently already
noteworthy back then.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org>
Arnd Bergmann [Mon, 23 Oct 2023 13:19:48 +0000 (15:19 +0200)]
wifi: remove orphaned orinoco driver
Orinoco is a PIO-only ISA/PCMCIA 802.11b device with extra bus interface
connections for PCI/Cardbus/mini-PCI and a few pre-2002 Apple PowerMac
variants. It supports both wireless extensions and CFG80211, but I could
not tell if it requires using both.
This device used to be one of the most common ones 20 years ago, but
has been orphaned for most of the time since then, and the conversion
to cfg80211 has stalled in 2010.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org>
Arnd Bergmann [Mon, 23 Oct 2023 13:19:46 +0000 (15:19 +0200)]
wifi: remove obsolete hostap driver
HostAP is an ISA/PCMCIA style 802.11b driver supporting only
wireless extensions, and some custom ioctls (already removed).
Some devices include a legacy PCI bridge but no DMA.
The driver was marked obsolete in 2016 and is highly unlikely
to still have any users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org>
Arnd Bergmann [Mon, 23 Oct 2023 13:19:43 +0000 (15:19 +0200)]
wifi: libertas: drop 16-bit PCMCIA support
With all the other PCMCIA WLAN adapters gone from the kernel, this is now
the last remaining device with this interface, but as far as I can tell,
all the actual libertas users were actually using either SDIO or USB.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org>
Ping-Ke Shih [Fri, 27 Oct 2023 01:50:59 +0000 (09:50 +0800)]
wifi: rtw89: extend PHY status parser to support WiFi 7 chips
PHY status IEs is used to have more information about received packets,
such as RSSI and EVM. For each PHY IE type, it has different predefined
PHY IE length, and the length are changed, so add them for WiFi 7 chips
accordingly.
Ping-Ke Shih [Fri, 27 Oct 2023 01:50:58 +0000 (09:50 +0800)]
wifi: rtw89: consider RX info for WiFi 7 chips
For WiFi 7 chips, some fields and predefined length are changed, so
add them accordingly.
The mac_id field is used to identify peer that send a packet, and we can
use it to know RSSI and traffic from the peer. For WiFi 7 chips,
RXWD.mac_id of PPDU status packet is not set by hardware. Instead, we
should fill it by rxinfo_user[].mac_id of PPDU status content.
Also, filter out invalid reports to prevent warning messages keep showing.
Justin Stitt [Thu, 26 Oct 2023 23:19:18 +0000 (23:19 +0000)]
wifi: airo: replace deprecated strncpy with strscpy_pad
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
`extra` is clearly supposed to be NUL-terminated which is evident by the
manual NUL-byte assignment as well as its immediate usage with strlen().
Moreover, let's NUL-pad since there is deliberate effort (48 instances)
made elsewhere to zero-out buffers in these getters and setters:
6050 | memset(local->config.nodeName, 0, sizeof(local->config.nodeName));
6130 | memset(local->config.rates, 0, 8);
6139 | memset(local->config.rates, 0, 8);
6414 | memset(key.key, 0, MAX_KEY_SIZE);
6497 | memset(extra, 0, 16);
(to be clear, strncpy also NUL-padded -- we are matching that behavior)
Considering the above, a suitable replacement is `strscpy_pad` due to
the fact that it guarantees both NUL-termination and NUL-padding on the
destination buffer.
We can also replace the hard-coded size of "16" to IW_ESSID_MAX_SIZE
because this function is a wext handler.
Dmitry Antipov [Thu, 26 Oct 2023 14:10:11 +0000 (17:10 +0300)]
wifi: wilc1000: simplify remain on channel support
For 'struct wilc_remain_ch', drop set but otherwise unused 'duration'
field and adjust 'expired' callback assuming that the only data passed
to it is 'struct wilc_vif *', thus making 'wilc_remain_on_channel()'
a bit simpler as well. Compile tested only.
Ping-Ke Shih [Thu, 26 Oct 2023 12:00:48 +0000 (20:00 +0800)]
wifi: rtw89: pci: add new RX ring design to determine full RX ring efficiently
To make hardware efficient to determine if RX ring is full, introduce new
design that checks if reading and writing indices are equal. Comparing
to old design, initial indices of both reading and writing indices are 0
that means empty, and hardware checks full by "writing index + 1 ==
reading index". The "+1" has extra cost for hardware, so new design is
to avoid this.
Take ring size is 256 as an example, the initial reading and writing
indices are 255 and 0 respectively; the initial values mean empty. If two
indices are the same, for example 5 and 5, it means ring is full.
wp rp used_cnt state
255 0 0 initial (ring is empty)
255 1 1 receive 1st packet
255 2 2 receive 2nd packet
0 2 1 driver read 1st packet
1 2 0 driver read 2nd packet (ring is empty)
:
5 5 255 ring is full
Ping-Ke Shih [Thu, 26 Oct 2023 12:00:47 +0000 (20:00 +0800)]
wifi: rtw89: pci: define PCI ring address for WiFi 7 chips
PCI rings are used to DMA TX/RX packets. The address of WiFi 7 chips are
different from previous ones, so add them according to hardware design.
Another difference is that driver doesn't need to configure BD (buffer
descriptor) RAM table, which is used by hardware to fetch BD ahead before
fetching whole TX data.
A TX ring contains numbers of TX BD (e.g. 512):
TX BD (buffer descriptor; continual memory)
+---+---+---+---+ +---+
| | | | | ... | |
+-|-+---+---+---+ +---+
|
| point to TX WD (WiFi descriptor; metadata of a skb data)
v
+------+
| |
| |
+-|----+
|
| point to a skb data
v
+------+
| |
| |
+------+
Ping-Ke Shih [Thu, 26 Oct 2023 12:00:46 +0000 (20:00 +0800)]
wifi: rtw89: 8922ae: add 8922AE PCI entry and basic info
8922AE is a PCIE 802.11be wireless adapter with PID 0x8922. We add basic
configurations including PCI DMA mode, PCI parameters, register address to
control TX/RX rings and etc.
Dmitry Antipov [Tue, 24 Oct 2023 14:31:33 +0000 (17:31 +0300)]
wifi: rtw89: fix timeout calculation in rtw89_roc_end()
Since 'rtw89_core_tx_kick_off_and_wait()' assumes timeout
(actually RTW89_ROC_TX_TIMEOUT) in milliseconds, I suppose
that RTW89_ROC_IDLE_TIMEOUT is in milliseconds as well. If
so, 'msecs_to_jiffies()' should be used in a call to
'ieee80211_queue_delayed_work()' from 'rtw89_roc_end()'.
Compile tested only.
Dmitry Antipov [Mon, 23 Oct 2023 09:17:06 +0000 (12:17 +0300)]
wifi: rtlwifi: rtl92ee_dm_dynamic_primary_cca_check(): fix typo in function name
For rtl8192ee, change 'rtl92ee_dm_dynamic_primary_cca_ckeck()'
to 'rtl92ee_dm_dynamic_primary_cca_check()' and so adjust
'rtl92ee_dm_watchdog()' as well. Compile tested only.
Dmitry Antipov [Mon, 23 Oct 2023 09:17:05 +0000 (12:17 +0300)]
wifi: rtlwifi: cleanup struct rtl_phy
Remove unused and read by otherwise unused 'h2c_box_num',
'rfpienable', 'reserve_0', 'reserve_1', 'iqk_in_progress',
'apk_done' and 'hw_rof_enable' fields of 'struct rtl_phy',
adjust related code. Compile tested only.
Dmitry Antipov [Mon, 23 Oct 2023 09:17:04 +0000 (12:17 +0300)]
wifi: rtlwifi: cleanup struct rtl_hal
Remove unused and set but otherwise unused 'bbrf_ready', 'external_pa',
'pa_mode', 'rx_tag', 'rts_en', 'wow_enable' and 'wow_enabled' fields
of 'struct rtl_hal', adjust related code. Compile tested only.
Justin Stitt [Tue, 17 Oct 2023 20:11:29 +0000 (20:11 +0000)]
wifi: brcmsmac: replace deprecated strncpy with memcpy
Let's move away from using strncpy and instead use the more obvious
interface for this context.
For wlc->pub->srom_ccode, we're just copying two bytes from ccode into
wlc->pub->srom_ccode with no expectation that srom_ccode be
NUL-terminated:
wlc->pub->srom_ccode is only used in regulatory_hint():
1193 | if (wl->pub->srom_ccode[0] &&
1194 | regulatory_hint(wl->wiphy, wl->pub->srom_ccode))
1195 | wiphy_err(wl->wiphy, "%s: regulatory hint failed\n", __func__);
We can see that only index 0 and index 1 are accessed.
3307 | int regulatory_hint(struct wiphy *wiphy, const char *alpha2)
3308 | {
... | ...
3322 | request->alpha2[0] = alpha2[0];
3323 | request->alpha2[1] = alpha2[1];
... | ...
3332 | }
Since this is just a simple byte copy with correct lengths, let's use
memcpy(). There should be no functional change.
In a similar boat, both wlc->country_default and
wlc->autocountry_default are just simple byte copies so let's use
memcpy. However, FWICT they aren't used anywhere. (they should be
used or removed -- not in scope of my patch, though).
Justin Stitt [Tue, 17 Oct 2023 20:11:28 +0000 (20:11 +0000)]
wifi: brcm80211: replace deprecated strncpy with strscpy
Let's move away from using strncpy and instead favor a less ambiguous
and more robust interface.
For ifp->ndev->name, we expect ifp->ndev->name to be NUL-terminated based
on its use in format strings within core.c:
67 | char *brcmf_ifname(struct brcmf_if *ifp)
68 | {
69 | if (!ifp)
70 | return "<if_null>";
71 |
72 | if (ifp->ndev)
73 | return ifp->ndev->name;
74 |
75 | return "<if_none>";
76 | }
...
288 | static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
289 | struct net_device *ndev) {
...
330 | brcmf_dbg(INFO, "%s: insufficient headroom (%d)\n",
331 | brcmf_ifname(ifp), head_delta);
...
336 | bphy_err(drvr, "%s: failed to expand headroom\n",
337 | brcmf_ifname(ifp));
For di->name, we expect di->name to be NUL-terminated based on its usage
with format strings:
| brcms_dbg_dma(di->core,
| "%s: DMA64 tx doesn't have AE set\n",
| di->name);
Looking at its allocation we can see that it is already zero-allocated
which means NUL-padding is not required:
| di = kzalloc(sizeof(struct dma_info), GFP_ATOMIC);
For wlc->modulecb[i].name, we expect each name in wlc->modulecb to be
NUL-terminated based on their usage with strcmp():
| if (!strcmp(wlc->modulecb[i].name, name) &&
NUL-padding is not required as wlc is zero-allocated in:
brcms_c_attach_malloc() ->
| wlc = kzalloc(sizeof(struct brcms_c_info), GFP_ATOMIC);
For all these cases, a suitable replacement is `strscpy` due to the fact
that it guarantees NUL-termination on the destination buffer without
unnecessarily NUL-padding.
====================
Intel Wired LAN Driver Updates for 2023-10-25 (ice)
This series extends the ice driver with basic support for the E830 device
line. It does not include support for all device features, but enables basic
functionality to load and pass traffic.
Alice adds the 200G speed and PHY types supported by E830 hardware.
Dan extends the DDP package logic to support the E830 package segment.
Paul adds the basic registers and macros used by E830 hardware, and adds
support for handling variable length link status information from firmware.
Pawel removes some redundant zeroing of the PCI IDs list, and extends the
list to include the E830 device IDs.
====================
Pawel Chmielewski [Wed, 25 Oct 2023 21:41:57 +0000 (14:41 -0700)]
ice: Hook up 4 E830 devices by adding their IDs
As the previous patches provide support for E830 hardware, add E830
specific IDs to the PCI device ID table, so these devices can now be
probed by the kernel.
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025214157.1222758-7-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Nowlin [Wed, 25 Oct 2023 21:41:55 +0000 (14:41 -0700)]
ice: Add support for E830 DDP package segment
Add support for E830 DDP package segment. For the E830 package,
signature buffers will not be included inline in the configuration
buffers. Instead, the signature buffers will be located in a
signature segment.
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Co-developed-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025214157.1222758-5-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Greenwalt [Wed, 25 Oct 2023 21:41:54 +0000 (14:41 -0700)]
ice: Add ice_get_link_status_datalen
The Get Link Status data length can vary with different versions of
ice_aqc_get_link_status_data. Add ice_get_link_status_datalen() to return
datalen for the specific ice_aqc_get_link_status_data version.
Add new link partner fields to ice_aqc_get_link_status_data; PHY type,
FEC, and flow control.
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025214157.1222758-4-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alice Michael [Wed, 25 Oct 2023 21:41:53 +0000 (14:41 -0700)]
ice: Add 200G speed/phy type use
Add the support for 200G phy speeds and the mapping for their
advertisement in link. Add the new PHY type bits for AQ command, as
needed for 200G E830 controllers.
Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025214157.1222758-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Greenwalt [Wed, 25 Oct 2023 21:41:52 +0000 (14:41 -0700)]
ice: Add E830 device IDs, MAC type and registers
E830 is the 200G NIC family which uses the ice driver.
Add specific E830 registers. Embed macros to use proper register based on
(hw)->mac_type & name those macros to [ORIGINAL]_BY_MAC(hw). Registers
only available on one of the macs will need to be explicitly referred to
as E800_NAME instead of just NAME. PTP is not yet supported.
Co-developed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Co-developed-by: Scott Taylor <scott.w.taylor@intel.com> Signed-off-by: Scott Taylor <scott.w.taylor@intel.com> Co-developed-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025214157.1222758-2-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We've added 51 non-merge commits during the last 10 day(s) which contain
a total of 75 files changed, 5037 insertions(+), 200 deletions(-).
The main changes are:
1) Add open-coded task, css_task and css iterator support.
One of the use cases is customizable OOM victim selection via BPF,
from Chuyi Zhou.
2) Fix BPF verifier's iterator convergence logic to use exact states
comparison for convergence checks, from Eduard Zingerman,
Andrii Nakryiko and Alexei Starovoitov.
3) Add BPF programmable net device where bpf_mprog defines the logic
of its xmit routine. It can operate in L3 and L2 mode,
from Daniel Borkmann and Nikolay Aleksandrov.
4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking
for global per-CPU allocator, from Hou Tao.
5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section
was going to be present whenever a binary has SHT_GNU_versym section,
from Andrii Nakryiko.
6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into
atomic_set_release(), from Paul E. McKenney.
7) Add a warning if NAPI callback missed xdp_do_flush() under
CONFIG_DEBUG_NET which helps checking if drivers were missing
the former, from Sebastian Andrzej Siewior.
8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing
a warning under sleepable programs, from Yafang Shao.
9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before
checking map_locked, from Song Liu.
10) Make BPF CI linked_list failure test more robust,
from Kumar Kartikeya Dwivedi.
11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik.
12) Fix xsk starving when multiple xsk sockets were associated with
a single xsk_buff_pool, from Albert Huang.
13) Clarify the signed modulo implementation for the BPF ISA standardization
document that it uses truncated division, from Dave Thaler.
14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider
signed bounds knowledge, from Andrii Nakryiko.
15) Add an option to XDP selftests to use multi-buffer AF_XDP
xdp_hw_metadata and mark used XDP programs as capable to use frags,
from Larysa Zaremba.
16) Fix bpftool's BTF dumper wrt printing a pointer value and another
one to fix struct_ops dump in an array, from Manu Bretelle.
* tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits)
netkit: Remove explicit active/peer ptr initialization
selftests/bpf: Fix selftests broken by mitigations=off
samples/bpf: Allow building with custom bpftool
samples/bpf: Fix passing LDFLAGS to libbpf
samples/bpf: Allow building with custom CFLAGS/LDFLAGS
bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free
selftests/bpf: Add selftests for netkit
selftests/bpf: Add netlink helper library
bpftool: Extend net dump with netkit progs
bpftool: Implement link show support for netkit
libbpf: Add link-based API for netkit
tools: Sync if_link uapi header
netkit, bpf: Add bpf programmable net device
bpf: Improve JEQ/JNE branch taken logic
bpf: Fold smp_mb__before_atomic() into atomic_set_release()
bpf: Fix unnecessary -EBUSY from htab_lock_bucket
xsk: Avoid starving the xsk further down the list
bpf: print full verifier states on infinite loop detection
selftests/bpf: test if state loops are detected in a tricky case
bpf: correct loop detection for iterators convergence
...
====================
Alexey Makhalov [Wed, 25 Oct 2023 23:19:31 +0000 (16:19 -0700)]
MAINTAINERS: Maintainer change for ptp_vmw driver
Deep has decided to transfer the maintainership of the VMware virtual
PTP clock driver (ptp_vmw) to Jeff. Update the MAINTAINERS file to
reflect this change.
Michael Chan [Thu, 26 Oct 2023 01:32:31 +0000 (18:32 -0700)]
bnxt_en: Fix 2 stray ethtool -S counters
The recent firmware interface change has added 2 counters in struct
rx_port_stats_ext. This caused 2 stray ethtool counters to be
displayed.
Since new counters are added from time to time, fix it so that the
ethtool logic will only display up to the maximum known counters.
These 2 counters are not used by production firmware yet.
Fixes: 754fbf604ff6 ("bnxt_en: Update firmware interface to 1.10.2.171") Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231026013231.53271-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 25 Oct 2023 18:27:39 +0000 (11:27 -0700)]
tools: ynl-gen: respect attr-cnt-name at the attr set level
Davide reports that we look for the attr-cnt-name in the wrong
object. We try to read it from the family, but the schema only
allows for it to exist at attr-set level.
Jakub Kicinski [Wed, 25 Oct 2023 16:22:53 +0000 (09:22 -0700)]
netlink: specs: support conditional operations
Page pool code is compiled conditionally, but the operations
are part of the shared netlink family. We can handle this
by reporting empty list of pools or -EOPNOTSUPP / -ENOSYS
but the cleanest way seems to be removing the ops completely
at compilation time. That way user can see that the page
pool ops are not present using genetlink introspection.
Same way they'd check if the kernel is "new enough" to
support the ops.
Extend the specs with the ability to specify the config
condition under which op (and its policies, etc.) should
be hidden.
Jakub Kicinski [Wed, 25 Oct 2023 16:22:04 +0000 (09:22 -0700)]
netlink: make range pointers in policies const
struct nla_policy is usually constant itself, but unless
we make the ranges inside constant we won't be able to
make range structs const. The ranges are not modified
by the core.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231025162204.132528-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs")
Linus Torvalds [Thu, 26 Oct 2023 17:41:27 +0000 (07:41 -1000)]
Merge tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi and netfilter.
Most regressions addressed here come from quite old versions, with the
exceptions of the iavf one and the WiFi fixes. No known outstanding
reports or investigation.
Fixes to fixes:
- eth: iavf: in iavf_down, disable queues when removing the driver
Previous releases - regressions:
- sched: act_ct: additional checks for outdated flows
- tcp: do not leave an empty skb in write queue
- tcp: fix wrong RTO timeout when received SACK reneging
- wifi: cfg80211: pass correct pointer to rdev_inform_bss()
- eth: i40e: sync next_to_clean and next_to_process for programming
status desc
- eth: iavf: initialize waitqueues before starting watchdog_task
Previous releases - always broken:
- eth: r8169: fix data-races
- eth: igb: fix potential memory leak in igb_add_ethtool_nfc_entry
- eth: r8152: avoid writing garbage to the adapter's registers
- eth: gtp: fix fragmentation needed check with gso"
* tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
iavf: in iavf_down, disable queues when removing the driver
vsock/virtio: initialize the_virtio_vsock before using VQs
net: ipv6: fix typo in comments
net: ipv4: fix typo in comments
net/sched: act_ct: additional checks for outdated flows
netfilter: flowtable: GC pushes back packets to classic path
i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR
gtp: fix fragmentation needed check with gso
gtp: uapi: fix GTPA_MAX
Fix NULL pointer dereference in cn_filter()
sfc: cleanup and reduce netlink error messages
net/handshake: fix file ref count in handshake_nl_accept_doit()
wifi: mac80211: don't drop all unprotected public action frames
wifi: cfg80211: fix assoc response warning on failed links
wifi: cfg80211: pass correct pointer to rdev_inform_bss()
isdn: mISDN: hfcsusb: Spelling fix in comment
tcp: fix wrong RTO timeout when received SACK reneging
r8152: Block future register access if register access fails
r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE
r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()
...
Yafang Shao [Wed, 25 Oct 2023 03:11:44 +0000 (03:11 +0000)]
selftests/bpf: Fix selftests broken by mitigations=off
When we configure the kernel command line with 'mitigations=off' and set
the sysctl knob 'kernel.unprivileged_bpf_disabled' to 0, the commit bc5bc309db45 ("bpf: Inherit system settings for CPU security mitigations")
causes issues in the execution of `test_progs -t verifier`. This is
because 'mitigations=off' bypasses Spectre v1 and Spectre v4 protections.
Currently, when a program requests to run in unprivileged mode
(kernel.unprivileged_bpf_disabled = 0), the BPF verifier may prevent
it from running due to the following conditions not being enabled:
While 'mitigations=off' enables the first two conditions, it does not
enable the latter two. As a result, some test cases in
'test_progs -t verifier' that were expected to fail to run may run
successfully, while others still fail but with different error messages.
This makes it challenging to address them comprehensively.
Moreover, in the future, we may introduce more fine-grained control over
CPU mitigations, such as enabling only bypass_spec_v1 or bypass_spec_v4.
Given the complexity of the situation, rather than fixing each broken test
case individually, it's preferable to skip them when 'mitigations=off' is
in effect and introduce specific test cases for the new 'mitigations=off'
scenario. For instance, we can introduce new BTF declaration tags like
'__failure__nospec', '__failure_nospecv1' and '__failure_nospecv4'.
In this patch, the approach is to simply skip the broken test cases when
'mitigations=off' is enabled. The result of `test_progs -t verifier` as
follows after this commit,
Viktor Malik [Wed, 25 Oct 2023 06:19:14 +0000 (08:19 +0200)]
samples/bpf: Allow building with custom bpftool
samples/bpf build its own bpftool boostrap to generate vmlinux.h as well
as some BPF objects. This is a redundant step if bpftool has been
already built, so update samples/bpf/Makefile such that it accepts a
path to bpftool passed via the BPFTOOL variable. The approach is
practically the same as tools/testing/selftests/bpf/Makefile uses.
Viktor Malik [Wed, 25 Oct 2023 06:19:13 +0000 (08:19 +0200)]
samples/bpf: Fix passing LDFLAGS to libbpf
samples/bpf/Makefile passes LDFLAGS=$(TPROGS_LDFLAGS) to libbpf build
without surrounding quotes, which may cause compilation errors when
passing custom TPROGS_USER_LDFLAGS.
Viktor Malik [Wed, 25 Oct 2023 06:19:12 +0000 (08:19 +0200)]
samples/bpf: Allow building with custom CFLAGS/LDFLAGS
Currently, it is not possible to specify custom flags when building
samples/bpf. The flags are defined in TPROGS_CFLAGS/TPROGS_LDFLAGS
variables, however, when trying to override those from the make command,
compilation fails.
For example, when trying to build with PIE:
$ make -C samples/bpf TPROGS_CFLAGS="-fpie" TPROGS_LDFLAGS="-pie"
This is because samples/bpf/Makefile updates these variables, especially
appends include paths to TPROGS_CFLAGS and these updates are overridden
by setting the variables from the make command.
This patch introduces variables TPROGS_USER_CFLAGS/TPROGS_USER_LDFLAGS
for this purpose, which can be set from the make command and their
values are propagated to TPROGS_CFLAGS/TPROGS_LDFLAGS.
Beniamino Galvani [Wed, 25 Oct 2023 09:44:41 +0000 (11:44 +0200)]
bareudp: use ports to lookup route
The source and destination ports should be taken into account when
determining the route destination; they can affect the result, for
example in case there are routing rules defined.
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231025094441.417464-1-b.galvani@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 26 Oct 2023 10:20:35 +0000 (12:20 +0200)]
Merge tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next. Mostly
nf_tables updates with two patches for connlabel and br_netfilter.
1) Rename function name to perform on-demand GC for rbtree elements,
and replace async GC in rbtree by sync GC. Patches from Florian Westphal.
2) Use commit_mutex for NFT_MSG_GETRULE_RESET to ensure that two
concurrent threads invoking this command do not underrun stateful
objects. Patches from Phil Sutter.
3) Use single hook to deal with IP and ARP packets in br_netfilter.
Patch from Florian Westphal.
4) Use atomic_t in netns->connlabel use counter instead of using a
spinlock, also patch from Florian.
5) Cleanups for stateful objects infrastructure in nf_tables.
Patches from Phil Sutter.
6) Flush path uses opaque set element offered by the iterator, instead of
calling pipapo_deactivate() which looks up for it again.
7) Set backend .flush interface always succeeds, make it return void
instead.
8) Add struct nft_elem_priv placeholder structure and use it by replacing
void * to pass opaque set element representation from backend to frontend
which defeats compiler type checks.
9) Shrink memory consumption of set element transactions, by reducing
struct nft_trans_elem object size and reducing stack memory usage.
10) Use struct nft_elem_priv also for set backend .insert operation too.
11) Carry reset flag in nft_set_dump_ctx structure, instead of passing it
as a function argument, from Phil Sutter.
netfilter pull request 23-10-25
* tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx
netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST
netfilter: nf_tables: shrink memory consumption of set elements
netfilter: nf_tables: expose opaque set element as struct nft_elem_priv
netfilter: nf_tables: set backend .flush always succeeds
netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush
netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx
netfilter: nf_tables: nft_obj_filter fits into cb->ctx
netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx
netfilter: nf_tables: A better name for nft_obj_filter
netfilter: nf_tables: Unconditionally allocate nft_obj_filter
netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj
netfilter: conntrack: switch connlabels to atomic_t
br_netfilter: use single forward hook for ip and arp
netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests
netfilter: nf_tables: Introduce nf_tables_getrule_single()
netfilter: nf_tables: Open-code audit log call in nf_tables_getrule()
netfilter: nft_set_rbtree: prefer sync gc to async worker
netfilter: nft_set_rbtree: rename gc deactivate+erase function
====================
Alex Henrie [Tue, 24 Oct 2023 21:23:08 +0000 (15:23 -0600)]
net: ipv6/addrconf: clamp preferred_lft to the minimum required
If the preferred lifetime was less than the minimum required lifetime,
ipv6_create_tempaddr would error out without creating any new address.
On my machine and network, this error happened immediately with the
preferred lifetime set to 1 second, after a few minutes with the
preferred lifetime set to 4 seconds, and not at all with the preferred
lifetime set to 5 seconds. During my investigation, I found a Stack
Exchange post from another person who seems to have had the same
problem: They stopped getting new addresses if they lowered the
preferred lifetime below 3 seconds, and they didn't really know why.
The preferred lifetime is a preference, not a hard requirement. The
kernel does not strictly forbid new connections on a deprecated address,
nor does it guarantee that the address will be disposed of the instant
its total valid lifetime expires. So rather than disable IPv6 privacy
extensions altogether if the minimum required lifetime swells above the
preferred lifetime, it is more in keeping with the user's intent to
increase the temporary address's lifetime to the minimum necessary for
the current network conditions.
With these fixes, setting the preferred lifetime to 3 or 4 seconds "just
works" because the extra fraction of a second is practically
unnoticeable. It's even possible to reduce the time before deprecation
to 1 or 2 seconds by also disabling duplicate address detection (setting
/proc/sys/net/ipv6/conf/*/dad_transmits to 0). I realize that that is a
pretty niche use case, but I know at least one person who would gladly
sacrifice performance and convenience to be sure that they are getting
the maximum possible level of privacy.
Alex Henrie [Tue, 24 Oct 2023 21:23:07 +0000 (15:23 -0600)]
net: ipv6/addrconf: clamp preferred_lft to the maximum allowed
Without this patch, there is nothing to stop the preferred lifetime of a
temporary address from being greater than its valid lifetime. If that
was the case, the valid lifetime was effectively ignored.
====================
ipv6: avoid atomic fragment on GSO output
When the ipv6 stack output a GSO packet, if its gso_size is larger than
dst MTU, then all segments would be fragmented. However, it is possible
for a GSO packet to have a trailing segment with smaller actual size
than both gso_size as well as the MTU, which leads to an "atomic
fragment". Atomic fragments are considered harmful in RFC-8021. An
Existing report from APNIC also shows that atomic fragments are more
likely to be dropped even it is equivalent to a no-op [1].
The series contains following changes:
* drop feature RTAX_FEATURE_ALLFRAG, which has been broken. This helps
simplifying other changes in this set.
* refactor __ip6_finish_output code to separate GSO and non-GSO packet
processing, mirroring IPv4 side logic.
* avoid generating atomic fragment on GSO packets.
Yan Zhai [Tue, 24 Oct 2023 14:26:40 +0000 (07:26 -0700)]
ipv6: avoid atomic fragment on GSO packets
When the ipv6 stack output a GSO packet, if its gso_size is larger than
dst MTU, then all segments would be fragmented. However, it is possible
for a GSO packet to have a trailing segment with smaller actual size
than both gso_size as well as the MTU, which leads to an "atomic
fragment". Atomic fragments are considered harmful in RFC-8021. An
Existing report from APNIC also shows that atomic fragments are more
likely to be dropped even it is equivalent to a no-op [1].
Add an extra check in the GSO slow output path. For each segment from
the original over-sized packet, if it fits with the path MTU, then avoid
generating an atomic fragment.
Yan Zhai [Tue, 24 Oct 2023 14:26:37 +0000 (07:26 -0700)]
ipv6: refactor ip6_finish_output for GSO handling
Separate GSO and non-GSO packets handling to make the logic cleaner. For
GSO packets, frag_max_size check can be omitted because it is only
useful for packets defragmented by netfilter hooks. Both local output
and GRO logic won't produce GSO packets when defragment is needed. This
also mirrors what IPv4 side code is doing.
The feature would send packets to the fragmentation path if a box
receives a PMTU value with less than 1280 byte. However, since commit 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such
message would be simply discarded. The feature flag is neither supported
in iproute2 utility. In theory one can still manipulate it with direct
netlink message, but it is not ideal because it was based on obsoleted
guidance of RFC-2460 (replaced by RFC-8200).
The feature would always test false at the moment, so remove related
code or mark them as unused.
Michal Schmidt [Wed, 25 Oct 2023 18:32:13 +0000 (11:32 -0700)]
iavf: in iavf_down, disable queues when removing the driver
In iavf_down, we're skipping the scheduling of certain operations if
the driver is being removed. However, the IAVF_FLAG_AQ_DISABLE_QUEUES
request must not be skipped in this case, because iavf_close waits
for the transition to the __IAVF_DOWN state, which happens in
iavf_virtchnl_completion after the queues are released.
Without this fix, "rmmod iavf" takes half a second per interface that's
up and prints the "Device resources not yet released" warning.
Fixes: c8de44b577eb ("iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is set") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025183213.874283-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 25 Oct 2023 23:02:06 +0000 (16:02 -0700)]
Merge tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This patch contains two late Netfilter's flowtable fixes for net:
1) Flowtable GC pushes back packets to classic path in every GC run,
ie. every second. This is because NF_FLOW_HW_ESTABLISHED is only
used by sched/act_ct (never set) and IPS_SEEN_REPLY might be unset
by the time the flow is offloaded (this status bit is only reliable
in the sched/act_ct datapath).
2) sched/act_ct logic to push back packets to classic path to reevaluate
if UDP flow is unidirectional only applies if IPS_HW_OFFLOAD_BIT is
set on and no hardware offload request is pending to be handled.
From Vlad Buslov.
These two patches fixes two problems that were introduced in the
previous 6.5 development cycle.
* tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
net/sched: act_ct: additional checks for outdated flows
netfilter: flowtable: GC pushes back packets to classic path
====================