Alex Elder [Fri, 26 Mar 2021 15:11:17 +0000 (10:11 -0500)]
net: ipa: move ipa_resource_type definition
Most platforms have the same set of source and destination resource
types. But some older platforms have some additional ones, and it's
possible different resources will be used in the future.
Move the definition of the ipa_resource_type enumerated type so it
is defined for each platform in its configuration data file. This
permits each to have a distinct set of resources.
Shorten the data files slightly, by putting the min and max limit
values on the same line.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 26 Mar 2021 15:11:16 +0000 (10:11 -0500)]
net: ipa: index resource limits with type
Remove the type field from the ipa_resource_src and ipa_resource_dst
structures, and instead use that value as the index into the arrays
of source and destination resources.
Change ipa_resource_config_src() and ipa_resource_config_dst() so
the resource type is passed in as an argument.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 26 Mar 2021 15:11:15 +0000 (10:11 -0500)]
net: ipa: combine resource type definitions
Combine the ipa_resource_type_src and ipa_resource_type_dst
enumerated types into a single enumerated type, ipa_resource_type.
Assign value 0 to the first element for the source and destination
types, so their numeric values are preserved. Add some additional
commentary where these are defined, stating explicitly that code
assumes the first source and first destination member must have
numeric value 0.
Fix the kerneldoc comments for the ipa_gsi_endpoint_data structure.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 26 Mar 2021 15:11:14 +0000 (10:11 -0500)]
net: ipa: add some missing resource limits
Currently, the SDM845 configuration data defines resource limits for
the first two resource groups (for both source and destination
resource types). The hardware supports additional resource groups,
and we should program the resource limits for those groups as well.
Even the "unused" destination resource group (number 2) should have
non-zero limits programmed in some cases, to ensure correct operation.
Add these missing resource group limit definitions to the SDM845
configuration data.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 26 Mar 2021 15:11:13 +0000 (10:11 -0500)]
net: ipa: identify resource groups
Define a new ipa_resource_group_id enumerated type, whose members
have numeric values that match the resource group number used when
programming the hardware. Each platform supports a different number
of source and destination resource groups, so define the type
separately for each platform in its configuration data file.
Use these new symbolic values when specifying the resource group an
endpoint is associated with. And use them to index the limits
arrays for source and destination resources, making it clearer how
these values are used.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 26 Mar 2021 15:11:12 +0000 (10:11 -0500)]
net: ipa: fix bug in resource group limit programming
If the number of resource groups supported by the hardware is less
than a certain number, we return early in ipa_resource_config_src()
and ipa_resource_config_dst() (to avoid programming resource limits
for non-existent groups).
Unfortunately, these checks are off by one. Fix this problem in the
four places it occurs.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Guojia Liao [Fri, 26 Mar 2021 01:36:28 +0000 (09:36 +0800)]
net: hns3: split out hclge_tm_vport_tc_info_update()
hclge_tm_vport_tc_info_update() is bloated, so split it into
separate functions for readability and maintainability.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Fri, 26 Mar 2021 01:36:27 +0000 (09:36 +0800)]
net: hns3: split function hclge_reset_rebuild()
hclge_reset_rebuild() is a bit too long. So add a new function
hclge_update_reset_level() to improve readability.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 26 Mar 2021 01:36:23 +0000 (09:36 +0800)]
net: hns3: remove unused parameter from hclge_set_vf_vlan_common()
Parameter vf in hclge_set_vf_vlan_common() is unused now,
so remove it.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiaran Zhang [Fri, 26 Mar 2021 01:36:22 +0000 (09:36 +0800)]
net: hns3: remove redundant query in hclge_config_tm_hw_err_int()
According to the HW manual, the query operation is unnecessary
when the TM QCN error event is enabled, so remove it.
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 26 Mar 2021 01:36:21 +0000 (09:36 +0800)]
net: hns3: remove redundant blank lines
Remove some redundant blank lines.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 26 Mar 2021 01:36:20 +0000 (09:36 +0800)]
net: hns3: remove unused code of vmdq
Vmdq is not supported yet, the num_vmdq_vport is always 0,
it's a bit confusing when using the num_vport, so remove
these unused codes of vmdq.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 26 Mar 2021 21:50:34 +0000 (14:50 -0700)]
Merge tag 'mlx5-updates-2021-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-03-24
mlx5e netdev driver updates:
1) Some cleanups from Colin, Tariq and Saeed.
2) Aya made some trivial refactoring to cleanup and generalize
PTP and RQ (Receive Queue) creation and management.
Mostly code decoupling and reducing dependencies between the different
RX objects in the netdev driver.
This is a preparation series for upcoming PTP special RQ creation which
will allow coexistence of CQE compression (important performance feature,
especially in Multihost systems) and HW TS PTP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Aya Levin [Sun, 17 Jan 2021 13:25:27 +0000 (15:25 +0200)]
net/mlx5e: Cleanup PTP
Reduce scope of mlx5e_ptp_params, move to its c file. Remove unneeded
variables from mlx5e_ptp_open and state bitmap from PTP channel. In
addition, remove channel index from PTP channel since it is set to a
hard coded value, use define instead.
Aya Levin [Sun, 7 Mar 2021 13:41:27 +0000 (15:41 +0200)]
net/mlx5e: Generalize PTP implementation
Following patches in the set add support for RX PTP. Rename PTP prefix
from %s/port_ptp/ptp/g to include RX PTP too.
In addition rename indication (used in statistics context) that PTP-SQ
was opened: %s/port_ptp_opened/tx_ptp_opened/g. This will simplify adding
indication that PTP-RQ was opened.
Aya Levin [Thu, 25 Feb 2021 15:46:25 +0000 (17:46 +0200)]
net/mlx5e: Generalize direct-TIRs and direct-RQTs API
Add input parameter indicating the size of direct-TIRs/direct-RQTs array
to be created/destroyed. This allows next patches in the patch-set to
handle a single direct-TIR pointing to a direct-RQT with a single entry.
Aya Levin [Mon, 8 Feb 2021 18:56:02 +0000 (20:56 +0200)]
net/mlx5e: Generalize close RQ
Allow different flavours of RQ to use the same close flow. Add validity
checks to support different RQ types which not necessarily initialize
all the RQ's functionality.
Aya Levin [Mon, 8 Feb 2021 16:25:56 +0000 (18:25 +0200)]
net/mlx5e: Generalize RQ activation
Support RQ activation for RQs without an ICOSQ in the main flow, like
existing trap-RQ and like PTP-RQ that will be introduced in the coming
patches in the patchset.
With this patch, remove the wrapper in traps to deactivate the trap-RQ.
Aya Levin [Sun, 7 Mar 2021 13:29:53 +0000 (15:29 +0200)]
net/mlx5e: Generalize open RQ
Unify RQ creation for different RQ types. For each RQ type add a
separate open helper which initializes the RQ specific values and
trigger a call for generic open RQ function. Avoid passing the
mlx5e_channel pointer to the generic open RQ as a container, since the
RQ may reside under a different type of channel.
Aya Levin [Mon, 8 Feb 2021 14:00:36 +0000 (16:00 +0200)]
net/mlx5e: Allow creating mpwqe info without channel
Change the signature of mlx5e_rq_alloc_mpwqe_info from receiving channel
pointer to receive the NUMA node. This allows creating mpwqe_info in
context of different channels types.
Tariq Toukan [Wed, 10 Mar 2021 12:46:59 +0000 (14:46 +0200)]
net/mlx5e: Restrict usage of mlx5e_priv in params logic functions
Do not use generic struct mlx5e_priv as a parameter to param
functions, as it is too generic. All calculations of the channel's
param should be mainly based on struct mlx5_core_dev and
struct mlx5e_params. Additional info can be explicitly passed.
Tariq Toukan [Sun, 7 Mar 2021 13:13:23 +0000 (15:13 +0200)]
net/mlx5e: Pass q_counter indentifier as parameter to rq_param builders
Pass q_counter idintifier, instead of reading it from mlx5e_priv
parameter.
This is a step towards removing the mlx5e_priv parameter from all
params function and logic in the next patches of the series.
Pablo Neira Ayuso [Thu, 25 Mar 2021 21:10:16 +0000 (22:10 +0100)]
docs: nf_flowtable: fix compilation and warnings
... cannot be used in block quote, it breaks compilation, remove it.
Fix warnings due to missing blank line such as:
net-next/Documentation/networking/nf_flowtable.rst:142: WARNING: Block quote ends without a blank line; unexpected unindent.
Fixes: 143490cde566 ("docs: nf_flowtable: update documentation with enhancements") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 25 Mar 2021 18:08:17 +0000 (11:08 -0700)]
tcp: convert elligible sysctls to u8
Many tcp sysctls are either bools or small ints that can fit into u8.
Reducing space taken by sysctls can save few cache line misses
when sending/receiving data while cpu caches are empty,
for example after cpu idle period.
This is hard to measure with typical network performance tests,
but after this patch, struct netns_ipv4 has shrunk
by three cache lines.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset adds support for multi MSI interrupts in addition to
current single common interrupt implementation. Each MSI interrupt is tied
to a newly introduce interrupt service routine(ISR). Hence, each interrupt
will only go through the corresponding ISR.
In order to increase the efficiency, enabling multi MSI interrupt will
automatically select the interrupt mode configuration INTM=1. When INTM=1,
the TX/RX transfer complete signal will only asserted on corresponding
sbd_perch_tx_intr_o[] or sbd_perch_rx_intr_o[] without asserting signal
on the common sbd_intr_o. Hence, for each TX/RX interrupts, only the
corresponding ISR will be triggered.
Every vendor might have different MSI vector assignment. So, this patchset
only includes multi-vector MSI assignment for Intel platform.
Changes:
v1 -> v2
patch 2/5
-Remove defensive check for invalid dev pointer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wong, Vee Khee [Thu, 25 Mar 2021 17:39:16 +0000 (01:39 +0800)]
net: stmmac: use interrupt mode INTM=1 for multi-MSI
For interrupt mode INTM=0, TX/RX transfer complete will trigger signal
not only on sbd_perch_[tx|rx]_intr_o (Transmit/Receive Per Channel) but
also on the sbd_intr_o (Common).
As for multi-MSI implementation, setting interrupt mode INTM=1 is more
efficient as each TX intr and RX intr (TI/RI) will be handled by TX/RX ISR
without the need of calling the common MAC ISR.
Updated the TX/RX NORMAL interrupts status checking process as the
NIS status bit is not asserted for any RI/TI events for INTM=1.
Signed-off-by: Wong, Vee Khee <vee.khee.wong@intel.com> Co-developed-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
During probe(), the driver will starts with request allocation for
multi-vector interrupts. If it fails, then it will automatically fallback
to request allocation for single interrupts.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Co-developed-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Now we introduce MSI interrupt service routines and hook these routines
up if stmmac_open() sees valid irq line being requested:-
stmmac_mac_interrupt() :- MAC (dev->irq), WOL (wol_irq), LPI (lpi_irq)
stmmac_safety_interrupt() :- Safety Feat Correctible Error (sfty_ce_irq)
& Uncorrectible Error (sfty_ue_irq)
stmmac_msi_intr_rx() :- For all RX MSI irq (rx_irq)
stmmac_msi_intr_tx() :- For all TX MSI irq (tx_irq)
Each of IRQs will have its unique name so that we can differentiate
them easily under /proc/interrupts.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Thu, 25 Mar 2021 17:39:13 +0000 (01:39 +0800)]
net: stmmac: make stmmac_interrupt() function more friendly to MSI
Refactor stmmac_interrupt() by introducing stmmac_common_interrupt()
so that we prepare the ISR operation to be friendly to MSI later.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Thu, 25 Mar 2021 17:39:12 +0000 (01:39 +0800)]
net: stmmac: introduce DMA interrupt status masking per traffic direction
In preparation to make stmmac support multi-vector MSI, we introduce the
interrupt status masking according to RX, TX or RXTX. Default to use RXTX
inside stmmac_dma_interrupt(), so there is no run-time logic difference
now.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov [Thu, 25 Mar 2021 14:52:45 +0000 (15:52 +0100)]
net: change netdev_unregister_timeout_secs min value to 1
netdev_unregister_timeout_secs=0 can lead to printing the
"waiting for dev to become free" message every jiffy.
This is too frequent and unnecessary.
Set the min value to 1 second.
Also fix the merge issue introduced by
"net: make unregister netdev warning timeout configurable":
it changed "refcnt != 1" to "refcnt".
Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable") Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 26 Mar 2021 00:22:30 +0000 (17:22 -0700)]
Merge branch 'ipa-reg-versions'
Alex Elder says:
====================
net: ipa: update registers for other versions
This series updates IPA and GSI register definitions to permit more
versions of IPA hardware to be supported. Most of the updates are
informational, updating comments to indicate which IPA versions
support each register and field. But some registers are new and
others are deprecated. In a few cases register fields are laid
out differently, and in these cases the changes are a little more
substantive.
I won't claim the result is 100% correct, but it's close, and should
allow all IPA versions 3.x through 4.x to be supported by the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 25 Mar 2021 14:44:37 +0000 (09:44 -0500)]
net: ipa: expand GSI channel types
IPA v4.5 (GSI v2.5) supports a larger set of channel protocols, and
adds an additional field to hold the most-significant bits of the
protocol identifier on a channel.
Add an inline function that encodes the protocol (including the
extra bits for newer versions of IPA), and define some additional
protocols. At this point we still use only GPI protocol.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 25 Mar 2021 14:44:36 +0000 (09:44 -0500)]
net: ipa: update GSI ring size registers
Each GSI channel has a CNTXT_1 register that encodes the size of its
ring buffer. The size of the field that records that is increased
starting at IPA v4.9. Replace the use of a fixed-size field mask
with a new inline function that encodes that size value.
Similarly, the size of GSI event rings can be larger starting with
IPA v4.9, so create a function to encode that as well.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 25 Mar 2021 14:44:35 +0000 (09:44 -0500)]
net: ipa: GSI register cleanup
The main purpose of this is to extend these GSI register definitions
to support additional IPA versions.
This patch makes some minor updates to "gsi_reg.h":
- Define a DB_IN_BYTES field in the channel QOS register
- Add some comments clarifying when certain fields are valid
- Add the definition of GSI_CH_DB_STOP channel command
- Add a couple of blank lines
- Move one comment and indent another
- Delete two unused register definitions at the end.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 25 Mar 2021 14:44:34 +0000 (09:44 -0500)]
net: ipa: support IPA interrupt addresses for IPA v4.7
Starting with IPA v4.7, registers related to IPA interrupts are
located at a fixed offset 0x1000 above than the addresses used for
earlier versions. Define and use functions to provide the offset to
use for these registers based on IPA version.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 25 Mar 2021 14:44:32 +0000 (09:44 -0500)]
net: ipa: update IPA register comments
Add and update IPA register definitions. Extend these definitions
to incorporate a fairly small number of new symbols (register
offsets and fields) to support IPA v3.0, v3.1, v3.5, v4.0, v4.1,
v4.7, 4.9, and v4.11, and have the comments reflect when they are
valid. None of the added symbols require changes elsewhere in the
code.
Update rsrc_grp_encoded() to support these other IPA versions.
Add kerneldoc comments for the IPA IRQ numbers and sequencer type.
Fix a few spots where the version check should be less restrictive
(missed by an earlier patch).
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/mediatek/mtk_ppe_debugfs.c:80:9: warning:
variable 'count' set but not used [-Wunused-but-set-variable]
80 | int i, count;
| ^~~~~
This variable is not used in function , this commit
remove it to fix the warning.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Qiheng Lin <linqiheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 26 Mar 2021 00:07:58 +0000 (17:07 -0700)]
Merge branch 'gve-cleanups'
Daode Huang says:
====================
net: gve: make cleanup for gve
This patch set replace deprecated strlcpy by strscpy, remove
repeat word "allowed" in gve driver.
for more details, please refer to each patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daode Huang [Thu, 25 Mar 2021 07:56:32 +0000 (15:56 +0800)]
net: gve: remove duplicated allowed
fix the WARNING of Possible repeated word: 'allowed'
Signed-off-by: Daode Huang <huangdaode@huawei.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daode Huang <huangdaode@huawei.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wan Jiabing [Thu, 25 Mar 2021 06:35:55 +0000 (14:35 +0800)]
drivers: net: ethernet: struct sk_buff is declared duplicately
struct sk_buff has been declared. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Thu, 25 Mar 2021 03:29:32 +0000 (11:29 +0800)]
net: bcmgenet: remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Thu, 25 Mar 2021 03:29:28 +0000 (11:29 +0800)]
qede: remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Thu, 25 Mar 2021 02:51:08 +0000 (10:51 +0800)]
net: usb: lan78xx: remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Mar 2021 23:46:53 +0000 (16:46 -0700)]
Merge branch 'ethtool-FEC'
Jakub Kicinski says:
====================
ethtool: clarify the ethtool FEC interface
Our FEC configuration interface is one of the more confusing.
It also lacks any error checking in the core. This certainly
shows in the varying implementations across the drivers.
Improve the documentation and add most basic checks. Sadly, it's
probably too late now to try to enforce much more uniformity.
Any thoughts & suggestions welcome. Next step is to add netlink
for FEC, then stats.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 25 Mar 2021 01:12:00 +0000 (18:12 -0700)]
ethtool: clarify the ethtool FEC interface
The definition of the FEC driver interface is quite unclear.
Improve the documentation.
This is based on current driver and user space code, as well
as the discussions about the interface:
RFC v1 (24 Oct 2016): https://lore.kernel.org/netdev/1477363849-36517-1-git-send-email-vidya@cumulusnetworks.com/
- this version has the autoneg field
- no active_fec field
- none vs off confusion is already present
Jakub Kicinski [Thu, 25 Mar 2021 01:11:59 +0000 (18:11 -0700)]
ethtool: fec: sanitize ethtool_fecparam->fec
Reject NONE on set, this mode means device does not support
FEC so it's a little out of place in the set interface.
This should be safe to do - user space ethtool does not allow
the use of NONE on set. A few drivers treat it the same as OFF,
but none use it instead of OFF.
Similarly reject an empty FEC mask. The common user space tool
will not send such requests and most drivers correctly reject
it already.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
struct ethtool_fecparam::active_fec is a GET-only field,
all in-tree drivers correctly ignore it on SET. Clear
the field on SET to avoid any confusion. Again, we can't
reject non-zero now since ethtool user space does not
zero-init the param correctly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 25 Mar 2021 01:11:57 +0000 (18:11 -0700)]
ethtool: fec: sanitize ethtool_fecparam->reserved
struct ethtool_fecparam::reserved is never looked at by the core.
Make sure it's actually 0. Unfortunately we can't return an error
because old ethtool doesn't zero-initialize the structure for SET.
On GET we can be more verbose, there are no in tree (ab)users.
Fix up the kdoc on the structure. Remove the mention of FEC
bypass. Seems like a niche thing to configure in the first
place.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 25 Mar 2021 01:11:56 +0000 (18:11 -0700)]
ethtool: fec: remove long structure description
Digging through the mailing list archive @autoneg was part
of the first version of the RFC, this left over comment was
pointed out twice in review but wasn't removed.
The sentence is an exact copy-paste from pauseparam.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 25 Mar 2021 18:43:43 +0000 (11:43 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: mm (hugetlb, kasan, gup,
selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64,
gcov, and mailmap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mailmap: update Andrey Konovalov's email address
mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
mm: memblock: fix section mismatch warning again
kfence: make compatible with kmemleak
gcov: fix clang-11+ support
ia64: fix format strings for err_inject
ia64: mca: allocate early mca with GFP_ATOMIC
squashfs: fix xattr id and id lookup sanity checks
squashfs: fix inode lookup sanity checks
z3fold: prevent reclaim/free race for headless pages
selftests/vm: fix out-of-tree build
mm/mmu_notifiers: ensure range_end() is paired with range_start()
kasan: fix per-page tags for non-page_alloc pages
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
Linus Torvalds [Thu, 25 Mar 2021 18:23:35 +0000 (11:23 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Not much going on, just some small bug fixes:
- Typo causing a regression in mlx5 devx
- Regression in the recent hns rework causing the HW to get out of
sync
- Long-standing cxgb4 adaptor crash when destroying cm ids"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
RDMA/hns: Fix bug during CMDQ initialization
RDMA/mlx5: Fix typo in destroy_mkey inbox
Linus Torvalds [Thu, 25 Mar 2021 18:07:40 +0000 (11:07 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Minor fixes all over, ranging from typos to tests to errata
workarounds:
- Fix possible memory hotplug failure with KASLR
- Fix FFR value in SVE kselftest
- Fix backtraces reported in /proc/$pid/stack
- Disable broken CnP implementation on NVIDIA Carmel
- Typo fixes and ACPI documentation clarification
- Fix some W=1 warnings"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kernel: disable CNP on Carmel
arm64/process.c: fix Wmissing-prototypes build warnings
kselftest/arm64: sve: Do not use non-canonical FFR register value
arm64: mm: correct the inside linear map range during hotplug check
arm64: kdump: update ppos when reading elfcorehdr
arm64: cpuinfo: Fix a typo
Documentation: arm64/acpi : clarify arm64 support of IBFT
arm64: stacktrace: don't trace arch_stack_walk()
arm64: csum: cast to the proper type
Ira Weiny [Thu, 25 Mar 2021 04:37:53 +0000 (21:37 -0700)]
mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
The kernel test robot found that __kmap_local_sched_out() was not
correctly skipping the guard pages when DEBUG_KMAP_LOCAL_FORCE_MAP was
set.[1] This was due to DEBUG_HIGHMEM check being used.
Mike Rapoport [Thu, 25 Mar 2021 04:37:50 +0000 (21:37 -0700)]
mm: memblock: fix section mismatch warning again
Commit 34dc2efb39a2 ("memblock: fix section mismatch warning") marked
memblock_bottom_up() and memblock_set_bottom_up() as __init, but they
could be referenced from non-init functions like
memblock_find_in_range_node() on architectures that enable
CONFIG_ARCH_KEEP_MEMBLOCK.
For such builds kernel test robot reports:
WARNING: modpost: vmlinux.o(.text+0x74fea4): Section mismatch in reference from the function memblock_find_in_range_node() to the function .init.text:memblock_bottom_up()
The function memblock_find_in_range_node() references the function __init memblock_bottom_up().
This is often because memblock_find_in_range_node lacks a __init annotation or the annotation of memblock_bottom_up is wrong.
Replace __init annotations with __init_memblock annotations so that the
appropriate section will be selected depending on
CONFIG_ARCH_KEEP_MEMBLOCK.
Link: https://lore.kernel.org/lkml/202103160133.UzhgY0wt-lkp@intel.com Link: https://lkml.kernel.org/r/20210316171347.14084-1-rppt@kernel.org Fixes: 34dc2efb39a2 ("memblock: fix section mismatch warning") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marco Elver [Thu, 25 Mar 2021 04:37:47 +0000 (21:37 -0700)]
kfence: make compatible with kmemleak
Because memblock allocations are registered with kmemleak, the KFENCE
pool was seen by kmemleak as one large object. Later allocations
through kfence_alloc() that were registered with kmemleak via
slab_post_alloc_hook() would then overlap and trigger a warning.
Therefore, once the pool is initialized, we can remove (free) it from
kmemleak again, since it should be treated as allocator-internal and be
seen as "free memory".
The second problem is that kmemleak is passed the rounded size, and not
the originally requested size, which is also the size of KFENCE objects.
To avoid kmemleak scanning past the end of an object and trigger a
KFENCE out-of-bounds error, fix the size if it is a KFENCE object.
For simplicity, to avoid a call to kfence_ksize() in
slab_post_alloc_hook() (and avoid new IS_ENABLED(CONFIG_DEBUG_KMEMLEAK)
guard), just call kfence_ksize() in mm/kmemleak.c:create_object().
Link: https://lkml.kernel.org/r/20210317084740.3099921-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reported-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Luis Henriques <lhenriques@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Desaulniers [Thu, 25 Mar 2021 04:37:44 +0000 (21:37 -0700)]
gcov: fix clang-11+ support
LLVM changed the expected function signatures for llvm_gcda_start_file()
and llvm_gcda_emit_function() in the clang-11 release. Users of
clang-11 or newer may have noticed their kernels failing to boot due to
a panic when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y.
Fix up the function signatures so calling these functions doesn't panic
the kernel.
Sergei Trofimovich [Thu, 25 Mar 2021 04:37:41 +0000 (21:37 -0700)]
ia64: fix format strings for err_inject
Fix warning with %lx / u64 mismatch:
arch/ia64/kernel/err_inject.c: In function 'show_resources':
arch/ia64/kernel/err_inject.c:62:22: warning:
format '%lx' expects argument of type 'long unsigned int',
but argument 3 has type 'u64' {aka 'long long unsigned int'}
62 | return sprintf(buf, "%lx", name[cpu]); \
| ^~~~~~~
Sergei Trofimovich [Thu, 25 Mar 2021 04:37:38 +0000 (21:37 -0700)]
ia64: mca: allocate early mca with GFP_ATOMIC
The sleep warning happens at early boot right at secondary CPU
activation bootup:
smp: Bringing up secondary CPUs ...
BUG: sleeping function called from invalid context at mm/page_alloc.c:4942
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-rc2-00007-g79e228d0b611-dirty #99
..
Call Trace:
show_stack+0x90/0xc0
dump_stack+0x150/0x1c0
___might_sleep+0x1c0/0x2a0
__might_sleep+0xa0/0x160
__alloc_pages_nodemask+0x1a0/0x600
alloc_page_interleave+0x30/0x1c0
alloc_pages_current+0x2c0/0x340
__get_free_pages+0x30/0xa0
ia64_mca_cpu_init+0x2d0/0x3a0
cpu_init+0x8b0/0x1440
start_secondary+0x60/0x700
start_ap+0x750/0x780
Fixed BSP b0 value from CPU 1
As I understand interrupts are not enabled yet and system has a lot of
memory. There is little chance to sleep and switch to GFP_ATOMIC should
be a no-op.
Thomas Hebb [Thu, 25 Mar 2021 04:37:29 +0000 (21:37 -0700)]
z3fold: prevent reclaim/free race for headless pages
Commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced
the PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page
release." By atomically testing and setting the bit in each of
z3fold_free() and z3fold_reclaim_page(), a double-free was avoided.
However, commit dcf5aedb24f8 ("z3fold: stricter locking and more careful
reclaim") appears to have unintentionally broken this behavior by moving
the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock
gets taken, which only happens for non-headless pages. For headless
pages, the check is now skipped entirely and races can occur again.
Sean Christopherson [Thu, 25 Mar 2021 04:37:23 +0000 (21:37 -0700)]
mm/mmu_notifiers: ensure range_end() is paired with range_start()
If one or more notifiers fails .invalidate_range_start(), invoke
.invalidate_range_end() for "all" notifiers. If there are multiple
notifiers, those that did not fail are expecting _start() and _end() to
be paired, e.g. KVM's mmu_notifier_count would become imbalanced.
Disallow notifiers that can fail _start() from implementing _end() so
that it's unnecessary to either track which notifiers rejected _start(),
or had already succeeded prior to a failed _start().
Note, the existing behavior of calling _start() on all notifiers even
after a previous notifier failed _start() was an unintented "feature".
Make it canon now that the behavior is depended on for correctness.
As of today, the bug is likely benign:
1. The only caller of the non-blocking notifier is OOM kill.
2. The only notifiers that can fail _start() are the i915 and Nouveau
drivers.
3. The only notifiers that utilize _end() are the SGI UV GRU driver
and KVM.
4. The GRU driver will never coincide with the i195/Nouveau drivers.
5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the
_guest_, and the guest is already doomed due to being an OOM victim.
Fix the bug now to play nice with future usage, e.g. KVM has a
potential use case for blocking memslot updates in KVM while an
invalidation is in-progress, and failure to unblock would result in said
updates being blocked indefinitely and hanging.
Found by inspection. Verified by adding a second notifier in KVM that
periodically returns -EAGAIN on non-blockable ranges, triggering OOM,
and observing that KVM exits with an elevated notifier count.
Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Sean Christopherson <seanjc@google.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: David Rientjes <rientjes@google.com> Cc: Ben Gardon <bgardon@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Konovalov [Thu, 25 Mar 2021 04:37:20 +0000 (21:37 -0700)]
kasan: fix per-page tags for non-page_alloc pages
To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page->flags.
Currently, the default tag value stored in page->flags is 0x00.
Therefore, page_address() returns a 0x00ffff... address for pages that
were not allocated via page_alloc.
This might cause problems. A particular case we encountered is a
conflict with KFENCE. If a KFENCE-allocated slab object is being freed
via kfree(page_address(page) + offset), the address passed to kfree()
will get tagged with 0x00 (as slab pages keep the default per-page
tags). This leads to is_kfence_address() check failing, and a KFENCE
object ending up in normal slab freelist, which causes memory
corruptions.
This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff. This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.
With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.
This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.
Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 25 Mar 2021 04:37:17 +0000 (21:37 -0700)]
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
The current implementation of hugetlb_cgroup for shared mappings could
have different behavior. Consider the following two scenarios:
1.Assume initial css reference count of hugetlb_cgroup is 1:
1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
count is 2 associated with 1 file_region.
1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
count is 3 associated with 2 file_region.
1.3 coalesce_file_region will coalesce these two file_regions into
one. So css reference count is 3 associated with 1 file_region
now.
2.Assume initial css reference count of hugetlb_cgroup is 1 again:
2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
count is 2 associated with 1 file_region.
Therefore, we might have one file_region while holding one or more css
reference counts. This inconsistency could lead to imbalanced css_get()
and css_put() pair. If we do css_put one by one (i.g. hole punch case),
scenario 2 would put one more css reference. If we do css_put all
together (i.g. truncate case), scenario 1 will leak one css reference.
The imbalanced css_get() and css_put() pair would result in a non-zero
reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup
directory is removed __but__ associated resource is not freed. This
might result in OOM or can not create a new hugetlb cgroup in a busy
workload ultimately.
In order to fix this, we have to make sure that one file_region must
hold exactly one css reference. So in coalesce_file_region case, we
should release one css reference before coalescence. Also only put css
reference when the entire file_region is removed.
The last thing to note is that the caller of region_add() will only hold
one reference to h_cg->css for the whole contiguous reservation region.
But this area might be scattered when there are already some
file_regions reside in it. As a result, many file_regions may share only
one h_cg->css reference. In order to ensure that one file_region must
hold exactly one css reference, we should do css_get() for each
file_region and release the reference held by caller when they are done.
[linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings] Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings") Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR) Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Wanpeng Li <liwp.linux@gmail.com> Cc: Mina Almasry <almasrymina@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
Not setting the ipv6 bit while destroying ipv6 listening servers may
result in potential fatal adapter errors due to lookup engine memory hash
errors. Therefore always set ipv6 field while destroying ipv6 listening
servers.
Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Link: https://lore.kernel.org/r/20210324190453.8171-1-bharat@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Rich Wiley [Wed, 24 Mar 2021 00:28:09 +0000 (17:28 -0700)]
arm64: kernel: disable CNP on Carmel
On NVIDIA Carmel cores, CNP behaves differently than it does on standard
ARM cores. On Carmel, if two cores have CNP enabled and share an L2 TLB
entry created by core0 for a specific ASID, a non-shareable TLBI from
core1 may still see the shared entry. On standard ARM cores, that TLBI
will invalidate the shared entry as well.
This causes issues with patchsets that attempt to do local TLBIs based
on cpumasks instead of broadcast TLBIs. Avoid these issues by disabling
CNP support for NVIDIA Carmel cores.