Bob Pearson [Thu, 30 Jun 2022 19:04:26 +0000 (14:04 -0500)]
RDMA/rxe: Replace __rxe_do_task by rxe_run_task
In rxe_req.c replace calls to __rxe_do_task() by calls to rxe_run_task(..,
0). Using __rxe_do_task is an error because the completer tasklet is not
designed to be re-entrant and __rxe_do_task() should only be called when
it is clear that no one else could be calling the completer tasklet as is
the case in rxe_qp.c where this call is used in safe environments.
Bob Pearson [Thu, 30 Jun 2022 19:04:25 +0000 (14:04 -0500)]
RDMA/rxe: Limit the number of calls to each tasklet
Limit the maximum number of calls to each tasklet from rxe_do_task()
before yielding the cpu. When the limit is reached reschedule the tasklet
and exit the calling loop. This patch prevents one tasklet from consuming
100% of a cpu core and causing a deadlock or soft lockup.
Bob Pearson [Thu, 30 Jun 2022 19:04:22 +0000 (14:04 -0500)]
RDMA/rxe: Fix rnr retry behavior
Currently the completer tasklet when retransmit timer or the rnr timer
fires the same flag (qp->req.need_retry) is set so that if either timer
fires it will attempt to perform a retry flow on the send queue. This has
the effect of responding to an RNR NAK at the first retransmit timer event
which might not allow the requested rnr timeout.
This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set,
prevents a retry flow until the rnr nak timer fires.
This patch fixes rnr retry errors which can be observed by running the
pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this
patch applied they do not occur.
Bob Pearson [Thu, 30 Jun 2022 19:04:18 +0000 (14:04 -0500)]
RDMA/rxe: Add rxe_is_fenced() subroutine
The code thc that decides whether to defer execution of a wqe in
rxe_requester.c is isolated into a subroutine rxe_is_fenced() and removed
from the call to req_next_wqe(). The condition whether a wqe should be
fenced is changed to comply with the IBA. Currently an operation is fenced
if the fence bit is set in the wqe flags and the last wqe has not
completed. For normal operations the IBA actually only requires that the
last read or atomic operation is complete.
RDMA/rxe: For invalidate compare according to set keys in mr
The 'rkey' input can be an lkey or rkey, and in rxe the lkey or rkey have
the same value, including the variant bits.
So, if mr->rkey is set, compare the invalidate key with it, otherwise
compare with the mr->lkey.
Since we already did a lookup on the non-varient bits to get this far, the
check's only purpose is to confirm that the wqe has the correct variant
bits.
Fixes: 001345339f4c ("RDMA/rxe: Separate HW and SW l/rkeys") Link: https://lore.kernel.org/r/20220707073006.328737-1-haris.phnx@gmail.com Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Thu, 14 Jul 2022 20:46:20 +0000 (15:46 -0500)]
RDMA/rxe: Fix mw bind to allow any consumer key portion
The current implementation of rxe_check_bind_mw() in rxe_mw.c is incorrect
since it requires the new key portion provided by the mw consumer to be
different than the previous key portion. This is not required by the
IBA. Remove the test.
Mark Bloch [Sun, 3 Jul 2022 20:54:07 +0000 (13:54 -0700)]
RDMA/mlx5: Expose steering anchor to userspace
Expose a steering anchor per priority to allow users to re-inject
packets back into default NIC pipeline for additional processing.
MLX5_IB_METHOD_STEERING_ANCHOR_CREATE returns a flow table ID which
a user can use to re-inject packets at a specific priority.
A FTE (flow table entry) can be created and the flow table ID
used as a destination.
When a packet is taken into a RDMA-controlled steering domain (like
software steering) there may be a need to insert the packet back into
the default NIC pipeline. This exposes a flow table ID to the user that can
be used as a destination in a flow table entry.
With this new method priorities that are exposed to users via
MLX5_IB_METHOD_FLOW_MATCHER_CREATE can be reached from a non-zero UID.
As user-created flow tables (via RDMA DEVX) are created with a non-zero UID
thus it's impossible to point to a NIC core flow table (core driver flow tables
are created with UID value of zero) from userspace.
Create flow tables that are exposed to users with the shared UID, this
allows users to point to default NIC flow tables.
Steering loops are prevented at FW level as FW enforces that no flow
table at level X can point to a table at level lower than X.
Mark Bloch [Sun, 3 Jul 2022 20:54:06 +0000 (13:54 -0700)]
RDMA/mlx5: Refactor get flow table function
_get_flow_table() requires the entire matcher being passed
while all it needs is the priority and namespace type.
Pass the priority and namespace type directly instead.
Leon Romanovsky [Tue, 19 Jul 2022 17:40:28 +0000 (20:40 +0300)]
Merge branch 'mlx5-next' into wip/leon-for-next
Mark Bloch Says:
================
Expose steering anchor
Expose a steering anchor per priority to allow users to re-inject
packets back into default NIC pipeline for additional processing.
MLX5_IB_METHOD_STEERING_ANCHOR_CREATE returns a flow table ID which
a user can use to re-inject packets at a specific priority.
A FTE (flow table entry) can be created and the flow table ID
used as a destination.
When a packet is taken into a RDMA-controlled steering domain (like
software steering) there may be a need to insert the packet back into
the default NIC pipeline. This exposes a flow table ID to the user that can
be used as a destination in a flow table entry.
With this new method priorities that are exposed to users via
MLX5_IB_METHOD_FLOW_MATCHER_CREATE can be reached from a non-zero UID.
As user-created flow tables (via RDMA DEVX) are created with a non-zero UID
thus it's impossible to point to a NIC core flow table (core driver flow tables
are created with UID value of zero) from userspace.
Create flow tables that are exposed to users with the shared UID, this
allows users to point to default NIC flow tables.
Steering loops are prevented at FW level as FW enforces that no flow
table at level X can point to a table at level lower than X.
Jianglei Nie [Mon, 11 Jul 2022 07:07:18 +0000 (15:07 +0800)]
RDMA/hfi1: fix potential memory leak in setup_base_ctxt()
setup_base_ctxt() allocates a memory chunk for uctxt->groups with
hfi1_alloc_ctxt_rcv_groups(). When init_user_ctxt() fails, uctxt->groups
is not released, which will lead to a memory leak.
We should release the uctxt->groups with hfi1_free_ctxt_rcv_groups()
when init_user_ctxt() fails.
Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Link: https://lore.kernel.org/r/20220711070718.2318320-1-niejianglei2021@163.com Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Xiao Yang [Tue, 5 Jul 2022 14:52:11 +0000 (22:52 +0800)]
RDMA/rxe: Add common rxe_prepare_res()
It's redundant to prepare resources for Read and Atomic
requests by different functions. Replace them by a common
rxe_prepare_res() with different parameters. In addition,
the common rxe_prepare_res() can also be used by new Flush
and Atomic Write requests in the future.
RDMA/rxe: Fix BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup
The function rxe_create_qp calls rxe_qp_from_init. If some error
occurs, the error handler of function rxe_qp_from_init will set
both scq and rcq to NULL.
Then rxe_create_qp calls rxe_put to handle qp. In the end,
rxe_qp_do_cleanup is called by rxe_put. rxe_qp_do_cleanup directly
accesses scq and rcq before checking them. This will cause
null-ptr-deref error.
The call graph is as below:
rxe_create_qp {
...
rxe_qp_from_init {
...
err1:
...
qp->rcq = NULL; <---rcq is set to NULL
qp->scq = NULL; <---scq is set to NULL
...
}
qp_init:
rxe_put{
...
rxe_qp_do_cleanup {
...
atomic_dec(&qp->scq->num_wq); <--- scq is accessed
...
atomic_dec(&qp->rcq->num_wq); <--- rcq is accessed
}
}
Fixes: 4703b4f0d94a ("RDMA/rxe: Enforce IBA C11-17") Link: https://lore.kernel.org/r/20220705225414.315478-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
in this case. This may trigger a call trace in iw_cm. A simple way to
trigger this:
server: ib_send_lat
client: ib_send_lat -R <server_ip>
Since ECC memory maintains a memory system immune to single-bit errors,
add support for correcting the 1bit-ECC error, which prevents a 1bit-ECC
error become an uncorrected type error. When a 1bit-ECC error happens in
the internal ram of the ROCE engine, such as the QPC table, as a 1bit-ECC
error caused by reading, the ROCE engine only corrects those 1bit ECC
errors by writing.
RDMA/hns: Fix incorrect clearing of interrupt status register
The driver will clear all the interrupts in the same area
when the driver handles the interrupt of type AEQ overflow.
It should only set the interrupt status bit of type AEQ overflow.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/20220714134353.16700-4-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
RDMA/hns: Remove unused abnormal interrupt of type RAS
The HNS NIC driver receives and handles the abnormal interrupt of the RAS
type generated by ROCEE, and the HNS RDMA driver does not need to handle
this type of interrupt. Therefore, delete unused codes in the HNS RDMA
driver.
Jianglei Nie [Thu, 14 Jul 2022 06:15:05 +0000 (14:15 +0800)]
RDMA/qedr: Fix potential memory leak in __qedr_alloc_mr()
__qedr_alloc_mr() allocates a memory chunk for "mr->info.pbl_table" with
init_mr_info(). When rdma_alloc_tid() and rdma_register_tid() fail, "mr"
is released while "mr->info.pbl_table" is not released, which will lead
to a memory leak.
We should release the "mr->info.pbl_table" with qedr_free_pbl() when error
occurs to fix the memory leak.
Fixes: e0290cce6ac0 ("qedr: Add support for memory registeration verbs") Link: https://lore.kernel.org/r/20220714061505.2342759-1-niejianglei2021@163.com Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Acked-by: Michal KalderonĀ <michal.kalderon@marvell.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Both hfi1 and UML depend on x86_64, this can trigger build errors.
This driver must depends on !UML because it accesses x86_64
features that are not supported by UML.
Jack Wang [Tue, 12 Jul 2022 10:31:13 +0000 (12:31 +0200)]
RDMA/rtrs-srv: Do not use mempool for page allocation
The mempool is for guaranteed memory allocation during
extreme VM load (see the header of mempool.c of the kernel).
But rtrs-srv allocates pages only when creating new session.
There is no need to use the mempool.
With the removal of mempool, rtrs-server no longer need to reserve
huge mount of memory, this will avoid error like this:
https://lore.kernel.org/lkml/20220620020727.GA3669@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/r/20220712103113.617754-6-haris.iqbal@ionos.com Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
RDMA/rtrs-clt: Replace list_next_or_null_rr_rcu with an inline function
removes list_next_or_null_rr_rcu macro to fix below warnings.
That macro is used only twice.
CHECK:MACRO_ARG_REUSE: Macro argument reuse 'head' - possible side-effects?
CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
CHECK:MACRO_ARG_REUSE: Macro argument reuse 'memb' - possible side-effects?
Replaces that macro with an inline function.
Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Cc: jinpu.wang@ionos.com Link: https://lore.kernel.org/r/20220712103113.617754-5-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
RDMA/rtrs-srv: Use per-cpu variables for rdma stats
Convert server stat counters from atomic to per-cpu variables.
Link: https://lore.kernel.org/r/20220712103113.617754-4-haris.iqbal@ionos.com Signed-off-by: Santosh Kumar Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Jack Wang [Tue, 12 Jul 2022 10:31:09 +0000 (12:31 +0200)]
RDMA/rtrs-srv: Fix modinfo output for stringify
stringify works with define, not enum.
Fixes: 91fddedd439c ("RDMA/rtrs: private headers with rtrs protocol structs and helpers") Cc: jinpu.wang@ionos.com Link: https://lore.kernel.org/r/20220712103113.617754-2-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mustafa Ismail [Tue, 5 Jul 2022 23:08:15 +0000 (18:08 -0500)]
RDMA/irdma: Fix setting of QP context err_rq_idx_valid field
Setting err_rq_idx_valid field in QP context when the AE source of the
AEQE is not associated with an RQ causes the firmware flush to fail.
Set err_rq_idx_valid field in QP context only if it is associated with an
RQ. Additionally, cleanup the redundant setting of this field in
irdma_process_aeq.
Mustafa Ismail [Tue, 5 Jul 2022 23:08:14 +0000 (18:08 -0500)]
RDMA/irdma: Fix VLAN connection with wildcard address
When an application listens on a wildcard address, and there are VLAN and
non-VLAN IP addresses, iWARP connection establishemnt can fail if the listen
node VLAN ID does not match.
Fix this by checking the vlan_id only if not a wildcard listen node.
Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://lore.kernel.org/r/20220705230815.265-7-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mustafa Ismail [Tue, 5 Jul 2022 23:08:13 +0000 (18:08 -0500)]
RDMA/irdma: Fix a window for use-after-free
During a destroy CQ an interrupt may cause processing of a CQE after CQ
resources are freed by irdma_cq_free_rsrc(). Fix this by moving the call
to irdma_cq_free_rsrc() after the irdma_sc_cleanup_ceqes(), which is
called under the cq_lock.
RDMA/irdma: Make resource distribution algorithm more QP oriented
Adapt the resource distribution algorithm in irdma_cfg_fpm_val to be more
QP oriented. If the configuration is too big for the available memory,
trim the MR and PBLE's first before trimming the QPs. This also avoids
having to double QPs requested as input to algorithm for GEN1 devices.
Jakub Kicinski [Tue, 5 Jul 2022 23:02:08 +0000 (16:02 -0700)]
ipoib: switch to netif_napi_add_weight()
We want to remove the weight argument from the basic
netif_napi_add() API and just default to 64.
Switch ipoib to the new API for explicitly specifying
the weight.
Jakub Kicinski [Tue, 5 Jul 2022 23:02:07 +0000 (16:02 -0700)]
IB/hfi1: switch to netif_napi_add_weight()
Since we'll remove the last argument from netif_napi_add()
soon switch this RDMA driver to netif_napi_add_weight()
for now to avoid cross-tree patches.
Bob Pearson [Thu, 30 Jun 2022 19:04:20 +0000 (14:04 -0500)]
RDMA/rxe: Remove unnecessary include statement
rxe_verbs.h includes the file <rdma/rdma_user_rxe.h>. It should have been
<uapi/rdma/rdma_user_rxe.h>, however, it is not used and not required in
this file.
Bob Pearson [Thu, 30 Jun 2022 19:04:21 +0000 (14:04 -0500)]
RDMA/rxe: Replace include statement
rxe_queue.h currently includes <uapi/rdma/rdma_user_rxe.h> for a
definition of struct rxe_queue_buf. But it is only used as a pointer so
the definition is not needed.
This patch replaces the include statement with the declaration
Bob Pearson [Thu, 30 Jun 2022 19:04:19 +0000 (14:04 -0500)]
RDMA/rxe: Convert pr_warn/err to pr_debug in pyverbs
The pyverbs test suite generates a few dmesg traces from intentional error
tests. This patch replaces those messages with pr_debug() calls which
improves the usefullness of the tests.
Bob Pearson [Mon, 23 May 2022 22:32:52 +0000 (17:32 -0500)]
RDMA/rxe: Fix deadlock in rxe_do_local_ops()
When a local operation (invalidate mr, reg mr, bind mw) is finished there
will be no ack packet coming from a responder to cause the wqe to be
completed. This may happen anyway if a subsequent wqe performs
IO. Currently if the wqe is signalled the completer tasklet is scheduled
immediately but not otherwise.
This leads to a deadlock if the next wqe has the fence bit set in send
flags and the operation is not signalled. This patch removes the condition
that the wqe must be signalled in order to schedule the completer tasklet
which is the simplest fix for this deadlock and is fairly low cost. This
is the analog for local operations of always setting the ackreq bit in all
last or only request packets even if the operation is not signalled.
Bob Pearson [Mon, 6 Jun 2022 14:38:37 +0000 (09:38 -0500)]
RDMA/rxe: Merge normal and retry atomic flows
Make the execution of the atomic operation in rxe_atomic_reply()
conditional on res->replay and make duplicate_request() call into
rxe_atomic_reply() to merge the two flows. This is modeled on the behavior
of read reply. Delete the skb from the atomic responder resource since it
is no longer used. Adjust the reference counting of the qp in
send_atomic_ack() for this flow.
Bob Pearson [Mon, 6 Jun 2022 14:38:36 +0000 (09:38 -0500)]
RDMA/rxe: Move atomic original value to res
Move the saved original value to the atomic responder resource. This
replaces saving it in the qp. In preparation for merging the normal and
retry atomic responder flows.
Bob Pearson [Mon, 6 Jun 2022 14:38:35 +0000 (09:38 -0500)]
RDMA/rxe: Move atomic responder res to atomic_reply
Move the allocation of the atomic responder resource up into
rxe_atomic_reply() from send_atomic_ack(). In preparation for merging the
normal and retry atomic responder flows.
Bob Pearson [Mon, 6 Jun 2022 14:38:34 +0000 (09:38 -0500)]
RDMA/rxe: Add a responder state for atomic reply
Add a responder state for atomic reply similar to read reply and rename
process_atomic() rxe_atomic_reply(). In preparation for merging the normal
and retry atomic responder flows.
Bob Pearson [Mon, 6 Jun 2022 14:38:33 +0000 (09:38 -0500)]
RDMA/rxe: Move code to rxe_prepare_atomic_res()
Separate the code that prepares the atomic responder resource into a
subroutine. This is preparation for merging the normal and retry atomic
responder flows.
Bob Pearson [Sun, 12 Jun 2022 22:34:34 +0000 (17:34 -0500)]
RDMA/rxe: Stop lookup of partially built objects
Currently the rdma_rxe driver has a security weakness due to giving
objects which are partially initialized indices allowing external actors
to gain access to them by sending packets which refer to their
index (e.g. qpn, rkey, etc) causing unpredictable results.
This patch adds a new API rxe_finalize(obj) which enables looking up pool
objects from indices using rxe_pool_get_index() for AH, QP, MR, and
MW. They are added in create verbs only after the objects are fully
initialized.
It also adds wait for completion to destroy/dealloc verbs to assure that
all references have been dropped before returning to rdma_core by
implementing a new rxe_pool API rxe_cleanup() which drops a reference to
the object and then waits for all other references to be dropped. When
the last reference is dropped the object is completed by kref. After that
it cleans up the object and if locally allocated frees the memory. In the
special case of address handle objects the delay is implemented separately
if the destroy_ah call is not sleepable.
Combined with deferring cleanup code to type specific cleanup routines
this allows all pending activity referring to objects to complete before
returning to rdma_core.
Leon Romanovsky [Thu, 16 Jun 2022 17:09:11 +0000 (20:09 +0300)]
Merge branch 'mlx5-next' into wip/leon-for-next
This merge commit includes shared updates to net-next and rdma-next
for upcoming mlx5 features.
1) Updated HW bits and definitions for upcoming features
1.1) vport debug counters
1.2) flow meter
1.3) Execute ASO action for flow entry
1.4) enhanced CQE compression
2) Add ICM header-modify-pattern RDMA API
Leon Says
=========
SW steering manipulates packet's header using "modifying header" actions.
Many of these actions do the same operation, but use different data each time.
Currently we create and keep every one of these actions, which use expensive
and limited resources.
Now we introduce a new mechanism - pattern and argument, which splits
a modifying action into two parts:
1. action pattern: contains the operations to be applied on packet's header,
mainly set/add/copy of fields in the packet
2. action data/argument: contains the data to be used by each operation
in the pattern.
This way we reuse same patterns with different arguments to create new
modifying actions, and since many actions share the same operations, we end
up creating a small number of patterns that we keep in a dedicated cache.
These modify header patterns are implemented as new type of ICM memory,
so the following kernel patch series add the support for this new ICM type.
==========
Dongliang Mu [Thu, 9 Jun 2022 07:06:56 +0000 (15:06 +0800)]
RDMA/rxe: fix xa_alloc_cycle() error return value check again
Currently rxe_alloc checks ret to indicate error, but 1 is also a valid
return and just indicates that the allocation succeeded with a wrap.
Fix this by modifying the check to be < 0.
Link: https://lore.kernel.org/r/20220609070656.1446121-1-dzm91@hust.edu.cn Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Patrisious Haddad [Tue, 7 Jun 2022 11:32:44 +0000 (14:32 +0300)]
RDMA/core: Add a netevent notifier to cma
Add a netevent callback for cma, mainly to catch NETEVENT_NEIGH_UPDATE.
Previously, when a system with failover MAC mechanism change its MAC address
during a CM connection attempt, the RDMA-CM would take a lot of time till
it disconnects and timesout due to the incorrect MAC address.
Now when we get a NETEVENT_NEIGH_UPDATE we check if it is due to a failover
MAC change and if so, we instantly destroy the CM and notify the user in order
to spare the unnecessary waiting for the timeout.
Patrisious Haddad [Tue, 7 Jun 2022 11:32:43 +0000 (14:32 +0300)]
RDMA/core: Add an rb_tree that stores cm_ids sorted by ifindex and remote IP
Add to the cma, a tree that keeps track of all rdma_id_private channels
that were created while in RoCE mode.
The IDs are sorted first according to their netdevice ifindex then their
destination IP. And for IDs with matching IP they would be at the same node
in the tree, since the tree data is a list of all ids with matching destination IP.
The tree allows fast and efficient lookup of ids using an ifindex and
IP address which is useful for identifying relevant net_events promptly.
Ofer Levi [Wed, 8 Jun 2022 20:04:52 +0000 (13:04 -0700)]
net/mlx5: Add bits and fields to support enhanced CQE compression
Expose ifc bits and add needed structure fields and methods to
support enhanced CQE compression feature.
The enhanced CQE compression feature improves cpu utiliziation with
better packet latency from nic to host.
Signed-off-by: Ofer Levi <oferle@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shay Drory [Wed, 8 Jun 2022 20:04:51 +0000 (13:04 -0700)]
net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK
Remove not used MLX5_CAP_BITS_RW_MASK.
While at it, remove CAP_MASK, MLX5_CAP_OFF_CMDIF_CSUM
and MLX5_DEV_CAP_FLAG_*, since MLX5_CAP_BITS_RW_MASK
was their only user.
Shay Drory [Wed, 8 Jun 2022 20:04:50 +0000 (13:04 -0700)]
net/mlx5: group fdb cleanup to single function
Currently, the allocation of fdb software objects are done is single
function, oppose to the cleanup of them.
Group the cleanup of fdb software objects to single function.
Linus Torvalds [Mon, 6 Jun 2022 00:14:03 +0000 (17:14 -0700)]
Merge tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull file descriptor fix from Al Viro:
"Fix for breakage in #work.fd this window"
* tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix the breakage in close_fd_get_file() calling conventions change
Linus Torvalds [Sun, 5 Jun 2022 18:51:48 +0000 (11:51 -0700)]
bluetooth: don't use bitmaps for random flag accesses
The bluetooth code uses our bitmap infrastructure for the two bits (!)
of connection setup flags, and in the process causes odd problems when
it converts between a bitmap and just the regular values of said bits.
It's completely pointless to do things like bitmap_to_arr32() to convert
a bitmap into a u32. It shoudln't have been a bitmap in the first
place. The reason to use bitmaps is if you have arbitrary number of
bits you want to manage (not two!), or if you rely on the atomicity
guarantees of the bitmap setting and clearing.
The code could use an "atomic_t" and use "atomic_or/andnot()" to set and
clear the bit values, but considering that it then copies the bitmaps
around with "bitmap_to_arr32()" and friends, there clearly cannot be a
lot of atomicity requirements.
So just use a regular integer.
In the process, this avoids the warnings about erroneous use of
bitmap_from_u64() which were triggered on 32-bit architectures when
conversion from a u64 would access two words (and, surprise, surprise,
only one word is needed - and indeed overkill - for a 2-bit bitmap).
That was always problematic, but the compiler seems to notice it and
warn about the invalid pattern only after commit 0a97953fd221 ("lib: add
bitmap_{from,to}_arr64") changed the exact implementation details of
'bitmap_from_u64()', as reported by Sudip Mukherjee and Stephen Rothwell.
Al Viro [Sun, 5 Jun 2022 18:01:42 +0000 (14:01 -0400)]
fix the breakage in close_fd_get_file() calling conventions change
It used to grab an extra reference to struct file rather than
just transferring to caller the one it had removed from descriptor
table. New variant doesn't, and callers need to be adjusted.
Reported-and-tested-by: syzbot+47dd250f527cb7bebf24@syzkaller.appspotmail.com Fixes: 6319194ec57b ("Unify the primitives for file descriptor closing") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sun, 5 Jun 2022 18:00:43 +0000 (11:00 -0700)]
Merge tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX fix from Thomas Gleixner:
"A single fix for x86/SGX to prevent that memory which is allocated for
an SGX enclave is accounted to the wrong memory control group"
* tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Set active memcg prior to shmem allocation
Linus Torvalds [Sun, 5 Jun 2022 17:55:23 +0000 (10:55 -0700)]
Merge tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Thomas Gleixner:
- Disable late microcode loading by default. Unless the HW people get
their act together and provide a required minimum version in the
microcode header for making a halfways informed decision its just
lottery and broken.
- Warn and taint the kernel when microcode is loaded late
- Remove the old unused microcode loader interface
- Remove a redundant perf callback from the microcode loader
* tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Remove unnecessary perf callback
x86/microcode: Taint and warn on late loading
x86/microcode: Default-disable late loading
x86/microcode: Rip out the OLD_INTERFACE
Linus Torvalds [Sun, 5 Jun 2022 17:53:41 +0000 (10:53 -0700)]
Merge tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Thomas Gleixner:
"A set of small x86 cleanups:
- Remove unused headers in the IDT code
- Kconfig indendation and comment fixes
- Fix all 'the the' typos in one go instead of waiting for bots to
fix one at a time"
* tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix all occurences of the "the the" typo
x86/idt: Remove unused headers
x86/Kconfig: Fix indentation of arch/x86/Kconfig.debug
x86/Kconfig: Fix indentation and add endif comments to arch/x86/Kconfig
Linus Torvalds [Sun, 5 Jun 2022 17:47:06 +0000 (10:47 -0700)]
Merge tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clockevent/clocksource updates from Thomas Gleixner:
- Device tree bindings for MT8186
- Tell the kernel that the RISC-V SBI timer stops in deeper power
states
- Make device tree parsing in sp804 more robust
- Dead code removal and tiny fixes here and there
- Add the missing SPDX identifiers
* tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
clocksource/drivers/timer-ti-dm: Remove unnecessary NULL check
clocksource/drivers/timer-sun5i: Convert to SPDX identifier
clocksource/drivers/timer-sun4i: Convert to SPDX identifier
clocksource/drivers/pistachio: Convert to SPDX identifier
clocksource/drivers/orion: Convert to SPDX identifier
clocksource/drivers/lpc32xx: Convert to SPDX identifier
clocksource/drivers/digicolor: Convert to SPDX identifier
clocksource/drivers/armada-370-xp: Convert to SPDX identifier
clocksource/drivers/mips-gic-timer: Convert to SPDX identifier
clocksource/drivers/jcore: Convert to SPDX identifier
clocksource/drivers/bcm_kona: Convert to SPDX identifier
clocksource/drivers/sp804: Avoid error on multiple instances
clocksource/drivers/riscv: Events are stopped during CPU suspend
clocksource/drivers/ixp4xx: Drop boardfile probe path
dt-bindings: timer: Add compatible for Mediatek MT8186
Linus Torvalds [Sun, 5 Jun 2022 17:40:31 +0000 (10:40 -0700)]
Merge tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
- Make the ICL event constraints match reality
- Remove a unused local variable
* tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Remove unused local variable
perf/x86/intel: Fix event constraints for ICL
Linus Torvalds [Sun, 5 Jun 2022 16:45:27 +0000 (09:45 -0700)]
Merge tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Thomas Gleixner:
- Handle __ubsan_handle_builtin_unreachable() correctly and treat it as
noreturn
- Allow architectures to select uaccess validation
- Use the non-instrumented bit test for test_cpu_has() to prevent
escape from non-instrumentable regions
- Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape
from non-instrumentable regions
- Mark a few tiny inline as __always_inline to prevent GCC from
bringing them out of line and instrumenting them
- Mark the empty stub context_tracking_enabled() as always inline as
GCC brings them out of line and instruments the empty shell
- Annotate ex_handler_msr_mce() as dead end
* tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/extable: Annotate ex_handler_msr_mce() as a dead end
context_tracking: Always inline empty stubs
x86: Always inline on_thread_stack() and current_top_of_stack()
jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds
x86/cpu: Elide KCSAN for cpu_has() and friends
objtool: Mark __ubsan_handle_builtin_unreachable() as noreturn
objtool: Add CONFIG_HAVE_UACCESS_VALIDATION