]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
2 years agotools: ynl-gen: don't generate forward declarations for policies
Jakub Kicinski [Wed, 7 Jun 2023 20:24:00 +0000 (13:24 -0700)]
tools: ynl-gen: don't generate forward declarations for policies

Now that all nested types have structs and are sorted topologically
there should be no need to generate forward declarations for policies.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: walk nested types in depth
Jakub Kicinski [Wed, 7 Jun 2023 20:23:59 +0000 (13:23 -0700)]
tools: ynl-gen: walk nested types in depth

So far we had only created structures for nested types nested
directly in messages (second level of attrs so to speak).
Walk types in depth to support deeper nesting.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: inherit struct use info
Jakub Kicinski [Wed, 7 Jun 2023 20:23:58 +0000 (13:23 -0700)]
tools: ynl-gen: inherit struct use info

We only render parse and netlink generation helpers as needed,
to avoid generating dead code. Propagate the information from
first- and second-layer attribute sets onto all children.
Otherwise devlink won't work, it has a lot more levels of nesting.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: try to sort the types more intelligently
Jakub Kicinski [Wed, 7 Jun 2023 20:23:57 +0000 (13:23 -0700)]
tools: ynl-gen: try to sort the types more intelligently

We need to sort the structures to avoid the need for forward
declarations. While at it remove the sort of structs when
rendering, it doesn't do anything.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: enable code gen for directional specs
Jakub Kicinski [Wed, 7 Jun 2023 20:23:56 +0000 (13:23 -0700)]
tools: ynl-gen: enable code gen for directional specs

I think that user space code gen for directional specs
works after recent changes. Let them through.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: refactor strmap helper generation
Jakub Kicinski [Wed, 7 Jun 2023 20:23:55 +0000 (13:23 -0700)]
tools: ynl-gen: refactor strmap helper generation

Move generating strmap lookup function to a helper.
No functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: use enum names in op strmap more carefully
Jakub Kicinski [Wed, 7 Jun 2023 20:23:54 +0000 (13:23 -0700)]
tools: ynl-gen: use enum names in op strmap more carefully

In preparation for supporting families which use different msg
ids to and from the kernel - make sure the ids in op strmap
are correct. The map is expected to be used mostly for notifications,
don't generate a separate map for the "to kernel" direction.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetlink: specs: devlink: fill in some details important for C
Jakub Kicinski [Wed, 7 Jun 2023 20:23:53 +0000 (13:23 -0700)]
netlink: specs: devlink: fill in some details important for C

Python YNL is much more forgiving than the C code gen in terms
of the spec completeness. Fill in a handful of devlink details
to make the spec usable in C.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 8 Jun 2023 18:34:28 +0000 (11:34 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/sched/sch_taprio.c
  d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
  dced11ef84fb ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()")

net/ipv4/sysctl_net_ipv4.c
  e209fee4118f ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294")
  ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear")
https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 8 Jun 2023 16:27:19 +0000 (09:27 -0700)]
Merge tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can, wifi, netfilter, bluetooth and ebpf.

  Current release - regressions:

   - bpf: sockmap: avoid potential NULL dereference in
     sk_psock_verdict_data_ready()

   - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()

   - phylink: actually fix ksettings_set() ethtool call

   - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3

  Current release - new code bugs:

   - wifi: mt76: fix possible NULL pointer dereference in
     mt7996_mac_write_txwi()

  Previous releases - regressions:

   - netfilter: fix NULL pointer dereference in nf_confirm_cthelper

   - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS

   - openvswitch: fix upcall counter access before allocation

   - bluetooth:
      - fix use-after-free in hci_remove_ltk/hci_remove_irk
      - fix l2cap_disconnect_req deadlock

   - nic: bnxt_en: prevent kernel panic when receiving unexpected
     PHC_UPDATE event

  Previous releases - always broken:

   - core: annotate rfs lockless accesses

   - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values

   - netfilter: add null check for nla_nest_start_noflag() in
     nft_dump_basechain_hook()

   - bpf: fix UAF in task local storage

   - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294

   - ipv6: rpl: fix route of death.

   - tcp: gso: really support BIG TCP

   - mptcp: fixes for user-space PM address advertisement

   - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT

   - can: avoid possible use-after-free when j1939_can_rx_register fails

   - batman-adv: fix UaF while rescheduling delayed work

   - eth: qede: fix scheduling while atomic

   - eth: ice: make writes to /dev/gnssX synchronous"

* tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
  bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
  bnxt_en: Skip firmware fatal error recovery if chip is not accessible
  bnxt_en: Query default VLAN before VNIC setup on a VF
  bnxt_en: Don't issue AP reset during ethtool's reset operation
  bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
  net: bcmgenet: Fix EEE implementation
  eth: ixgbe: fix the wake condition
  eth: bnxt: fix the wake condition
  lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
  bpf: Add extra path pointer check to d_path helper
  net: sched: fix possible refcount leak in tc_chain_tmplt_add()
  net: sched: act_police: fix sparse errors in tcf_police_dump()
  net: openvswitch: fix upcall counter access before allocation
  net: sched: move rtm_tca_policy declaration to include file
  ice: make writes to /dev/gnssX synchronous
  net: sched: add rcu annotations around qdisc->qdisc_sleeping
  rfs: annotate lockless accesses to RFS sock flow table
  rfs: annotate lockless accesses to sk->sk_rxhash
  virtio_net: use control_buf for coalesce params
  ...

2 years agoMerge tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 8 Jun 2023 15:46:58 +0000 (08:46 -0700)]
Merge tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Dave Chinner:
 "These are a set of regression fixes discovered on recent kernels. I
  was hoping to send this to you a week and half ago, but events out of
  my control delayed finalising the changes until early this week.

  Whilst the diffstat looks large for this stage of the merge window, a
  large chunk of it comes from moving the guts of one function from one
  file to another i.e. it's the same code, it is just run in a different
  context where it is safe to hold a specific lock. Otherwise the
  individual changes are relatively small and straigtht forward.

  Summary:

   - Propagate unlinked inode list corruption back up to log recovery
     (regression fix)

   - improve corruption detection for AGFL entries, AGFL indexes and
     XEFI extents (syzkaller fuzzer oops report)

   - Avoid double perag reference release (regression fix)

   - Improve extent merging detection in scrub (regression fix)

   - Fix a new undefined high bit shift (regression fix)

   - Fix for AGF vs inode cluster buffer deadlock (regression fix)"

* tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: collect errors from inodegc for unlinked inode recovery
  xfs: validate block number being freed before adding to xefi
  xfs: validity check agbnos on the AGFL
  xfs: fix agf/agfl verification on v4 filesystems
  xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()
  xfs: fix broken logic when detecting mergeable bmap records
  xfs: Fix undefined behavior of shift into sign bit
  xfs: fix AGF vs inode cluster buffer deadlock
  xfs: defered work could create precommits
  xfs: restore allocation trylock iteration
  xfs: buffer pins need to hold a buffer reference

2 years agoMerge branch 'crypto-splice-net-make-af_alg-handle-sendmsg-msg_splice_pages'
Paolo Abeni [Thu, 8 Jun 2023 11:42:54 +0000 (13:42 +0200)]
Merge branch 'crypto-splice-net-make-af_alg-handle-sendmsg-msg_splice_pages'

David Howells says:

====================
crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)

Here are patches to make AF_ALG handle the MSG_SPLICE_PAGES internal
sendmsg flag.  MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can.  The sendpage functions
are then turned into wrappers around that.

This set consists of the following parts:

 (1) Move netfs_extract_iter_to_sg() to somewhere more general and rename
     it to drop the "netfs" prefix.  We use this to extract directly from
     an iterator into a scatterlist.

 (2) Make AF_ALG use iov_iter_extract_pages().  This has the additional
     effect of pinning pages obtained from userspace rather than taking
     refs on them.  Pages from kernel-backed iterators would not be pinned,
     but AF_ALG isn't really meant for use by kernel services.

 (3) Change AF_ALG still further to use extract_iter_to_sg().

 (4) Make af_alg_sendmsg() support MSG_SPLICE_PAGES support and make
     af_alg_sendpage() just a wrapper around sendmsg().  This has to take
     refs on the pages pinned for the moment.

 (5) Make hash_sendmsg() support MSG_SPLICE_PAGES by simply ignoring it.
     hash_sendpage() is left untouched to be removed later, after the
     splice core has been changed to call sendmsg().
====================

Link: https://lore.kernel.org/r/20230606130856.1970660-1-dhowells@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocrypto: af_alg/hash: Support MSG_SPLICE_PAGES
David Howells [Tue, 6 Jun 2023 13:08:56 +0000 (14:08 +0100)]
crypto: af_alg/hash: Support MSG_SPLICE_PAGES

Make AF_ALG sendmsg() support MSG_SPLICE_PAGES in the hashing code.  This
causes pages to be spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocrypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES
David Howells [Tue, 6 Jun 2023 13:08:55 +0000 (14:08 +0100)]
crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES

Convert af_alg_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocrypto: af_alg: Support MSG_SPLICE_PAGES
David Howells [Tue, 6 Jun 2023 13:08:54 +0000 (14:08 +0100)]
crypto: af_alg: Support MSG_SPLICE_PAGES

Make AF_ALG sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocrypto: af_alg: Indent the loop in af_alg_sendmsg()
David Howells [Tue, 6 Jun 2023 13:08:53 +0000 (14:08 +0100)]
crypto: af_alg: Indent the loop in af_alg_sendmsg()

Put the loop in af_alg_sendmsg() into an if-statement to indent it to make
the next patch easier to review as that will add another branch to handle
MSG_SPLICE_PAGES to the if-statement.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocrypto: af_alg: Use extract_iter_to_sg() to create scatterlists
David Howells [Tue, 6 Jun 2023 13:08:52 +0000 (14:08 +0100)]
crypto: af_alg: Use extract_iter_to_sg() to create scatterlists

Use extract_iter_to_sg() to decant the destination iterator into a
scatterlist in af_alg_get_rsgl().  af_alg_make_sg() can then be removed.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocrypto: af_alg: Pin pages rather than ref'ing if appropriate
David Howells [Tue, 6 Jun 2023 13:08:51 +0000 (14:08 +0100)]
crypto: af_alg: Pin pages rather than ref'ing if appropriate

Convert AF_ALG to use iov_iter_extract_pages() instead of
iov_iter_get_pages().  This will pin pages or leave them unaltered rather
than getting a ref on them as appropriate to the iterator.

The pages need to be pinned for DIO-read rather than having refs taken on
them to prevent VM copy-on-write from malfunctioning during a concurrent
fork() (the result of the I/O would otherwise end up only visible to the
child process and not the parent).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMove netfs_extract_iter_to_sg() to lib/scatterlist.c
David Howells [Tue, 6 Jun 2023 13:08:50 +0000 (14:08 +0100)]
Move netfs_extract_iter_to_sg() to lib/scatterlist.c

Move netfs_extract_iter_to_sg() to lib/scatterlist.c as it's going to be
used by more than just network filesystems (AF_ALG, for example).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoWrap lines at 80
David Howells [Tue, 6 Jun 2023 13:08:49 +0000 (14:08 +0100)]
Wrap lines at 80

Wrap a line at 80 to stop checkpatch complaining.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Simon Horman <simon.horman@corigine.com>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoFix a couple of spelling mistakes
David Howells [Tue, 6 Jun 2023 13:08:48 +0000 (14:08 +0100)]
Fix a couple of spelling mistakes

Fix a couple of spelling mistakes in a comment.

Suggested-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZHH2mSRqeL4Gs1ft@corigine.com/
Link: https://lore.kernel.org/r/ZHH1nqZWOGzxlidT@corigine.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoDrop the netfs_ prefix from netfs_extract_iter_to_sg()
David Howells [Tue, 6 Jun 2023 13:08:47 +0000 (14:08 +0100)]
Drop the netfs_ prefix from netfs_extract_iter_to_sg()

Rename netfs_extract_iter_to_sg() and its auxiliary functions to drop the
netfs_ prefix.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'txgbe-phylink-support'
Paolo Abeni [Thu, 8 Jun 2023 11:25:14 +0000 (13:25 +0200)]
Merge branch 'txgbe-phylink-support'

Jiawen Wu says:

====================
TXGBE PHYLINK support

Implement I2C, SFP, GPIO and PHYLINK to setup TXGBE link.

Because our I2C and PCS are based on Synopsys Designware IP-core, extend
the i2c-designware and pcs-xpcs driver to realize our functions.
====================

Link: https://lore.kernel.org/r/20230606092107.764621-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: txgbe: Support phylink MAC layer
Jiawen Wu [Tue, 6 Jun 2023 09:21:07 +0000 (17:21 +0800)]
net: txgbe: Support phylink MAC layer

Add phylink support to Wangxun 10Gb Ethernet controller for the 10GBASE-R
interface.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: txgbe: Implement phylink pcs
Jiawen Wu [Tue, 6 Jun 2023 09:21:06 +0000 (17:21 +0800)]
net: txgbe: Implement phylink pcs

Register MDIO bus for PCS layer to use Synopsys designware XPCS, support
10GBASE-R interface to the controller.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: pcs: Add 10GBASE-R mode for Synopsys Designware XPCS
Jiawen Wu [Tue, 6 Jun 2023 09:21:05 +0000 (17:21 +0800)]
net: pcs: Add 10GBASE-R mode for Synopsys Designware XPCS

Add basic support for XPCS using 10GBASE-R interface. This mode will
be extended to use interrupt, so set pcs.poll false. And avoid soft
reset so that the device using this mode is in the default configuration.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: txgbe: Support GPIO to SFP socket
Jiawen Wu [Tue, 6 Jun 2023 09:21:04 +0000 (17:21 +0800)]
net: txgbe: Support GPIO to SFP socket

Register GPIO chip and handle GPIO IRQ for SFP socket.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: txgbe: Add SFP module identify
Jiawen Wu [Tue, 6 Jun 2023 09:21:03 +0000 (17:21 +0800)]
net: txgbe: Add SFP module identify

Register SFP platform device to get modules information.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: txgbe: Register I2C platform device
Jiawen Wu [Tue, 6 Jun 2023 09:21:02 +0000 (17:21 +0800)]
net: txgbe: Register I2C platform device

Register the platform device to use Designware I2C bus master driver.
Use regmap to read/write I2C device region from given base offset.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: txgbe: Register fixed rate clock
Jiawen Wu [Tue, 6 Jun 2023 09:21:01 +0000 (17:21 +0800)]
net: txgbe: Register fixed rate clock

In order for I2C to be able to work in standard mode, register a fixed
rate clock for each I2C device.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: txgbe: Add software nodes to support phylink
Jiawen Wu [Tue, 6 Jun 2023 09:21:00 +0000 (17:21 +0800)]
net: txgbe: Add software nodes to support phylink

Register software nodes for GPIO, I2C, SFP and PHYLINK. Define the
device properties.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'bnxt_en-bug-fixes'
Paolo Abeni [Thu, 8 Jun 2023 08:52:48 +0000 (10:52 +0200)]
Merge branch 'bnxt_en-bug-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes

This patchset has the following fixes for bnxt_en:

1. Add missing VNIC ID parameter in the FW message when getting an
updated RSS configuration from the FW.

2. Fix a warning when doing ethtool reset on newer chips.

3. Fix VLAN issue on a VF when a default VLAN is assigned.

4. Fix a problem during DPC (Downstream Port containment) scenario.

5. Fix a NULL pointer dereference when receiving a PTP event from FW.

6. Fix VXLAN/Geneve UDP port delete/add with newer FW.
====================

Link: https://lore.kernel.org/r/20230607075409.228450-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
Somnath Kotur [Wed, 7 Jun 2023 07:54:09 +0000 (00:54 -0700)]
bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks

As per the new udp tunnel framework, drivers which need to know the
details of a port entry (i.e. port type) when it gets deleted should
use the .set_port / .unset_port callbacks.

Implementing the current .udp_tunnel_sync callback would mean that the
deleted tunnel port entry would be all zeros.  This used to work on
older firmware because it would not check the input when deleting a
tunnel port.  With newer firmware, the delete will now fail and
subsequent tunnel port allocation will fail as a result.

Fixes: 442a35a5a7aa ("bnxt: convert to new udp_tunnel_nic infra")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
Pavan Chebbi [Wed, 7 Jun 2023 07:54:08 +0000 (00:54 -0700)]
bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event

The firmware can send PHC_RTC_UPDATE async event on a PF that may not
have PTP registered. In such a case, there will be a null pointer
deference for bp->ptp_cfg when we try to handle the event.

Fix it by not registering for this event with the firmware if !bp->ptp_cfg.
Also, check that bp->ptp_cfg is valid before proceeding when we receive
the event.

Fixes: 8bcf6f04d4a5 ("bnxt_en: Handle async event when the PHC is updated in RTC mode")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: Skip firmware fatal error recovery if chip is not accessible
Vikas Gupta [Wed, 7 Jun 2023 07:54:07 +0000 (00:54 -0700)]
bnxt_en: Skip firmware fatal error recovery if chip is not accessible

Driver starts firmware fatal error recovery by detecting
heartbeat failure or fw reset count register changing.  But
these checks are not reliable if the device is not accessible.
This can happen while DPC (Downstream Port containment) is in
progress.  Skip firmware fatal recovery if pci_device_is_present()
returns false.

Fixes: acfb50e4e773 ("bnxt_en: Add FW fatal devlink_health_reporter.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: Query default VLAN before VNIC setup on a VF
Somnath Kotur [Wed, 7 Jun 2023 07:54:06 +0000 (00:54 -0700)]
bnxt_en: Query default VLAN before VNIC setup on a VF

We need to call bnxt_hwrm_func_qcfg() on a VF to query the default
VLAN that may be setup by the PF.  If a default VLAN is enabled,
the VF cannot support VLAN acceleration on the receive side and
the VNIC must be setup to strip out the default VLAN tag.  If a
default VLAN is not enabled, the VF can support VLAN acceleration
on the receive side.  The VNIC should be set up to strip or not
strip the VLAN based on the RX VLAN acceleration setting.

Without this call to determine the default VLAN before calling
bnxt_setup_vnic(), the VNIC may not be set up correctly.  For
example, bnxt_setup_vnic() may set up to strip the VLAN tag based
on stale default VLAN information.  If RX VLAN acceleration is
not enabled, the VLAN tag will be incorrectly stripped and the
RX data path will not work correctly.

Fixes: cf6645f8ebc6 ("bnxt_en: Add function for VF driver to query default VLAN.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: Don't issue AP reset during ethtool's reset operation
Sreekanth Reddy [Wed, 7 Jun 2023 07:54:05 +0000 (00:54 -0700)]
bnxt_en: Don't issue AP reset during ethtool's reset operation

Only older NIC controller's firmware uses the PROC AP reset type.
Firmware on 5731X/5741X and newer chips does not support this reset
type.  When bnxt_reset() issues a series of resets, this PROC AP
reset may actually fail on these newer chips because the firmware
is not ready to accept this unsupported command yet.  Avoid this
unnecessary error by skipping this reset type on chips that don't
support it.

Fixes: 7a13240e3718 ("bnxt_en: fix ethtool_reset_flags ABI violations")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
Pavan Chebbi [Wed, 7 Jun 2023 07:54:04 +0000 (00:54 -0700)]
bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()

We must specify the vnic id of the vnic in the input structure of this
firmware message.  Otherwise we will get an error from the firmware.

Fixes: 98a4322b70e8 ("bnxt_en: update RSS config using difference algorithm")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge
Jakub Kicinski [Thu, 8 Jun 2023 04:56:01 +0000 (21:56 -0700)]
Merge tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

 - fix a broken sync while rescheduling delayed work,
   by Vladislav Efanov

* tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge:
  batman-adv: Broken sync while rescheduling delayed work
====================

Link: https://lore.kernel.org/r/20230607155515.548120-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: bcmgenet: Fix EEE implementation
Florian Fainelli [Tue, 6 Jun 2023 21:43:47 +0000 (14:43 -0700)]
net: bcmgenet: Fix EEE implementation

We had a number of short comings:

- EEE must be re-evaluated whenever the state machine detects a link
  change as wight be switching from a link partner with EEE
  enabled/disabled

- tx_lpi_enabled controls whether EEE should be enabled/disabled for the
  transmit path, which applies to the TBUF block

- We do not need to forcibly enable EEE upon system resume, as the PHY
  state machine will trigger a link event that will do that, too

Fixes: 6ef398ea60d9 ("net: bcmgenet: add EEE support")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230606214348.2408018-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoeth: ixgbe: fix the wake condition
Jakub Kicinski [Wed, 7 Jun 2023 01:08:26 +0000 (18:08 -0700)]
eth: ixgbe: fix the wake condition

Flip the netif_carrier_ok() condition in queue wake logic.
When I moved it to inside __netif_txq_completed_wake()
I missed negating it.

This made the condition ineffective and could probably
lead to crashes.

Fixes: 301f227fc860 ("net: piggy back on the memory barrier in bql when waking queues")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230607010826.960226-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoeth: bnxt: fix the wake condition
Jakub Kicinski [Wed, 7 Jun 2023 01:08:25 +0000 (18:08 -0700)]
eth: bnxt: fix the wake condition

The down condition should be the negation of the wake condition,
IOW when I moved it from:

 if (cond && wake())
to
 if (__netif_txq_completed_wake(cond))

Cond should have been negated. Flip it now.

This bug leads to occasional crashes with netconsole.
It may also lead to queue never waking up in case BQL is not enabled.

Reported-by: David Wei <davidhwei@meta.com>
Fixes: 08a096780d92 ("bnxt: use new queue try_stop/try_wake macros")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230607010826.960226-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 8 Jun 2023 04:47:11 +0000 (21:47 -0700)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-06-07

We've added 7 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 112 insertions(+), 7 deletions(-).

The main changes are:

1) Fix a use-after-free in BPF's task local storage, from KP Singh.

2) Make struct path handling more robust in bpf_d_path, from Jiri Olsa.

3) Fix a syzbot NULL-pointer dereference in sockmap, from Eric Dumazet.

4) UAPI fix for BPF_NETFILTER before final kernel ships,
   from Florian Westphal.

5) Fix map-in-map array_map_gen_lookup code generation where elem_size was
   not being set for inner maps, from Rhys Rustad-Elliott.

6) Fix sockopt_sk selftest's NETLINK_LIST_MEMBERSHIPS assertion,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add extra path pointer check to d_path helper
  selftests/bpf: Fix sockopt_sk selftest
  bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
  selftests/bpf: Add access_inner_map selftest
  bpf: Fix elem_size not being set for inner maps
  bpf: Fix UAF in task local storage
  bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
====================

Link: https://lore.kernel.org/r/20230607220514.29698-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x
Michal Smulski [Mon, 5 Jun 2023 17:44:42 +0000 (10:44 -0700)]
net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x

Enable USXGMII mode for mv88e6393x chips. Tested on Marvell 88E6191X.

Signed-off-by: Michal Smulski <michal.smulski@ooma.com>
Link: https://lore.kernel.org/r/20230605174442.12493-1-msmulski2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agolib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
Ben Hutchings [Fri, 2 Jun 2023 18:28:15 +0000 (20:28 +0200)]
lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()

irq_cpu_rmap_release() calls cpu_rmap_put(), which may free the rmap.
So we need to clear the pointer to our glue structure in rmap before
doing that, not after.

Fixes: 4e0473f1060a ("lib: cpu_rmap: Avoid use after free on rmap->obj array entries")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZHo0vwquhOy3FaXc@decadent.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'input-for-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor...
Linus Torvalds [Wed, 7 Jun 2023 20:49:42 +0000 (13:49 -0700)]
Merge tag 'input-for-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - a fix for unbalanced open count for inhibited input devices

 - fixups in Elantech PS/2 and Cyppress TTSP v5 drivers

 - a quirk to soc_button_array driver to make it work with Lenovo
   Yoga Book X90F / X90L

 - a removal of erroneous entry from xpad driver

* tag 'input-for-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - delete a Razer DeathAdder mouse VID/PID entry
  Input: psmouse - fix OOB access in Elantech protocol
  Input: soc_button_array - add invalid acpi_index DMI quirk handling
  Input: fix open count when closing inhibited device
  Input: cyttsp5 - fix array length

2 years agoMerge branch 'followup-fixes-for-the-dwmac-and-altera-lynx-conversion'
Jakub Kicinski [Wed, 7 Jun 2023 20:30:30 +0000 (13:30 -0700)]
Merge branch 'followup-fixes-for-the-dwmac-and-altera-lynx-conversion'

Maxime Chevallier says:

====================
Followup fixes for the dwmac and altera lynx conversion

Here's yet another version of the cleanup series for the TSE PCS replacement
by PCS Lynx. It includes Kconfig fixups, some missing initialisations
and a slight rework suggested by Russell for the dwmac cleanup sequence,
along with more explicit zeroing of local structures as per MAciej's
review.
====================

Link: https://lore.kernel.org/r/20230607135941.407054-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dwmac_socfpga: initialize local data for mdio regmap configuration
Maxime Chevallier [Wed, 7 Jun 2023 13:59:41 +0000 (15:59 +0200)]
net: dwmac_socfpga: initialize local data for mdio regmap configuration

Explicitly zero-ize the local mdio_regmap_config data, and explicitly
set the .autoscan parameter, as we only have a PCS on this bus.

Fixes: 5d1f3fe7d2d5 ("net: stmmac: dwmac-sogfpga: use the lynx pcs driver")
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: altera_tse: explicitly disable autoscan on the regmap-mdio bus
Maxime Chevallier [Wed, 7 Jun 2023 13:59:40 +0000 (15:59 +0200)]
net: altera_tse: explicitly disable autoscan on the regmap-mdio bus

Set the .autoscan flag to false on the regmap-mdio bus, to avoid using a
random uninitialized value. We don't want autoscan in this case as the
mdio device is a PCS and not a PHY.

Fixes: db48abbaa18e ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx")
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: stmmac: make the pcs_lynx cleanup sequence specific to dwmac_socfpga
Maxime Chevallier [Wed, 7 Jun 2023 13:59:39 +0000 (15:59 +0200)]
net: stmmac: make the pcs_lynx cleanup sequence specific to dwmac_socfpga

So far, only the dwmac_socfpga variant of stmmac uses PCS Lynx. Use a
dedicated cleanup sequence for dwmac_socfpga instead of using the
generic stmmac one.

Fixes: 5d1f3fe7d2d5 ("net: stmmac: dwmac-sogfpga: use the lynx pcs driver")
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: altera_tse: Use the correct Kconfig option for the PCS_LYNX dependency
Maxime Chevallier [Wed, 7 Jun 2023 13:59:38 +0000 (15:59 +0200)]
net: altera_tse: Use the correct Kconfig option for the PCS_LYNX dependency

Use the correct Kconfig dependency for altera_tse as PCS_ALTERA_TSE was
replaced by PCS_LYNX.

Fixes: db48abbaa18e ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: altera-tse: Initialize local structs before using it
Maxime Chevallier [Wed, 7 Jun 2023 13:59:37 +0000 (15:59 +0200)]
net: altera-tse: Initialize local structs before using it

The regmap_config and mdio_regmap_config objects needs to be zeroed before
using them. This will cause spurious errors at probe time as config->pad_bits
is containing random uninitialized data.

Fixes: db48abbaa18e ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'tools-ynl-generate-code-for-the-handshake-family'
Jakub Kicinski [Wed, 7 Jun 2023 19:53:13 +0000 (12:53 -0700)]
Merge branch 'tools-ynl-generate-code-for-the-handshake-family'

Jakub Kicinski says:

====================
tools: ynl: generate code for the handshake family

Add necessary features and generate user space C code for serializing
/ deserializing messages of the handshake family.

In addition to basics already present in netdev and fou, handshake
has nested attrs and multi-attr u32.
====================

Link: https://lore.kernel.org/r/20230606194302.919343-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl: generate code for the handshake family
Jakub Kicinski [Tue, 6 Jun 2023 19:43:02 +0000 (12:43 -0700)]
tools: ynl: generate code for the handshake family

Generate support for the handshake family.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: improve unwind on parsing errors
Jakub Kicinski [Tue, 6 Jun 2023 19:43:01 +0000 (12:43 -0700)]
tools: ynl-gen: improve unwind on parsing errors

When parsing multi-attr we count the objects and then allocate
an array to hold the parsed objects. If an attr space has multiple
multi-attr objects, however, if parsing the first array fails
we'll leave the object count for the second even tho the second
array was never allocated.

This may cause crashes when freeing objects on error.

Count attributes to a variable on the stack and only set the count
in the object once the memory was allocated.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl-gen: fill in support for MultiAttr scalars
Jakub Kicinski [Tue, 6 Jun 2023 19:43:00 +0000 (12:43 -0700)]
tools: ynl-gen: fill in support for MultiAttr scalars

The handshake family needs support for MultiAttr scalars.
Right now we only support code gen for MultiAttr nested
types.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMAINTAINERS: Add entry for debug objects
Thomas Gleixner [Tue, 6 Jun 2023 23:14:05 +0000 (01:14 +0200)]
MAINTAINERS: Add entry for debug objects

This is overdue and an oversight.

Add myself to this file deespite the fact that I'm trying to reduce the
number of entries in this file which have my name attached, but in the
hope that patches wont get picked up elsewhere completely unreviewed and
unnoticed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoafs: Fix setting of mtime when creating a file/dir/symlink
David Howells [Wed, 7 Jun 2023 08:47:13 +0000 (09:47 +0100)]
afs: Fix setting of mtime when creating a file/dir/symlink

kafs incorrectly passes a zero mtime (ie. 1st Jan 1970) to the server when
creating a file, dir or symlink because the mtime recorded in the
afs_operation struct gets passed to the server by the marshalling routines,
but the afs_mkdir(), afs_create() and afs_symlink() functions don't set it.

This gets masked if a file or directory is subsequently modified.

Fix this by filling in op->mtime before calling the create op.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agobpf: Add extra path pointer check to d_path helper
Jiri Olsa [Tue, 6 Jun 2023 18:17:14 +0000 (11:17 -0700)]
bpf: Add extra path pointer check to d_path helper

Anastasios reported crash on stable 5.15 kernel with following
BPF attached to lsm hook:

  SEC("lsm.s/bprm_creds_for_exec")
  int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm)
  {
          struct path *path = &bprm->executable->f_path;
          char p[128] = { 0 };

          bpf_d_path(path, p, 128);
          return 0;
  }

But bprm->executable can be NULL, so bpf_d_path call will crash:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
  ...
  RIP: 0010:d_path+0x22/0x280
  ...
  Call Trace:
   <TASK>
   bpf_d_path+0x21/0x60
   bpf_prog_db9cf176e84498d9_bprm_creds_for_exec+0x94/0x99
   bpf_trampoline_6442506293_0+0x55/0x1000
   bpf_lsm_bprm_creds_for_exec+0x5/0x10
   security_bprm_creds_for_exec+0x29/0x40
   bprm_execve+0x1c1/0x900
   do_execveat_common.isra.0+0x1af/0x260
   __x64_sys_execve+0x32/0x40

It's problem for all stable trees with bpf_d_path helper, which was
added in 5.9.

This issue is fixed in current bpf code, where we identify and mark
trusted pointers, so the above code would fail even to load.

For the sake of the stable trees and to workaround potentially broken
verifier in the future, adding the code that reads the path object from
the passed pointer and verifies it's valid in kernel space.

Fixes: 6e22ab9da793 ("bpf: Add d_path helper")
Reported-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230606181714.532998-1-jolsa@kernel.org
2 years agonet: sched: fix possible refcount leak in tc_chain_tmplt_add()
Hangyu Hua [Wed, 7 Jun 2023 02:23:01 +0000 (10:23 +0800)]
net: sched: fix possible refcount leak in tc_chain_tmplt_add()

try_module_get will be called in tcf_proto_lookup_ops. So module_put needs
to be called to drop the refcount if ops don't implement the required
function.

Fixes: 9f407f1768d3 ("net: sched: introduce chain templates")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: txgbe: Avoid passing uninitialised parameter to pci_wake_from_d3()
Simon Horman [Tue, 6 Jun 2023 13:49:45 +0000 (15:49 +0200)]
net: txgbe: Avoid passing uninitialised parameter to pci_wake_from_d3()

txgbe_shutdown() relies on txgbe_dev_shutdown() to initialise
wake by passing it by reference. However, txgbe_dev_shutdown()
doesn't use this parameter at all.

wake is then passed uninitialised by txgbe_dev_shutdown()
to pci_wake_from_d3().

Resolve this problem by:
* Removing the unused parameter from txgbe_dev_shutdown()
* Removing the uninitialised variable wake from txgbe_dev_shutdown()
* Passing false to pci_wake_from_d3() - this assumes that
  although uninitialised wake was in practice false (0).

I'm not sure that this counts as a bug, as I'm not sure that
it manifests in any unwanted behaviour. But in any case, the issue
was introduced by:

  3ce7547e5b71 ("net: txgbe: Add build support for txgbe")

Flagged by Smatch as:

  .../txgbe_main.c:486 txgbe_shutdown() error: uninitialized symbol 'wake'.

No functional change intended.
Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: act_police: fix sparse errors in tcf_police_dump()
Eric Dumazet [Tue, 6 Jun 2023 13:13:04 +0000 (13:13 +0000)]
net: sched: act_police: fix sparse errors in tcf_police_dump()

Fixes following sparse errors:

net/sched/act_police.c:360:28: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:368:28: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression

Fixes: d1967e495a8d ("net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: qca8k: remove unnecessary (void*) conversions
Atin Bainada [Tue, 6 Jun 2023 12:23:08 +0000 (12:23 +0000)]
net: dsa: qca8k: remove unnecessary (void*) conversions

Pointer variables of (void*) type do not require type cast.

Signed-off-by: Atin Bainada <hi@atinb.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: openvswitch: fix upcall counter access before allocation
Eelco Chaudron [Tue, 6 Jun 2023 11:56:35 +0000 (13:56 +0200)]
net: openvswitch: fix upcall counter access before allocation

Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.

Here is an example:

  PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
   #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
   #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
   #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
   #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
   #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
   #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
   #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
   #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
   ...

  PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
   #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
   #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
   #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
   #3 [ffff80000a5d3120] die at ffffb70f0628234c
   #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
   #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
   #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
   #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
   #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
   #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
  #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
  #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
  #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
  #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
  #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
  #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
  #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
  #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.

Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: liquidio: fix mixed module-builtin object
Masahiro Yamada [Tue, 6 Jun 2023 17:18:49 +0000 (02:18 +0900)]
net: liquidio: fix mixed module-builtin object

With CONFIG_LIQUIDIO=m and CONFIG_LIQUIDIO_VF=y (or vice versa),
$(common-objs) are linked to a module and also to vmlinux even though
the expected CFLAGS are different between builtins and modules.

This is the same situation as fixed by commit 637a642f5ca5 ("zstd:
Fixing mixed module-builtin objects").

Introduce the new module, liquidio-core, to provide the common functions
to liquidio and liquidio-vf.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: move rtm_tca_policy declaration to include file
Eric Dumazet [Tue, 6 Jun 2023 11:42:33 +0000 (11:42 +0000)]
net: sched: move rtm_tca_policy declaration to include file

rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c,
thus should be declared in an include file.

This fixes the following sparse warning:
net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static?

Fixes: e331473fee3d ("net/sched: cls_api: add missing validation of netlink attributes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: fix formatting in sysctl_net_ipv4.c
David Morley [Tue, 6 Jun 2023 18:12:33 +0000 (18:12 +0000)]
tcp: fix formatting in sysctl_net_ipv4.c

Fix incorrectly formatted tcp_syn_linear_timeouts sysctl in the
ipv4_net_table.

Fixes: ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear")
Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoice: make writes to /dev/gnssX synchronous
Michal Schmidt [Tue, 6 Jun 2023 17:12:53 +0000 (10:12 -0700)]
ice: make writes to /dev/gnssX synchronous

The current ice driver's GNSS write implementation buffers writes and
works through them asynchronously in a kthread. That's bad because:
 - The GNSS write_raw operation is supposed to be synchronous[1][2].
 - There is no upper bound on the number of pending writes.
   Userspace can submit writes much faster than the driver can process,
   consuming unlimited amounts of kernel memory.

A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS
buffers instead of first one") would add one more problem:
 - The possibility of waiting for a very long time to flush the write
   work when doing rmmod, softlockups.

To fix these issues, simplify the implementation: Drop the buffering,
the write_work, and make the writes synchronous.

I tested this with gpsd and ubxtool.

[1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf
    "User interface" slide.
[2] A comment in drivers/gnss/core.c:gnss_write():
        /* Ignoring O_NONBLOCK, write_raw() is synchronous. */
[3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/

Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: add rcu annotations around qdisc->qdisc_sleeping
Eric Dumazet [Tue, 6 Jun 2023 11:19:29 +0000 (11:19 +0000)]
net: sched: add rcu annotations around qdisc->qdisc_sleeping

syzbot reported a race around qdisc->qdisc_sleeping [1]

It is time we add proper annotations to reads and writes to/from
qdisc->qdisc_sleeping.

[1]
BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu

read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1:
qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331
__tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174
tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547
rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0:
dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115
qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103
tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693
rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023

Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: ocelot: unlock on error in vsc9959_qos_port_tas_set()
Dan Carpenter [Tue, 6 Jun 2023 08:24:37 +0000 (11:24 +0300)]
net: dsa: ocelot: unlock on error in vsc9959_qos_port_tas_set()

This error path needs call mutex_unlock(&ocelot->tas_lock) before
returning.

Fixes: 2d800bc500fb ("net/sched: taprio: replace tc_taprio_qopt_offload :: enable with a "cmd" enum")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'rfs-lockless-annotate'
David S. Miller [Wed, 7 Jun 2023 09:09:05 +0000 (10:09 +0100)]
Merge branch 'rfs-lockless-annotate'

Eric Dumazet says:

====================
rfs: annotate lockless accesses

rfs runs without locks held, so we should annotate
read and writes to shared variables.

It should prevent compilers forcing writes
in the following situation:

  if (var != val)
     var = val;

A compiler could indeed simply avoid the conditional:

    var = val;

This matters if var is shared between many cpus.

v2: aligns one closing bracket (Simon)
    adds Fixes: tags (Jakub)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agorfs: annotate lockless accesses to RFS sock flow table
Eric Dumazet [Tue, 6 Jun 2023 07:41:15 +0000 (07:41 +0000)]
rfs: annotate lockless accesses to RFS sock flow table

Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.

This also prevents a (smart ?) compiler to remove the condition in:

if (table->ents[index] != newval)
        table->ents[index] = newval;

We need the condition to avoid dirtying a shared cache line.

Fixes: fec5e652e58f ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agorfs: annotate lockless accesses to sk->sk_rxhash
Eric Dumazet [Tue, 6 Jun 2023 07:41:14 +0000 (07:41 +0000)]
rfs: annotate lockless accesses to sk->sk_rxhash

Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash.

This also prevents a (smart ?) compiler to remove the condition in:

if (sk->sk_rxhash != newval)
sk->sk_rxhash = newval;

We need the condition to avoid dirtying a shared cache line.

Fixes: fec5e652e58f ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'realtek-external-phy-clock'
David S. Miller [Wed, 7 Jun 2023 08:52:25 +0000 (09:52 +0100)]
Merge branch 'realtek-external-phy-clock'

Detlev Casanova says:

====================
net: phy: realtek: Support external PHY clock

Some PHYs can use an external clock that must be enabled before
communicating with them.

Changes since v3:
 * Do not call genphy_suspend if WoL is enabled.
Changes since v2:
 * Reword documentation commit message
Changes since v1:
 * Remove the clock name as it is not guaranteed to be identical across
   different PHYs
 * Disable/Enable the clock when suspending/resuming
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: realtek: Disable clock on suspend
Detlev Casanova [Mon, 5 Jun 2023 15:40:10 +0000 (11:40 -0400)]
net: phy: realtek: Disable clock on suspend

For PHYs that call rtl821x_probe() where an external clock can be
configured, make sure that the clock is disabled
when ->suspend() is called and enabled on resume.

The PHY_ALWAYS_CALL_SUSPEND is added to ensure that the suspend function
is actually always called.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: phy: Document support for external PHY clk
Detlev Casanova [Mon, 5 Jun 2023 15:40:09 +0000 (11:40 -0400)]
dt-bindings: net: phy: Document support for external PHY clk

Ethern PHYs can have external an clock that needs to be activated before
communicating with the PHY.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: realtek: Add optional external PHY clock
Detlev Casanova [Mon, 5 Jun 2023 15:40:08 +0000 (11:40 -0400)]
net: phy: realtek: Add optional external PHY clock

In some cases, the PHY can use an external clock source instead of a
crystal.

Add an optional clock in the phy node to make sure that the clock source
is enabled, if specified, before probing.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agohv_netvsc: Allocate rx indirection table size dynamically
Shradha Gupta [Mon, 5 Jun 2023 11:30:06 +0000 (04:30 -0700)]
hv_netvsc: Allocate rx indirection table size dynamically

Allocate the size of rx indirection table dynamically in netvsc
from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES
query instead of using a constant value of ITAB_NUM.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Tested-on: Ubuntu22 (azure VM, SKU size: Standard_F72s_v2)
Testcases:
1. ethtool -x eth0 output
2. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic
3. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotipc: replace open-code bearer rcu_dereference access in bearer.c
Xin Long [Mon, 5 Jun 2023 14:40:44 +0000 (10:40 -0400)]
tipc: replace open-code bearer rcu_dereference access in bearer.c

Replace these open-code bearer rcu_dereference access with bearer_get(),
like other places in bearer.c. While at it, also use tipc_net() instead
of net_generic(net, tipc_net_id) to get "tn" in bearer.c.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/1072588a8691f970bda950c7e2834d1f2983f58e.1685976044.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Wed, 7 Jun 2023 04:36:57 +0000 (21:36 -0700)]
Merge tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fixes to debugfs registration
 - Fix use-after-free in hci_remove_ltk/hci_remove_irk
 - Fixes to ISO channel support
 - Fix missing checks for invalid L2CAP DCID
 - Fix l2cap_disconnect_req deadlock
 - Add lock to protect HCI_UNREGISTER

* tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Add missing checks for invalid DCID
  Bluetooth: ISO: use correct CIS order in Set CIG Parameters event
  Bluetooth: ISO: don't try to remove CIG if there are bound CIS left
  Bluetooth: Fix l2cap_disconnect_req deadlock
  Bluetooth: hci_qca: fix debugfs registration
  Bluetooth: fix debugfs registration
  Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
  Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
  Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG
  Bluetooth: ISO: consider right CIS when removing CIG at cleanup
====================

Link: https://lore.kernel.org/r/20230606003454.2392552-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Wed, 7 Jun 2023 04:23:49 +0000 (21:23 -0700)]
Merge tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia.

2) Fix bitwise register tracking, from Jeremy Sowden.

3) Null pointer dereference when accessing conntrack helper,
   from Tijs Van Buggenhout.

4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima.

5) Incorrect boundary check when building chain blob.

* tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: out-of-bound check in chain blob
  netfilter: ipset: Add schedule point in call_ad().
  netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
  netfilter: nft_bitwise: fix register tracking
  netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()
====================

Link: https://lore.kernel.org/r/20230606225851.67394-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Wed, 7 Jun 2023 04:16:52 +0000 (21:16 -0700)]
Merge tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.4

Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
introduced in v6.0. mt76 fixes a race and a null pointer dereference.
iwlwifi fixes an issue where not enough memory was allocated for a
firmware event. And finally the stack has several smaller fixes all
over.

* tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: cfg80211: fix locking in regulatory disconnect
  wifi: cfg80211: fix locking in sched scan stop work
  wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
  wifi: mac80211: fix switch count in EMA beacons
  wifi: mac80211: don't translate beacon/presp addrs
  wifi: mac80211: mlme: fix non-inheritence element
  wifi: cfg80211: reject bad AP MLD address
  wifi: mac80211: use correct iftype HE cap
  wifi: mt76: mt7996: fix possible NULL pointer dereference in mt7996_mac_write_txwi()
  wifi: rtw89: remove redundant check of entering LPS
  wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
  wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS
  wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
====================

Link: https://lore.kernel.org/r/20230606150817.EC133C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'ipv4-remove-rt_conn_flags-calls-in-flowi4_init_output'
Jakub Kicinski [Wed, 7 Jun 2023 04:13:04 +0000 (21:13 -0700)]
Merge branch 'ipv4-remove-rt_conn_flags-calls-in-flowi4_init_output'

Guillaume Nault says:

====================
ipv4: Remove RT_CONN_FLAGS() calls in flowi4_init_output().

Remove a few RT_CONN_FLAGS() calls used inside flowi4_init_output().
These users can be easily converted to set the scope properly, instead
of overloading the tos parameter with scope information as done by
RT_CONN_FLAGS().

The objective is to eventually remove RT_CONN_FLAGS() entirely, which
will then allow to also remove RTO_ONLINK and to finally convert
->flowi4_tos to dscp_t.
====================

Link: https://lore.kernel.org/r/cover.1685999117.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Set route scope properly in cookie_v4_check().
Guillaume Nault [Mon, 5 Jun 2023 21:55:30 +0000 (23:55 +0200)]
tcp: Set route scope properly in cookie_v4_check().

RT_CONN_FLAGS(sk) overloads flowi4_tos with the RTO_ONLINK bit when
sk has the SOCK_LOCALROUTE flag set. This allows
ip_route_output_key_hash() to eventually adjust flowi4_scope.

Instead of relying on special handling of the RTO_ONLINK bit, we can
just set the route scope correctly. This will eventually allow to avoid
special interpretation of tos variables and to convert ->flowi4_tos to
dscp_t.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv4: Set correct scope in inet_csk_route_*().
Guillaume Nault [Mon, 5 Jun 2023 21:55:25 +0000 (23:55 +0200)]
ipv4: Set correct scope in inet_csk_route_*().

RT_CONN_FLAGS(sk) overloads the tos parameter with the RTO_ONLINK bit
when sk has the SOCK_LOCALROUTE flag set. This is only useful for
ip_route_output_key_hash() to eventually adjust the route scope.

Let's drop RTO_ONLINK and set the correct scope directly to avoid this
special case in the future and to allow converting ->flowi4_tos to
dscp_t.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agovirtio_net: use control_buf for coalesce params
Brett Creeley [Mon, 5 Jun 2023 19:59:25 +0000 (12:59 -0700)]
virtio_net: use control_buf for coalesce params

Commit 699b045a8e43 ("net: virtio_net: notifications coalescing
support") added coalescing command support for virtio_net. However,
the coalesce commands are using buffers on the stack, which is causing
the device to see DMA errors. There should also be a complaint from
check_for_stack() in debug_dma_map_xyz(). Fix this by adding and using
coalesce params from the control_buf struct, which aligns with other
commands.

Cc: stable@vger.kernel.org
Fixes: 699b045a8e43 ("net: virtio_net: notifications coalescing support")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20230605195925.51625-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agopds_core: Fix FW recovery detection
Brett Creeley [Mon, 5 Jun 2023 19:51:16 +0000 (12:51 -0700)]
pds_core: Fix FW recovery detection

Commit 523847df1b37 ("pds_core: add devcmd device interfaces") included
initial support for FW recovery detection. Unfortunately, the ordering
in pdsc_is_fw_good() was incorrect, which was causing FW recovery to be
undetected by the driver. Fix this by making sure to update the cached
fw_status by calling pdsc_is_fw_running() before setting the local FW
gen.

Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605195116.49653-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'move-ksz9477-errata-handling-to-phy-driver'
Jakub Kicinski [Wed, 7 Jun 2023 04:08:39 +0000 (21:08 -0700)]
Merge branch 'move-ksz9477-errata-handling-to-phy-driver'

Robert Hancock says:

====================
Move KSZ9477 errata handling to PHY driver

Patches to move handling for KSZ9477 PHY errata register fixes from
the DSA switch driver into the corresponding PHY driver, for more
proper layering and ordering.
====================

Link: https://lore.kernel.org/r/20230605153943.1060444-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: microchip: remove KSZ9477 PHY errata handling
Robert Hancock [Mon, 5 Jun 2023 15:39:43 +0000 (09:39 -0600)]
net: dsa: microchip: remove KSZ9477 PHY errata handling

The KSZ9477 PHY errata handling code has now been moved into the Micrel
PHY driver, so it is no longer needed inside the DSA switch driver.
Remove it.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: phy: micrel: Move KSZ9477 errata fixes to PHY driver
Robert Hancock [Mon, 5 Jun 2023 15:39:42 +0000 (09:39 -0600)]
net: phy: micrel: Move KSZ9477 errata fixes to PHY driver

The ksz9477 DSA switch driver is currently updating some MMD registers
on the internal port PHYs to address some chip errata. However, these
errata are really a property of the PHY itself, not the switch they are
part of, so this is kind of a layering violation. It makes more sense for
these writes to be done inside the driver which binds to the PHY and not
the driver for the containing device.

This also addresses some issues where the ordering of when these writes
are done may have been incorrect, causing the link to erratically fail to
come up at the proper speed or at all. Doing this in the PHY driver
during config_init ensures that they happen before anything else tries to
change the state of the PHY on the port.

The new code also ensures that autonegotiation is disabled during the
register writes and re-enabled afterwards, as indicated by the latest
version of the errata documentation from Microchip.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: gso: really support BIG TCP
Eric Dumazet [Mon, 5 Jun 2023 16:16:47 +0000 (16:16 +0000)]
tcp: gso: really support BIG TCP

We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 :

oldlen = (u16)~skb->len;

This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO problems.")

This leads to wrong TCP checksum.

Adapt the code to accept arbitrary packet length.

v2:
  - use two csum_add() instead of csum_fold() (Alexander Duyck)
  - Change delta type to __wsum to reduce casts (Alexander Duyck)

Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605161647.3624428-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv6: rpl: Fix Route of Death.
Kuniyuki Iwashima [Mon, 5 Jun 2023 18:06:17 +0000 (11:06 -0700)]
ipv6: rpl: Fix Route of Death.

A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156.

The Source Routing Header (SRH) has the following format:

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |  Next Header  |  Hdr Ext Len  | Routing Type  | Segments Left |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | CmprI | CmprE |  Pad  |               Reserved                |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                                                               |
  .                                                               .
  .                        Addresses[1..n]                        .
  .                                                               .
  |                                                               |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The originator of an SRH places the first hop's IPv6 address in the IPv6
header's IPv6 Destination Address and the second hop's IPv6 address as
the first address in Addresses[1..n].

The CmprI and CmprE fields indicate the number of prefix octets that are
shared with the IPv6 Destination Address.  When CmprI or CmprE is not 0,
Addresses[1..n] are compressed as follows:

  1..n-1 : (16 - CmprI) bytes
       n : (16 - CmprE) bytes

Segments Left indicates the number of route segments remaining.  When the
value is not zero, the SRH is forwarded to the next hop.  Its address
is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6
Destination Address.

When Segment Left is greater than or equal to 2, the size of SRH is not
changed because Addresses[1..n-1] are decompressed and recompressed with
CmprI.

OTOH, when Segment Left changes from 1 to 0, the new SRH could have a
different size because Addresses[1..n-1] are decompressed with CmprI and
recompressed with CmprE.

Let's say CmprI is 15 and CmprE is 0.  When we receive SRH with Segment
Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has
16 bytes.  When Segment Left is 1, Addresses[1..n-1] is decompressed to
16 bytes and not recompressed.  Finally, the new SRH will need more room
in the header, and the size is (16 - 1) * (n - 1) bytes.

Here the max value of n is 255 as Segment Left is u8, so in the worst case,
we have to allocate 3825 bytes in the skb headroom.  However, now we only
allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7
bytes).  If the decompressed size overflows the room, skb_push() hits BUG()
below [0].

Instead of allocating the fixed buffer for every packet, let's allocate
enough headroom only when we receive SRH with Segment Left 1.

[0]:
skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo
kernel BUG at net/core/skbuff.c:200!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_panic (net/core/skbuff.c:200)
Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc90000003da0 EFLAGS: 00000246
RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540
RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff
R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800
R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190
FS:  00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <IRQ>
 skb_push (net/core/skbuff.c:210)
 ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718)
 ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
 ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483)
 __netif_receive_skb_one_core (net/core/dev.c:5494)
 process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934)
 __napi_poll (net/core/dev.c:6496)
 net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696)
 __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
 do_softirq (kernel/softirq.c:472 kernel/softirq.c:459)
 </IRQ>
 <TASK>
 __local_bh_enable_ip (kernel/softirq.c:396)
 __dev_queue_xmit (net/core/dev.c:4272)
 ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134)
 rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
 sock_sendmsg (net/socket.c:724 net/socket.c:747)
 __sys_sendto (net/socket.c:2144)
 __x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f453a138aea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea
RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003
RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b
 </TASK>
Modules linked in:

Fixes: 8610c7c6e3bd ("net: ipv6: add support for rpl sr exthdr")
Reported-by: Max VA
Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230605180617.67284-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetlink: specs: ethtool: fix random typos
Jakub Kicinski [Mon, 5 Jun 2023 23:32:57 +0000 (16:32 -0700)]
netlink: specs: ethtool: fix random typos

Working on the code gen for C reveals typos in the ethtool spec
as the compiler tries to find the names in the existing uAPI
header. Fix the mistakes.

Fixes: a353318ebf24 ("tools: ynl: populate most of the ethtool spec")
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230605233257.843977-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetfilter: nf_tables: out-of-bound check in chain blob
Pablo Neira Ayuso [Tue, 6 Jun 2023 14:32:44 +0000 (16:32 +0200)]
netfilter: nf_tables: out-of-bound check in chain blob

Add current size of rule expressions to the boundary check.

Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: ipset: Add schedule point in call_ad().
Kuniyuki Iwashima [Thu, 18 May 2023 17:33:00 +0000 (10:33 -0700)]
netfilter: ipset: Add schedule point in call_ad().

syzkaller found a repro that causes Hung Task [0] with ipset.  The repro
first creates an ipset and then tries to delete a large number of IPs
from the ipset concurrently:

  IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187
  IPSET_ATTR_CIDR        : 2

The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET)
held, and other threads wait for it to be released.

Previously, the same issue existed in set->variant->uadt() that could run
so long under ip_set_lock(set).  Commit 5e29dc36bd5e ("netfilter: ipset:
Rework long task execution when adding/deleting entries") tried to fix it,
but the issue still exists in the caller with another mutex.

While adding/deleting many IPs, we should release the CPU periodically to
prevent someone from abusing ipset to hang the system.

Note we need to increment the ipset's refcnt to prevent the ipset from
being destroyed while rescheduling.

[0]:
INFO: task syz-executor174:268 blocked for more than 143 seconds.
      Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor174 state:D stack:0     pid:268   ppid:260    flags:0x0000000d
Call trace:
 __switch_to+0x308/0x714 arch/arm64/kernel/process.c:556
 context_switch kernel/sched/core.c:5343 [inline]
 __schedule+0xd84/0x1648 kernel/sched/core.c:6669
 schedule+0xf0/0x214 kernel/sched/core.c:6745
 schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804
 __mutex_lock_common kernel/locking/mutex.c:679 [inline]
 __mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747
 __mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035
 mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286
 nfnl_lock net/netfilter/nfnetlink.c:98 [inline]
 nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295
 netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546
 nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658
 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
 netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365
 netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913
 sock_sendmsg_nosec net/socket.c:724 [inline]
 sock_sendmsg net/socket.c:747 [inline]
 ____sys_sendmsg+0x4b8/0x810 net/socket.c:2503
 ___sys_sendmsg net/socket.c:2557 [inline]
 __sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586
 __do_sys_sendmsg net/socket.c:2595 [inline]
 __se_sys_sendmsg net/socket.c:2593 [inline]
 __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
 invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52
 el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142
 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
 el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637
 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

Reported-by: syzkaller <syzkaller@googlegroups.com>
Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
Tijs Van Buggenhout [Thu, 25 May 2023 10:25:26 +0000 (12:25 +0200)]
netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper

An nf_conntrack_helper from nf_conn_help may become NULL after DNAT.

Observed when TCP port 1720 (Q931_PORT), associated with h323 conntrack
helper, is DNAT'ed to another destination port (e.g. 1730), while
nfqueue is being used for final acceptance (e.g. snort).

This happenned after transition from kernel 4.14 to 5.10.161.

Workarounds:
 * keep the same port (1720) in DNAT
 * disable nfqueue
 * disable/unload h323 NAT helper

$ linux-5.10/scripts/decode_stacktrace.sh vmlinux < /tmp/kernel.log
BUG: kernel NULL pointer dereference, address: 0000000000000084
[..]
RIP: 0010:nf_conntrack_update (net/netfilter/nf_conntrack_core.c:2080 net/netfilter/nf_conntrack_core.c:2134) nf_conntrack
[..]
nfqnl_reinject (net/netfilter/nfnetlink_queue.c:237) nfnetlink_queue
nfqnl_recv_verdict (net/netfilter/nfnetlink_queue.c:1230) nfnetlink_queue
nfnetlink_rcv_msg (net/netfilter/nfnetlink.c:241) nfnetlink
[..]

Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again")
Signed-off-by: Tijs Van Buggenhout <tijs.van.buggenhout@axsguard.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nft_bitwise: fix register tracking
Jeremy Sowden [Thu, 25 May 2023 14:07:24 +0000 (15:07 +0100)]
netfilter: nft_bitwise: fix register tracking

At the end of `nft_bitwise_reduce`, there is a loop which is intended to
update the bitwise expression associated with each tracked destination
register.  However, currently, it just updates the first register
repeatedly.  Fix it.

Fixes: 34cc9e52884a ("netfilter: nf_tables: cancel tracking for clobbered destination registers")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechai...
Gavrilov Ilia [Wed, 24 May 2023 12:25:27 +0000 (12:25 +0000)]
netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()

The nla_nest_start_noflag() function may fail and return NULL;
the return value needs to be checked.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agoMerge branch 'tools-ynl-user-space-c'
Jakub Kicinski [Tue, 6 Jun 2023 19:31:33 +0000 (12:31 -0700)]
Merge branch 'tools-ynl-user-space-c'

Jakub Kicinski says:

====================
tools: ynl: user space C

Use the code gen which is already in tree to generate a user space
library for a handful of simple families. I find YNL C quite useful
in some WIP projects, and I think others may find it useful, too.
I was hoping someone will pick this work up and finish it...
but it seems that Python YNL has largely stolen the thunder.
Python may not be great for selftest, tho, and actually this lib
is more fully-featured. The Python script was meant as a quick demo,
funny how those things go.

v2: https://lore.kernel.org/all/20230604175843.662084-1-kuba@kernel.org/
v1: https://lore.kernel.org/all/20230603052547.631384-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230605190108.809439-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotools: ynl: add sample for netdev
Jakub Kicinski [Mon, 5 Jun 2023 19:01:08 +0000 (12:01 -0700)]
tools: ynl: add sample for netdev

Add a sample application using the C library.
My main goal is to make writing selftests easier but until
I have some of those ready I think it's useful to show off
the functionality and let people poke and tinker.

Sample outputs - dump:

$ ./netdev
Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 0
      lo[1] 0:
  enp1s0[2] 23: basic redirect rx-sg

Notifications (watching veth pair getting added and deleted):

$ ./netdev
Select ifc ($ifindex; or 0 = dump; or -2 ntf check): -2
[53] 0: (ntf: dev-add-ntf)
[54] 0: (ntf: dev-add-ntf)
[54] 23: basic redirect rx-sg (ntf: dev-change-ntf)
[53] 23: basic redirect rx-sg (ntf: dev-change-ntf)
[53] 23: basic redirect rx-sg (ntf: dev-del-ntf)
[54] 23: basic redirect rx-sg (ntf: dev-del-ntf)

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>