From b454abfab52543c44b581afc807b9f97fc1e7a3a Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 5 Dec 2024 16:55:18 +0200 Subject: [PATCH 01/16] net: mscc: ocelot: be resilient to loss of PTP packets during transmission The Felix DSA driver presents unique challenges that make the simplistic ocelot PTP TX timestamping procedure unreliable: any transmitted packet may be lost in hardware before it ever leaves our local system. This may happen because there is congestion on the DSA conduit, the switch CPU port or even user port (Qdiscs like taprio may delay packets indefinitely by design). The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(), runs out of timestamp IDs eventually, because it never detects that packets are lost, and keeps the IDs of the lost packets on hold indefinitely. The manifestation of the issue once the entire timestamp ID range becomes busy looks like this in dmesg: mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp At the surface level, we need a timeout timer so that the kernel knows a timestamp ID is available again. But there is a deeper problem with the implementation, which is the monotonically increasing ocelot_port->ts_id. In the presence of packet loss, it will be impossible to detect that and reuse one of the holes created in the range of free timestamp IDs. What we actually need is a bitmap of 63 timestamp IDs tracking which one is available. That is able to use up holes caused by packet loss, but also gives us a unique opportunity to not implement an actual timer_list for the timeout timer (very complicated in terms of locking). We could only declare a timestamp ID stale on demand (lazily), aka when there's no other timestamp ID available. There are pros and cons to this approach: the implementation is much more simple than per-packet timers would be, but most of the stale packets would be quasi-leaked - not really leaked, but blocked in driver memory, since this algorithm sees no reason to free them. An improved technique would be to check for stale timestamp IDs every time we allocate a new one. Assuming a constant flux of PTP packets, this avoids stale packets being blocked in memory, but of course, packets lost at the end of the flux are still blocked until the flux resumes (nobody left to kick them out). Since implementing per-packet timers is way too complicated, this should be good enough. Testing procedure: Persistently block traffic class 5 and try to run PTP on it: $ tc qdisc replace dev swp3 parent root taprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 sched-entry S 0xdf 100000 flags 0x2 [ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS $ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3 ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[70.406]: timed out while polling for tx timestamp ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[70.406]: port 1 (swp3): send peer delay response failed ptp4l[70.407]: port 1 (swp3): clearing fault immediately ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1 ptp4l[71.400]: timed out while polling for tx timestamp ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[71.401]: port 1 (swp3): send peer delay response failed ptp4l[71.401]: port 1 (swp3): clearing fault immediately [ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2 ptp4l[72.401]: timed out while polling for tx timestamp ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[72.402]: port 1 (swp3): send peer delay response failed ptp4l[72.402]: port 1 (swp3): clearing fault immediately ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3 ptp4l[73.400]: timed out while polling for tx timestamp ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[73.400]: port 1 (swp3): send peer delay response failed ptp4l[73.400]: port 1 (swp3): clearing fault immediately [ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4 ptp4l[74.400]: timed out while polling for tx timestamp ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[74.401]: port 1 (swp3): send peer delay response failed ptp4l[74.401]: port 1 (swp3): clearing fault immediately ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[75.410]: timed out while polling for tx timestamp ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[75.411]: port 1 (swp3): send peer delay response failed ptp4l[75.411]: port 1 (swp3): clearing fault immediately (...) Remove the blocking condition and see that the port recovers: $ same tc command as above, but use "sched-entry S 0xff" instead $ same ptp4l command as above ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost [ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost [ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost [ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost [ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d ptp4l[104.966]: port 1 (swp3): assuming the grand master role ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER [ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0 (...) Notice that in this new usage pattern, a non-congested port should basically use timestamp ID 0 all the time, progressing to higher numbers only if there are unacknowledged timestamps in flight. Compare this to the old usage, where the timestamp ID used to monotonically increase modulo OCELOT_MAX_PTP_ID. In terms of implementation, this simplifies the bookkeeping of the ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse the list of two-step timestampable skbs for each new packet anyway, the information can already be computed and does not need to be stored. Also, ocelot_port->tx_skbs is always accessed under the switch-wide ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's lock and can use the unlocked primitives safely. This problem was actually detected using the tc-taprio offload, and is causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959) supports but Ocelot (VSC7514) does not. Thus, I've selected the commit to blame as the one adding initial timestamping support for the Felix switch. Fixes: c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") Signed-off-by: Vladimir Oltean Link: https://patch.msgid.link/20241205145519.1236778-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mscc/ocelot_ptp.c | 134 +++++++++++++++---------- include/linux/dsa/ocelot.h | 1 + include/soc/mscc/ocelot.h | 2 - 3 files changed, 80 insertions(+), 57 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c index d732f99e6391..7eb01d1e1ecd 100644 --- a/drivers/net/ethernet/mscc/ocelot_ptp.c +++ b/drivers/net/ethernet/mscc/ocelot_ptp.c @@ -14,6 +14,8 @@ #include #include "ocelot.h" +#define OCELOT_PTP_TX_TSTAMP_TIMEOUT (5 * HZ) + int ocelot_ptp_gettime64(struct ptp_clock_info *ptp, struct timespec64 *ts) { struct ocelot *ocelot = container_of(ptp, struct ocelot, ptp_info); @@ -603,34 +605,88 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port, } EXPORT_SYMBOL(ocelot_get_ts_info); -static int ocelot_port_add_txtstamp_skb(struct ocelot *ocelot, int port, +static struct sk_buff *ocelot_port_dequeue_ptp_tx_skb(struct ocelot *ocelot, + int port, u8 ts_id, + u32 seqid) +{ + struct ocelot_port *ocelot_port = ocelot->ports[port]; + struct sk_buff *skb, *skb_tmp, *skb_match = NULL; + struct ptp_header *hdr; + + spin_lock(&ocelot->ts_id_lock); + + skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) { + if (OCELOT_SKB_CB(skb)->ts_id != ts_id) + continue; + + /* Check that the timestamp ID is for the expected PTP + * sequenceId. We don't have to test ptp_parse_header() against + * NULL, because we've pre-validated the packet's ptp_class. + */ + hdr = ptp_parse_header(skb, OCELOT_SKB_CB(skb)->ptp_class); + if (seqid != ntohs(hdr->sequence_id)) + continue; + + __skb_unlink(skb, &ocelot_port->tx_skbs); + ocelot->ptp_skbs_in_flight--; + skb_match = skb; + break; + } + + spin_unlock(&ocelot->ts_id_lock); + + return skb_match; +} + +static int ocelot_port_queue_ptp_tx_skb(struct ocelot *ocelot, int port, struct sk_buff *clone) { struct ocelot_port *ocelot_port = ocelot->ports[port]; + DECLARE_BITMAP(ts_id_in_flight, OCELOT_MAX_PTP_ID); + struct sk_buff *skb, *skb_tmp; + unsigned long n; spin_lock(&ocelot->ts_id_lock); - if (ocelot_port->ptp_skbs_in_flight == OCELOT_MAX_PTP_ID || - ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) { + /* To get a better chance of acquiring a timestamp ID, first flush the + * stale packets still waiting in the TX timestamping queue. They are + * probably lost. + */ + skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) { + if (time_before(OCELOT_SKB_CB(skb)->ptp_tx_time + + OCELOT_PTP_TX_TSTAMP_TIMEOUT, jiffies)) { + dev_warn_ratelimited(ocelot->dev, + "port %d invalidating stale timestamp ID %u which seems lost\n", + port, OCELOT_SKB_CB(skb)->ts_id); + __skb_unlink(skb, &ocelot_port->tx_skbs); + kfree_skb(skb); + ocelot->ptp_skbs_in_flight--; + } else { + __set_bit(OCELOT_SKB_CB(skb)->ts_id, ts_id_in_flight); + } + } + + if (ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) { spin_unlock(&ocelot->ts_id_lock); return -EBUSY; } - skb_shinfo(clone)->tx_flags |= SKBTX_IN_PROGRESS; - /* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */ - OCELOT_SKB_CB(clone)->ts_id = ocelot_port->ts_id; - - ocelot_port->ts_id++; - if (ocelot_port->ts_id == OCELOT_MAX_PTP_ID) - ocelot_port->ts_id = 0; + n = find_first_zero_bit(ts_id_in_flight, OCELOT_MAX_PTP_ID); + if (n == OCELOT_MAX_PTP_ID) { + spin_unlock(&ocelot->ts_id_lock); + return -EBUSY; + } - ocelot_port->ptp_skbs_in_flight++; + /* Found an available timestamp ID, use it */ + OCELOT_SKB_CB(clone)->ts_id = n; + OCELOT_SKB_CB(clone)->ptp_tx_time = jiffies; ocelot->ptp_skbs_in_flight++; - - skb_queue_tail(&ocelot_port->tx_skbs, clone); + __skb_queue_tail(&ocelot_port->tx_skbs, clone); spin_unlock(&ocelot->ts_id_lock); + dev_dbg_ratelimited(ocelot->dev, "port %d timestamp id %lu\n", port, n); + return 0; } @@ -686,12 +742,14 @@ int ocelot_port_txtstamp_request(struct ocelot *ocelot, int port, if (!(*clone)) return -ENOMEM; - err = ocelot_port_add_txtstamp_skb(ocelot, port, *clone); + /* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */ + err = ocelot_port_queue_ptp_tx_skb(ocelot, port, *clone); if (err) { kfree_skb(*clone); return err; } + skb_shinfo(*clone)->tx_flags |= SKBTX_IN_PROGRESS; OCELOT_SKB_CB(skb)->ptp_cmd = ptp_cmd; OCELOT_SKB_CB(*clone)->ptp_class = ptp_class; } @@ -727,26 +785,14 @@ static void ocelot_get_hwtimestamp(struct ocelot *ocelot, spin_unlock_irqrestore(&ocelot->ptp_clock_lock, flags); } -static bool ocelot_validate_ptp_skb(struct sk_buff *clone, u16 seqid) -{ - struct ptp_header *hdr; - - hdr = ptp_parse_header(clone, OCELOT_SKB_CB(clone)->ptp_class); - if (WARN_ON(!hdr)) - return false; - - return seqid == ntohs(hdr->sequence_id); -} - void ocelot_get_txtstamp(struct ocelot *ocelot) { int budget = OCELOT_PTP_QUEUE_SZ; while (budget--) { - struct sk_buff *skb, *skb_tmp, *skb_match = NULL; struct skb_shared_hwtstamps shhwtstamps; u32 val, id, seqid, txport; - struct ocelot_port *port; + struct sk_buff *skb_match; struct timespec64 ts; val = ocelot_read(ocelot, SYS_PTP_STATUS); @@ -762,36 +808,14 @@ void ocelot_get_txtstamp(struct ocelot *ocelot) txport = SYS_PTP_STATUS_PTP_MESS_TXPORT_X(val); seqid = SYS_PTP_STATUS_PTP_MESS_SEQ_ID(val); - port = ocelot->ports[txport]; - - spin_lock(&ocelot->ts_id_lock); - port->ptp_skbs_in_flight--; - ocelot->ptp_skbs_in_flight--; - spin_unlock(&ocelot->ts_id_lock); - /* Retrieve its associated skb */ -try_again: - spin_lock(&port->tx_skbs.lock); - - skb_queue_walk_safe(&port->tx_skbs, skb, skb_tmp) { - if (OCELOT_SKB_CB(skb)->ts_id != id) - continue; - __skb_unlink(skb, &port->tx_skbs); - skb_match = skb; - break; - } - - spin_unlock(&port->tx_skbs.lock); - - if (WARN_ON(!skb_match)) + skb_match = ocelot_port_dequeue_ptp_tx_skb(ocelot, txport, id, + seqid); + if (!skb_match) { + dev_warn_ratelimited(ocelot->dev, + "port %d received TX timestamp (seqid %d, ts id %u) for packet previously declared stale\n", + txport, seqid, id); goto next_ts; - - if (!ocelot_validate_ptp_skb(skb_match, seqid)) { - dev_err_ratelimited(ocelot->dev, - "port %d received stale TX timestamp for seqid %d, discarding\n", - txport, seqid); - kfree_skb(skb); - goto try_again; } /* Get the h/w timestamp */ diff --git a/include/linux/dsa/ocelot.h b/include/linux/dsa/ocelot.h index 6fbfbde68a37..620a3260fc08 100644 --- a/include/linux/dsa/ocelot.h +++ b/include/linux/dsa/ocelot.h @@ -15,6 +15,7 @@ struct ocelot_skb_cb { struct sk_buff *clone; unsigned int ptp_class; /* valid only for clones */ + unsigned long ptp_tx_time; /* valid only for clones */ u32 tstamp_lo; u8 ptp_cmd; u8 ts_id; diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 462c653e1017..2db9ae0575b6 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -778,7 +778,6 @@ struct ocelot_port { phy_interface_t phy_mode; - unsigned int ptp_skbs_in_flight; struct sk_buff_head tx_skbs; unsigned int trap_proto; @@ -786,7 +785,6 @@ struct ocelot_port { u16 mrp_ring_id; u8 ptp_cmd; - u8 ts_id; u8 index; -- 2.51.0 From 43a4166349a254446e7a3db65f721c6a30daccf3 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 5 Dec 2024 16:55:19 +0200 Subject: [PATCH 02/16] net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set() An unsupported RX filter will leave the port with TX timestamping still applied as per the new request, rather than the old setting. When parsing the tx_type, don't apply it just yet, but delay that until after we've parsed the rx_filter as well (and potentially returned -ERANGE for that). Similarly, copy_to_user() may fail, which is a rare occurrence, but should still be treated by unwinding what was done. Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean Link: https://patch.msgid.link/20241205145519.1236778-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mscc/ocelot_ptp.c | 59 ++++++++++++++++++-------- 1 file changed, 42 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c index 7eb01d1e1ecd..808ce8e68d39 100644 --- a/drivers/net/ethernet/mscc/ocelot_ptp.c +++ b/drivers/net/ethernet/mscc/ocelot_ptp.c @@ -497,6 +497,28 @@ static int ocelot_traps_to_ptp_rx_filter(unsigned int proto) return HWTSTAMP_FILTER_NONE; } +static int ocelot_ptp_tx_type_to_cmd(int tx_type, int *ptp_cmd) +{ + switch (tx_type) { + case HWTSTAMP_TX_ON: + *ptp_cmd = IFH_REW_OP_TWO_STEP_PTP; + break; + case HWTSTAMP_TX_ONESTEP_SYNC: + /* IFH_REW_OP_ONE_STEP_PTP updates the correctionField, + * what we need to update is the originTimestamp. + */ + *ptp_cmd = IFH_REW_OP_ORIGIN_PTP; + break; + case HWTSTAMP_TX_OFF: + *ptp_cmd = 0; + break; + default: + return -ERANGE; + } + + return 0; +} + int ocelot_hwstamp_get(struct ocelot *ocelot, int port, struct ifreq *ifr) { struct ocelot_port *ocelot_port = ocelot->ports[port]; @@ -523,30 +545,19 @@ EXPORT_SYMBOL(ocelot_hwstamp_get); int ocelot_hwstamp_set(struct ocelot *ocelot, int port, struct ifreq *ifr) { struct ocelot_port *ocelot_port = ocelot->ports[port]; + int ptp_cmd, old_ptp_cmd = ocelot_port->ptp_cmd; bool l2 = false, l4 = false; struct hwtstamp_config cfg; + bool old_l2, old_l4; int err; if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg))) return -EFAULT; /* Tx type sanity check */ - switch (cfg.tx_type) { - case HWTSTAMP_TX_ON: - ocelot_port->ptp_cmd = IFH_REW_OP_TWO_STEP_PTP; - break; - case HWTSTAMP_TX_ONESTEP_SYNC: - /* IFH_REW_OP_ONE_STEP_PTP updates the correctional field, we - * need to update the origin time. - */ - ocelot_port->ptp_cmd = IFH_REW_OP_ORIGIN_PTP; - break; - case HWTSTAMP_TX_OFF: - ocelot_port->ptp_cmd = 0; - break; - default: - return -ERANGE; - } + err = ocelot_ptp_tx_type_to_cmd(cfg.tx_type, &ptp_cmd); + if (err) + return err; switch (cfg.rx_filter) { case HWTSTAMP_FILTER_NONE: @@ -571,13 +582,27 @@ int ocelot_hwstamp_set(struct ocelot *ocelot, int port, struct ifreq *ifr) return -ERANGE; } + old_l2 = ocelot_port->trap_proto & OCELOT_PROTO_PTP_L2; + old_l4 = ocelot_port->trap_proto & OCELOT_PROTO_PTP_L4; + err = ocelot_setup_ptp_traps(ocelot, port, l2, l4); if (err) return err; + ocelot_port->ptp_cmd = ptp_cmd; + cfg.rx_filter = ocelot_traps_to_ptp_rx_filter(ocelot_port->trap_proto); - return copy_to_user(ifr->ifr_data, &cfg, sizeof(cfg)) ? -EFAULT : 0; + if (copy_to_user(ifr->ifr_data, &cfg, sizeof(cfg))) { + err = -EFAULT; + goto out_restore_ptp_traps; + } + + return 0; +out_restore_ptp_traps: + ocelot_setup_ptp_traps(ocelot, port, old_l2, old_l4); + ocelot_port->ptp_cmd = old_ptp_cmd; + return err; } EXPORT_SYMBOL(ocelot_hwstamp_set); -- 2.51.0 From 09310cfd4ea5c3ab2c7a610420205e0a1660bf7e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 6 Dec 2024 15:32:52 +0300 Subject: [PATCH 03/16] rtnetlink: fix error code in rtnl_newlink() If rtnl_get_peer_net() fails, then propagate the error code. Don't return success. Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()") Signed-off-by: Dan Carpenter Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/a2d20cd4-387a-4475-887c-bb7d0e88e25a@stanley.mountain Signed-off-by: Jakub Kicinski --- net/core/rtnetlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index ab5f201bf0ab..ebcfc2debf1a 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3972,8 +3972,10 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, if (ops->peer_type) { peer_net = rtnl_get_peer_net(ops, data, extack); - if (IS_ERR(peer_net)) + if (IS_ERR(peer_net)) { + ret = PTR_ERR(peer_net); goto put_ops; + } if (peer_net) rtnl_nets_add(&rtnl_nets, peer_net); } -- 2.51.0 From 1cd7523f4baaf03026974553978210dc39e96665 Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Thu, 5 Dec 2024 14:54:24 +0100 Subject: [PATCH 04/16] net: lan969x: fix cyclic dependency reported by depmod Depmod reports a cyclic dependency between modules sparx5-switch.ko and lan969x-switch.ko: depmod: ERROR: Cycle detected: lan969x_switch -> sparx5_switch -> lan969x_switch depmod: ERROR: Found 2 modules in dependency cycles! make[2]: *** [scripts/Makefile.modinst:132: depmod] Error 1 make: *** [Makefile:224: __sub-make] Error 2 This makes sense, as they both require symbols from each other. Fix this by compiling lan969x support into the sparx5-switch.ko module. In order to do this, in a sensible way, we move the lan969x/ dir into the sparx5/ dir and do some code cleanup of code that is no longer required. After this patch, depmod will no longer complain, as lan969x support is compiled into the sparx5-swicth.ko module, and can no longer be compiled as a standalone module. Fixes: 98a01119608d ("net: sparx5: add compatible string for lan969x") Signed-off-by: Daniel Machon Signed-off-by: David S. Miller --- MAINTAINERS | 2 +- drivers/net/ethernet/microchip/Kconfig | 1 - drivers/net/ethernet/microchip/Makefile | 1 - drivers/net/ethernet/microchip/lan969x/Kconfig | 5 ----- drivers/net/ethernet/microchip/lan969x/Makefile | 13 ------------- drivers/net/ethernet/microchip/sparx5/Kconfig | 6 ++++++ drivers/net/ethernet/microchip/sparx5/Makefile | 6 ++++++ .../microchip/{ => sparx5}/lan969x/lan969x.c | 5 ----- .../microchip/{ => sparx5}/lan969x/lan969x.h | 0 .../{ => sparx5}/lan969x/lan969x_calendar.c | 0 .../microchip/{ => sparx5}/lan969x/lan969x_regs.c | 0 .../{ => sparx5}/lan969x/lan969x_vcap_ag_api.c | 0 .../{ => sparx5}/lan969x/lan969x_vcap_impl.c | 0 .../net/ethernet/microchip/sparx5/sparx5_calendar.c | 2 -- drivers/net/ethernet/microchip/sparx5/sparx5_main.c | 4 ++-- drivers/net/ethernet/microchip/sparx5/sparx5_ptp.c | 1 - 16 files changed, 15 insertions(+), 31 deletions(-) delete mode 100644 drivers/net/ethernet/microchip/lan969x/Kconfig delete mode 100644 drivers/net/ethernet/microchip/lan969x/Makefile rename drivers/net/ethernet/microchip/{ => sparx5}/lan969x/lan969x.c (98%) rename drivers/net/ethernet/microchip/{ => sparx5}/lan969x/lan969x.h (100%) rename drivers/net/ethernet/microchip/{ => sparx5}/lan969x/lan969x_calendar.c (100%) rename drivers/net/ethernet/microchip/{ => sparx5}/lan969x/lan969x_regs.c (100%) rename drivers/net/ethernet/microchip/{ => sparx5}/lan969x/lan969x_vcap_ag_api.c (100%) rename drivers/net/ethernet/microchip/{ => sparx5}/lan969x/lan969x_vcap_impl.c (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 686109008d8e..f84ec3572a5d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15343,7 +15343,7 @@ M: Daniel Machon M: UNGLinuxDriver@microchip.com L: netdev@vger.kernel.org S: Maintained -F: drivers/net/ethernet/microchip/lan969x/* +F: drivers/net/ethernet/microchip/sparx5/lan969x/* MICROCHIP LCDFB DRIVER M: Nicolas Ferre diff --git a/drivers/net/ethernet/microchip/Kconfig b/drivers/net/ethernet/microchip/Kconfig index 73832fb2bc32..ee046468652c 100644 --- a/drivers/net/ethernet/microchip/Kconfig +++ b/drivers/net/ethernet/microchip/Kconfig @@ -59,7 +59,6 @@ config LAN743X source "drivers/net/ethernet/microchip/lan865x/Kconfig" source "drivers/net/ethernet/microchip/lan966x/Kconfig" -source "drivers/net/ethernet/microchip/lan969x/Kconfig" source "drivers/net/ethernet/microchip/sparx5/Kconfig" source "drivers/net/ethernet/microchip/vcap/Kconfig" source "drivers/net/ethernet/microchip/fdma/Kconfig" diff --git a/drivers/net/ethernet/microchip/Makefile b/drivers/net/ethernet/microchip/Makefile index 7770df82200f..3c65baed9fd8 100644 --- a/drivers/net/ethernet/microchip/Makefile +++ b/drivers/net/ethernet/microchip/Makefile @@ -11,7 +11,6 @@ lan743x-objs := lan743x_main.o lan743x_ethtool.o lan743x_ptp.o obj-$(CONFIG_LAN865X) += lan865x/ obj-$(CONFIG_LAN966X_SWITCH) += lan966x/ -obj-$(CONFIG_LAN969X_SWITCH) += lan969x/ obj-$(CONFIG_SPARX5_SWITCH) += sparx5/ obj-$(CONFIG_VCAP) += vcap/ obj-$(CONFIG_FDMA) += fdma/ diff --git a/drivers/net/ethernet/microchip/lan969x/Kconfig b/drivers/net/ethernet/microchip/lan969x/Kconfig deleted file mode 100644 index c5c6122ae2ec..000000000000 --- a/drivers/net/ethernet/microchip/lan969x/Kconfig +++ /dev/null @@ -1,5 +0,0 @@ -config LAN969X_SWITCH - bool "Lan969x switch driver" - depends on SPARX5_SWITCH - help - This driver supports the lan969x family of network switch devices. diff --git a/drivers/net/ethernet/microchip/lan969x/Makefile b/drivers/net/ethernet/microchip/lan969x/Makefile deleted file mode 100644 index 316405cbbc71..000000000000 --- a/drivers/net/ethernet/microchip/lan969x/Makefile +++ /dev/null @@ -1,13 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0-only -# -# Makefile for the Microchip lan969x network device drivers. -# - -obj-$(CONFIG_SPARX5_SWITCH) += lan969x-switch.o - -lan969x-switch-y := lan969x_regs.o lan969x.o lan969x_calendar.o \ - lan969x_vcap_ag_api.o lan969x_vcap_impl.o - -# Provide include files -ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/fdma -ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/vcap diff --git a/drivers/net/ethernet/microchip/sparx5/Kconfig b/drivers/net/ethernet/microchip/sparx5/Kconfig index 3f04992eace6..35b057c9d0cb 100644 --- a/drivers/net/ethernet/microchip/sparx5/Kconfig +++ b/drivers/net/ethernet/microchip/sparx5/Kconfig @@ -24,3 +24,9 @@ config SPARX5_DCB DSCP and PCP. If unsure, set to Y. + +config LAN969X_SWITCH + bool "Lan969x switch driver" + depends on SPARX5_SWITCH + help + This driver supports the lan969x family of network switch devices. diff --git a/drivers/net/ethernet/microchip/sparx5/Makefile b/drivers/net/ethernet/microchip/sparx5/Makefile index 3435ca86dd70..4bf2a885a9da 100644 --- a/drivers/net/ethernet/microchip/sparx5/Makefile +++ b/drivers/net/ethernet/microchip/sparx5/Makefile @@ -16,6 +16,12 @@ sparx5-switch-y := sparx5_main.o sparx5_packet.o \ sparx5-switch-$(CONFIG_SPARX5_DCB) += sparx5_dcb.o sparx5-switch-$(CONFIG_DEBUG_FS) += sparx5_vcap_debugfs.o +sparx5-switch-$(CONFIG_LAN969X_SWITCH) += lan969x/lan969x_regs.o \ + lan969x/lan969x.o \ + lan969x/lan969x_calendar.o \ + lan969x/lan969x_vcap_ag_api.o \ + lan969x/lan969x_vcap_impl.o + # Provide include files ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/vcap ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/fdma diff --git a/drivers/net/ethernet/microchip/lan969x/lan969x.c b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c similarity index 98% rename from drivers/net/ethernet/microchip/lan969x/lan969x.c rename to drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c index ac37d0f74ee3..67463d41d10e 100644 --- a/drivers/net/ethernet/microchip/lan969x/lan969x.c +++ b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c @@ -346,8 +346,3 @@ const struct sparx5_match_data lan969x_desc = { .consts = &lan969x_consts, .ops = &lan969x_ops, }; -EXPORT_SYMBOL_GPL(lan969x_desc); - -MODULE_DESCRIPTION("Microchip lan969x switch driver"); -MODULE_AUTHOR("Daniel Machon "); -MODULE_LICENSE("Dual MIT/GPL"); diff --git a/drivers/net/ethernet/microchip/lan969x/lan969x.h b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.h similarity index 100% rename from drivers/net/ethernet/microchip/lan969x/lan969x.h rename to drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.h diff --git a/drivers/net/ethernet/microchip/lan969x/lan969x_calendar.c b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_calendar.c similarity index 100% rename from drivers/net/ethernet/microchip/lan969x/lan969x_calendar.c rename to drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_calendar.c diff --git a/drivers/net/ethernet/microchip/lan969x/lan969x_regs.c b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_regs.c similarity index 100% rename from drivers/net/ethernet/microchip/lan969x/lan969x_regs.c rename to drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_regs.c diff --git a/drivers/net/ethernet/microchip/lan969x/lan969x_vcap_ag_api.c b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_vcap_ag_api.c similarity index 100% rename from drivers/net/ethernet/microchip/lan969x/lan969x_vcap_ag_api.c rename to drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_vcap_ag_api.c diff --git a/drivers/net/ethernet/microchip/lan969x/lan969x_vcap_impl.c b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_vcap_impl.c similarity index 100% rename from drivers/net/ethernet/microchip/lan969x/lan969x_vcap_impl.c rename to drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_vcap_impl.c diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_calendar.c b/drivers/net/ethernet/microchip/sparx5/sparx5_calendar.c index 5fe941c66c17..5c46d81de530 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_calendar.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_calendar.c @@ -98,7 +98,6 @@ u32 sparx5_cal_speed_to_value(enum sparx5_cal_bw speed) default: return 0; } } -EXPORT_SYMBOL_GPL(sparx5_cal_speed_to_value); static u32 sparx5_bandwidth_to_calendar(u32 bw) { @@ -150,7 +149,6 @@ enum sparx5_cal_bw sparx5_get_port_cal_speed(struct sparx5 *sparx5, u32 portno) return SPX5_CAL_SPEED_NONE; return sparx5_bandwidth_to_calendar(port->conf.bandwidth); } -EXPORT_SYMBOL_GPL(sparx5_get_port_cal_speed); /* Auto configure the QSYS calendar based on port configuration */ int sparx5_config_auto_calendar(struct sparx5 *sparx5) diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c index 2f1013f870fb..2b58fcb9422e 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c @@ -24,7 +24,7 @@ #include #include -#include "../lan969x/lan969x.h" /* for lan969x match data */ +#include "lan969x/lan969x.h" /* for lan969x match data */ #include "sparx5_main_regs.h" #include "sparx5_main.h" @@ -1093,7 +1093,7 @@ static const struct sparx5_match_data sparx5_desc = { static const struct of_device_id mchp_sparx5_match[] = { { .compatible = "microchip,sparx5-switch", .data = &sparx5_desc }, -#if IS_ENABLED(CONFIG_LAN969X_SWITCH) +#ifdef CONFIG_LAN969X_SWITCH { .compatible = "microchip,lan9691-switch", .data = &lan969x_desc }, #endif { } diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_ptp.c b/drivers/net/ethernet/microchip/sparx5/sparx5_ptp.c index 1c2903700a9c..2f168700f63c 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_ptp.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_ptp.c @@ -303,7 +303,6 @@ void sparx5_get_hwtimestamp(struct sparx5 *sparx5, spin_unlock_irqrestore(&sparx5->ptp_clock_lock, flags); } -EXPORT_SYMBOL_GPL(sparx5_get_hwtimestamp); irqreturn_t sparx5_ptp_irq_handler(int irq, void *args) { -- 2.51.0 From aa5fc889844ff9f920356c4b6535bf2c625457cd Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Thu, 5 Dec 2024 14:54:25 +0100 Subject: [PATCH 05/16] net: lan969x: fix the use of spin_lock in PTP handler We are mixing the use of spin_lock() and spin_lock_irqsave() functions in the PTP handler of lan969x. Fix this by correctly using the _irqsave variants. Fixes: 24fe83541755 ("net: lan969x: add PTP handler function") Signed-off-by: Daniel Machon [1]: https://lore.kernel.org/netdev/20241024-sparx5-lan969x-switch-driver-2-v2-10-a0b5fae88a0f@microchip.com/ Signed-off-by: David S. Miller --- drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c index 67463d41d10e..c2afa2176b08 100644 --- a/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c +++ b/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x.c @@ -273,9 +273,9 @@ static irqreturn_t lan969x_ptp_irq_handler(int irq, void *args) if (WARN_ON(!skb_match)) continue; - spin_lock(&sparx5->ptp_ts_id_lock); + spin_lock_irqsave(&sparx5->ptp_ts_id_lock, flags); sparx5->ptp_skbs--; - spin_unlock(&sparx5->ptp_ts_id_lock); + spin_unlock_irqrestore(&sparx5->ptp_ts_id_lock, flags); /* Get the h/w timestamp */ sparx5_get_hwtimestamp(sparx5, &ts, delay); -- 2.51.0 From f004f2e535e2b66ccbf5ac35f8eaadeac70ad7b7 Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Thu, 5 Dec 2024 14:54:26 +0100 Subject: [PATCH 06/16] net: sparx5: fix FDMA performance issue The FDMA handler is responsible for scheduling a NAPI poll, which will eventually fetch RX packets from the FDMA queue. Currently, the FDMA handler is run in a threaded context. For some reason, this kills performance. Admittedly, I did not do a thorough investigation to see exactly what causes the issue, however, I noticed that in the other driver utilizing the same FDMA engine, we run the FDMA handler in hard IRQ context. Fix this performance issue, by running the FDMA handler in hard IRQ context, not deferring any work to a thread. Prior to this change, the RX UDP performance was: Interval Transfer Bitrate Jitter 0.00-10.20 sec 44.6 MBytes 36.7 Mbits/sec 0.027 ms After this change, the rx UDP performance is: Interval Transfer Bitrate Jitter 0.00-9.12 sec 1.01 GBytes 953 Mbits/sec 0.020 ms Fixes: 10615907e9b5 ("net: sparx5: switchdev: adding frame DMA functionality") Signed-off-by: Daniel Machon Signed-off-by: David S. Miller --- drivers/net/ethernet/microchip/sparx5/sparx5_main.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c index 2b58fcb9422e..f61aa15beab7 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c @@ -780,12 +780,11 @@ static int sparx5_start(struct sparx5 *sparx5) err = -ENXIO; if (sparx5->fdma_irq >= 0 && is_sparx5(sparx5)) { if (GCB_CHIP_ID_REV_ID_GET(sparx5->chip_id) > 0) - err = devm_request_threaded_irq(sparx5->dev, - sparx5->fdma_irq, - NULL, - sparx5_fdma_handler, - IRQF_ONESHOT, - "sparx5-fdma", sparx5); + err = devm_request_irq(sparx5->dev, + sparx5->fdma_irq, + sparx5_fdma_handler, + 0, + "sparx5-fdma", sparx5); if (!err) err = sparx5_fdma_start(sparx5); if (err) -- 2.51.0 From e4d505fda6c81baf9b92a7f32f89912984654983 Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Thu, 5 Dec 2024 14:54:27 +0100 Subject: [PATCH 07/16] net: sparx5: fix default value of monitor ports When doing port mirroring, the physical port to send the frame to, is written to the FRMC_PORT_VAL field of the QFWD_FRAME_COPY_CFG register. This field is 7 bits wide on sparx5 and 6 bits wide on lan969x, and has a default value of 65 and 30, respectively (the number of front ports). On mirror deletion, we set the default value of the monitor port to 65 for this field, in case no more ports exists for the mirror. Needless to say, this will not fit the 6 bits on lan969x. Fix this by correctly using the n_ports constant instead. Fixes: 3f9e46347a46 ("net: sparx5: use SPX5_CONST for constants which already have a symbol") Signed-off-by: Daniel Machon Signed-off-by: David S. Miller --- drivers/net/ethernet/microchip/sparx5/sparx5_mirror.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_mirror.c b/drivers/net/ethernet/microchip/sparx5/sparx5_mirror.c index 9806729e9c62..76097761fa97 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_mirror.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_mirror.c @@ -12,7 +12,6 @@ #define SPX5_MIRROR_DISABLED 0 #define SPX5_MIRROR_EGRESS 1 #define SPX5_MIRROR_INGRESS 2 -#define SPX5_MIRROR_MONITOR_PORT_DEFAULT 65 #define SPX5_QFWD_MP_OFFSET 9 /* Mirror port offset in the QFWD register */ /* Convert from bool ingress/egress to mirror direction */ @@ -200,7 +199,7 @@ void sparx5_mirror_del(struct sparx5_mall_entry *entry) sparx5_mirror_monitor_set(sparx5, mirror_idx, - SPX5_MIRROR_MONITOR_PORT_DEFAULT); + sparx5->data->consts->n_ports); } void sparx5_mirror_stats(struct sparx5_mall_entry *entry, -- 2.51.0 From ddd7ba006078a2bef5971b2dc5f8383d47f96207 Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Thu, 5 Dec 2024 14:54:28 +0100 Subject: [PATCH 08/16] net: sparx5: fix the maximum frame length register On port initialization, we configure the maximum frame length accepted by the receive module associated with the port. This value is currently written to the MAX_LEN field of the DEV10G_MAC_ENA_CFG register, when in fact, it should be written to the DEV10G_MAC_MAXLEN_CFG register. Fix this. Fixes: 946e7fd5053a ("net: sparx5: add port module support") Signed-off-by: Daniel Machon Signed-off-by: David S. Miller --- drivers/net/ethernet/microchip/sparx5/sparx5_port.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_port.c b/drivers/net/ethernet/microchip/sparx5/sparx5_port.c index 1401761c6251..f9d1a6bb9bff 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_port.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_port.c @@ -1151,7 +1151,7 @@ int sparx5_port_init(struct sparx5 *sparx5, spx5_inst_rmw(DEV10G_MAC_MAXLEN_CFG_MAX_LEN_SET(ETH_MAXLEN), DEV10G_MAC_MAXLEN_CFG_MAX_LEN, devinst, - DEV10G_MAC_ENA_CFG(0)); + DEV10G_MAC_MAXLEN_CFG(0)); /* Handle Signal Detect in 10G PCS */ spx5_inst_wr(PCS10G_BR_PCS_SD_CFG_SD_POL_SET(sd_pol) | -- 2.51.0 From 356983f569c1f5991661fc0050aa263792f50616 Mon Sep 17 00:00:00 2001 From: Anumula Murali Mohan Reddy Date: Fri, 6 Dec 2024 11:50:14 +0530 Subject: [PATCH 09/16] cxgb4: use port number to set mac addr t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl() uses port number to get mac addr, this leads to error when an attempt to set MAC address on VF's of PF2 and PF3. This patch fixes the issue by using port number to set mac address. Fixes: e0cdac65ba26 ("cxgb4vf: configure ports accessible by the VF") Signed-off-by: Anumula Murali Mohan Reddy Signed-off-by: Potnuri Bharat Teja Reviewed-by: Simon Horman Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 5 +++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 75bd69ff61a8..c7c2c15a1815 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -2076,7 +2076,7 @@ void t4_idma_monitor(struct adapter *adapter, struct sge_idma_monitor_state *idma, int hz, int ticks); int t4_set_vf_mac_acl(struct adapter *adapter, unsigned int vf, - unsigned int naddr, u8 *addr); + u8 start, unsigned int naddr, u8 *addr); void t4_tp_pio_read(struct adapter *adap, u32 *buff, u32 nregs, u32 start_index, bool sleep_ok); void t4_tp_tm_pio_read(struct adapter *adap, u32 *buff, u32 nregs, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 97a261d5357e..bc3af0054406 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -3234,7 +3234,7 @@ static int cxgb4_mgmt_set_vf_mac(struct net_device *dev, int vf, u8 *mac) dev_info(pi->adapter->pdev_dev, "Setting MAC %pM on VF %d\n", mac, vf); - ret = t4_set_vf_mac_acl(adap, vf + 1, 1, mac); + ret = t4_set_vf_mac_acl(adap, vf + 1, pi->lport, 1, mac); if (!ret) ether_addr_copy(adap->vfinfo[vf].vf_mac_addr, mac); return ret; diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index 76de55306c4d..175bf9b13058 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -10215,11 +10215,12 @@ out: * t4_set_vf_mac_acl - Set MAC address for the specified VF * @adapter: The adapter * @vf: one of the VFs instantiated by the specified PF + * @start: The start port id associated with specified VF * @naddr: the number of MAC addresses * @addr: the MAC address(es) to be set to the specified VF */ int t4_set_vf_mac_acl(struct adapter *adapter, unsigned int vf, - unsigned int naddr, u8 *addr) + u8 start, unsigned int naddr, u8 *addr) { struct fw_acl_mac_cmd cmd; @@ -10234,7 +10235,7 @@ int t4_set_vf_mac_acl(struct adapter *adapter, unsigned int vf, cmd.en_to_len16 = cpu_to_be32((unsigned int)FW_LEN16(cmd)); cmd.nmac = naddr; - switch (adapter->pf) { + switch (start) { case 3: memcpy(cmd.macaddr3, addr, sizeof(cmd.macaddr3)); break; -- 2.51.0 From 4dba406fac06b009873fe7a28231b9b7e4288b09 Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Fri, 6 Dec 2024 19:46:42 +0100 Subject: [PATCH 10/16] qca_spi: Fix clock speed for multiple QCA7000 Storing the maximum clock speed in module parameter qcaspi_clkspeed has the unintended side effect that the first probed instance defines the value for all other instances. Fix this issue by storing it in max_speed_hz of the relevant SPI device. This fix keeps the priority of the speed parameter (module parameter, device tree property, driver default). Btw this uses the opportunity to get the rid of the unused member clkspeed. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren Link: https://patch.msgid.link/20241206184643.123399-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/qualcomm/qca_spi.c | 24 ++++++++++-------------- drivers/net/ethernet/qualcomm/qca_spi.h | 1 - 2 files changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index ef9c02b000e4..d328e770bcbe 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -818,7 +818,6 @@ qcaspi_netdev_init(struct net_device *dev) dev->mtu = QCAFRM_MAX_MTU; dev->type = ARPHRD_ETHER; - qca->clkspeed = qcaspi_clkspeed; qca->burst_len = qcaspi_burst_len; qca->spi_thread = NULL; qca->buffer_size = (QCAFRM_MAX_MTU + VLAN_ETH_HLEN + QCAFRM_HEADER_LEN + @@ -909,17 +908,15 @@ qca_spi_probe(struct spi_device *spi) legacy_mode = of_property_read_bool(spi->dev.of_node, "qca,legacy-mode"); - if (qcaspi_clkspeed == 0) { - if (spi->max_speed_hz) - qcaspi_clkspeed = spi->max_speed_hz; - else - qcaspi_clkspeed = QCASPI_CLK_SPEED; - } + if (qcaspi_clkspeed) + spi->max_speed_hz = qcaspi_clkspeed; + else if (!spi->max_speed_hz) + spi->max_speed_hz = QCASPI_CLK_SPEED; - if ((qcaspi_clkspeed < QCASPI_CLK_SPEED_MIN) || - (qcaspi_clkspeed > QCASPI_CLK_SPEED_MAX)) { - dev_err(&spi->dev, "Invalid clkspeed: %d\n", - qcaspi_clkspeed); + if (spi->max_speed_hz < QCASPI_CLK_SPEED_MIN || + spi->max_speed_hz > QCASPI_CLK_SPEED_MAX) { + dev_err(&spi->dev, "Invalid clkspeed: %u\n", + spi->max_speed_hz); return -EINVAL; } @@ -944,14 +941,13 @@ qca_spi_probe(struct spi_device *spi) return -EINVAL; } - dev_info(&spi->dev, "ver=%s, clkspeed=%d, burst_len=%d, pluggable=%d\n", + dev_info(&spi->dev, "ver=%s, clkspeed=%u, burst_len=%d, pluggable=%d\n", QCASPI_DRV_VERSION, - qcaspi_clkspeed, + spi->max_speed_hz, qcaspi_burst_len, qcaspi_pluggable); spi->mode = SPI_MODE_3; - spi->max_speed_hz = qcaspi_clkspeed; if (spi_setup(spi) < 0) { dev_err(&spi->dev, "Unable to setup SPI device\n"); return -EFAULT; diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h b/drivers/net/ethernet/qualcomm/qca_spi.h index 7ba5c9e2f61c..90b290f94c27 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.h +++ b/drivers/net/ethernet/qualcomm/qca_spi.h @@ -89,7 +89,6 @@ struct qcaspi { #endif /* user configurable options */ - u32 clkspeed; u8 legacy_mode; u16 burst_len; }; -- 2.51.0 From becc6399ce3b724cffe9ccb7ef0bff440bb1b62b Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Fri, 6 Dec 2024 19:46:43 +0100 Subject: [PATCH 11/16] qca_spi: Make driver probing reliable The module parameter qcaspi_pluggable controls if QCA7000 signature should be checked at driver probe (current default) or not. Unfortunately this could fail in case the chip is temporary in reset, which isn't under total control by the Linux host. So disable this check per default in order to avoid unexpected probe failures. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren Link: https://patch.msgid.link/20241206184643.123399-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/qualcomm/qca_spi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index d328e770bcbe..38a779f4b866 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -54,7 +54,7 @@ MODULE_PARM_DESC(qcaspi_burst_len, "Number of data bytes per burst. Use 1-5000." #define QCASPI_PLUGGABLE_MIN 0 #define QCASPI_PLUGGABLE_MAX 1 -static int qcaspi_pluggable = QCASPI_PLUGGABLE_MIN; +static int qcaspi_pluggable = QCASPI_PLUGGABLE_MAX; module_param(qcaspi_pluggable, int, 0); MODULE_PARM_DESC(qcaspi_pluggable, "Pluggable SPI connection (yes/no)."); -- 2.51.0 From af47a328e8130c81984cbcd1f771fc4bb777bd0f Mon Sep 17 00:00:00 2001 From: Geetha sowjanya Date: Thu, 5 Dec 2024 17:04:35 +0530 Subject: [PATCH 12/16] octeontx2-af: Fix installation of PF multicast rule Due to target variable is being reassigned in npc_install_flow() function, PF multicast rules are not getting installed. This patch addresses the issue by fixing the "IF" condition checks when rules are installed by AF. Fixes: 6c40ca957fe5 ("octeontx2-pf: Adds TC offload support"). Signed-off-by: Geetha sowjanya Reviewed-by: Simon Horman Link: https://patch.msgid.link/20241205113435.10601-1-gakula@marvell.com Signed-off-by: Paolo Abeni --- .../ethernet/marvell/octeontx2/af/rvu_npc_fs.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c index da69e454662a..1b765045aa63 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c @@ -1452,23 +1452,21 @@ process_flow: * hence modify pcifunc accordingly. */ - /* AF installing for a PF/VF */ - if (!req->hdr.pcifunc) + if (!req->hdr.pcifunc) { + /* AF installing for a PF/VF */ target = req->vf; - - /* PF installing for its VF */ - if (!from_vf && req->vf && !from_rep_dev) { + } else if (!from_vf && req->vf && !from_rep_dev) { + /* PF installing for its VF */ target = (req->hdr.pcifunc & ~RVU_PFVF_FUNC_MASK) | req->vf; pf_set_vfs_mac = req->default_rule && (req->features & BIT_ULL(NPC_DMAC)); - } - - /* Representor device installing for a representee */ - if (from_rep_dev && req->vf) + } else if (from_rep_dev && req->vf) { + /* Representor device installing for a representee */ target = req->vf; - else + } else { /* msg received from PF/VF */ target = req->hdr.pcifunc; + } /* ignore chan_mask in case pf func is not AF, revisit later */ if (!is_pffunc_af(req->hdr.pcifunc)) -- 2.51.0 From 3ddccbefebdbe0c4c72a248676e4d39ac66a8e26 Mon Sep 17 00:00:00 2001 From: Koichiro Den Date: Fri, 6 Dec 2024 10:10:42 +0900 Subject: [PATCH 13/16] virtio_net: correct netdev_tx_reset_queue() invocation point When virtnet_close is followed by virtnet_open, some TX completions can possibly remain unconsumed, until they are finally processed during the first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash [1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call before RX napi enable") was not sufficient to eliminate all BQL crash cases for virtio-net. This issue can be reproduced with the latest net-next master by running: `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network TX load from inside the machine. netdev_tx_reset_queue() can actually be dropped from virtnet_open path; the device is not stopped in any case. For BQL core part, it's just like traffic nearly ceases to exist for some period. For stall detector added to BQL, even if virtnet_close could somehow lead to some TX completions delayed for long, followed by virtnet_open, we can just take it as stall as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector based on BQL"). Note also that users can still reset stall_max via sysfs. So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This eliminates the BQL crashes. As a result, netdev_tx_reset_queue() is now explicitly required in freeze/restore path. This patch adds it to immediately after free_unused_bufs(), following the rule of thumb: netdev_tx_reset_queue() should follow any SKB freeing not followed by netdev_tx_completed_queue(). This seems the most consistent and streamlined approach, and now netdev_tx_reset_queue() runs whenever free_unused_bufs() is done. [1]: ------------[ cut here ]------------ kernel BUG at lib/dynamic_queue_limits.c:99! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2 Tainted: [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \ BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dql_completed+0x26b/0x290 Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01 d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009 RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000 RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001 R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40 FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0 PKRU: 55555554 Call Trace: ? die+0x32/0x80 ? do_trap+0xd9/0x100 ? dql_completed+0x26b/0x290 ? dql_completed+0x26b/0x290 ? do_error_trap+0x6d/0xb0 ? dql_completed+0x26b/0x290 ? exc_invalid_op+0x4c/0x60 ? dql_completed+0x26b/0x290 ? asm_exc_invalid_op+0x16/0x20 ? dql_completed+0x26b/0x290 __free_old_xmit+0xff/0x170 [virtio_net] free_old_xmit+0x54/0xc0 [virtio_net] virtnet_poll+0xf4/0xe30 [virtio_net] ? __update_load_avg_cfs_rq+0x264/0x2d0 ? update_curr+0x35/0x260 ? reweight_entity+0x1be/0x260 __napi_poll.constprop.0+0x28/0x1c0 net_rx_action+0x329/0x420 ? enqueue_hrtimer+0x35/0x90 ? trace_hardirqs_on+0x1d/0x80 ? kvm_sched_clock_read+0xd/0x20 ? sched_clock+0xc/0x30 ? kvm_sched_clock_read+0xd/0x20 ? sched_clock+0xc/0x30 ? sched_clock_cpu+0xd/0x1a0 handle_softirqs+0x138/0x3e0 do_softirq.part.0+0x89/0xc0 __local_bh_enable_ip+0xa7/0xb0 virtnet_open+0xc8/0x310 [virtio_net] __dev_open+0xfa/0x1b0 __dev_change_flags+0x1de/0x250 dev_change_flags+0x22/0x60 do_setlink.isra.0+0x2df/0x10b0 ? rtnetlink_rcv_msg+0x34f/0x3f0 ? netlink_rcv_skb+0x54/0x100 ? netlink_unicast+0x23e/0x390 ? netlink_sendmsg+0x21e/0x490 ? ____sys_sendmsg+0x31b/0x350 ? avc_has_perm_noaudit+0x67/0xf0 ? cred_has_capability.isra.0+0x75/0x110 ? __nla_validate_parse+0x5f/0xee0 ? __pfx___probestub_irq_enable+0x3/0x10 ? __create_object+0x5e/0x90 ? security_capable+0x3b/0x70 rtnl_newlink+0x784/0xaf0 ? avc_has_perm_noaudit+0x67/0xf0 ? cred_has_capability.isra.0+0x75/0x110 ? stack_depot_save_flags+0x24/0x6d0 ? __pfx_rtnl_newlink+0x10/0x10 rtnetlink_rcv_msg+0x34f/0x3f0 ? do_syscall_64+0x6c/0x180 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e ? __pfx_rtnetlink_rcv_msg+0x10/0x10 netlink_rcv_skb+0x54/0x100 netlink_unicast+0x23e/0x390 netlink_sendmsg+0x21e/0x490 ____sys_sendmsg+0x31b/0x350 ? copy_msghdr_from_user+0x6d/0xa0 ___sys_sendmsg+0x86/0xd0 ? __pte_offset_map+0x17/0x160 ? preempt_count_add+0x69/0xa0 ? __call_rcu_common.constprop.0+0x147/0x610 ? preempt_count_add+0x69/0xa0 ? preempt_count_add+0x69/0xa0 ? _raw_spin_trylock+0x13/0x60 ? trace_hardirqs_on+0x1d/0x80 __sys_sendmsg+0x66/0xc0 do_syscall_64+0x6c/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f41defe5b34 Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00 f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55 RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34 RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003 RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001 R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003 R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000 [...] Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits") Cc: # v6.11+ Signed-off-by: Koichiro Den Acked-by: Jason Wang Reviewed-by: Xuan Zhuo [ pabeni: trimmed possibly troublesome separator ] Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 64c87bb48a41..6e0925f7f182 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index) if (err < 0) goto err_xdp_reg_mem_model; - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index)); virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi); virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi); @@ -6966,11 +6965,20 @@ free: static void remove_vq_common(struct virtnet_info *vi) { + int i; + virtio_reset_device(vi->vdev); /* Free unused buffers in both send and recv, if any. */ free_unused_bufs(vi); + /* + * Rule of thumb is netdev_tx_reset_queue() should follow any + * skb freeing not followed by netdev_tx_completed_queue() + */ + for (i = 0; i < vi->max_queue_pairs; i++) + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i)); + free_receive_bufs(vi); free_receive_page_frags(vi); -- 2.51.0 From 4571dc7272b22cf35c7a5a1b14d3b036a2fefdc5 Mon Sep 17 00:00:00 2001 From: Koichiro Den Date: Fri, 6 Dec 2024 10:10:43 +0900 Subject: [PATCH 14/16] virtio_net: replace vq2rxq with vq2txq where appropriate While not harmful, using vq2rxq where it's always sq appears odd. Replace it with the more appropriate vq2txq for clarity and correctness. Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer") Signed-off-by: Koichiro Den Acked-by: Jason Wang Reviewed-by: Xuan Zhuo Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6e0925f7f182..fc89c5e1a207 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -6213,7 +6213,7 @@ static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf) { struct virtnet_info *vi = vq->vdev->priv; struct send_queue *sq; - int i = vq2rxq(vq); + int i = vq2txq(vq); sq = &vi->sq[i]; -- 2.51.0 From 8d6712c892019b9b9dc5c7039edd3c9d770b510b Mon Sep 17 00:00:00 2001 From: Koichiro Den Date: Fri, 6 Dec 2024 10:10:44 +0900 Subject: [PATCH 15/16] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize() When virtqueue_resize() has actually recycled all unused buffers, additional work may be required in some cases. Relying solely on its return status is fragile, so introduce a new function argument 'recycle_done', which is invoked when the recycle really occurs. Cc: # v6.11+ Signed-off-by: Koichiro Den Acked-by: Jason Wang Reviewed-by: Xuan Zhuo Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 4 ++-- drivers/virtio/virtio_ring.c | 6 +++++- include/linux/virtio.h | 3 ++- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index fc89c5e1a207..e10bc9e6b072 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3331,7 +3331,7 @@ static int virtnet_rx_resize(struct virtnet_info *vi, virtnet_rx_pause(vi, rq); - err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf); + err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf, NULL); if (err) netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err); @@ -3394,7 +3394,7 @@ static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq, virtnet_tx_pause(vi, sq); - err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf); + err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, NULL); if (err) netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err); diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 82a7d2cbc704..6af8cf6a619e 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2772,6 +2772,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma); * @_vq: the struct virtqueue we're talking about. * @num: new ring num * @recycle: callback to recycle unused buffers + * @recycle_done: callback to be invoked when recycle for all unused buffers done * * When it is really necessary to create a new vring, it will set the current vq * into the reset state. Then call the passed callback to recycle the buffer @@ -2792,7 +2793,8 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma); * */ int virtqueue_resize(struct virtqueue *_vq, u32 num, - void (*recycle)(struct virtqueue *vq, void *buf)) + void (*recycle)(struct virtqueue *vq, void *buf), + void (*recycle_done)(struct virtqueue *vq)) { struct vring_virtqueue *vq = to_vvq(_vq); int err; @@ -2809,6 +2811,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num, err = virtqueue_disable_and_recycle(_vq, recycle); if (err) return err; + if (recycle_done) + recycle_done(_vq); if (vq->packed_ring) err = virtqueue_resize_packed(_vq, num); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 57cc4b07fd17..0aa7df4ed5ca 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -109,7 +109,8 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue *vq); dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq); int virtqueue_resize(struct virtqueue *vq, u32 num, - void (*recycle)(struct virtqueue *vq, void *buf)); + void (*recycle)(struct virtqueue *vq, void *buf), + void (*recycle_done)(struct virtqueue *vq)); int virtqueue_reset(struct virtqueue *vq, void (*recycle)(struct virtqueue *vq, void *buf)); -- 2.51.0 From 1480f0f61b675567ca5d0943d6ef2e39172dcafd Mon Sep 17 00:00:00 2001 From: Koichiro Den Date: Fri, 6 Dec 2024 10:10:45 +0900 Subject: [PATCH 16/16] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to be reset when flushing has actually occurred. Add virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to handle this. Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits") Cc: # v6.11+ Signed-off-by: Koichiro Den Acked-by: Jason Wang Reviewed-by: Xuan Zhuo Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e10bc9e6b072..3a0341cc6085 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -503,6 +503,7 @@ struct virtio_net_common_hdr { static struct virtio_net_common_hdr xsk_hdr; static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf); +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq); static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp, struct net_device *dev, unsigned int *xdp_xmit, @@ -3394,7 +3395,8 @@ static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq, virtnet_tx_pause(vi, sq); - err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, NULL); + err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, + virtnet_sq_free_unused_buf_done); if (err) netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err); @@ -6233,6 +6235,14 @@ static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf) } } +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq) +{ + struct virtnet_info *vi = vq->vdev->priv; + int i = vq2txq(vq); + + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i)); +} + static void free_unused_bufs(struct virtnet_info *vi) { void *buf; -- 2.51.0