Jason Wang [Fri, 24 Aug 2018 08:53:13 +0000 (16:53 +0800)]
vhost: correctly check the iova range when waking virtqueue
We don't wakeup the virtqueue if the first byte of pending iova range
is the last byte of the range we just got updated. This will lead a
virtqueue to wait for IOTLB updating forever. Fixing by correct the
check and wake up the virtqueue in this case.
Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Reported-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Thu, 23 Aug 2018 20:20:52 +0000 (13:20 -0700)]
qlge: Fix netdev features configuration.
qlge_fix_features() is not supposed to modify hardware or
driver state, rather it is supposed to only fix requested
fetures bits. Currently qlge_fix_features() also goes for
interface down and up unnecessarily if there is not even
any change in features set.
This patch changes/fixes following -
1) Move reload of interface or device re-config from
qlge_fix_features() to qlge_set_features().
2) Reload of interface in qlge_set_features() only if
relevant feature bit (NETIF_F_HW_VLAN_CTAG_RX) is changed.
3) Get rid of qlge_fix_features() since driver is not really
required to fix any features bit.
Signed-off-by: Manish <manish.chopra@cavium.com> Reviewed-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Anssi Hannula [Thu, 23 Aug 2018 07:45:22 +0000 (10:45 +0300)]
net: macb: do not disable MDIO bus at open/close time
macb_reset_hw() is called from macb_close() and indirectly from
macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE
(Management Port Enable) bit.
This will prevent accessing any other PHYs for other Ethernet MACs on
the MDIO bus, which remains registered at macb_reset_hw() time, until
macb_init_hw() is called from macb_open() which sets the MPE bit again.
I.e. currently the MDIO bus has a short disruption at open time and is
disabled at close time until the interface is opened again.
Fix that by only touching the RE and TE bits when enabling and disabling
RX/TX.
v2: Make macb_init_hw() NCR write a single statement.
Fixes: 6c36a7074436 ("macb: Use generic PHY layer") Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
All legacy clock implementations now implement clk_set_rate() (Some
implementations may be dummies, though).
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Arnd Bergmann <arnd@arnd.de> Acked-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 739de9a1563a ("net: macb: Reorganize macb_mii bringup") broke
initializing macb on the EVB-KSZ9477 eval board.
There, of_mdiobus_register was called even for the fixed-link representing
the RGMII-link to the switch with the result that the driver attempts to
enumerate PHYs on a non-existent MDIO bus:
libphy: MACB_mii_bus: probed
mdio_bus f0028000.ethernet-ffffffff: fixed-link has invalid PHY address
mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 0
[snip]
mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 31
The "MDIO" bus registration succeeds regardless, having claimed the reset GPIO,
and calling of_phy_register_fixed_link later on fails because it tries
to claim the same GPIO:
Fix this by registering the fixed-link before calling mdiobus_register.
Fixes: 739de9a1563a ("net: macb: Reorganize macb_mii bringup") Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 24 Aug 2018 12:41:35 +0000 (15:41 +0300)]
mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
When a bridge device is removed, the VLANs are flushed from each
configured port. This causes the ports to decrement the reference count
on the associated FIDs (filtering identifier). If the reference count of
a FID is 1 and it has a RIF (router interface), then this RIF is
destroyed.
However, if no port is member in the VLAN for which a RIF exists, then
the RIF will continue to exist after the removal of the bridge. To
reproduce:
# ip link add name br0 type bridge vlan_filtering 1
# ip link set dev swp1 master br0
# ip link add link br0 name br0.10 type vlan id 10
# ip address add 192.0.2.0/24 dev br0.10
# ip link del dev br0
The RIF associated with br0.10 continues to exist.
Fix this by iterating over all the bridge device uppers when it is
destroyed and take care of destroying their RIFs.
Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 25 Aug 2018 06:40:38 +0000 (23:40 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-08-24
This series contains fixes to e1000, igb, ixgb, ixgbe and i40e.
YueHaibing from Huawei provides a change to use dma_zalloc_coherent()
instead of calls to allocator followed by a memset for ixgb.
Bo Chen provides a couple of fixes for e1000, first by adding a check to
prevent a NULL pointer dereference. The second change is to clean up a
possible resource leak on old transmit and receive rings when the device
is not up.
Jesus fixes an issue in the usage of an advanced transmit context
descriptor for retrieving the timestamp of a packet for AF_PACKET if the
IGB_TX_FLAGS_VLAN is not set in igb.
Jia-Ju Bai provides several patches which replace GFP_ATOMIC with
GFP_KERNEL, when using kzalloc() and kcalloc() which is not necessary.
Also found an instance of mdelay() call which could be replaced with
msleep().
Tony fixes ixgbe to allow MTU changes with XDP, by adding checks to
ensure only supported values and return -EINVAL for when it is not
supported.
Sebastian fixed an issue that was not clearing VF mailbox memory and
ensure queues are re-enabled correctly.
Martyna fixes a transmit timeout when DCB is configured when bringing up
an interface.
Jake fixes a previous commit which accidentally reversed the check of
the data pointer, so we can accurately count the size of the stats.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Mon, 20 Aug 2018 15:12:27 +0000 (08:12 -0700)]
i40e: fix condition of WARN_ONCE for stat strings
Commit 9b10df596bd4 ("i40e: use WARN_ONCE to replace the commented
BUG_ON size check") introduced a warning check to make sure
that the size of the stat strings was always the expected value. This
code accidentally inverted the check of the data pointer. Fix this so
that we accurately count the size of the stats we copied in.
This fixes an erroneous WARN kernel splat that occurs when requesting
ethtool statistics.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Tested-by: Mauro S M Rodrigues <maurosr@linux.vnet.ibm.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Martyna Szapar [Wed, 8 Aug 2018 00:11:23 +0000 (17:11 -0700)]
i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
If interface is connected to switch port configured for DCB there are
TX timeouts when bringing up interface. Problem started appearing after
adding in i40e driver code mqprio hardware offload mode. In function
i40e_vsi_configure_bw_alloc was added resetting BW rate which should
be executing when mqprio qdisc is removed but was also when there was
no mqprio qdisc added and DCB was enabled. In this patch was added
additional check for DCB flag so now when DCB is enabled the correct
DCB configs from before mqprio patch are restored.
Signed-off-by: Martyna Szapar <martyna.szapar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sebastian Basierski [Tue, 31 Jul 2018 16:16:00 +0000 (18:16 +0200)]
ixgbe: fix driver behaviour after issuing VFLR
Since VFLR doesn't clear VFMBMEM (VF Mailbox Memory)
and is not re-enabling queues correctly we should fix
this behavior.
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Mon, 30 Jul 2018 22:52:48 +0000 (15:52 -0700)]
ixgbe: Prevent unsupported configurations with XDP
These changes address comments by Jakub Kicinski on
commit 38b7e7f8ae82 ("ixgbe: Do not allow LRO or MTU change with XDP").
Change the MTU check with XDP to allow any supported value and only
reject those outside of the range as opposed to rejecting any change
when XDP is active. In situations where MTU size is not supported,
return -EINVAL instead of -EPERM.
Add checks when enabling SRIOV, DCB, or adding L2FW offloaded device
as they are not supported with XDP.
CC: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jia-Ju Bai [Fri, 27 Jul 2018 08:22:31 +0000 (16:22 +0800)]
ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
ixgbe_fcoe_ddp_setup(), ixgbe_setup_fcoe_ddp_resources() and
ixgbe_sw_init() are never called in atomic context.
They call kmalloc(), dma_pool_alloc() and kzalloc() with GFP_ATOMIC,
which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Sebastian Basierski <sebastianx.basierski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jia-Ju Bai [Fri, 27 Jul 2018 08:07:38 +0000 (16:07 +0800)]
igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
igb_integrated_phy_loopback() is never called in atomic context.
It calls mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep().
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jia-Ju Bai [Fri, 27 Jul 2018 08:04:53 +0000 (16:04 +0800)]
igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
igb_sw_init() is never called in atomic context.
It calls kzalloc() and kcalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesus Sanchez-Palencia [Thu, 26 Jul 2018 17:20:38 +0000 (10:20 -0700)]
igb: Use an advanced ctx descriptor for launchtime
On i210, Launchtime (TxTime) requires the usage of an "Advanced
Transmit Context Descriptor" for retrieving the timestamp of a packet.
The igb driver correctly builds such descriptor on the segmentation
flow (i.e. igb_tso()) or on the checksum one (i.e. igb_tx_csum()), but the
feature is broken for AF_PACKET if the IGB_TX_FLAGS_VLAN is not set,
which happens due to an early return on igb_tx_csum().
This flag is only set by the kernel when a VLAN interface is used,
thus we can't just rely on it. Here we are fixing this issue by checking
if launchtime is enabled for the current tx_ring before performing the
early return.
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bo Chen [Mon, 23 Jul 2018 16:01:30 +0000 (09:01 -0700)]
e1000: ensure to free old tx/rx rings in set_ringparam()
In 'e1000_set_ringparam()', the tx_ring and rx_ring are updated with new value
and the old tx/rx rings are freed only when the device is up. There are resource
leaks on old tx/rx rings when the device is not up. This bug is reported by COD,
a tool for testing kernel module binaries I am building.
This patch fixes the bug by always calling 'kfree()' on old tx/rx rings in
'e1000_set_ringparam()'.
Signed-off-by: Bo Chen <chenbo@pdx.edu> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bo Chen [Mon, 23 Jul 2018 16:01:29 +0000 (09:01 -0700)]
e1000: check on netif_running() before calling e1000_up()
When the device is not up, the call to 'e1000_up()' from the error handling path
of 'e1000_set_ringparam()' causes a kernel oops with a null-pointer
dereference. The null-pointer dereference is triggered in function
'e1000_alloc_rx_buffers()' at line 'buffer_info = &rx_ring->buffer_info[i]'.
This bug was reported by COD, a tool for testing kernel module binaries I am
building. This bug was also detected by KFI from Dr. Kai Cong.
This patch fixes the bug by checking on 'netif_running()' before calling
'e1000_up()' in 'e1000_set_ringparam()'.
Signed-off-by: Bo Chen <chenbo@pdx.edu> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix BPF sockmap and tls where we get a hang in do_tcp_sendpages()
when sndbuf is full due to missing calls into underlying socket's
sk_write_space(), from John.
2) Two BPF sockmap fixes to reject invalid parameters on map creation
and to fix a map element miscount on allocation failure. Another fix
for BPF hash tables to use per hash table salt for jhash(), from Daniel.
3) Fix for bpftool's command line parsing in order to terminate on bad
arguments instead of keeping looping in some border cases, from Quentin.
4) Fix error value of xdp_umem_assign_dev() in order to comply with
expected bind ops error codes, from Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Aug 2018 19:37:41 +0000 (12:37 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2018-08-23
This series contains bug fixes to the ice driver.
Anirudh provides several fixes, starting with static analysis fixes by
replacing a bitwise-and with a constant value and replace "magic"
numbers with defines. Fixed the control queue processing by removing
unnecessary read/writes to registers, as well as getting a accurate
value for "pending". Added additional checks to avoid NULL pointer
dereferences. Fixed up code formatting issues, by cleaning up code
comments and coding style.
Bruce cleans up a duplicate check for owner, within the same function.
Also cleans up interrupt causes that are not handled or applicable.
Fix checkpatch warning about the use of bool in structures due to the
wasted space and size of bool, so convert struct members to u8 instead.
Jake fixes a number of potential bugs in the reporting of stats via
ethtool, by simply reporting all the queue statistics, even for the
queues that are not activated. Fixed a compiler warning, as well as
make the code a bit cleaner but just using order_base_2() for
calculating the power-of-2.
Preethi adds a check to avoid a NULL pointer dereference crash during
initialization.
Brett clarifies the code when it comes to port VLANs and regular VLANs,
by renaming defines and field values to match their intended use and
purpose.
Jesse initializes a variable to avoid garbage values being returned to
the caller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Thu, 9 Aug 2018 13:29:02 +0000 (06:29 -0700)]
ice: Change struct members from bool to u8
Recent versions of checkpatch have a new warning based on a documented
preference of Linus to not use bool in structures due to wasted space and
the size of bool is implementation dependent. For more information, see
the email thread at https://lkml.org/lkml/2017/11/21/384.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anirudh Venkataramanan [Thu, 9 Aug 2018 13:29:00 +0000 (06:29 -0700)]
ice: Fix a few null pointer dereference issues
1) When ice_ena_msix_range() fails to reserve vectors, a devm_kfree()
warning was seen in the error flow path. So check pf->irq_tracker
before use in ice_clear_interrupt_scheme().
2) In ice_vsi_cfg(), check vsi->netdev before use.
3) In ice_get_link_status, check link_up before use.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Thu, 9 Aug 2018 13:28:59 +0000 (06:28 -0700)]
ice: Update to interrupts enabled in OICR
Remove the following interrupt causes that are not applicable or not
handled:
- PFINT_OICR_HLP_RDY_M
- PFINT_OICR_CPM_RDY_M
- PFINT_OICR_GPIO_M
- PFINT_OICR_STORM_DETECT_M
Add the following interrupt cause that's actually handled in ice_misc_intr:
- PFINT_OICR_PE_CRITERR_M
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Quentin Monnet [Thu, 23 Aug 2018 16:46:25 +0000 (17:46 +0100)]
tools: bpftool: return from do_event_pipe() on bad arguments
When command line parsing fails in the while loop in do_event_pipe()
because the number of arguments is incorrect or because the keyword is
unknown, an error message is displayed, but bpftool remains stuck in
the loop. Make sure we exit the loop upon failure.
Brett Creeley [Thu, 9 Aug 2018 13:28:58 +0000 (06:28 -0700)]
ice: Set VLAN flags correctly
In the struct ice_aqc_vsi_props the field port_vlan_flags is an
overloaded term because it is used for both port VLANs (PVLANs) and
regular VLANs. This is an issue and is very confusing especially when
dealing with VFs because normal VLANs and port VLANs are not the same.
To fix this the field was renamed to vlan_flags and all of the #define's
labeled *_PVLAN_* were renamed to *_VLAN_* if they are not specific to
port VLANs.
Also in ice_vsi_manage_vlan_stripping, set the ICE_AQ_VSI_VLAN_MODE_ALL
bit to allow the driver to add a VLAN tag to all packets it sends.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 9 Aug 2018 13:28:57 +0000 (06:28 -0700)]
ice: Use order_base_2 to calculate higher power of 2
Currently, we use a combination of ilog2 and is_power_of_2() to
calculate the next power of 2 for the qcount. This appears to be causing
a warning on some combinations of GCC and the Linux kernel:
This appears to because because GCC realizes that qcount could be zero
in some circumstances and thus attempts to link against the
intentionally undefined ___ilog2_NaN function.
The order_base_2 function is intentionally defined to return 0 when
passed 0 as an argument, and thus will be safe to use here.
This not only fixes the warning but makes the resulting code slightly
cleaner, and is really what we should have used originally.
Also update the comment to make it more clear that we are rounding up,
not just incrementing the ilog2 of qcount unconditionally.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anirudh Venkataramanan [Thu, 9 Aug 2018 13:28:56 +0000 (06:28 -0700)]
ice: Fix bugs in control queue processing
This patch is a consolidation of multiple bug fixes for control queue
processing.
1) In ice_clean_adminq_subtask() remove unnecessary reads/writes to
registers. The bits PFINT_FW_CTL, PFINT_MBX_CTL and PFINT_SB_CTL
are not set when an interrupt arrives, which means that clearing them
again can be omitted.
2) Get an accurate value in "pending" by re-reading the control queue
head register from the hardware.
3) Fix a corner case involving lost control queue messages by checking
for new control messages (using ice_ctrlq_pending) before exiting the
cleanup routine.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Preethi Banala [Thu, 9 Aug 2018 13:28:55 +0000 (06:28 -0700)]
ice: Clean control queues only when they are initialized
Clean control queues only when they are initialized. One of the ways to
validate if the basic initialization is done is by checking value of
cq->sq.head and cq->rq.head variables that specify the register address.
This patch adds a check to avoid NULL pointer dereference crash when tried
to shutdown uninitialized control queue.
Signed-off-by: Preethi Banala <preethi.banala@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 9 Aug 2018 13:28:54 +0000 (06:28 -0700)]
ice: Report stats for allocated queues via ethtool stats
It is not safe to have the string table for statistics change order or
size over the lifetime of a given netdevice. This is because of the
nature of the 3-step process for obtaining stats. First, user space
performs a request for the size of the strings table. Second it performs
a separate request for the strings themselves, after allocating space
for the table. Third, it requests the stats themselves, also allocating
space for the table.
If the size decreased, there is potential to see garbage data or stats
values. In the worst case, we could potentially see stats values become
mis-aligned with their strings, so that it looks like a statistic is
being reported differently than it actually is.
Even worse, if the size increased, there is potential that the strings
table or stats table was not allocated large enough and the stats code
could access and write to memory it should not, potentially resulting in
undefined behavior and system crashes.
It isn't even safe if the size always changes under the RTNL lock. This
is because the calls take place over multiple user space commands, so it
is not possible to hold the RTNL lock for the entire duration of
obtaining strings and stats. Further, not all consumers of the ethtool
API are the user space ethtool program, and it is possible that one
assumes the strings will not change (valid under the current contract),
and thus only requests the stats values when requesting stats in a loop.
Finally, it's not possible in the general case to detect when the size
changes, because it is quite possible that one value which could impact
the stat size increased, while another decreased. This would result in
the same total number of stats, but reordering them so that stats no
longer line up with the strings they belong to. Since only size changes
aren't enough, we would need some sort of hash or token to determine
when the strings no longer match. This would require extending the
ethtool stats commands, but there is no more space in the relevant
structures.
The real solution to resolve this would be to add a completely new API
for stats, probably over netlink.
In the ice driver, the only thing impacting the stats that is not
constant is the number of queues. Instead of reporting stats for each
used queue, report stats for each allocated queue. We do not change the
number of queues allocated for a given netdevice, as we pass this into
the alloc_etherdev_mq() function to set the num_tx_queues and
num_rx_queues.
This resolves the potential bugs at the slight cost of displaying many
queue statistics which will not be activated.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Daniel Borkmann [Wed, 22 Aug 2018 21:49:37 +0000 (23:49 +0200)]
bpf: use per htab salt for bucket hash
All BPF hash and LRU maps currently have a known and global seed
we feed into jhash() which is 0. This is suboptimal, thus fix it
by generating a random seed upon hashtab setup time which we can
later on feed into jhash() on lookup, update and deletions.
Fixes: 0f8e4bd8a1fc8 ("bpf: add hashtable type of eBPF maps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Bruce Allan [Thu, 9 Aug 2018 13:28:52 +0000 (06:28 -0700)]
ice: Remove unnecessary node owner check
There is already a check for owner == ICE_SCHED_NODE_OWNER_LAN at the
beginning of ice_sched_update_vsi_child_nodes. Remove the additional
check to address the static analysis tool smatch issue "warn: we tested
'owner' before and it was 'false'".
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anirudh Venkataramanan [Thu, 9 Aug 2018 13:28:51 +0000 (06:28 -0700)]
ice: Fix multiple static analyser warnings
This patch fixes the following smatch errors:
1) Fix "odd binop '0x0 & 0xc'" when performing the bitwise-and with a
constant value of zero (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128_FLAG).
Remove a similar bitwise-and with 0 in ice_add_marker_act() and use the
right mask ICE_LG_ACT_GENERIC_OFFSET_M in the expression.
2) Fix a similar issue "odd binop '0x0 & 0x1800' in ice_req_irq_msix_misc.
3) Fix "odd binop '0x380000 & 0x7fff8'" in ice_add_marker_act(). Also, use
a new define ICE_LG_ACT_GENERIC_OFF_RX_DESC_PROF_IDX instead of magic
number '7'.
4) Fix warn: odd binop '0x0 & 0x18' in ice_set_dflt_vsi_ctx() by removing
unnecessary logic to explicitly unset bits 3 and 4 in port_vlan_bits.
These bits are unset already by the memset on ctxt->info.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Huazhong Tan [Thu, 23 Aug 2018 03:37:16 +0000 (11:37 +0800)]
net: hns3: modify variable type in hns3_nic_reuse_page
'truesize' is supposed to be u32, not int, so fix it.
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 23 Aug 2018 03:37:15 +0000 (11:37 +0800)]
net: hns3: fix page_offset overflow when CONFIG_ARM64_64K_PAGES
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of
PAGE_SIZE is 65536(64K). But the type of page_offset is u16, it will
overflow. So change it to u32, when "CONFIG_ARM64_64K_PAGES" enabled.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Thu, 23 Aug 2018 03:31:37 +0000 (11:31 +0800)]
net/ipv6: init ip6 anycast rt->dst.input as ip6_input
Commit 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
forgot to handle anycast route and init anycast rt->dst.input to ip6_forward.
Fix it by setting anycast rt->dst.input back to ip6_input.
Fixes: 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 23 Aug 2018 03:10:13 +0000 (11:10 +0800)]
net: hns: use eth_get_headlen interface instead of hns_nic_get_headlen
Update hns to drop the hns_nic_get_headlen function in favour of
eth_get_headlen, and hence also removes now redundant hns_nic_get_headlen.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 23 Aug 2018 03:10:12 +0000 (11:10 +0800)]
net: hns: fix skb->truesize underestimation
skb->truesize is not meant to be tracking amount of used bytes in a skb,
but amount of reserved/consumed bytes in memory.
For instance, if we use a single byte in last page fragment, we have to
account the full size of the fragment.
So skb_add_rx_frag needs to calculate the length of the entire buffer into
turesize.
Fixes: 9cbe9fd5214e ("net: hns: optimize XGE capability by reducing cpu usage") Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 23 Aug 2018 03:10:11 +0000 (11:10 +0800)]
net: hns: modify variable type in hns_nic_reuse_page
'truesize' is supposed to be u32, not int, so fix it.
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 23 Aug 2018 03:10:10 +0000 (11:10 +0800)]
net: hns: fix length and page_offset overflow when CONFIG_ARM64_64K_PAGES
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length and page_offset are u16, they will
overflow. So change them to u32.
Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Aug 2018 04:45:32 +0000 (21:45 -0700)]
Merge branch 'tcp_bbr-PROBE_RTT-minor-bug-fixes'
Kevin Yang says:
====================
tcp_bbr: PROBE_RTT minor bug fixes
This series includes two minor bug fixes for the TCP BBR PROBE_RTT
mechanism, and one preparatory patch:
(1) A preparatory patch to reorganize the PROBE_RTT logic by refactoring
(into its own function) the code to exit PROBE_RTT, since the next
patch will be using that code in a new context.
(2) Fix: When BBR restarts from idle and if BBR is in PROBE_RTT mode,
BBR should check if it's time to exit PROBE_RTT. If yes, then BBR
should exit PROBE_RTT mode and restore the cwnd to its full value.
(3) Fix: Apply the PROBE_RTT cwnd cap even if the count of fully-ACKed
packets is 0.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Yang [Wed, 22 Aug 2018 21:43:16 +0000 (17:43 -0400)]
tcp_bbr: apply PROBE_RTT cwnd cap even if acked==0
This commit fixes a corner case where TCP BBR would enter PROBE_RTT
mode but not reduce its cwnd. If a TCP receiver ACKed less than one
full segment, the number of delivered/acked packets was 0, so that
bbr_set_cwnd() would short-circuit and exit early, without cutting
cwnd to the value we want for PROBE_RTT.
The fix is to instead make sure that even when 0 full packets are
ACKed, we do apply all the appropriate caps, including the cap that
applies in PROBE_RTT mode.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Yang [Wed, 22 Aug 2018 21:43:15 +0000 (17:43 -0400)]
tcp_bbr: in restart from idle, see if we should exit PROBE_RTT
This patch fix the case where BBR does not exit PROBE_RTT mode when
it restarts from idle. When BBR restarts from idle and if BBR is in
PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If
yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its
full value.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Yang [Wed, 22 Aug 2018 21:43:14 +0000 (17:43 -0400)]
tcp_bbr: add bbr_check_probe_rtt_done() helper
This patch add a helper function bbr_check_probe_rtt_done() to
1. check the condition to see if bbr should exit probe_rtt mode;
2. process the logic of exiting probe_rtt mode.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Aug 2018 20:30:45 +0000 (13:30 -0700)]
ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
to send some control packets.
1) RST packets, through tcp_v4_send_reset()
2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()
These packets assert IP_DF, and also use the hashed IP ident generator
to provide an IPv4 ID number.
Geoff Alexander reported this could be used to build off-path attacks.
These packets should not be fragmented, since their size is smaller than
IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
regardless of inner IPID.
We really can use zero IPID, to address the flaw, and as a bonus,
avoid a couple of atomic operations in ip_idents_reserve()
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Geoff Alexander <alexandg@cs.unm.edu> Tested-by: Geoff Alexander <alexandg@cs.unm.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 22 Aug 2018 19:58:34 +0000 (12:58 -0700)]
addrconf: reduce unnecessary atomic allocations
All the 3 callers of addrconf_add_mroute() assert RTNL
lock, they don't take any additional lock either, so
it is safe to convert it to GFP_KERNEL.
Same for sit_add_v4_addrs().
Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 22 Aug 2018 15:25:44 +0000 (17:25 +0200)]
net_sched: fix unused variable warning in stmmac
The new tcf_exts_for_each_action() macro doesn't reference its
arguments when CONFIG_NET_CLS_ACT is disabled, which leads to
a harmless warning in at least one driver:
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c: In function 'tc_fill_actions':
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:64:6: error: unused variable 'i' [-Werror=unused-variable]
Adding a cast to void lets us avoid this kind of warning.
To be on the safe side, do it for all three arguments, not
just the one that caused the warning.
Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Wed, 22 Aug 2018 10:29:43 +0000 (12:29 +0200)]
sch_cake: Fix TC filter flow override and expand it to hosts as well
The TC filter flow mapping override completely skipped the call to
cake_hash(); however that meant that the internal state was not being
updated, which ultimately leads to deadlocks in some configurations. Fix
that by passing the overridden flow ID into cake_hash() instead so it can
react appropriately.
In addition, the major number of the class ID can now be set to override
the host mapping in host isolation mode. If both host and flow are
overridden (or if the respective modes are disabled), flow dissection and
hashing will be skipped entirely; otherwise, the hashing will be kept for
the portions that are not set by the filter.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Mendoza-Jonas [Wed, 22 Aug 2018 04:57:44 +0000 (14:57 +1000)]
net/ncsi: Fixup .dumpit message flags and ID check in Netlink handler
The ncsi_pkg_info_all_nl() .dumpit handler is missing the NLM_F_MULTI
flag, causing additional package information after the first to be lost.
Also fixup a sanity check in ncsi_write_package_info() to reject out of
range package IDs.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wolfram Sang [Tue, 21 Aug 2018 22:02:19 +0000 (00:02 +0200)]
net: ethernet: renesas: use SPDX identifier for Renesas drivers
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 22 Aug 2018 15:37:37 +0000 (08:37 -0700)]
bpf: sockmap: write_space events need to be passed to TCP handler
When sockmap code is using the stream parser it also handles the write
space events in order to handle the case where (a) verdict redirects
skb to another socket and (b) the sockmap then sends the skb but due
to memory constraints (or other EAGAIN errors) needs to do a retry.
But the initial code missed a third case where the
skb_send_sock_locked() triggers an sk_wait_event(). A typically case
would be when sndbuf size is exceeded. If this happens because we
do not pass the write_space event to the lower layers we never wake
up the event and it will wait for sndtimeo. Which as noted in ktls
fix may be rather large and look like a hang to the user.
To reproduce the best test is to reduce the sndbuf size and send
1B data chunks to stress the memory handling. To fix this pass the
event from the upper layer to the lower layer.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Wed, 22 Aug 2018 15:37:32 +0000 (08:37 -0700)]
tls: possible hang when do_tcp_sendpages hits sndbuf is full case
Currently, the lower protocols sk_write_space handler is not called if
TLS is sending a scatterlist via tls_push_sg. However, normally
tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
that in turn may trigger a wait via sk_wait_event. Typically, this
happens when the in-flight bytes exceed the sdnbuf size. In the normal
case when enough ACKs are received sk_write_space() will be called and
the sk_wait_event will be woken up allowing it to send more data
and/or return to the user.
But, in the TLS case because the sk_write_space() handler does not
wake up the events the above send will wait until the sndtimeo is
exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
hang to the user (especially this impatient user). To fix this pass
the sk_write_space event to the lower layers sk_write_space event
which in the TCP case will wake any pending events.
I observed the above while integrating sockmap and ktls. It
initially appeared as test_sockmap (modified to use ktls) occasionally
hanging. To reliably reproduce this reduce the sndbuf size and stress
the tls layer by sending many 1B sends. This results in every byte
needing a header and each byte individually being sent to the crypto
layer.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann [Wed, 22 Aug 2018 16:09:17 +0000 (18:09 +0200)]
bpf, sockmap: fix sock hash count in alloc_sock_hash_elem
When we try to allocate a new sock hash entry and the allocation
fails, then sock hash map fails to reduce the map element counter,
meaning we keep accounting this element although it was never used.
Fix it by dropping the element counter on error.
Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com>
Daniel Borkmann [Tue, 21 Aug 2018 13:55:00 +0000 (15:55 +0200)]
bpf, sockmap: fix sock_hash_alloc and reject zero-sized keys
Currently, it is possible to create a sock hash map with key size
of 0 and have the kernel return a fd back to user space. This is
invalid for hash maps (and kernel also hasn't been tested for zero
key size support in general at this point). Thus, reject such
configuration.
Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com>
Prashant Bhole [Mon, 20 Aug 2018 00:54:25 +0000 (09:54 +0900)]
xsk: fix return value of xdp_umem_assign_dev()
s/ENOTSUPP/EOPNOTSUPP/ in function umem_assign_dev().
This function's return value is directly returned by xsk_bind().
EOPNOTSUPP is bind()'s possible return value.
Fixes: f734607e819b ("xsk: refactor xdp_umem_assign_dev()") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cong Wang [Sun, 19 Aug 2018 19:22:13 +0000 (12:22 -0700)]
act_ife: fix a potential deadlock
use_all_metadata() acquires read_lock(&ife_mod_lock), then calls
add_metainfo() which calls find_ife_oplist() which acquires the same
lock again. Deadlock!
Introduce __add_metainfo() which accepts struct tcf_meta_ops *ops
as an additional parameter and let its callers to decide how
to find it. For use_all_metadata(), it already has ops, no
need to find it again, just call __add_metainfo() directly.
And, as ife_mod_lock is only needed for find_ife_oplist(),
this means we can make non-atomic allocation for populate_metalist()
now.
Fixes: 817e9f2c5c26 ("act_ife: acquire ife_mod_lock before reading ifeoplist") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sun, 19 Aug 2018 19:22:12 +0000 (12:22 -0700)]
act_ife: move tcfa_lock down to where necessary
The only time we need to take tcfa_lock is when adding
a new metainfo to an existing ife->metalist. We don't need
to take tcfa_lock so early and so broadly in tcf_ife_init().
This means we can always take ife_mod_lock first, avoid the
reverse locking ordering warning as reported by Vlad.
Reported-by: Vlad Buslov <vladbu@mellanox.com> Tested-by: Vlad Buslov <vladbu@mellanox.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sun, 19 Aug 2018 19:22:11 +0000 (12:22 -0700)]
Revert "net: sched: act_ife: disable bh when taking ife_mod_lock"
This reverts commit 42c625a486f3 ("net: sched: act_ife: disable bh
when taking ife_mod_lock"), because what ife_mod_lock protects
is absolutely not touched in rate est timer BH context, they have
no race.
A better fix is following up.
Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sun, 19 Aug 2018 19:22:09 +0000 (12:22 -0700)]
net_sched: remove list_head from tc_action
After commit 90b73b77d08e, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.
Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions") Cc: Jiri Pirko <jiri@mellanox.com> Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sun, 19 Aug 2018 19:22:06 +0000 (12:22 -0700)]
net_sched: remove unnecessary ops->delete()
All ops->delete() wants is getting the tn->idrinfo, but we already
have tc_action before calling ops->delete(), and tc_action has
a pointer ->idrinfo.
More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().
So it can be just removed.
Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops") Cc: Jiri Pirko <jiri@mellanox.com> Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sun, 19 Aug 2018 19:22:05 +0000 (12:22 -0700)]
net_sched: improve and refactor tcf_action_put_many()
tcf_action_put_many() is mostly called to clean up actions on
failure path, but tcf_action_put_many(&actions[acts_deleted]) is
used in the ugliest way: it passes a slice of the array and
uses an additional NULL at the end to avoid out-of-bound
access.
acts_deleted is completely unnecessary since we can teach
tcf_action_put_many() scan the whole array and checks against
NULL pointer. Which also means tcf_action_delete() should
set deleted action pointers to NULL to avoid double free.
Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions") Cc: Jiri Pirko <jiri@mellanox.com> Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Tue, 21 Aug 2018 17:40:38 +0000 (10:40 -0700)]
hv_netvsc: ignore devices that are not PCI
Registering another device with same MAC address (such as TAP, VPN or
DPDK KNI) will confuse the VF autobinding logic. Restrict the search
to only run if the device is known to be a PCI attached VF.
Fixes: e8ff40d4bff1 ("hv_netvsc: improve VF device matching") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yue Haibing [Tue, 21 Aug 2018 14:05:42 +0000 (14:05 +0000)]
rds: tcp: remove duplicated include from tcp.c
Remove duplicated include.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Tue, 14 Aug 2018 15:10:31 +0000 (10:10 -0500)]
Bluetooth: mediatek: Fix memory leak
In case memory resources for *fw* were allocated, release them before
return.
Addresses-Coverity-ID: 1472611 ("Resource leak") Fixes: 7237c4c9ec92 ("Bluetooth: mediatek: Add protocol support for MediaTek serial devices") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
David Ahern [Mon, 20 Aug 2018 20:02:41 +0000 (13:02 -0700)]
net/ipv6: Put lwtstate when destroying fib6_info
Prior to the introduction of fib6_info lwtstate was managed by the dst
code. With fib6_info releasing lwtstate needs to be done when the struct
is freed.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kai-Heng Feng [Mon, 20 Aug 2018 04:43:51 +0000 (12:43 +0800)]
r8152: disable RX aggregation on new Dell TB16 dock
There's a new Dell TB16 dock with a different iSerialNumber.
Apply the same fix from commit 0b1655143df0 ("r8152: disable RX
aggregation on Dell TB16 dock") to this model.
BugLink: https://bugs.launchpad.net/bugs/1785780 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Sun, 19 Aug 2018 21:01:45 +0000 (00:01 +0300)]
qed: Avoid sending mailbox commands when MFW is not responsive
Keep sending mailbox commands to the MFW when it is not responsive ends up
with a redundant amount of timeout expiries.
This patch prints the MCP status on the first command which is not
responded, and blocks the following commands.
Since the (un)load request commands might be not responded due to other
PFs, the patch also adds the option to skip the blocking upon a failure.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Sun, 19 Aug 2018 21:01:44 +0000 (00:01 +0300)]
qed: Prevent a possible deadlock during driver load and unload
The MFW manages an internal lock to prevent concurrent hardware
(de)initialization of different PFs.
This, together with the busy-waiting for the MFW's responses for commands,
might lead to a deadlock during concurrent load or unload of PFs.
This patch adds the option to sleep within the busy-waiting, and uses it
for the (un)load requests (which are not sent from an interrupt context) to
prevent the possible deadlock.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Sun, 19 Aug 2018 21:01:43 +0000 (00:01 +0300)]
qed: Wait for MCP halt and resume commands to take place
Successive iterations of halting and resuming the management chip (MCP)
might fail, since currently the driver doesn't wait for these operations to
actually take place.
This patch prevents the driver from moving forward before the operations
are reflected in the state register.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Sun, 19 Aug 2018 21:01:42 +0000 (00:01 +0300)]
qed: Wait for ready indication before rereading the shmem
The MFW might be reset and re-update its shared memory.
Upon the detection of such a reset the driver rereads this memory, but it
has to wait till the data is valid.
This patch adds the missing wait for a data ready indication.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This because ip6n->tnls_wc[0] point to fallback device in default, but
in non-default namespace, ip6n->tnls_wc[0] will be NULL, so add the NULL
check comparatively.
Fixes: e2948e5af8ee ("ip6_vti: fix creating fallback tunnel device for vti6") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: r8169
Kernel modules: r8169
Falling back to MSI fixes the issue.
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling") Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Tue, 14 Aug 2018 18:46:16 +0000 (21:46 +0300)]
net: sched: always disable bh when taking tcf_lock
Recently, ops->init() and ops->dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.
Change ops->init() and ops->dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:
[ 105.470398] ================================
[ 105.475014] WARNING: inconsistent lock state
[ 105.479628] 4.18.0-rc8+ #664 Not tainted
[ 105.483897] --------------------------------
[ 105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[ 105.509696] {SOFTIRQ-ON-W} state was registered at:
[ 105.514925] _raw_spin_lock+0x2c/0x40
[ 105.519022] tcf_bpf_init+0x579/0x820 [act_bpf]
[ 105.523990] tcf_action_init_1+0x4e4/0x660
[ 105.528518] tcf_action_init+0x1ce/0x2d0
[ 105.532880] tcf_exts_validate+0x1d8/0x200
[ 105.537416] fl_change+0x55a/0x268b [cls_flower]
[ 105.542469] tc_new_tfilter+0x748/0xa20
[ 105.546738] rtnetlink_rcv_msg+0x56a/0x6d0
[ 105.551268] netlink_rcv_skb+0x18d/0x200
[ 105.555628] netlink_unicast+0x2d0/0x370
[ 105.559990] netlink_sendmsg+0x3b9/0x6a0
[ 105.564349] sock_sendmsg+0x6b/0x80
[ 105.568271] ___sys_sendmsg+0x4a1/0x520
[ 105.572547] __sys_sendmsg+0xd7/0x150
[ 105.576655] do_syscall_64+0x72/0x2c0
[ 105.580757] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 105.586243] irq event stamp: 489296
[ 105.590084] hardirqs last enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
[ 105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
[ 105.609277] softirqs last enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
[ 105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
[ 105.626813]
other info that might help us debug this:
[ 105.633976] Possible unsafe locking scenario:
Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:
[ 162.108959] Possible interrupt unsafe locking scenario:
In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.
Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock") Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock") Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock") Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock") Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock") Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock") Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 19 Aug 2018 16:56:38 +0000 (09:56 -0700)]
Merge tag 'riscv-for-linus-4.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains some major improvements to the RISC-V port, including
the necessary interrupt controller and timer support to actually make
it to userspace. Support for three devices has been added:
- the ISA-mandated timers on RISC-V systems.
- the ISA-mandated first-level interrupt controller on RISC-V
systems, which is handled as part of our core arch code because
it's very small and tightly tied to the ISA.
- SiFive's platform-level interrupt controller, which talks to the
actual devices.
In addition to these new devices, there are a handful of cleanups all
over the RISC-V tree:
- build fixes for various configurations:
* A fix to the vDSO build's makefile so it respects CFLAGS.
* The addition of __lshrti3, a libgcc derived function necessary
for some 32-bit configurations.
* !SMP && PERF_EVENTS
- Cleanups to the arch code to remove the remnants of old versions of
the drivers that were just properly submitted.
* Some dead code from the timer driver, most of which wasn't ever
even compiled.
* Cleanups of some interrupt #defines, which are now local to the
interrupt handling code.
- Fixes to ptrace(), which while not being sufficient to fully make
GDB work are at least sufficient to get simple GDB tasks to work.
- Early printk support via RISC-V's architecturally mandated SBI
console device.
- A fix to our early debug trap handler to ensure it's always
aligned.
These patches have all been through a fairly extensive review process,
but as this enables a whole pile of functionality (ie, userspace) I'm
confident we'll need to submit a few more patches. The only concrete
issues I know about are the sys_riscv_flush_icache patches, but as I
managed to screw those up on Friday I figured it'd be best to let them
bake another week.
This tag boots a Fedora root filesystem on QEMU's master branch for
me, and before this morning's rebase (from 4.18-rc8 to 4.18) it booted
on the HiFive Unleashed.
Thanks to Christoph Hellwig and the other guys at WD for getting the
new drivers in shape!"
* tag 'riscv-for-linus-4.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
dt-bindings: interrupt-controller: SiFive Plaform Level Interrupt Controller
dt-bindings: interrupt-controller: RISC-V local interrupt controller
RISC-V: Fix !CONFIG_SMP compilation error
irqchip: add a SiFive PLIC driver
RISC-V: Add the directive for alignment of stvec's value
clocksource: new RISC-V SBI timer driver
RISC-V: implement low-level interrupt handling
RISC-V: add a definition for the SIE SEIE bit
RISC-V: remove INTERRUPT_CAUSE_* defines from asm/irq.h
RISC-V: simplify software interrupt / IPI code
RISC-V: remove timer leftovers
RISC-V: Add early printk support via the SBI console
RISC-V: Don't increment sepc after breakpoint.
RISC-V: implement __lshrti3.
RISC-V: Use KBUILD_CFLAGS instead of KCFLAGS when building the vDSO
Linus Torvalds [Sun, 19 Aug 2018 16:30:44 +0000 (09:30 -0700)]
Merge tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull UIO fix from Greg KH:
"Here is a single UIO fix that I forgot to send before 4.18-final came
out. It reverts a UIO patch that went in the 4.18 development window
that was causing problems.
This patch has been in linux-next for a while with no problems, I just
forgot to send it earlier, or as part of the larger char/misc patch
series from yesterday, my fault"
* tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "uio: use request_threaded_irq instead"
Linus Torvalds [Sat, 18 Aug 2018 23:45:27 +0000 (16:45 -0700)]
Merge tag 'hwlock-v4.19' of git://github.com/andersson/remoteproc
Pull hwspinlock updates from Bjorn Andersson:
"This introduces devres helpers and an API to request a lock by name,
then migrates the sprd SPI driver to use these"
* tag 'hwlock-v4.19' of git://github.com/andersson/remoteproc:
hwspinlock: Fix incorrect return pointers
spi: sprd: Change to use devm_hwspin_lock_request_specific()
spi: sprd: Replace of_hwspin_lock_get_id() with of_hwspin_lock_get_id_byname()
hwspinlock: Fix one comment mistake
hwspinlock: Remove redundant config
hwspinlock: Add devm_xxx() APIs to register/unregister one hwlock controller
hwspinlock: Add devm_xxx() APIs to request/free hwlock
hwspinlock: Add one new API to support getting a specific hwlock by the name
Linus Torvalds [Sat, 18 Aug 2018 23:43:57 +0000 (16:43 -0700)]
Merge tag 'rpmsg-v4.19' of git://github.com/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"This fixes a few compile and kerneldoc warnings, allows rpmsg devices
to handle power domains, allow for labeling GLINK edges and supports
compat for rpmsg_char"
* tag 'rpmsg-v4.19' of git://github.com/andersson/remoteproc:
rpmsg: Add compat ioctl for rpmsg char driver
rpmsg: glink: Store edge name for glink device
dt-bindings: soc: qcom: Add label for GLINK bindings
rpmsg: core: add support to power domains for devices
rpmsg: smd: fix kerneldoc warnings
rpmsg: glink: Fix various kerneldoc warnings.
rpmsg: glink: correctly annotate intent members
rpmsg: smd: Add missing include of sizes.h
Linus Torvalds [Sat, 18 Aug 2018 23:42:04 +0000 (16:42 -0700)]
Merge tag 'rproc-v4.19' of git://github.com/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
"This adds support for pre-start and post-shutdown hooks for remoteproc
subdevices, refactors the Qualcomm Hexagon support to allow reuse
between several drivers, makes authentication in the MDT file loader
optional, migrates a few format strings to use %pK and migrates the
Davinci driver to use the reset framework"
* tag 'rproc-v4.19' of git://github.com/andersson/remoteproc:
remoteproc/davinci: use the reset framework
remoteproc/davinci: Mark error recovery as disabled
remoteproc: st_slim: replace "%p" with "%pK"
remoteproc: replace "%p" with "%pK"
remoteproc: qcom: fix Q6V5_WCSS dependencies
remoteproc: Reset table_ptr in rproc_start() failure paths
remoteproc: qcom: q6v5-pil: fix modem hang on SDM845 after axis2 clk unvote
remoteproc: qcom q6v5: fix modular build
remoteproc: Introduce prepare and unprepare for subdevices
remoteproc: rename subdev probe and remove functions
remoteproc: Make client initialize ops in rproc_subdev
remoteproc: Make start and stop in subdev optional
remoteproc: Rename subdev functions to start/stop
remoteproc: qcom: Introduce Hexagon V5 based WCSS driver
remoteproc: qcom: q6v5-pil: Use common q6v5 helpers
remoteproc: qcom: adsp: Use common q6v5 helpers
remoteproc: q6v5: Extract common resource handling
remoteproc: qcom: mdt_loader: Make the firmware authentication optional
Linus Torvalds [Sat, 18 Aug 2018 23:16:57 +0000 (16:16 -0700)]
Merge tag 'linux-watchdog-4.19-rc1' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add MEN 16z069 IP-Core driver
- renesas-wdt: add support for the R8A77990 wdt
- stm32_iwdg: Add stm32mp1 support and pclk feature
- sp805_wdt, orion_wdt, sprd_wdt: several improvements
- imx2_wdt, stmp3xxx: switch to SPDX identifier
* tag 'linux-watchdog-4.19-rc1' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: fix dependencies of menz69_wdt.o
watchdog: sp805: Add clock-frequency property
watchdog: add driver for the MEN 16z069 IP-Core
watchdog: sprd_wdt: Remove redundant dev_err call in sprd_wdt_probe()
watchdog: stmp3xxx: Switch to SPDX identifier
watchdog: imx2_wdt: Switch to SPDX identifier
watchdog: sp805: set WDOG_HW_RUNNING when appropriate
watchdog: sp805: add 'timeout-sec' DT property support
dt-bindings: watchdog: Add optional 'timeout-sec' property for sp805
dt-bindings: watchdog: Consolidate SP805 binding docs
watchdog: orion_wdt: Mark watchdog as active when running at probe
watchdog: stm32: add pclk feature for stm32mp1
dt-bindings: watchdog: add stm32mp1 support
dt-bindings: watchdog: renesas-wdt: Add support for the R8A77990 wdt
Linus Torvalds [Sat, 18 Aug 2018 22:55:59 +0000 (15:55 -0700)]
Merge tag 'dmaengine-4.19-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull DMAengine updates from Vinod Koul:
"This round brings couple of framework changes, a new driver and usual
driver updates:
- new managed helper for dmaengine framework registration
- split dmaengine pause capability to pause and resume and allow
drivers to report that individually
- update dma_request_chan_by_mask() to handle deferred probing
- move imx-sdma to use virt-dma
- new driver for Actions Semi Owl family S900 controller
- minor updates to intel, renesas, mv_xor, pl330 etc"
* tag 'dmaengine-4.19-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (46 commits)
dmaengine: Add Actions Semi Owl family S900 DMA driver
dt-bindings: dmaengine: Add binding for Actions Semi Owl SoCs
dmaengine: sh: rcar-dmac: Should not stop the DMAC by rcar_dmac_sync_tcr()
dmaengine: mic_x100_dma: use the new helper to simplify the code
dmaengine: add a new helper dmaenginem_async_device_register
dmaengine: imx-sdma: add memcpy interface
dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0xffff'
dmaengine: dma_request_chan_by_mask() to handle deferred probing
dmaengine: pl330: fix irq race with terminate_all
dmaengine: Revert "dmaengine: mv_xor_v2: enable COMPILE_TEST"
dmaengine: mv_xor_v2: use {lower,upper}_32_bits to configure HW descriptor address
dmaengine: mv_xor_v2: enable COMPILE_TEST
dmaengine: mv_xor_v2: move unmap to before callback
dmaengine: mv_xor_v2: convert callback to helper function
dmaengine: mv_xor_v2: kill the tasklets upon exit
dmaengine: mv_xor_v2: explicitly freeup irq
dmaengine: sh: rcar-dmac: Add dma_pause operation
dmaengine: sh: rcar-dmac: add a new function to clear CHCR.DE with barrier
dmaengine: idma64: Support dmaengine_terminate_sync()
dmaengine: hsu: Support dmaengine_terminate_sync()
...