]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
6 years agoMerge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
David S. Miller [Tue, 17 Sep 2019 21:51:10 +0000 (23:51 +0200)]
Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net

Pull in bug fixes from 'net' tree for the merge window.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-spectrum_buffers-Add-the-ability-to-query-the-CPU-ports-shared...
David S. Miller [Mon, 16 Sep 2019 20:07:59 +0000 (22:07 +0200)]
Merge branch 'mlxsw-spectrum_buffers-Add-the-ability-to-query-the-CPU-ports-shared-buffer'

Ido Schimmel says:

====================
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer

Shalom says:

While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.

Patch #1 prevents changing the CPU port's threshold and binding.

Patch #2 registers the CPU port with devlink.

Patch #3 adds the ability to query the CPU port's shared buffer quotas and
occupancy.

v3:

Patch #2:
* Remove unnecessary wrapping

v2:

Patch #1:
* s/0/MLXSW_PORT_CPU_PORT/
* Assign "mlxsw_sp->ports[MLXSW_PORT_CPU_PORT]" at the end of
  mlxsw_sp_cpu_port_create() to avoid NULL assignment on error path
* Add common functions for mlxsw_core_port_init/fini()

Patch #2:
* Move "changing CPU port's threshold and binding" check to a separate
  patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
Shalom Toledo [Mon, 16 Sep 2019 15:04:22 +0000 (18:04 +0300)]
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer

While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.

Since the CPU port has no ingress buffers, all the shared buffers ingress
information will be cleared.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Register CPU port with devlink
Shalom Toledo [Mon, 16 Sep 2019 15:04:21 +0000 (18:04 +0300)]
mlxsw: spectrum: Register CPU port with devlink

Register CPU port with devlink.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_buffers: Prevent changing CPU port's configuration
Shalom Toledo [Mon, 16 Sep 2019 15:04:20 +0000 (18:04 +0300)]
mlxsw: spectrum_buffers: Prevent changing CPU port's configuration

Next patch is going to register the CPU port with devlink, but only so
that the CPU port's shared buffer configuration and occupancy could be
queried.

Prevent changing CPU port's shared buffer threshold and binding
configuration.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-ena-implement-adaptive-interrupt-moderation-using-dim'
David S. Miller [Mon, 16 Sep 2019 20:06:03 +0000 (22:06 +0200)]
Merge branch 'net-ena-implement-adaptive-interrupt-moderation-using-dim'

Arthur Kiyanovski says:

====================
net: ena: implement adaptive interrupt moderation using dim

In this patchset we replace our adaptive interrupt moderation
implementation with the dim library implementation.
The dim library showed great improvement in throughput, latency
and CPU usage in different scenarios on ARM CPUs.
This patchset also includes a few bug fixes to the parts of the
old implementation of adaptive interrupt moderation that were left.

Changes from V1 patchset:
Removed stray empty lines from patches 01/11, 09/11.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: fix incorrect update of intr_delay_resolution
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:36 +0000 (14:31 +0300)]
net: ena: fix incorrect update of intr_delay_resolution

ena_dev->intr_moder_rx/tx_interval save the intervals received from the
user after dividing them by ena_dev->intr_delay_resolution. Therefore
when intr_delay_resolution changes, the code needs to first mutiply
intr_moder_rx/tx_interval by the previous intr_delay_resolution to get
the value originally given by the user, and only then divide it by the
new intr_delay_resolution.

Current code does not first multiply intr_moder_rx/tx_interval by the old
intr_delay_resolution. This commit fixes it.

Also initialize ena_dev->intr_delay_resolution to be 1.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: fix retrieval of nonadaptive interrupt moderation intervals
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:35 +0000 (14:31 +0300)]
net: ena: fix retrieval of nonadaptive interrupt moderation intervals

Nonadaptive interrupt moderation intervals are assigned the value set
by the user in ethtool -C divided by ena_dev->intr_delay_resolution.

Therefore when the user tries to get the nonadaptive interrupt moderation
intervals with ethtool -c the code needs to multiply the saved value
by ena_dev->intr_delay_resolution.

The current code erroneously divides instead of multiplying in ethtool -c.
This patch fixes this.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: fix update of interrupt moderation register
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:34 +0000 (14:31 +0300)]
net: ena: fix update of interrupt moderation register

Current implementation always updates the interrupt register with
the smoothed_interval of the rx_ring. However this should be
done only in case of adaptive interrupt moderation. If non-adaptive
interrupt moderation is used, the non-adaptive interrupt moderation
interval should be used. This commit fixes that.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: remove all old adaptive rx interrupt moderation code from ena_com
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:33 +0000 (14:31 +0300)]
net: ena: remove all old adaptive rx interrupt moderation code from ena_com

Remove previous implementation of adaptive rx interrupt moderation
from ena_com files.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: remove ena_restore_ethtool_params() and relevant fields
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:32 +0000 (14:31 +0300)]
net: ena: remove ena_restore_ethtool_params() and relevant fields

Deleted unused 4 fields from struct ena_adapter and their only user
ena_restore_ethtool_params().

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: remove old adaptive interrupt moderation code from ena_netdev
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:31 +0000 (14:31 +0300)]
net: ena: remove old adaptive interrupt moderation code from ena_netdev

1. Out of the fields {per_napi_bytes, per_napi_packets} in struct ena_ring,
   only rx_ring->per_napi_packets are used to determine if napi did work
   for dim.
   This commit removes all other uses of these fields.
2. Remove ena_ring->moder_tbl_idx, which is not used by dim.
3. Remove all calls to ena_com_destroy_interrupt_moderation(), since all it
   did was to destroy the interrupt moderation table, which is removed as
   part of removing old interrupt moderation code.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval...
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:30 +0000 (14:31 +0300)]
net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()

Remove code duplication in:
ena_com_update_nonadaptive_moderation_interval_tx()
ena_com_update_nonadaptive_moderation_interval_rx()
functions.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: enable the interrupt_moderation in driver_supported_features
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:29 +0000 (14:31 +0300)]
net: ena: enable the interrupt_moderation in driver_supported_features

Add driver_supported_features to host_host info which is a new API used to
communicate to the device which features are supported by the driver.

Add the interrupt_moderation bit to host_info->driver_supported_features
and enable it to signal the device that this driver supports interrupt
moderation properly.

Reserved bits are for features implemented in the future

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: reimplement set/get_coalesce()
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:28 +0000 (14:31 +0300)]
net: ena: reimplement set/get_coalesce()

1. Remove old adaptive interrupt moderation code from set/get_coalesce()
2. Add ena_update_rx_rings_intr_moderation() function for updating
   nonadaptive interrupt moderation intervals similarly to
   ena_update_tx_rings_intr_moderation().
3. Remove checks of multiple unsupported received interrupt coalescing
   parameters. This makes code cleaner and cancels the need to update
   it every time a new coalescing parameter is invented.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: switch to dim algorithm for rx adaptive interrupt moderation
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:27 +0000 (14:31 +0300)]
net: ena: switch to dim algorithm for rx adaptive interrupt moderation

Use the dim library for the rx adaptive interrupt moderation implementation

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
Arthur Kiyanovski [Mon, 16 Sep 2019 11:31:26 +0000 (14:31 +0300)]
net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it

Add intr_moder_rx_interval to struct ena_com_dev and use it as the
location where the interrupt moderation rx interval is saved, instead
of the interrupt moderation table.

This is done as a first step before removing the old interrupt moderation
code.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ethtool-implement-Energy-Detect-Powerdown-support-via-phy-tunable'
David S. Miller [Mon, 16 Sep 2019 20:02:45 +0000 (22:02 +0200)]
Merge branch 'ethtool-implement-Energy-Detect-Powerdown-support-via-phy-tunable'

Alexandru Ardelean says:

====================
ethtool: implement Energy Detect Powerdown support via phy-tunable

This changeset proposes a new control for PHY tunable to control Energy
Detect Power Down.

The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.

The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.

Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.

This series updates only the `adin` PHY driver to support this new feature,
as this chip has been tested. A change for `micrel` can be proposed after a
discussion of the PHY-tunable API is resolved.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
Alexandru Ardelean [Mon, 16 Sep 2019 07:35:26 +0000 (10:35 +0300)]
net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable

This driver becomes the first user of the kernel's `ETHTOOL_PHY_EDPD`
phy-tunable feature.
EDPD is also enabled by default on PHY config_init, but can be disabled via
the phy-tunable control.

When enabling EDPD, it's also a good idea (for the ADIN PHYs) to enable TX
periodic pulses, so that in case the other PHY is also on EDPD mode, there
is no lock-up situation where both sides are waiting for the other to
transmit.

Via the phy-tunable control, TX pulses can be disabled if specifying 0
`tx-interval` via ethtool.

The ADIN PHY supports only fixed 1 second intervals; they cannot be
configured. That is why the acceptable values are 1,
ETHTOOL_PHY_EDPD_DFLT_TX_MSECS and ETHTOOL_PHY_EDPD_NO_TX (which disables
TX pulses).

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoethtool: implement Energy Detect Powerdown support via phy-tunable
Alexandru Ardelean [Mon, 16 Sep 2019 07:35:25 +0000 (10:35 +0300)]
ethtool: implement Energy Detect Powerdown support via phy-tunable

The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.

The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.

Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.

The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
datasheet does not mention whether they can be disabled, but mentions that
they can modified.

The way this change is structured, is similar to the PHY tunable downshift
control:
* a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default
  TX interval; some PHYs could specify a certain value that makes sense
* `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
* `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD

As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1
millisecond, which should cover a reasonable range of intervals:
 - from 1 millisecond, which does not sound like much of a power-saver
 - to ~65 seconds which is quite a lot to wait for a link to come up when
   plugging a cable

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoxen-netfront: do not assume sk_buff_head list is empty in error handling
Dongli Zhang [Mon, 16 Sep 2019 03:46:59 +0000 (11:46 +0800)]
xen-netfront: do not assume sk_buff_head list is empty in error handling

When skb_shinfo(skb) is not able to cache extra fragment (that is,
skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS), xennet_fill_frags() assumes
the sk_buff_head list is already empty. As a result, cons is increased only
by 1 and returns to error handling path in xennet_poll().

However, if the sk_buff_head list is not empty, queue->rx.rsp_cons may be
set incorrectly. That is, queue->rx.rsp_cons would point to the rx ring
buffer entries whose queue->rx_skbs[i] and queue->grant_rx_ref[i] are
already cleared to NULL. This leads to NULL pointer access in the next
iteration to process rx ring buffer entries.

Below is how xennet_poll() does error handling. All remaining entries in
tmpq are accounted to queue->rx.rsp_cons without assuming how many
outstanding skbs are remained in the list.

 985 static int xennet_poll(struct napi_struct *napi, int budget)
... ...
1032           if (unlikely(xennet_set_skb_gso(skb, gso))) {
1033                   __skb_queue_head(&tmpq, skb);
1034                   queue->rx.rsp_cons += skb_queue_len(&tmpq);
1035                   goto err;
1036           }

It is better to always have the error handling in the same way.

Fixes: ad4f15dc2c70 ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agos390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”
Markus Elfring [Sun, 15 Sep 2019 17:21:05 +0000 (19:21 +0200)]
s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”

The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ena: don't wake up tx queue when down
Sameeh Jubran [Sun, 15 Sep 2019 14:29:44 +0000 (17:29 +0300)]
net: ena: don't wake up tx queue when down

There is a race condition that can occur when calling ena_down().
The ena_clean_tx_irq() - which is a part of the napi handler -
function might wake up the tx queue when the queue is supposed
to be down (during recovery or changing the size of the queues
for example) This causes the ena_start_xmit() function to trigger
and possibly try to access the destroyed queues.

The race is illustrated below:

Flow A:                                       Flow B(napi handler)
ena_down()
   netif_carrier_off()
   netif_tx_disable()
                                                      ena_clean_tx_irq()
                                                         netif_tx_wake_queue()
   ena_napi_disable_all()
   ena_destroy_all_io_queues()

After these flows the tx queue is active and ena_start_xmit() accesses
the destroyed queue which leads to a kernel panic.

fixes: 1738cd3ed342 (net: ena: Add a driver for Amazon Elastic Network Adapters (ENA))

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'drop_monitor-Better-sanitize-notified-packets'
David S. Miller [Mon, 16 Sep 2019 19:39:27 +0000 (21:39 +0200)]
Merge branch 'drop_monitor-Better-sanitize-notified-packets'

Ido Schimmel says:

====================
drop_monitor: Better sanitize notified packets

When working in 'packet' mode, drop monitor generates a notification
with a potentially truncated payload of the dropped packet. The payload
is copied from the MAC header, but I forgot to check that the MAC header
was set, so do it now.

Patch #1 sets the offsets to the various protocol layers in netdevsim,
so that it will continue to work after the MAC header check is added to
drop monitor in patch #2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodrop_monitor: Better sanitize notified packets
Ido Schimmel [Sun, 15 Sep 2019 06:46:36 +0000 (09:46 +0300)]
drop_monitor: Better sanitize notified packets

When working in 'packet' mode, drop monitor generates a notification
with a potentially truncated payload of the dropped packet. The payload
is copied from the MAC header, but I forgot to check that the MAC header
was set, so do it now.

Fixes: ca30707dee2b ("drop_monitor: Add packet alert mode")
Fixes: 5e58109b1ea4 ("drop_monitor: Add support for packet alert mode for hardware drops")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetdevsim: Set offsets to various protocol layers
Ido Schimmel [Sun, 15 Sep 2019 06:46:35 +0000 (09:46 +0300)]
netdevsim: Set offsets to various protocol layers

The driver periodically generates "trapped" UDP packets that it then
passes on to devlink. Set the offsets to the various protocol layers.

This is a prerequisite to the next patch, where drop monitor is taught
to check that the offset to the MAC header was set.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tc-taprio-offload-for-SJA1105-DSA'
David S. Miller [Mon, 16 Sep 2019 19:32:58 +0000 (21:32 +0200)]
Merge branch 'tc-taprio-offload-for-SJA1105-DSA'

Vladimir Oltean says:

====================
tc-taprio offload for SJA1105 DSA

This is the third attempt to submit the tc-taprio offload model for
inclusion in the networking tree. The sja1105 switch driver will provide
the first implementation of the offload. Only the bare minimum is added:

- The offload model and a DSA pass-through
- The hardware implementation
- The interaction with the netdev queues in the tagger code
- Documentation

What has been removed from previous attempts is support for
PTP-as-clocksource in sja1105, as well as configuring the traffic class
for management traffic.  These will be added as soon as the offload
model is settled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodocs: net: dsa: sja1105: Add info about the Time-Aware Scheduler
Vladimir Oltean [Sun, 15 Sep 2019 02:00:03 +0000 (05:00 +0300)]
docs: net: dsa: sja1105: Add info about the Time-Aware Scheduler

While not an exhaustive usage tutorial, this describes the details
needed to build more complex scenarios.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload
Vladimir Oltean [Sun, 15 Sep 2019 02:00:02 +0000 (05:00 +0300)]
net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload

This qdisc offload is the closest thing to what the SJA1105 supports in
hardware for time-based egress shaping. The switch core really is built
around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
operate similarly to IEEE 802.1Qbv with some constraints:

- The gate control list is a global list for all ports. There are 8
  execution threads that iterate through this global list in parallel.
  I don't know why 8, there are only 4 front-panel ports.

- Care must be taken by the user to make sure that two execution threads
  never get to execute a GCL entry simultaneously. I created a O(n^4)
  checker for this hardware limitation, prior to accepting a taprio
  offload configuration as valid.

- The spec says that if a GCL entry's interval is shorter than the frame
  length, you shouldn't send it (and end up in head-of-line blocking).
  Well, this switch does anyway.

- The switch has no concept of ADMIN and OPER configurations. Because
  it's so simple, the TAS settings are loaded through the static config
  tables interface, so there isn't even place for any discussion about
  'graceful switchover between ADMIN and OPER'. You just reset the
  switch and upload a new OPER config.

- The switch accepts multiple time sources for the gate events. Right
  now I am using the standalone clock source as opposed to PTP. So the
  base time parameter doesn't really do much. Support for the PTP clock
  source will be added in a future series.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: sja1105: Advertise the 8 TX queues
Vladimir Oltean [Sun, 15 Sep 2019 02:00:01 +0000 (05:00 +0300)]
net: dsa: sja1105: Advertise the 8 TX queues

This is a preparation patch for the tc-taprio offload (and potentially
for other future offloads such as tc-mqprio).

Instead of looking directly at skb->priority during xmit, let's get the
netdev queue and the queue-to-traffic-class mapping, and put the
resulting traffic class into the dsa_8021q PCP field. The switch is
configured with a 1-to-1 PCP-to-ingress-queue-to-egress-queue mapping
(see vlan_pmap in sja1105_main.c), so the effect is that we can inject
into a front-panel's egress traffic class through VLAN tagging from
Linux, completely transparently.

Unfortunately the switch doesn't look at the VLAN PCP in the case of
management traffic to/from the CPU (link-local frames at
01-80-C2-xx-xx-xx or 01-1B-19-xx-xx-xx) so we can't alter the
transmission queue of this type of traffic on a frame-by-frame basis. It
is only selected through the "hostprio" setting which ATM is harcoded in
the driver to 7.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: sja1105: Add static config tables for scheduling
Vladimir Oltean [Sun, 15 Sep 2019 02:00:00 +0000 (05:00 +0300)]
net: dsa: sja1105: Add static config tables for scheduling

In order to support tc-taprio offload, the TTEthernet egress scheduling
core registers must be made visible through the static interface.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: Pass ndo_setup_tc slave callback to drivers
Vladimir Oltean [Sun, 15 Sep 2019 01:59:59 +0000 (04:59 +0300)]
net: dsa: Pass ndo_setup_tc slave callback to drivers

DSA currently handles shared block filters (for the classifier-action
qdisc) in the core due to what I believe are simply pragmatic reasons -
hiding the complexity from drivers and offerring a simple API for port
mirroring.

Extend the dsa_slave_setup_tc function by passing all other qdisc
offloads to the driver layer, where the driver may choose what it
implements and how. DSA is simply a pass-through in this case.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotaprio: Add support for hardware offloading
Vinicius Costa Gomes [Sun, 15 Sep 2019 01:59:58 +0000 (04:59 +0300)]
taprio: Add support for hardware offloading

This allows taprio to offload the schedule enforcement to capable
network cards, resulting in more precise windows and less CPU usage.

The gate mask acts on traffic classes (groups of queues of same
priority), as specified in IEEE 802.1Q-2018, and following the existing
taprio and mqprio semantics.
It is up to the driver to perform conversion between tc and individual
netdev queues if for some reason it needs to make that distinction.

Full offload is requested from the network interface by specifying
"flags 2" in the tc qdisc creation command, which in turn corresponds to
the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit.

The important detail here is the clockid which is implicitly /dev/ptpN
for full offload, and hence not configurable.

A reference counting API is added to support the use case where Ethernet
drivers need to keep the taprio offload structure locally (i.e. they are
a multi-port switch driver, and configuring a port depends on the
settings of other ports as well). The refcount_t variable is kept in a
private structure (__tc_taprio_qopt_offload) and not exposed to drivers.

In the future, the private structure might also be expanded with a
backpointer to taprio_sched *q, to implement the notification system
described in the patch (of when admin became oper, or an error occurred,
etc, so the offload can be monitored with 'tc qdisc show').

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phylink: clarify where phylink should be used
Russell King [Sat, 14 Sep 2019 09:44:04 +0000 (10:44 +0100)]
net: phylink: clarify where phylink should be used

Update the phylink documentation to make it clear that phylink is
designed to be used on the MAC facing side of the link, rather than
between a SFP and PHY.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'bnxt_en-error-recovery-follow-up-patches'
David S. Miller [Mon, 16 Sep 2019 14:44:28 +0000 (16:44 +0200)]
Merge branch 'bnxt_en-error-recovery-follow-up-patches'

Michael Chan says:

====================
bnxt_en: error recovery follow-up patches.

A follow-up patchset for the recently added health and error recovery
feature.  The first fix is to prevent .ndo_set_rx_mode() from proceeding
when reset is in progress.  The 2nd fix is for the firmware coredump
command.  The 3rd and 4th patches update the error recovery process
slightly to add a state that polls and waits for the firmware to be down.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add a new BNXT_FW_RESET_STATE_POLL_FW_DOWN state.
Vasundhara Volam [Sat, 14 Sep 2019 04:01:41 +0000 (00:01 -0400)]
bnxt_en: Add a new BNXT_FW_RESET_STATE_POLL_FW_DOWN state.

This new state is required when firmware indicates that the error
recovery process requires polling for firmware state to be completely
down before initiating reset.  For example, firmware may take some
time to collect the crash dump before it is down and ready to be
reset.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Update firmware interface spec. to 1.10.0.100.
Michael Chan [Sat, 14 Sep 2019 04:01:40 +0000 (00:01 -0400)]
bnxt_en: Update firmware interface spec. to 1.10.0.100.

Some error recovery updates to the spec., among other minor changes.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Increase timeout for HWRM_DBG_COREDUMP_XX commands
Vasundhara Volam [Sat, 14 Sep 2019 04:01:39 +0000 (00:01 -0400)]
bnxt_en: Increase timeout for HWRM_DBG_COREDUMP_XX commands

Firmware coredump messages take much longer than standard messages,
so increase the timeout accordingly.

Fixes: 6c5657d085ae ("bnxt_en: Add support for ethtool get dump.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Don't proceed in .ndo_set_rx_mode() when device is not in open state.
Michael Chan [Sat, 14 Sep 2019 04:01:38 +0000 (00:01 -0400)]
bnxt_en: Don't proceed in .ndo_set_rx_mode() when device is not in open state.

Check the BNXT_STATE_OPEN flag instead of netif_running() in
bnxt_set_rx_mode().  If the driver is going through any reset, such
as firmware reset or even TX timeout, it may not be ready to set the RX
mode and may crash.  The new rx mode settings will be picked up when
the device is opened again later.

Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: Add snd_wnd to TCP_INFO
Thomas Higdon [Fri, 13 Sep 2019 23:23:35 +0000 (23:23 +0000)]
tcp: Add snd_wnd to TCP_INFO

Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
performance problems --
> (1) Usually when we're diagnosing TCP performance problems, we do so
> from the sender, since the sender makes most of the
> performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> From the sender-side the thing that would be most useful is to see
> tp->snd_wnd, the receive window that the receiver has advertised to
> the sender.

This serves the purpose of adding an additional __u32 to avoid the
would-be hole caused by the addition of the tcpi_rcvi_ooopack field.

Signed-off-by: Thomas Higdon <tph@fb.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: Add TCP_INFO counter for packets received out-of-order
Thomas Higdon [Fri, 13 Sep 2019 23:23:34 +0000 (23:23 +0000)]
tcp: Add TCP_INFO counter for packets received out-of-order

For receive-heavy cases on the server-side, we want to track the
connection quality for individual client IPs. This counter, similar to
the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
tracks out-of-order packet reception. By providing this counter in
TCP_INFO, it will allow understanding to what degree receive-heavy
sockets are experiencing out-of-order delivery and packet drops
indicating congestion.

Please note that this is similar to the counter in NetBSD TCP_INFO, and
has the same name.

Also note that we avoid increasing the size of the tcp_sock struct by
taking advantage of a hole.

Signed-off-by: Thomas Higdon <tph@fb.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mdio: switch to using gpiod_get_optional()
Dmitry Torokhov [Fri, 13 Sep 2019 22:55:47 +0000 (15:55 -0700)]
net: mdio: switch to using gpiod_get_optional()

The MDIO device reset line is optional and now that gpiod_get_optional()
returns proper value when GPIO support is compiled out, there is no
reason to use fwnode_get_named_gpiod() that I plan to hide away.

Let's switch to using more standard gpiod_get_optional() and
gpiod_set_consumer_name() to keep the nice "PHY reset" label.

Also there is no reason to only try to fetch the reset GPIO when we have
OF node, gpiolib can fetch GPIO data from firmwares as well.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Mon, 16 Sep 2019 14:02:03 +0000 (16:02 +0200)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-09-16

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Now that initial BPF backend for gcc has been merged upstream, enable
   BPF kselftest suite for bpf-gcc. Also fix a BE issue with access to
   bpf_sysctl.file_pos, from Ilya.

2) Follow-up fix for link-vmlinux.sh to remove bash-specific extensions
   related to recent work on exposing BTF info through sysfs, from Andrii.

3) AF_XDP zero copy fixes for i40e and ixgbe driver which caused umem
   headroom to be added twice, from Ciara.

4) Refactoring work to convert sock opt tests into test_progs framework
   in BPF kselftests, from Stanislav.

5) Fix a general protection fault in dev_map_hash_update_elem(), from Toke.

6) Cleanup to use BPF_PROG_RUN() macro in KCM, from Sami.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf: fix accessing bpf_sysctl.file_pos on s390
Ilya Leoshkevich [Fri, 16 Aug 2019 10:53:00 +0000 (12:53 +0200)]
bpf: fix accessing bpf_sysctl.file_pos on s390

"ctx:file_pos sysctl:read write ok" fails on s390 with "Read value  !=
nux". This is because verifier rewrites a complete 32-bit
bpf_sysctl.file_pos update to a partial update of the first 32 bits of
64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian
systems.

Fix by using an offset on big-endian systems.

Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect
a problem there, since it expects to see 0, which it gets with high
probability in error cases, so change it to seek to offset 3 and expect
3 in bpf_sysctl.file_pos.

Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
6 years agoxdp: Fix race in dev_map_hash_update_elem() when replacing element
Toke Høiland-Jørgensen [Sun, 8 Sep 2019 08:20:16 +0000 (09:20 +0100)]
xdp: Fix race in dev_map_hash_update_elem() when replacing element

syzbot found a crash in dev_map_hash_update_elem(), when replacing an
element with a new one. Jesper correctly identified the cause of the crash
as a race condition between the initial lookup in the map (which is done
before taking the lock), and the removal of the old element.

Rather than just add a second lookup into the hashmap after taking the
lock, fix this by reworking the function logic to take the lock before the
initial lookup.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch 'bpf-af-xdp-unaligned-fixes'
Daniel Borkmann [Mon, 16 Sep 2019 07:35:10 +0000 (09:35 +0200)]
Merge branch 'bpf-af-xdp-unaligned-fixes'

Ciara Loftus says:

====================
This patch set contains some fixes for AF_XDP zero copy in the i40e and
ixgbe drivers as well as a fix for the 'xdpsock' sample application when
running in unaligned mode.

Patches 1 and 2 fix a regression for the i40e and ixgbe drivers which
caused the umem headroom to be added to the xdp handle twice, resulting in
an incorrect value being received by the user for the case where the umem
headroom is non-zero.

Patch 3 fixes an issue with the xdpsock sample application whereby the
start of the tx packet data (offset) was not being set correctly when the
application was being run in unaligned mode.

This patch set has been applied against commit a2c11b034142 ("kcm: use
BPF_PROG_RUN")
====================

Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agosamples/bpf: fix xdpsock l2fwd tx for unaligned mode
Ciara Loftus [Fri, 13 Sep 2019 10:39:48 +0000 (10:39 +0000)]
samples/bpf: fix xdpsock l2fwd tx for unaligned mode

Preserve the offset of the address of the received descriptor, and include
it in the address set for the tx descriptor, so the kernel can correctly
locate the start of the packet data.

Fixes: 03895e63ff97 ("samples/bpf: add buffer recycling for unaligned chunks to xdpsock")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoixgbe: fix xdp handle calculations
Ciara Loftus [Fri, 13 Sep 2019 10:39:47 +0000 (10:39 +0000)]
ixgbe: fix xdp handle calculations

Commit 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") reintroduced
the addition of the umem headroom to the xdp handle in the ixgbe_zca_free,
ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However,
the headroom is already added to the handle in the function
ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the
case where the headroom is non-zero.

Fixes: 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoi40e: fix xdp handle calculations
Ciara Loftus [Fri, 13 Sep 2019 10:39:46 +0000 (10:39 +0000)]
i40e: fix xdp handle calculations

Commit 4c5d9a7fa149 ("i40e: fix xdp handle calculations") reintroduced
the addition of the umem headroom to the xdp handle in the i40e_zca_free,
i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However,
the headroom is already added to the handle in the function i40_run_xdp_zc.
This commit removes the latter addition and fixes the case where the
headroom is non-zero.

Fixes: 4c5d9a7fa149 ("i40e: fix xdp handle calculations")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoselftests/bpf: add bpf-gcc support
Ilya Leoshkevich [Thu, 12 Sep 2019 16:05:43 +0000 (18:05 +0200)]
selftests/bpf: add bpf-gcc support

Now that binutils and gcc support for BPF is upstream, make use of it in
BPF selftests using alu32-like approach. Share as much as possible of
CFLAGS calculation with clang.

Fixes only obvious issues, leaving more complex ones for later:
- Use gcc-provided bpf-helpers.h instead of manually defining the
  helpers, change bpf_helpers.h include guard to avoid conflict.
- Include <linux/stddef.h> for __always_inline.
- Add $(OUTPUT)/../usr/include to include path in order to use local
  kernel headers instead of system kernel headers when building with O=.

In order to activate the bpf-gcc support, one needs to configure
binutils and gcc with --target=bpf and make them available in $PATH. In
particular, gcc must be installed as `bpf-gcc`, which is the default.

Right now with binutils 25a2915e8dba and gcc r275589 only a handful of
tests work:

# ./test_progs_bpf_gcc
# Summary: 7/39 PASSED, 1 SKIPPED, 98 FAILED

The reason for those failures are as follows:

- Build errors:
  - `error: too many function arguments for eBPF` for __always_inline
    functions read_str_var and read_map_var - must be inlining issue,
    and for process_l3_headers_v6, which relies on optimizing away
    function arguments.
  - `error: indirect call in function, which are not supported by eBPF`
    where there are no obvious indirect calls in the source calls, e.g.
    in __encap_ipip_none.
  - `error: field 'lock' has incomplete type` for fields of `struct
    bpf_spin_lock` type - bpf_spin_lock is re#defined by bpf-helpers.h,
    so its usage is sensitive to order of #includes.
  - `error: eBPF stack limit exceeded` in sysctl_tcp_mem.
- Load errors:
  - Missing object files due to above build errors.
  - `libbpf: failed to create map (name: 'test_ver.bss')`.
  - `libbpf: object file doesn't contain bpf program`.
  - `libbpf: Program '.text' contains unrecognized relo data pointing to
    section 0`.
  - `libbpf: BTF is required, but is missing or corrupted` - no BTF
    support in gcc yet.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agonet: stmmac: socfpga: re-use the `interface` parameter from platform data
Alexandru Ardelean [Mon, 16 Sep 2019 07:04:00 +0000 (10:04 +0300)]
net: stmmac: socfpga: re-use the `interface` parameter from platform data

The socfpga sub-driver defines an `interface` field in the `socfpga_dwmac`
struct and parses it on init.

The shared `stmmac_probe_config_dt()` function also parses this from the
device-tree and makes it available on the returned `plat_data` (which is
the same data available via `netdev_priv()`).

All that's needed now is to dig that information out, via some
`dev_get_drvdata()` && `netdev_priv()` calls and re-use it.

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'More-fixes-for-unlocked-cls-hardware-offload-API-refactoring'
David S. Miller [Mon, 16 Sep 2019 07:18:03 +0000 (09:18 +0200)]
Merge branch 'More-fixes-for-unlocked-cls-hardware-offload-API-refactoring'

Vlad Buslov says:

====================
More fixes for unlocked cls hardware offload API refactoring

Two fixes for my "Refactor cls hardware offload API to support
rtnl-independent drivers" series and refactoring patch that implements
infrastructure necessary for the fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: use get_dev() action API in flow_action infra
Vlad Buslov [Fri, 13 Sep 2019 15:28:41 +0000 (18:28 +0300)]
net: sched: use get_dev() action API in flow_action infra

When filling in hardware intermediate representation tc_setup_flow_action()
directly obtains, checks and takes reference to dev used by mirred action,
instead of using act->ops->get_dev() API created specifically for this
purpose. In order to remove code duplication, refactor flow_action infra to
use action API when obtaining mirred action target dev. Extend get_dev()
with additional argument that is used to provide dev destructor to the
user.

Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: take reference to psample group in flow_action infra
Vlad Buslov [Fri, 13 Sep 2019 15:28:40 +0000 (18:28 +0300)]
net: sched: take reference to psample group in flow_action infra

With recent patch set that removed rtnl lock dependency from cls hardware
offload API rtnl lock is only taken when reading action data and can be
released after action-specific data is parsed into intermediate
representation. However, sample action psample group is passed by pointer
without obtaining reference to it first, which makes it possible to
concurrently overwrite the action and deallocate object pointed by
psample_group pointer after rtnl lock is released but before driver
finished using the pointer.

To prevent such race condition, obtain reference to psample group while it
is used by flow_action infra. Extend psample API with function
psample_group_take() that increments psample group reference counter.
Extend struct tc_action_ops with new get_psample_group() API. Implement the
API for action sample using psample_group_take() and already existing
psample_group_put() as a destructor. Use it in tc_setup_flow_action() to
take reference to psample group pointed to by entry->sample.psample_group
and release it in tc_cleanup_flow_action().

Disable bh when taking psample_groups_lock. The lock is now taken while
holding action tcf_lock that is used by data path and requires bh to be
disabled, so doing the same for psample_groups_lock is necessary to
preserve SOFTIRQ-irq-safety.

Fixes: 918190f50eb6 ("net: sched: flower: don't take rtnl lock for cls hw offloads API")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: extend flow_action_entry with destructor
Vlad Buslov [Fri, 13 Sep 2019 15:28:39 +0000 (18:28 +0300)]
net: sched: extend flow_action_entry with destructor

Generalize flow_action_entry cleanup by extending the structure with
pointer to destructor function. Set the destructor in
tc_setup_flow_action(). Refactor tc_cleanup_flow_action() to call
entry->destructor() instead of using switch that dispatches by entry->id
and manually executes cleanup.

This refactoring is necessary for following patches in this series that
require destructor to use tc_action->ops callbacks that can't be easily
obtained in tc_cleanup_flow_action().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMAINTAINERS: update FORCEDETH MAINTAINERS info
Rain River [Fri, 13 Sep 2019 13:43:45 +0000 (21:43 +0800)]
MAINTAINERS: update FORCEDETH MAINTAINERS info

Many FORCEDETH NICs are used in our hosts. Several bugs are fixed and
some features are developed for FORCEDETH NICs. And I have been
reviewing patches for FORCEDETH NIC for several months. Mark me as the
FORCEDETH NIC maintainer. I will send out the patches and maintain
FORCEDETH NIC.

Signed-off-by: Rain River <rain.1986.08.12@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/wan: dscc4: remove broken dscc4 driver
Dan Carpenter [Fri, 13 Sep 2019 13:28:17 +0000 (16:28 +0300)]
net/wan: dscc4: remove broken dscc4 driver

Using static analysis, I discovered that the "dpriv->pci_priv->pdev"
pointer is always NULL.  This pointer was supposed to be initialized
during probe and is essential for the driver to work.  It would be easy
to add a "ppriv->pdev = pdev;" to dscc4_found1() but this driver has
been broken since before we started using git and no one has complained
so probably we should just remove it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMAINTAINERS: xen-netback: update my email address
Paul Durrant [Fri, 13 Sep 2019 12:47:27 +0000 (13:47 +0100)]
MAINTAINERS: xen-netback: update my email address

My Citrix email address will expire shortly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: Hold rtnl lock in suspend/resume callbacks
Jose Abreu [Fri, 13 Sep 2019 09:50:32 +0000 (11:50 +0200)]
net: stmmac: Hold rtnl lock in suspend/resume callbacks

We need to hold rnl lock in suspend and resume callbacks because phylink
requires it. Otherwise we will get a WARN() in suspend and resume.

Also, move phylink start and stop callbacks to inside device's internal
lock so that we prevent concurrent HW accesses.

Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic")
Reported-by: Christophe ROULLIER <christophe.roullier@st.com>
Tested-by: Christophe ROULLIER <christophe.roullier@st.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip6_gre: fix a dst leak in ip6erspan_tunnel_xmit
Xin Long [Fri, 13 Sep 2019 09:45:47 +0000 (17:45 +0800)]
ip6_gre: fix a dst leak in ip6erspan_tunnel_xmit

In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to
be freed on the tx_err path. Otherwise when deleting a netns, it would
cause dst/dev to leak, and dmesg shows:

  unregister_netdevice: waiting for lo to become free. Usage count = 1

Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: fix spelling mistake "fullill" -> "fulfill"
Colin Ian King [Fri, 13 Sep 2019 09:07:59 +0000 (10:07 +0100)]
qed: fix spelling mistake "fullill" -> "fulfill"

There is a spelling mistake in a DP_VERBOSE debug message. Fix it.
(Using American English spelling as this is the most common way
to spell this in the kernel).

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: b53: Add support for port_egress_floods callback
Florian Fainelli [Fri, 13 Sep 2019 03:28:39 +0000 (20:28 -0700)]
net: dsa: b53: Add support for port_egress_floods callback

Add support for configuring the per-port egress flooding control for
both Unicast and Multicast traffic.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoudp: correct reuseport selection with connected sockets
Willem de Bruijn [Fri, 13 Sep 2019 01:16:39 +0000 (21:16 -0400)]
udp: correct reuseport selection with connected sockets

UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.

Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.

Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.

New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'
Gerd Rausch [Thu, 12 Sep 2019 20:49:41 +0000 (13:49 -0700)]
net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'

All entries in 'rds_ib_stat_names' are stringified versions
of the corresponding "struct rds_ib_statistics" element
without the "s_"-prefix.

Fix entry 'ib_evt_handler_call' to do the same.

Fixes: f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance")
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet_sched: let qdisc_put() accept NULL pointer
Cong Wang [Thu, 12 Sep 2019 17:22:30 +0000 (10:22 -0700)]
net_sched: let qdisc_put() accept NULL pointer

When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL
pointer which leads to a crash in sfb_destroy(). Similar for
sch_dsmark.

Instead of fixing each separately, Linus suggested to just accept
NULL pointer in qdisc_put(), which would make callers easier.

(For sch_dsmark, the bug probably exists long before commit
6529eaba33f0.)

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Reported-by: syzbot+d5870a903591faaca4ae@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: Fix load order between DSA drivers and taggers
Andrew Lunn [Thu, 12 Sep 2019 13:16:45 +0000 (15:16 +0200)]
net: dsa: Fix load order between DSA drivers and taggers

The DSA core, DSA taggers and DSA drivers all make use of
module_init(). Hence they get initialised at device_initcall() time.
The ordering is non-deterministic. It can be a DSA driver is bound to
a device before the needed tag driver has been initialised, resulting
in the message:

No tagger for this switch

Rather than have this be fatal, return -EPROBE_DEFER so that it is
tried again later once all the needed drivers have been loaded.

Fixes: d3b8c04988ca ("dsa: Add boilerplate helper to register DSA tag driver modules")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: fix race between deactivation and dequeue for NOLOCK qdisc
Paolo Abeni [Thu, 12 Sep 2019 10:02:42 +0000 (12:02 +0200)]
net/sched: fix race between deactivation and dequeue for NOLOCK qdisc

The test implemented by some_qdisc_is_busy() is somewhat loosy for
NOLOCK qdisc, as we may hit the following scenario:

CPU1 CPU2
// in net_tx_action()
clear_bit(__QDISC_STATE_SCHED...);
// in some_qdisc_is_busy()
val = (qdisc_is_running(q) ||
       test_bit(__QDISC_STATE_SCHED,
&q->state));
// here val is 0 but...
qdisc_run(q)
// ... CPU1 is going to run the qdisc next

As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset()
in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as
both the above bit operations are under the root qdisc lock().

After commit 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue")
the race can cause use after free and/or null ptr dereference, but the root
cause is likely older.

This patch addresses the issue explicitly checking for deactivation under
the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical
scenario becomes a no-op.

Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(),
but that is safe due to the skb_array() locking, and we can't avoid that
for NOLOCK qdiscs.

Fixes: 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue")
Reported-by: Li Shuang <shuali@redhat.com>
Reported-and-tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
David S. Miller [Sun, 15 Sep 2019 12:17:27 +0000 (14:17 +0200)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Minor overlapping changes in the btusb and ixgbe drivers.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 14 Sep 2019 23:07:40 +0000 (16:07 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "The main change here is a revert of reverts. We recently simplified
  some code that was thought unnecessary; however, since then KVM has
  grown quite a few cond_resched()s and for that reason the simplified
  code is prone to livelocks---one CPUs tries to empty a list of guest
  page tables while the others keep adding to them. This adds back the
  generation-based zapping of guest page tables, which was not
  unnecessary after all.

  On top of this, there is a fix for a kernel memory leak and a couple
  of s390 fixlets as well"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
  KVM: x86: work around leak of uninitialized stack contents
  KVM: nVMX: handle page fault in vmread
  KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
  KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()

6 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Sat, 14 Sep 2019 23:02:49 +0000 (16:02 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fix from Michael Tsirkin:
 "A last minute revert

  The 32-bit build got broken by the latest defence in depth patch.
  Revert and we'll try again in the next cycle"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  Revert "vhost: block speculation of translated descriptors"

6 years agoMerge tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv...
Linus Torvalds [Sat, 14 Sep 2019 22:58:02 +0000 (15:58 -0700)]
Merge tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fix from Paul Walmsley:
 "Last week, Palmer and I learned that there was an error in the RISC-V
  kernel image header format that could make it less compatible with the
  ARM64 kernel image header format. I had missed this error during my
  original reviews of the patch.

  The kernel image header format is an interface that impacts
  bootloaders, QEMU, and other user tools. Those packages must be
  updated to align with whatever is merged in the kernel. We would like
  to avoid proliferating these image formats by keeping the RISC-V
  header as close as possible to the existing ARM64 header. Since the
  arch/riscv patch that adds support for the image header was merged
  with our v5.3-rc1 pull request as commit 0f327f2aaad6a ("RISC-V: Add
  an Image header that boot loader can parse."), we think it wise to try
  to fix this error before v5.3 is released.

  The fix itself should be backwards-compatible with any project that
  has already merged support for premature versions of this interface.
  It primarily involves ensuring that the RISC-V image header has
  something useful in the same field as the ARM64 image header"

* tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: modify the Image header to improve compatibility with the ARM64 header

6 years agoRevert "vhost: block speculation of translated descriptors"
Michael S. Tsirkin [Sat, 14 Sep 2019 19:21:51 +0000 (15:21 -0400)]
Revert "vhost: block speculation of translated descriptors"

This reverts commit a89db445fbd7f1f8457b03759aa7343fa530ef6b.

I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Sat, 14 Sep 2019 19:20:38 +0000 (12:20 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Don't corrupt xfrm_interface parms before validation, from Nicolas
    Dichtel.

 2) Revert use of usb-wakeup in btusb, from Mario Limonciello.

 3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from
    Leonardo Bras.

 4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.

 5) Missing ULP check in sock_map, from John Fastabend.

 6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.

 7) Fix length of SKB allocated in 6pack driver, from Christophe
    JAILLET.

 8) ip6_route_info_create() returns an error pointer, not NULL. From
    Maciej Żenczykowski.

 9) Only add RDS sock to the hashes after rs_transport is set, from
    Ka-Cheong Poon.

10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.

11) Presence of transmit IPSEC offload in an SKB is not tested for
    correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff
    Kirsher.

12) Need rcu_barrier() when register_netdevice() takes one of the
    notifier based failure paths, from Subash Abhinov Kasiviswanathan.

13) Fix leak in sctp_do_bind(), from Mao Wenan.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  cdc_ether: fix rndis support for Mediatek based smartphones
  sctp: destroy bucket if failed to bind addr
  sctp: remove redundant assignment when call sctp_get_port_local
  sctp: change return type of sctp_get_port_local
  ixgbevf: Fix secpath usage for IPsec Tx offload
  sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
  ixgbe: Fix secpath usage for IPsec TX offload.
  net: qrtr: fix memort leak in qrtr_tun_write_iter
  net: Fix null de-reference of device refcount
  ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
  tun: fix use-after-free when register netdev failed
  tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
  ixgbe: fix double clean of Tx descriptors with xdp
  ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
  mlx4: fix spelling mistake "veify" -> "verify"
  net: hns3: fix spelling mistake "undeflow" -> "underflow"
  net: lmc: fix spelling mistake "runnin" -> "running"
  NFC: st95hf: fix spelling mistake "receieve" -> "receive"
  net/rds: An rds_sock is added too early to the hash table
  mac80211: Do not send Layer 2 Update frame before authorization
  ...

6 years agoMerge tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Sat, 14 Sep 2019 19:08:19 +0000 (12:08 -0700)]
Merge tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:

 - tmio: Fixup runtime PM management during probe and remove

 - sdhci-pci-o2micro: Fix eMMC initialization for an AMD SoC

 - bcm2835: Prevent lockups when terminating work

* tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: tmio: Fixup runtime PM management during remove
  mmc: tmio: Fixup runtime PM management during probe
  Revert "mmc: tmio: move runtime PM enablement to the driver implementations"
  Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller"
  Revert "mmc: bcm2835: Terminate timeout work synchronously"

6 years agoMerge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Sat, 14 Sep 2019 18:54:57 +0000 (11:54 -0700)]
Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "From the maintainer summit, just some last minute fixes for final:

  lima:
   - fix gem_wait ioctl

  core:
   - constify modes list

  i915:
   - DP MST high color depth regression
   - GPU hangs on vulkan compute workloads"

* tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm:
  drm/lima: fix lima_gem_wait() return value
  drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+
  drm/i915: Limit MST to <= 8bpc once again
  drm/modes: Make the whitelist more const

6 years agoMerge tag 'wireless-drivers-next-for-davem-2019-09-14' of git://git.kernel.org/pub...
David S. Miller [Sat, 14 Sep 2019 13:08:18 +0000 (15:08 +0200)]
Merge tag 'wireless-drivers-next-for-davem-2019-09-14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 5.4

Last set of patches for 5.4. wil6210 and rtw88 being most active this
time, but ath9k also having a new module to load devices without
EEPROM.

Major changes:

wil6210

* add support for Enhanced Directional Multi-Gigabit (EDMG) channels 9-11

* add debugfs file to show PCM ring content

* report boottime_ns in scan results

ath9k

* add a separate loader for AR92XX (and older) pci(e) without eeprom

brcmfmac

* use the same wiphy after PCIe reset to not confuse the user space

rtw88

* enable interrupt migration

* enable AMSDU in AMPDU aggregation

* report RX power for each antenna

* enable to DPK and IQK calibration methods to improve performance
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'kvm-s390-master-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Sat, 14 Sep 2019 07:25:30 +0000 (09:25 +0200)]
Merge tag 'kvm-s390-master-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: Fixes for 5.3

- prevent a user triggerable oops in the migration code
- do not leak kernel stack content

6 years agoKVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
Sean Christopherson [Fri, 13 Sep 2019 02:46:02 +0000 (19:46 -0700)]
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot

James Harvey reported a livelock that was introduced by commit
d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").

The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages.  With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule.  The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (4e103134b8623, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.

There are three ways to fix the livelock:

- Reverting the revert (commit d012a06ab1d23) is not a viable option as
  the revert is needed to fix a regression that occurs when the guest has
  one or more assigned devices.  It's unlikely we'll root cause the device
  assignment regression soon enough to fix the regression timely.

- Remove the conditional reschedule from kvm_mmu_zap_all().  However, although
  removing the reschedule would be a smaller code change, it's less safe
  in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
  the wild for flushing memslots since the fast invalidate mechanism was
  introduced by commit 6ca18b6950f8d ("KVM: x86: use the fast way to
  invalidate all pages"), back in 2013.

- Reintroduce the fast invalidate mechanism and use it when zapping shadow
  pages in response to a memslot being deleted/moved, which is what this
  patch does.

For all intents and purposes, this is a revert of commit ea145aacf4ae8
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit 7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit 5304b8d37c2a5 ("KVM:
MMU: fast invalidate all pages") and commit 6ca18b6950f8d ("KVM: x86:
use the fast way to invalidate all pages") respectively.

Fixes: d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: x86: work around leak of uninitialized stack contents
Fuqian Huang [Thu, 12 Sep 2019 04:18:17 +0000 (12:18 +0800)]
KVM: x86: work around leak of uninitialized stack contents

Emulation of VMPTRST can incorrectly inject a page fault
when passed an operand that points to an MMIO address.
The page fault will use uninitialized kernel stack memory
as the CR2 and error code.

The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just ensure
that the error code and CR2 are zero.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Cc: stable@vger.kernel.org
[add comment]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: nVMX: handle page fault in vmread
Paolo Bonzini [Fri, 13 Sep 2019 22:26:27 +0000 (00:26 +0200)]
KVM: nVMX: handle page fault in vmread

The implementation of vmread to memory is still incomplete, as it
lacks the ability to do vmread to I/O memory just like vmptrst.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoriscv: modify the Image header to improve compatibility with the ARM64 header
Paul Walmsley [Sat, 14 Sep 2019 01:35:50 +0000 (18:35 -0700)]
riscv: modify the Image header to improve compatibility with the ARM64 header

Part of the intention during the definition of the RISC-V kernel image
header was to lay the groundwork for a future merge with the ARM64
image header.  One error during my original review was not noticing
that the RISC-V header's "magic" field was at a different size and
position than the ARM64's "magic" field.  If the existing ARM64 Image
header parsing code were to attempt to parse an existing RISC-V kernel
image header format, it would see a magic number 0.  This is
undesirable, since it's our intention to align as closely as possible
with the ARM64 header format.  Another problem was that the original
"res3" field was not being initialized correctly to zero.

Address these issues by creating a 32-bit "magic2" field in the RISC-V
header which matches the ARM64 "magic" field.  RISC-V binaries will
store "RSC\x05" in this field.  The intention is that the use of the
existing 64-bit "magic" field in the RISC-V header will be deprecated
over time.  Increment the minor version number of the file format to
indicate this change, and update the documentation accordingly.  Fix
the assembler directives in head.S to ensure that reserved fields are
properly zero-initialized.

Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reported-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Karsten Merker <merker@debian.org>
Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
6 years agoMerge branch 'devlink-move-reload-fail-indication-to-devlink-core-and-expose-to-user'
David S. Miller [Fri, 13 Sep 2019 20:11:14 +0000 (22:11 +0200)]
Merge branch 'devlink-move-reload-fail-indication-to-devlink-core-and-expose-to-user'

Jiri Pirko says:

====================
net: devlink: move reload fail indication to devlink core and expose to user

First two patches are dependencies of the last one. That moves devlink
reload failure indication to the devlink code, so the drivers do not
have to track it themselves. Currently it is only mlxsw, but I will send
a follow-up patchset that introduces this in netdevsim too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: devlink: move reload fail indication to devlink core and expose to user
Jiri Pirko [Thu, 12 Sep 2019 08:49:46 +0000 (10:49 +0200)]
net: devlink: move reload fail indication to devlink core and expose to user

Currently the fact that devlink reload failed is stored in drivers.
Move this flag into devlink core. Also, expose it to the user.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: devlink: split reload op into two
Jiri Pirko [Thu, 12 Sep 2019 08:49:45 +0000 (10:49 +0200)]
net: devlink: split reload op into two

In order to properly implement failure indication during reload,
split the reload op into two ops, one for down phase and one for
up phase.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlx4: Split restart_one into two functions
Jiri Pirko [Thu, 12 Sep 2019 08:49:44 +0000 (10:49 +0200)]
mlx4: Split restart_one into two functions

Split the function restart_one into two functions and separate teardown
and buildup.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocdc_ether: fix rndis support for Mediatek based smartphones
Bjørn Mork [Thu, 12 Sep 2019 08:42:00 +0000 (10:42 +0200)]
cdc_ether: fix rndis support for Mediatek based smartphones

A Mediatek based smartphone owner reports problems with USB
tethering in Linux.  The verbose USB listing shows a rndis_host
interface pair (e0/01/03 + 10/00/00), but the driver fails to
bind with

[  355.960428] usb 1-4: bad CDC descriptors

The problem is a failsafe test intended to filter out ACM serial
functions using the same 02/02/ff class/subclass/protocol as RNDIS.
The serial functions are recognized by their non-zero bmCapabilities.

No RNDIS function with non-zero bmCapabilities were known at the time
this failsafe was added. But it turns out that some Wireless class
RNDIS functions are using the bmCapabilities field. These functions
are uniquely identified as RNDIS by their class/subclass/protocol, so
the failing test can safely be disabled.  The same applies to the two
types of Misc class RNDIS functions.

Applying the failsafe to Communication class functions only retains
the original functionality, and fixes the problem for the Mediatek based
smartphone.

Tow examples of CDC functional descriptors with non-zero bmCapabilities
from Wireless class RNDIS functions are:

0e8d:000a  Mediatek Crosscall Spider X5 3G Phone

      CDC Header:
        bcdCDC               1.10
      CDC ACM:
        bmCapabilities       0x0f
          connection notifications
          sends break
          line coding and serial state
          get/set/clear comm features
      CDC Union:
        bMasterInterface        0
        bSlaveInterface         1
      CDC Call Management:
        bmCapabilities       0x03
          call management
          use DataInterface
        bDataInterface          1

and

19d2:1023  ZTE K4201-z

      CDC Header:
        bcdCDC               1.10
      CDC ACM:
        bmCapabilities       0x02
          line coding and serial state
      CDC Call Management:
        bmCapabilities       0x03
          call management
          use DataInterface
        bDataInterface          1
      CDC Union:
        bMasterInterface        0
        bSlaveInterface         1

The Mediatek example is believed to apply to most smartphones with
Mediatek firmware.  The ZTE example is most likely also part of a larger
family of devices/firmwares.

Suggested-by: Lars Melin <larsm17@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'sctp_do_bind-leak'
David S. Miller [Fri, 13 Sep 2019 20:06:20 +0000 (22:06 +0200)]
Merge branch 'sctp_do_bind-leak'

Mao Wenan says:

====================
fix memory leak for sctp_do_bind

First two patches are to do cleanup, remove redundant assignment,
and change return type of sctp_get_port_local.
Third patch is to fix memory leak for sctp_do_bind if failed
to bind address.

v2: add one patch to change return type of sctp_get_port_local.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: destroy bucket if failed to bind addr
Mao Wenan [Thu, 12 Sep 2019 04:02:19 +0000 (12:02 +0800)]
sctp: destroy bucket if failed to bind addr

There is one memory leak bug report:
BUG: memory leak
unreferenced object 0xffff8881dc4c5ec0 (size 40):
  comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s)
  hex dump (first 32 bytes):
    02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00  ................
    f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00  .c=.............
  backtrace:
    [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp]
    [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp]
    [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp]
    [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6]
    [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647
    [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline]
    [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline]
    [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656
    [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
    [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is because in sctp_do_bind, if sctp_get_port_local is to
create hash bucket successfully, and sctp_add_bind_addr failed
to bind address, e.g return -ENOMEM, so memory leak found, it
needs to destroy allocated bucket.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: remove redundant assignment when call sctp_get_port_local
Mao Wenan [Thu, 12 Sep 2019 04:02:18 +0000 (12:02 +0800)]
sctp: remove redundant assignment when call sctp_get_port_local

There are more parentheses in if clause when call sctp_get_port_local
in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
do cleanup.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: change return type of sctp_get_port_local
Mao Wenan [Thu, 12 Sep 2019 04:02:17 +0000 (12:02 +0800)]
sctp: change return type of sctp_get_port_local

Currently sctp_get_port_local() returns a long
which is either 0,1 or a pointer casted to long.
It's neither of the callers use the return value since
commit 62208f12451f ("net: sctp: simplify sctp_get_port").
Now two callers are sctp_get_port and sctp_do_bind,
they actually assumend a casted to an int was the same as
a pointer casted to a long, and they don't save the return
value just check whether it is zero or non-zero, so
it would better change return type from long to int for
sctp_get_port_local.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip: support SO_MARK cmsg
Willem de Bruijn [Wed, 11 Sep 2019 19:50:51 +0000 (15:50 -0400)]
ip: support SO_MARK cmsg

Enable setting skb->mark for UDP and RAW sockets using cmsg.

This is analogous to existing support for TOS, TTL, txtime, etc.

Packet sockets already support this as of commit c7d39e32632e
("packet: support per-packet fwmark for af_packet sendmsg").

Similar to other fields, implement by
1. initialize the sockcm_cookie.mark from socket option sk_mark
2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl
3. initialize inet_cork.mark from sockcm_cookie.mark
4. initialize each (usually just one) skb->mark from inet_cork.mark

Step 1 is handled in one location for most protocols by ipcm_init_sk
as of commit 351782067b6b ("ipv4: ipcm_cookie initializers").

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
Kalle Valo [Fri, 13 Sep 2019 15:15:12 +0000 (18:15 +0300)]
Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git

ath.git patches for 5.4. Major changes:

wil6210

* add support for Enhanced Directional Multi-Gigabit (EDMG) channels 9-11

* add debugfs file to show PCM ring content

* report boottime_ns in scan results

ath9k

* add a separate loader for AR92XX (and older) pci(e) without eeprom,
  enabled with the new ATH9K_PCI_NO_EEPROM Kconfig option

6 years agortw88: report RX power for each antenna
Yan-Hsuan Chuang [Thu, 12 Sep 2019 06:39:15 +0000 (14:39 +0800)]
rtw88: report RX power for each antenna

Report chains and chain_signal in ieee80211_rx_status.
It is useful for program such as tcpdump to see if the
antennas are well connected/placed.

8822C is able to receive CCK rates with 2 antennas, while
8822B can only use 1 antenna path to receive CCK rates.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agortw88: fix wrong rx power calculation
Tsang-Shian Lin [Thu, 12 Sep 2019 06:39:14 +0000 (14:39 +0800)]
rtw88: fix wrong rx power calculation

Fix the wrong RF path for CCK rx power calculation.

Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver")
Signed-off-by: Tsang-Shian Lin <thlin@realtek.com>
Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agortlwifi: rtl8192de: replace _rtl92d_evm_db_to_percentage with generic version
Michael Straube [Tue, 10 Sep 2019 19:04:22 +0000 (21:04 +0200)]
rtlwifi: rtl8192de: replace _rtl92d_evm_db_to_percentage with generic version

Function _rtl92d_evm_db_to_percentage is functionally identical
to the generic version rtl_evm_db_to_percentage, so remove
_rtl92d_evm_db_to_percentage and use the generic version instead.

Signed-off-by: Michael Straube <straube.linux@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agortlwifi: rtl8192cu: replace _rtl92c_evm_db_to_percentage with generic version
Michael Straube [Tue, 10 Sep 2019 19:04:21 +0000 (21:04 +0200)]
rtlwifi: rtl8192cu: replace _rtl92c_evm_db_to_percentage with generic version

Function _rtl92c_evm_db_to_percentage is functionally identical
to the generic version rtl_evm_db_to_percentage, so remove
_rtl92c_evm_db_to_percentage and use the generic version instead.

Signed-off-by: Michael Straube <straube.linux@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agortlwifi: rtl8192ce: replace _rtl92c_evm_db_to_percentage with generic version
Michael Straube [Tue, 10 Sep 2019 19:04:20 +0000 (21:04 +0200)]
rtlwifi: rtl8192ce: replace _rtl92c_evm_db_to_percentage with generic version

Function _rtl92c_evm_db_to_percentage is functionally identical
to the generic version rtl_evm_db_to_percentage, so remove
_rtl92c_evm_db_to_percentage and use the generic version instead.

Signed-off-by: Michael Straube <straube.linux@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agortw88: allows to receive AMSDU in AMPDU
Yan-Hsuan Chuang [Mon, 9 Sep 2019 07:16:11 +0000 (15:16 +0800)]
rtw88: allows to receive AMSDU in AMPDU

The hardware has enough buffer to receive like 8K for an MPDU.
So tell mac80211 that we can receive AMSDU in AMPDU.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agortw88: add dynamic cck pd mechanism
Tzu-En Huang [Mon, 9 Sep 2019 07:16:10 +0000 (15:16 +0800)]
rtw88: add dynamic cck pd mechanism

This mechanism reduces the numbers of false alram in cck rate by
dynamically adjusting the value of power threshold and cs_ratio.
We determine the new value by three factors, which are rssi, false alarm
count and igi. Based on these factors, we define the current condition
into five levels. Compared to the previous level, if the level is changed,
we set the new values for power threshold and cs_ratio.

Signed-off-by: Tzu-En Huang <tehuang@realtek.com>
Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agortw88: move IQK/DPK into phy_calibration
Yan-Hsuan Chuang [Mon, 9 Sep 2019 07:16:09 +0000 (15:16 +0800)]
rtw88: move IQK/DPK into phy_calibration

Since 8822c requires to do not only IQK, but also DPK.
Move these calibrations that need to be done once the channel
is determined, into phy_calibration.

And note that the order of the calibrations matters, 8822c
should do IQK first, then DPK.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>