]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
7 years agopfifo_fast: drop unneeded additional lock on dequeue
Paolo Abeni [Tue, 15 May 2018 14:24:37 +0000 (16:24 +0200)]
pfifo_fast: drop unneeded additional lock on dequeue

After the previous patch, for NOLOCK qdiscs, q->seqlock is
always held when the dequeue() is invoked, we can drop
any additional locking to protect such operation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosched: replace __QDISC_STATE_RUNNING bit with a spin lock
Paolo Abeni [Tue, 15 May 2018 14:24:36 +0000 (16:24 +0200)]
sched: replace __QDISC_STATE_RUNNING bit with a spin lock

So that we can use lockdep on it.
The newly introduced sequence lock has the same scope of busylock,
so it shares the same lockdep annotation, but it's only used for
NOLOCK qdiscs.

With this changeset we acquire such lock in the control path around
flushing operation (qdisc reset), to allow more NOLOCK qdisc perf
improvement in the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Thu, 17 May 2018 02:47:11 +0000 (22:47 -0400)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-05-17

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Provide a new BPF helper for doing a FIB and neighbor lookup
   in the kernel tables from an XDP or tc BPF program. The helper
   provides a fast-path for forwarding packets. The API supports
   IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
   implemented in this initial work, from David (Ahern).

2) Just a tiny diff but huge feature enabled for nfp driver by
   extending the BPF offload beyond a pure host processing offload.
   Offloaded XDP programs are allowed to set the RX queue index and
   thus opening the door for defining a fully programmable RSS/n-tuple
   filter replacement. Once BPF decided on a queue already, the device
   data-path will skip the conventional RSS processing completely,
   from Jakub.

3) The original sockmap implementation was array based similar to
   devmap. However unlike devmap where an ifindex has a 1:1 mapping
   into the map there are use cases with sockets that need to be
   referenced using longer keys. Hence, sockhash map is added reusing
   as much of the sockmap code as possible, from John.

4) Introduce BTF ID. The ID is allocatd through an IDR similar as
   with BPF maps and progs. It also makes BTF accessible to user
   space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
   through BPF_OBJ_GET_INFO_BY_FD, from Martin.

5) Enable BPF stackmap with build_id also in NMI context. Due to the
   up_read() of current->mm->mmap_sem build_id cannot be parsed.
   This work defers the up_read() via a per-cpu irq_work so that
   at least limited support can be enabled, from Song.

6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
   JIT conversion as well as implementation of an optimized 32/64 bit
   immediate load in the arm64 JIT that allows to reduce the number of
   emitted instructions; in case of tested real-world programs they
   were shrinking by three percent, from Daniel.

7) Add ifindex parameter to the libbpf loader in order to enable
   BPF offload support. Right now only iproute2 can load offloaded
   BPF and this will also enable libbpf for direct integration into
   other applications, from David (Beckett).

8) Convert the plain text documentation under Documentation/bpf/ into
   RST format since this is the appropriate standard the kernel is
   moving to for all documentation. Also add an overview README.rst,
   from Jesper.

9) Add __printf verification attribute to the bpf_verifier_vlog()
   helper. Though it uses va_list we can still allow gcc to check
   the format string, from Mathieu.

10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
    is a bash 4.0+ feature which is not guaranteed to be available
    when calling out to shell, therefore use a more portable variant,
    from Joe.

11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
    instead of relying on the gcc built-in, from Björn.

12) Fix a sock hashmap kmalloc warning reported by syzbot when an
    overly large key size is used in hashmap then causing overflows
    in htab->elem_size. Reject bogus attr->key_size early in the
    sock_hash_alloc(), from Yonghong.

13) Ensure in BPF selftests when urandom_read is being linked that
    --build-id is always enabled so that test_stacktrace_build_id[_nmi]
    won't be failing, from Alexei.

14) Add bitsperlong.h as well as errno.h uapi headers into the tools
    header infrastructure which point to one of the arch specific
    uapi headers. This was needed in order to fix a build error on
    some systems for the BPF selftests, from Sirio.

15) Allow for short options to be used in the xdp_monitor BPF sample
    code. And also a bpf.h tools uapi header sync in order to fix a
    selftest build failure. Both from Prashant.

16) More formally clarify the meaning of ID in the direct packet access
    section of the BPF documentation, from Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: sockmap, on update propagate errors back to userspace
John Fastabend [Wed, 16 May 2018 23:38:14 +0000 (16:38 -0700)]
bpf: sockmap, on update propagate errors back to userspace

When an error happens in the update sockmap element logic also pass
the err up to the user.

Fixes: e5cd3abcb31a ("bpf: sockmap, refactor sockmap routines to work with hashmap")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agobpf: fix sock hashmap kmalloc warning
Yonghong Song [Wed, 16 May 2018 21:06:26 +0000 (14:06 -0700)]
bpf: fix sock hashmap kmalloc warning

syzbot reported a kernel warning below:
  WARNING: CPU: 0 PID: 4499 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 0 PID: 4499 Comm: syz-executor050 Not tainted 4.17.0-rc3+ #9
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x1b9/0x294 lib/dump_stack.c:113
   panic+0x22f/0x4de kernel/panic.c:184
   __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
   report_bug+0x252/0x2d0 lib/bug.c:186
   fixup_bug arch/x86/kernel/traps.c:178 [inline]
   do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
   do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
  RIP: 0010:kmalloc_slab+0x56/0x70 mm/slab_common.c:996
  RSP: 0018:ffff8801d907fc58 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff8801aeecb280 RCX: ffffffff8185ebd7
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffe1
  RBP: ffff8801d907fc58 R08: ffff8801adb5e1c0 R09: ffffed0035a84700
  R10: ffffed0035a84700 R11: ffff8801ad423803 R12: ffff8801aeecb280
  R13: 00000000fffffff4 R14: ffff8801ad891a00 R15: 00000000014200c0
   __do_kmalloc mm/slab.c:3713 [inline]
   __kmalloc+0x25/0x760 mm/slab.c:3727
   kmalloc include/linux/slab.h:517 [inline]
   map_get_next_key+0x24a/0x640 kernel/bpf/syscall.c:858
   __do_sys_bpf kernel/bpf/syscall.c:2131 [inline]
   __se_sys_bpf kernel/bpf/syscall.c:2096 [inline]
   __x64_sys_bpf+0x354/0x4f0 kernel/bpf/syscall.c:2096
   do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

The test case is against sock hashmap with a key size 0xffffffe1.
Such a large key size will cause the below code in function
sock_hash_alloc() overflowing and produces a smaller elem_size,
hence map creation will be successful.
    htab->elem_size = sizeof(struct htab_elem) +
                      round_up(htab->map.key_size, 8);

Later, when map_get_next_key is called and kernel tries
to allocate the key unsuccessfully, it will issue
the above warning.

Similar to hashtab, ensure the key size is at most
MAX_BPF_STACK for a successful map creation.

Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Reported-by: syzbot+e4566d29080e7f3460ff@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agolibbpf: add ifindex to enable offload support
David Beckett [Wed, 16 May 2018 21:02:49 +0000 (14:02 -0700)]
libbpf: add ifindex to enable offload support

BPF programs currently can only be offloaded using iproute2. This
patch will allow programs to be offloaded using libbpf calls.

Signed-off-by: David Beckett <david.beckett@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agobpf: add __printf verification to bpf_verifier_vlog
Mathieu Malaterre [Wed, 16 May 2018 20:27:41 +0000 (22:27 +0200)]
bpf: add __printf verification to bpf_verifier_vlog

__printf is useful to verify format and arguments. ‘bpf_verifier_vlog’
function is used twice in verifier.c in both cases the caller function
already uses the __printf gcc attribute.

Remove the following warning, triggered with W=1:

  kernel/bpf/verifier.c:176:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agosamples/bpf: Decrement ttl in fib forwarding example
David Ahern [Tue, 15 May 2018 23:20:52 +0000 (16:20 -0700)]
samples/bpf: Decrement ttl in fib forwarding example

Only consider forwarding packets if ttl in received packet is > 1 and
decrement ttl before handing off to bpf_redirect_map.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agoMerge branch 'bpf-sock-hashmap'
Daniel Borkmann [Wed, 16 May 2018 20:02:14 +0000 (22:02 +0200)]
Merge branch 'bpf-sock-hashmap'

John Fastabend says:

====================
In the original sockmap implementation we got away with using an
array similar to devmap. However, unlike devmap where an ifindex
has a nice 1:1 function into the map we have found some use cases
with sockets that need to be referenced using longer keys.

This series adds support for a sockhash map reusing as much of
the sockmap code as possible. I made the decision to add sockhash
specific helpers vs trying to generalize the existing helpers
because (a) they have sockmap in the name and (b) the keys are
different types. I prefer to be explicit here rather than play
type games or do something else tricky.

To test this we duplicate all the sockmap testing except swap out
the sockmap with a sockhash.

v2: fix file stats and add v2 tag
v3: move tool updates into test patch, move bpftool updates into
    its own patch, and fixup the test patch stats to catch the
    renamed file and provide only diffs ± on that.
v4: Add documentation to UAPI bpf.h
v5: Add documentation to tools UAPI bpf.h
v6: 'git add' test_sockhash_kern.c which was previously missing
    but was not causing issues because of typo in test script,
    noticed by Daniel. After this the git format-patch -M option
    no longer tracks the rename of the test_sockmap_kern files for
    some reason. I guess the diff has exceeded some threshold.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agobpf: bpftool, support for sockhash
John Fastabend [Mon, 14 May 2018 17:00:19 +0000 (10:00 -0700)]
bpf: bpftool, support for sockhash

This adds the SOCKHASH map type to bpftools so that we get correct
pretty printing.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agobpf: selftest additions for SOCKHASH
John Fastabend [Mon, 14 May 2018 17:00:18 +0000 (10:00 -0700)]
bpf: selftest additions for SOCKHASH

This runs existing SOCKMAP tests with SOCKHASH map type. To do this
we push programs into include file and build two BPF programs. One
for SOCKHASH and one for SOCKMAP.

We then run the entire test suite with each type.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agocxgb4: update LE-TCAM collection for T6
Rahul Lakkireddy [Wed, 16 May 2018 14:21:15 +0000 (19:51 +0530)]
cxgb4: update LE-TCAM collection for T6

For T6, clip table is separated from main TCAM. So, update LE-TCAM
collection logic to collect clip table TCAM as well. IPv6 takes
4 entries in clip table TCAM compared to 2 entries in main TCAM.

Also, in case of errors, keep LE-TCAM collected so far and set the
status to partial dump.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'qed-LL2-fixes'
David S. Miller [Wed, 16 May 2018 18:49:01 +0000 (14:49 -0400)]
Merge branch 'qed-LL2-fixes'

Michal Kalderon says:

====================
qed: LL2 fixes

This series fixes some issues in ll2 related to synchronization
and resource freeing
====================

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
7 years agoqed: Fix LL2 race during connection terminate
Michal Kalderon [Wed, 16 May 2018 11:44:40 +0000 (14:44 +0300)]
qed: Fix LL2 race during connection terminate

Stress on qedi/qedr load unload lead to list_del corruption.
This is due to ll2 connection terminate freeing resources without
verifying that no more ll2 processing will occur.

This patch unregisters the ll2 status block before terminating
the connection to assure this race does not occur.

Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Fix possibility of list corruption during rmmod flows
Michal Kalderon [Wed, 16 May 2018 11:44:39 +0000 (14:44 +0300)]
qed: Fix possibility of list corruption during rmmod flows

The ll2 flows of flushing the txq/rxq need to be synchronized with the
regular fp processing. Caused list corruption during load/unload stress
tests.

Fixes: 0a7fb11c23c0f ("qed: Add Light L2 support")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: LL2 flush isles when connection is closed
Michal Kalderon [Wed, 16 May 2018 11:44:38 +0000 (14:44 +0300)]
qed: LL2 flush isles when connection is closed

Driver should free all pending isles once it gets a FLUSH cqe from FW.
Part of iSCSI out of order flow.

Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethoc: Remove useless test before clk_disable_unprepare
YueHaibing [Wed, 16 May 2018 11:18:22 +0000 (19:18 +0800)]
net: ethoc: Remove useless test before clk_disable_unprepare

clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: stmmac: Remove useless test before clk_disable_unprepare
YueHaibing [Wed, 16 May 2018 10:59:19 +0000 (18:59 +0800)]
net: stmmac: Remove useless test before clk_disable_unprepare

clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qcom/emac: Encapsulate sgmii ops under one structure
Hemanth Puranik [Wed, 16 May 2018 01:10:53 +0000 (06:40 +0530)]
net: qcom/emac: Encapsulate sgmii ops under one structure

This patch introduces ops structure for sgmii, This by ensures that
we do not need dummy functions in case of emulation platforms.

Signed-off-by: Hemanth Puranik <hpuranik@codeaurora.org>
Acked-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'rmnet-next'
David S. Miller [Wed, 16 May 2018 18:23:05 +0000 (14:23 -0400)]
Merge branch 'rmnet-next'

Subash Abhinov Kasiviswanathan says:

====================
net: qualcomm: rmnet: Updates 2018-05-14

Patch 1 adds tx_drops counter to more places.
Patch 2 adds ethtool private stats support to make it easy to debug
the checksum offload path.
Patch 3 is a cleanup in command packet processing path.

v1->v2: Fix the incorrect if / else statement in
rmnet_map_checksum_downlink_packet() and define rmnet_ethtool_ops
as static as mentioned by kbuild test robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Remove redundant command check
Subash Abhinov Kasiviswanathan [Wed, 16 May 2018 00:52:02 +0000 (18:52 -0600)]
net: qualcomm: rmnet: Remove redundant command check

The command packet size is already checked once in
rmnet_map_deaggregate() for the header, packet and trailer size, so
this additional check is not needed.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Add support for ethtool private stats
Subash Abhinov Kasiviswanathan [Wed, 16 May 2018 00:52:01 +0000 (18:52 -0600)]
net: qualcomm: rmnet: Add support for ethtool private stats

Add ethtool private stats handler to debug the handling of packets
with checksum offload header / trailer. This allows to keep track of
the number of packets for which hardware computes the checksum and
counts and reasons where checksum computation was skipped in hardware
and was done in the network stack.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Capture all drops in transmit path
Subash Abhinov Kasiviswanathan [Wed, 16 May 2018 00:52:00 +0000 (18:52 -0600)]
net: qualcomm: rmnet: Capture all drops in transmit path

Packets in transmit path could potentially be dropped if there were
errors while adding the MAP header or the checksum header.
Increment the tx_drops stats in these cases.

Additionally, refactor the code to free the packet and increment
the tx_drops stat under a single label.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'of-mdio-Fall-back-to-mdiobus_register-with-NULL-device_node'
David S. Miller [Wed, 16 May 2018 18:20:36 +0000 (14:20 -0400)]
Merge branch 'of-mdio-Fall-back-to-mdiobus_register-with-NULL-device_node'

Florian Fainelli says:

====================
of: mdio: Fall back to mdiobus_register() with NULL device_node

This patch series updates of_mdiobus_register() such that when the device_node
argument is NULL, it calls mdiobus_register() directly. This is consistent with
the behavior of of_mdiobus_register() when CONFIG_OF=n.

I only converted the most obvious drivers, there are others that have a much
less obvious behavior and specifically attempt to deal with CONFIG_ACPI.

Changes in v2:

- fixed build error in davincin_mdio.c (Grygorii)
- reworked first patch a bit: commit message, subject and removed useless
  code comment
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: Remove device_node checks with of_mdiobus_register()
Florian Fainelli [Tue, 15 May 2018 23:56:19 +0000 (16:56 -0700)]
drivers: net: Remove device_node checks with of_mdiobus_register()

A number of drivers have the following pattern:

if (np)
of_mdiobus_register()
else
mdiobus_register()

which the implementation of of_mdiobus_register() now takes care of.
Remove that pattern in drivers that strictly adhere to it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Fugang Duan <fugang.duan@nxp.com>
Reviewed-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoof: mdio: Fall back to mdiobus_register() with NULL device_node
Florian Fainelli [Tue, 15 May 2018 23:56:18 +0000 (16:56 -0700)]
of: mdio: Fall back to mdiobus_register() with NULL device_node

When the device_node specified is NULL, fall back to mdiobus_register().
We have a number of drivers having a similar pattern which is:

if (np)
of_mdiobus_register()
else
mdiobus_register()

so incorporate that behavior within the core of_mdiobus_register()
function. This is also consistent with the stub version that we defined
when CONFIG_OF=n.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value
Grygorii Strashko [Tue, 15 May 2018 23:37:25 +0000 (18:37 -0500)]
net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value

This fixes klockworks warnings: Pointer 'dev' returned from call to
function 'bus_find_device' at line 179 may be NULL and will be dereferenced
at line 181.

    cpsw-phy-sel.c:179: 'dev' is assigned the return value from function 'bus_find_device'.
    bus.c:342: 'bus_find_device' explicitly returns a NULL value.
    cpsw-phy-sel.c:181: 'dev' is dereferenced by passing argument 1 to function 'dev_get_drvdata'.
    device.h:1024: 'dev' is passed to function 'dev_get_drvdata'.
    device.h:1026: 'dev' is explicitly dereferenced.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
[nsekhar@ti.com: add an error message, fix return path]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRevert "bonding: allow carrier and link status to determine link state"
Debabrata Banerjee [Wed, 16 May 2018 18:02:13 +0000 (14:02 -0400)]
Revert "bonding: allow carrier and link status to determine link state"

This reverts commit 1386c36b30388f46a95100924bfcae75160db715.

We don't want to encourage drivers to not report carrier status
correctly, therefore remove this commit.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotc-testing: updated mirred and vlan with more tests
Roman Mashak [Tue, 15 May 2018 18:31:14 +0000 (14:31 -0400)]
tc-testing: updated mirred and vlan with more tests

Added extra test cases for different control actions (reclassify, pipe
etc.), cookies, max values & exceeding maximum, and replace existing
actions unit tests.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotc-testing: fixed copy-pasting error in police tests
Roman Mashak [Tue, 15 May 2018 14:14:40 +0000 (10:14 -0400)]
tc-testing: fixed copy-pasting error in police tests

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers
Paolo Abeni [Tue, 15 May 2018 08:50:31 +0000 (10:50 +0200)]
sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers

Currently NOLOCK qdiscs pay a measurable overhead to atomically
manipulate the __QDISC_STATE_RUNNING. Such bit is flipped twice per
packet in the uncontended scenario with packet rate below the
line rate: on packed dequeue and on the next, failing dequeue attempt.

This changeset moves the bit manipulation into the qdisc_run_{begin,end}
helpers, so that the bit is now flipped only once per packet, with
measurable performance improvement in the uncontended scenario.

This also allows simplifying the qdisc teardown code path - since
qdisc_is_running() is now effective for each qdisc type - and avoid a
possible race between qdisc_run() and dev_deactivate_many(), as now
the some_qdisc_is_busy() can properly detect NOLOCK qdiscs being busy
dequeuing packets.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bonding-performance-and-reliability'
David S. Miller [Wed, 16 May 2018 16:15:12 +0000 (12:15 -0400)]
Merge branch 'bonding-performance-and-reliability'

Debabrata Banerjee says:

====================
bonding: performance and reliability

Series of fixes to how rlb updates are handled, code cleanup, allowing
higher performance tx hashing in balance-alb mode, and reliability of
link up/down monitoring.

v2: refactor bond_is_nondyn_tlb with inline fn, update log comment to
point out that multicast addresses will not get rlb updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: allow carrier and link status to determine link state
Debabrata Banerjee [Mon, 14 May 2018 18:48:10 +0000 (14:48 -0400)]
bonding: allow carrier and link status to determine link state

In a mixed environment it may be difficult to tell if your hardware
support carrier, if it does not it can always report true. With a new
use_carrier option of 2, we can check both carrier and link status
sequentially, instead of one or the other

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: allow use of tx hashing in balance-alb
Debabrata Banerjee [Mon, 14 May 2018 18:48:09 +0000 (14:48 -0400)]
bonding: allow use of tx hashing in balance-alb

The rx load balancing provided by balance-alb is not mutually
exclusive with using hashing for tx selection, and should provide a decent
speed increase because this eliminates spinlocks and cache contention.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: use common mac addr checks
Debabrata Banerjee [Mon, 14 May 2018 18:48:08 +0000 (14:48 -0400)]
bonding: use common mac addr checks

Replace homegrown mac addr checks with faster defs from etherdevice.h

Note that this will also prevent any rlb arp updates for multicast
addresses, however this should have been forbidden anyway.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: don't queue up extraneous rlb updates
Debabrata Banerjee [Mon, 14 May 2018 18:48:07 +0000 (14:48 -0400)]
bonding: don't queue up extraneous rlb updates

arps for incomplete entries can't be sent anyway.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-smc-enhancements-2018-05-15'
David S. Miller [Wed, 16 May 2018 15:49:20 +0000 (11:49 -0400)]
Merge branch 'net-smc-enhancements-2018-05-15'

Ursula Braun says:

====================
net/smc: enhancements 2018/05/15

here are smc patches for net-next. The first one is a fix for net-next
commit 01d2f7e2cdd3 "net/smc: sockopts TCP_NODELAY and TCP_CORK".
Patch 7 improves Connection Layer Control error handling, patch 10
improves abnormal termination of link groups. The remaining patches
from Karsten improve Link Layer Control code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: check for pending termination
Karsten Graul [Tue, 15 May 2018 15:05:03 +0000 (17:05 +0200)]
net/smc: check for pending termination

Avoid to run the processing in smc_lgr_terminate() more than once,
remember when the link group termination is triggered.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: drop messages when link state is inactive
Karsten Graul [Tue, 15 May 2018 15:05:02 +0000 (17:05 +0200)]
net/smc: drop messages when link state is inactive

Drop incoming messages when the link is flagged as inactive.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: set link inactive before calling smc_lgr_free()
Karsten Graul [Tue, 15 May 2018 15:05:01 +0000 (17:05 +0200)]
net/smc: set link inactive before calling smc_lgr_free()

Before smc_lgr_free() is called the link must be set inactive by calling
smc_llc_link_inactive().

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: handle all error codes from smc_conn_create()
Karsten Graul [Tue, 15 May 2018 15:05:00 +0000 (17:05 +0200)]
net/smc: handle all error codes from smc_conn_create()

Always set a reason_code when smc_conn_create() returns an error code.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: use a workqueue to defer llc send
Karsten Graul [Tue, 15 May 2018 15:04:59 +0000 (17:04 +0200)]
net/smc: use a workqueue to defer llc send

SMC handles deferred work in tasklets. As tasklets cannot sleep this
can result in rare EBUSY conditions, so defer this work in a work queue.
The high level api functions do not defer work because they can sleep
until the llc send is actually completed.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: move link llc initialization to llc layer
Karsten Graul [Tue, 15 May 2018 15:04:58 +0000 (17:04 +0200)]
net/smc: move link llc initialization to llc layer

Move the llc layer specific initialization and cleanup out of smc_core.c
into smc_llc.c (smc_llc_link_init and smc_llc_link_clear). Move all
initialization of a link into the new init function.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: simplify test_link function usage
Karsten Graul [Tue, 15 May 2018 15:04:57 +0000 (17:04 +0200)]
net/smc: simplify test_link function usage

Make smc_llc_send_test_link() static and remove it from the header file.
And to send a test_link response set the response flag and send the
message back as-is, without using smc_llc_send_test_link(). And because
smc_llc_send_test_link() must no longer send responses, remove the
response flag handling from the function.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: remove unnecessary cast
Karsten Graul [Tue, 15 May 2018 15:04:56 +0000 (17:04 +0200)]
net/smc: remove unnecessary cast

Remove an unneeded (void *) cast from the calls to
smc_llc_send_message(). No functional changes.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: register new rmbs with the peer
Karsten Graul [Tue, 15 May 2018 15:04:55 +0000 (17:04 +0200)]
net/smc: register new rmbs with the peer

Register new rmb buffers with the remote peer by exchanging a
confirm_rkey llc message.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/smc: no tx work trigger for fallback sockets
Ursula Braun [Tue, 15 May 2018 15:04:54 +0000 (17:04 +0200)]
net/smc: no tx work trigger for fallback sockets

If TCP_NODELAY is set or TCP_CORK is reset, setsockopt triggers the
tx worker. This does not make sense, if the SMC socket switched to
the TCP fallback when the connection is created. This patch adds
the additional check for the fallback case.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'mlx5e-updates-2018-05-14' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 16 May 2018 15:36:39 +0000 (11:36 -0400)]
Merge tag 'mlx5e-updates-2018-05-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-05-14

Misc update for mlx5e netdevice driver

From Gal Pressman:
 - Remove MLX5E_TEST_BIT macros and use test_bit instead
 - Use __set_bit when possible

From Eran Ben Elisha:
  - Improve debug print on initial RX posting timeout

From Or Gerlitz:
 - Support offloaded TC flows with no matches on headers
 - mlx5e TC cleanups

Trivial cleanups From Roi, Tariq and Saeed:
  - Use bool as return type for mlx5e_xdp_handle
  - Use u8 instead of int for LRO number of segments
  - Skip redundant checks when providing NUD lastuse feedback
  - Remove redundant vport context vlan update
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'Misc-Bug-Fixes-and-clean-ups-for-HNS3-Driver'
David S. Miller [Wed, 16 May 2018 15:33:09 +0000 (11:33 -0400)]
Merge branch 'Misc-Bug-Fixes-and-clean-ups-for-HNS3-Driver'

Salil Mehta says:

====================
Misc. Bug Fixes and clean-ups for HNS3 Driver

This patch-set mainly introduces various bug fixes, cleanups and one
very small enhancement to existing HN3 driver code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Fixes the missing PCI iounmap for various legs
Fuyun Liang [Tue, 15 May 2018 18:20:14 +0000 (19:20 +0100)]
net: hns3: Fixes the missing PCI iounmap for various legs

We call pcim_iomap in hclge_pci_init, pcim_iounmap should be called
in error handle of hclge_init_ae_dev.

We call pcim_iomap in hclge_pci_init, but do not call pcim_iounmap in
hclge_pci_uninit. When we remove the hclge.ko and insert it again, a
problem that pci can not map will happen. pcim_iounmap need to be called
in hclge_pci_uninit.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Add support of .sriov_configure in HNS3 driver
Peng Li [Tue, 15 May 2018 18:20:13 +0000 (19:20 +0100)]
net: hns3: Add support of .sriov_configure in HNS3 driver

As HNS3 driver will enable SRIOV default and enable all VFs the
HW support, if PF and VF driver compiled to kernel, VF driver
will work on host default, it is not right.

This patch adds support for hns3_driver.sriov_configure to support
user configs the VF_num, and do not enable sriov default.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Suggested-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Fix for fiber link up problem
Yunsheng Lin [Tue, 15 May 2018 18:20:12 +0000 (19:20 +0100)]
net: hns3: Fix for fiber link up problem

When hclge_ae_start is called, hdev->hw.mac.link may be set
to one after up/down multi-times, which does not correspond to
the link state of netdev when the netdev is up.

This fixes it by setting hdev->hw.mac.link to zero when
hclge_ae_start is called.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Fixes the back pressure setting when sriov is enabled
Yunsheng Lin [Tue, 15 May 2018 18:20:11 +0000 (19:20 +0100)]
net: hns3: Fixes the back pressure setting when sriov is enabled

When sriov is enabled, the Qset and tc mapping is not longer one
to one relation.

This patch fixes it by mapping all pf and vf's Qset to tc.

Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Change return value in hnae3_register_client
Fuyun Liang [Tue, 15 May 2018 18:20:10 +0000 (19:20 +0100)]
net: hns3: Change return value in hnae3_register_client

A client includes many client instance. Just like ae_algo, Initializing
client instance failed does not represent registering client failed.
The action of registering client just is adding client to the client
list and the result always is true. This patch changes the return
value of hnae3_register_client form a variable value to a fixed value,
makes the function always return ok.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Change return type of hnae3_register_ae_algo
Fuyun Liang [Tue, 15 May 2018 18:20:09 +0000 (19:20 +0100)]
net: hns3: Change return type of hnae3_register_ae_algo

The ae_algo is used by many ae_devs. It is not only belong to just a
ae_dev. Initializing ae_dev failed does not represent registering ae_algo
failed. Because the action of registering ae_algo just is adding ae_algo
to the ae_algo list and it is always is true, it make no sense to define
return type as int.

This patch changes the return type of hnae3_register_ae_algo from int to
void.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Change return type of hnae3_register_ae_dev
Fuyun Liang [Tue, 15 May 2018 18:20:08 +0000 (19:20 +0100)]
net: hns3: Change return type of hnae3_register_ae_dev

If hclge.ko has not been inserted, the value of ret always is zero
in hnae3_register_ae_dev. If hclge.ko has been inserted, the value
of ret is zero or non zero. Different execution ways have different
results. It is confusing.

The ae_dev which is initialized failed can be reinitialized when we
remove hclge.ko and insert it again. For the case initializing client
instance, it is just like the case initializing ae_dev. The main function
of hnae3_register_ae_dev is adding the ae_dev to ad_dev list. Because
adding ae_dev is always ok, we does not need to return any in this
function.

This patch changes the return type of hnae3_register_ae_dev from int
to void.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Add a check for client instance init state
Fuyun Liang [Tue, 15 May 2018 18:20:07 +0000 (19:20 +0100)]
net: hns3: Add a check for client instance init state

If the client instance is initializd failed, we do not need to uninit it.
This patch adds a state check to check init state of client instance.

Fixes: 38caee9d3ee8 ("net: hns3: Add support of the HNAE3 framework")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Fix for the null pointer problem occurring when initializing ae_dev failed
Fuyun Liang [Tue, 15 May 2018 18:20:06 +0000 (19:20 +0100)]
net: hns3: Fix for the null pointer problem occurring when initializing ae_dev failed

When initializing ae_dev failed during loading hclge.ko, the drvdata will
be set to null. When removing hns3.ko, we get a null ae_dev. It causes the
null pointer problem.

This patch removes pci_set_drvdata from error handle of hclge_init_ae_dev
to fix the bug, since pci_set_drvdata has been called in hns3_remove.
Also, we do not need to uninit the ae_dev which is not initialized. And
it may be the one which is initialized failed.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Fix for deadlock problem occurring when unregistering ae_algo
Fuyun Liang [Tue, 15 May 2018 18:20:05 +0000 (19:20 +0100)]
net: hns3: Fix for deadlock problem occurring when unregistering ae_algo

When hnae3_unregister_ae_algo is called by PF, pci_disable_sriov is
called. And then, hns3_remove is called by VF. We get deadlocked in
this case.

Since VF pci device is dependent on PF pci device, When PF pci device
is removed, VF pci device must be removed. Also, To solve the deadlock
problem, VF pci device should be removed before PF pci device is removed.

This patch moves pci_enable/disable_sriov from hclge to hns3 to solve
the deadlock problem.

Also, we do not need to return EPROBE_DEFER in hnae3_register_ae_dev,
because SRIOV is no longer enabled in the context calling
hnae3_register_ae_dev. Mutex_trylock can be replaced with mutex_lock.

Fixes: 424eb834a9be ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'Microsemi-Ocelot-Ethernet-switch-support'
David S. Miller [Tue, 15 May 2018 20:41:16 +0000 (16:41 -0400)]
Merge branch 'Microsemi-Ocelot-Ethernet-switch-support'

Alexandre Belloni says:

====================
Microsemi Ocelot Ethernet switch support

This series adds initial support for the Microsemi Ethernet switch
present on Ocelot SoCs.

This only has bridging (and STP) support for now and it uses the
switchdev framework.
Coming features are VLAN filtering, link aggregation, IGMP snooping.

The switch can also be connected to an external CPU using PCIe.

Also, support for integration on other SoCs will be submitted.

The ocelot dts changes are here for reference and should probably go
through the MIPS tree once the bindings are accepted.

Changes in v3:
 - Collected Reviewed-by
 * Switchdev driver:
   - Fixed two issues reported by kbuild
   - Modified ethtool statistics to support different layoiut on different chips and take care of counter overflow

Changes in v2:
 - Dropped Microsemi Ocelot PHY support
 * MIIM driver:
   - Documented interrupts bindings
   - Moved the driver to drivers/net/phy/
   - Removed unused mutex
   - Removed MDIO bus scanning
 * Switchdev driver:
   - Changed compatible to mscc,vsc7514-switch
   - Removed unused header inclusion
   - Factorized MAC table selection in ocelot_mact_select()
   - Disable the port in ocelot_port_stop()
   - Fixed the smatch endianness warnings
   - int to unsinged int where necessary
   - Removed VID handling for the FDB it has been reworked anyway and will be
     submitted with VLAN support
   - Fixed up unused cases in ocelot_port_attr_set()
   - Added a loop to register all the IO register spaces
   - the ports are now in an ethernet-ports node

I've tried switching to NAPI but this is not working well, mainly because the
only way to disable interrupts is to actually mask them in the interrupt
controller (it is not possible to tell the switch to stop generating
interrupts).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMAINTAINERS: Add entry for Microsemi Ethernet switches
Alexandre Belloni [Mon, 14 May 2018 20:05:00 +0000 (22:05 +0200)]
MAINTAINERS: Add entry for Microsemi Ethernet switches

Add myself as a maintainer for the Microsemi Ethernet switches.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mscc: Add initial Ocelot switch support
Alexandre Belloni [Mon, 14 May 2018 20:04:57 +0000 (22:04 +0200)]
net: mscc: Add initial Ocelot switch support

Add a driver for Microsemi Ocelot Ethernet switch support.

This makes two modules:
mscc_ocelot_common handles all the common features that doesn't depend on
how the switch is integrated in the SoC. Currently, it handles offloading
bridging to the hardware. ocelot_io.c handles register accesses. This is
unfortunately needed because the register layout is packed and then depends
on the number of ports available on the switch. The register definition
files are automatically generated.

ocelot_board handles the switch integration on the SoC and on the board.

Frame injection and extraction to/from the CPU port is currently done using
register accesses which is quite slow. DMA is possible but the port is not
able to absorb the whole switch bandwidth.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: net: add DT bindings for Microsemi Ocelot Switch
Alexandre Belloni [Mon, 14 May 2018 20:04:56 +0000 (22:04 +0200)]
dt-bindings: net: add DT bindings for Microsemi Ocelot Switch

DT bindings for the Ethernet switch found on Microsemi Ocelot platforms.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: mscc-miim: Add MDIO driver
Alexandre Belloni [Mon, 14 May 2018 20:04:55 +0000 (22:04 +0200)]
net: phy: mscc-miim: Add MDIO driver

Add a driver for the Microsemi MII Management controller (MIIM) found on
Microsemi SoCs.
On Ocelot, there are two controllers, one is connected to the internal
PHYs, the other one can communicate with external PHYs.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: net: add DT bindings for Microsemi MIIM
Alexandre Belloni [Mon, 14 May 2018 20:04:54 +0000 (22:04 +0200)]
dt-bindings: net: add DT bindings for Microsemi MIIM

DT bindings for the Microsemi MII Management Controller found on Microsemi
SoCs

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: sockmap, add hash map support
John Fastabend [Mon, 14 May 2018 17:00:17 +0000 (10:00 -0700)]
bpf: sockmap, add hash map support

Sockmap is currently backed by an array and enforces keys to be
four bytes. This works well for many use cases and was originally
modeled after devmap which also uses four bytes keys. However,
this has become limiting in larger use cases where a hash would
be more appropriate. For example users may want to use the 5-tuple
of the socket as the lookup key.

To support this add hash support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agobpf: sockmap, refactor sockmap routines to work with hashmap
John Fastabend [Mon, 14 May 2018 17:00:16 +0000 (10:00 -0700)]
bpf: sockmap, refactor sockmap routines to work with hashmap

This patch only refactors the existing sockmap code. This will allow
much of the psock initialization code path and bpf helper codes to
work for both sockmap bpf map types that are backed by an array, the
currently supported type, and the new hash backed bpf map type
sockhash.

Most the fallout comes from three changes,

  - Pushing bpf programs into an independent structure so we
    can use it from the htab struct in the next patch.
  - Generalizing helpers to use void *key instead of the hardcoded
    u32.
  - Instead of passing map/key through the metadata we now do
    the lookup inline. This avoids storing the key in the metadata
    which will be useful when keys can be longer than 4 bytes. We
    rename the sk pointers to sk_redir at this point as well to
    avoid any confusion between the current sk pointer and the
    redirect pointer sk_redir.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agoselftests/bpf: make sure build-id is on
Alexei Starovoitov [Tue, 15 May 2018 00:11:29 +0000 (17:11 -0700)]
selftests/bpf: make sure build-id is on

--build-id may not be a default linker config.
Make sure it's used when linking urandom_read test program.
Otherwise test_stacktrace_build_id[_nmi] tests will be failling.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
7 years agoMerge branch 'convert-doc-to-rst'
Alexei Starovoitov [Tue, 15 May 2018 06:02:58 +0000 (23:02 -0700)]
Merge branch 'convert-doc-to-rst'

Jesper Dangaard Brouer says:

====================
The kernel is moving files under Documentation to use the RST
(reStructuredText) format and Sphinx [1].  This patchset converts the
files under Documentation/bpf/ into RST format.  The Sphinx
integration is left as followup work.

[1] https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html

This patchset have been uploaded as branch bpf_doc10 on github[2], so
reviewers can see how GitHub renders this.

[2] https://github.com/netoptimizer/linux/tree/bpf_doc10/Documentation/bpf
====================

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, doc: howto use/run the BPF selftests
Jesper Dangaard Brouer [Mon, 14 May 2018 13:42:32 +0000 (15:42 +0200)]
bpf, doc: howto use/run the BPF selftests

I always forget howto run the BPF selftests. Thus, lets add that info
to the QA document.

Documentation was based on Cilium's documentation:
 http://cilium.readthedocs.io/en/latest/bpf/#verifying-the-setup

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, doc: convert bpf_devel_QA.rst to use RST formatting
Jesper Dangaard Brouer [Mon, 14 May 2018 13:42:27 +0000 (15:42 +0200)]
bpf, doc: convert bpf_devel_QA.rst to use RST formatting

Same story as bpf_design_QA.rst RST format conversion.

Again thanks to Quentin Monnet <quentin.monnet@netronome.com> for
fixes and patches that have been squashed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, doc: convert bpf_design_QA.rst to use RST formatting
Jesper Dangaard Brouer [Mon, 14 May 2018 13:42:22 +0000 (15:42 +0200)]
bpf, doc: convert bpf_design_QA.rst to use RST formatting

The RST formatting is done such that that when rendered or converted
to different formats, an automatic index with links are created to the
subsections.

Thus, the questions are created as sections (or subsections), in-order
to get the wanted auto-generated FAQ/QA index.

Special thanks to Quentin Monnet <quentin.monnet@netronome.com> who
have reviewed and corrected both RST formatting and GitHub rendering
issues in this file.  Those commits have been squashed.

I've manually tested that this also renders nicely if included as part
of the kernel 'make htmldocs'.  As the end-goal is for this to become
more integrated with kernel-doc project/movement.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, doc: rename txt files to rst files
Jesper Dangaard Brouer [Mon, 14 May 2018 13:42:17 +0000 (15:42 +0200)]
bpf, doc: rename txt files to rst files

This will cause them to get auto rendered, e.g. when viewing them on GitHub.
Followup patches will correct the content to be RST compliant.

Also adjust README.rst to point to the renamed files.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, doc: add basic README.rst file
Jesper Dangaard Brouer [Mon, 14 May 2018 13:42:12 +0000 (15:42 +0200)]
bpf, doc: add basic README.rst file

A README.rst file in a directory have special meaning for sites like
github, which auto renders the contents.  Plus search engines like
Google also index these README.rst files.

Auto rendering allow us to use links, for (re)directing eBPF users to
other places where docs live.  The end-goal would be to direct users
towards https://www.kernel.org/doc/html/latest but we haven't written
the full docs yet, so we start out small and take this incrementally.

This directory itself contains some useful docs, which can be linked
to from the README.rst file (verified this works for github).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agoMerge branch 'fix-samples'
Alexei Starovoitov [Tue, 15 May 2018 05:54:40 +0000 (22:54 -0700)]
Merge branch 'fix-samples'

Jakub Kicinski says:

====================
Following patches address build issues after recent move to libbpf.
For out-of-tree builds we would see the following error:

gcc: error: samples/bpf/../../tools/lib/bpf/libbpf.a: No such file or directory

libbpf build system is now always invoked explicitly rather than
relying on building single objects most of the time.  We need to
resolve the friction between Kbuild and tools/ build system.

Mini-library called libbpf.h in samples is renamed to bpf_insn.h,
using linux/filter.h seems not completely trivial since some samples
get upset when order on include search path in changed.  We do have
to rename libbpf.h, however, because otherwise it's hard to reliably
get to libbpf's header in out-of-tree builds.

v2:
 - fix the build error harder (patch 3);
 - add patch 5 (make clang less noisy).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agosamples: bpf: make the build less noisy
Jakub Kicinski [Tue, 15 May 2018 05:35:06 +0000 (22:35 -0700)]
samples: bpf: make the build less noisy

Building samples with clang ignores the $(Q) setting, always
printing full command to the output.  Make it less verbose.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agosamples: bpf: move libbpf from object dependencies to libs
Jakub Kicinski [Tue, 15 May 2018 05:35:05 +0000 (22:35 -0700)]
samples: bpf: move libbpf from object dependencies to libs

Make complains that it doesn't know how to make libbpf.a:

scripts/Makefile.host:106: target 'samples/bpf/../../tools/lib/bpf/libbpf.a' doesn't match the target pattern

Now that we have it as a dependency of the sources simply add libbpf.a
to libraries not objects.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agosamples: bpf: fix build after move to compiling full libbpf.a
Jakub Kicinski [Tue, 15 May 2018 05:35:04 +0000 (22:35 -0700)]
samples: bpf: fix build after move to compiling full libbpf.a

There are many ways users may compile samples, some of them got
broken by commit 5f9380572b4b ("samples: bpf: compile and link
against full libbpf").  Improve path resolution and make libbpf
building a dependency of source files to force its build.

Samples should now again build with any of:
 cd samples/bpf; make
 make samples/bpf/
 make -C samples/bpf
 cd samples/bpf; make O=builddir
 make samples/bpf/ O=builddir
 make -C samples/bpf O=builddir
 export KBUILD_OUTPUT=builddir
 make samples/bpf/
 make -C samples/bpf

Fixes: 5f9380572b4b ("samples: bpf: compile and link against full libbpf")
Reported-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agosamples: bpf: rename libbpf.h to bpf_insn.h
Jakub Kicinski [Tue, 15 May 2018 05:35:03 +0000 (22:35 -0700)]
samples: bpf: rename libbpf.h to bpf_insn.h

The libbpf.h file in samples is clashing with libbpf's header.
Since it only includes a subset of filter.h instruction helpers
rename it to bpf_insn.h.  Drop the unnecessary include of bpf/bpf.h.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agosamples: bpf: include bpf/bpf.h instead of local libbpf.h
Jakub Kicinski [Tue, 15 May 2018 05:35:02 +0000 (22:35 -0700)]
samples: bpf: include bpf/bpf.h instead of local libbpf.h

There are two files in the tree called libbpf.h which is becoming
problematic.  Most samples don't actually need the local libbpf.h
they simply include it to get to bpf/bpf.h.  Include bpf/bpf.h
directly instead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agoMerge branch 'sctp-Introduce-sctp_flush_ctx'
David S. Miller [Tue, 15 May 2018 03:15:27 +0000 (23:15 -0400)]
Merge branch 'sctp-Introduce-sctp_flush_ctx'

Marcelo Ricardo Leitner says:

====================
sctp: Introduce sctp_flush_ctx

This struct will hold all the context used during the outq flush, so we
don't have to pass lots of pointers all around.

Checked on x86_64, the compiler inlines all these functions and there is no
derreference added because of the struct.

This patchset depends on 'sctp: refactor sctp_outq_flush'

Changes since v1:
- updated to build on top of v2 of 'sctp: refactor sctp_outq_flush'

Changes since v2:
- fixed a rebase issue which reverted a change in patch 2.
- rebased on v3 of 'sctp: refactor sctp_outq_flush'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: checkpatch fixups
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:35:20 +0000 (14:35 -0300)]
sctp: checkpatch fixups

A collection of fixups from previous patches, left for later to not
introduce unnecessary changes while moving code around.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: add asoc and packet to sctp_flush_ctx
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:35:19 +0000 (14:35 -0300)]
sctp: add asoc and packet to sctp_flush_ctx

Pre-compute these so the compiler won't reload them (due to
no-strict-aliasing).

Changes since v2:
- Do not replace a return with a break in sctp_outq_flush_data

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: add sctp_flush_ctx, a context struct on outq_flush routines
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:35:18 +0000 (14:35 -0300)]
sctp: add sctp_flush_ctx, a context struct on outq_flush routines

With this struct we avoid passing lots of variables around and taking care
of updating the current transport/packet.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sctp-refactor-sctp_outq_flush'
David S. Miller [Tue, 15 May 2018 02:57:15 +0000 (22:57 -0400)]
Merge branch 'sctp-refactor-sctp_outq_flush'

Marcelo Ricardo Leitner says:

====================
sctp: refactor sctp_outq_flush

Currently sctp_outq_flush does many different things and arguably
unrelated, such as doing transport selection and outq dequeueing.

This patchset refactors it into smaller and more dedicated functions.
The end behavior should be the same.

The next patchset will rework the function parameters.

Changes since v1:
- fix build issues on patches 3 and 4, and updated 5 and 8 because of
  it.

Changes since v2:
- fixed panic if building with just up to patch 3 applied
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: rework switch cases in sctp_outq_flush_data
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:43 +0000 (14:34 -0300)]
sctp: rework switch cases in sctp_outq_flush_data

Remove an inner one, which tended to be error prone due to the cascading
and it can be replaced by a simple if ().

Rework the outer one so that the actual flush code is not inside it. Now
we first validate if we can or cannot send data, return if not, and then
the flush code.

Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: make use of gfp on retransmissions
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:42 +0000 (14:34 -0300)]
sctp: make use of gfp on retransmissions

Retransmissions may be triggered when in user context, so lets make use
of gfp.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: move transport flush code out of sctp_outq_flush
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:41 +0000 (14:34 -0300)]
sctp: move transport flush code out of sctp_outq_flush

To the new sctp_outq_flush_transports.

Comment on Nagle is outdated and removed. Nagle is performed earlier, while
checking if the chunk fits the packet: if the outq length is not enough to
fill the packet, it returns SCTP_XMIT_DELAY.

So by when it gets to sctp_outq_flush_transports, it has to go through all
enlisted transports.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: move flushing of data chunks out of sctp_outq_flush
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:40 +0000 (14:34 -0300)]
sctp: move flushing of data chunks out of sctp_outq_flush

To the new sctp_outq_flush_data. Again, smaller functions and with well
defined objectives.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: move outq data rtx code out of sctp_outq_flush
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:39 +0000 (14:34 -0300)]
sctp: move outq data rtx code out of sctp_outq_flush

This patch renames current sctp_outq_flush_rtx to __sctp_outq_flush_rtx
and create a new sctp_outq_flush_rtx, with the code that was on
sctp_outq_flush. Again, the idea is to have functions with small and
defined objectives.

Yes, there is an open-coded path selection in the now sctp_outq_flush_rtx.
That is kept as is for now because it may be very different when we
implement retransmission path selection algorithms for CMT-SCTP.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: move the flush of ctrl chunks into its own function
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:38 +0000 (14:34 -0300)]
sctp: move the flush of ctrl chunks into its own function

Named sctp_outq_flush_ctrl and, with that, keep the contexts contained.

One small fix embedded is the reset of one_packet at every iteration.
This allows bundling of some control chunks in case they were preceeded by
another control chunk that cannot be bundled.

Other than this, it has the same behavior.

Changes since v2:
- Fixed panic reported by kbuild test robot if building with
  only up to this patch applied, due to bad parameter to
  sctp_outq_select_transport and by not initializing packet after
  calling sctp_outq_flush_ctrl.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: factor out sctp_outq_select_transport
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:37 +0000 (14:34 -0300)]
sctp: factor out sctp_outq_select_transport

We had two spots doing such complex operation and they were very close to
each other, a bit more tailored to here or there.

This patch unifies these under the same function,
sctp_outq_select_transport, which knows how to handle control chunks and
original transmissions (but not retransmissions).

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: add sctp_packet_singleton
Marcelo Ricardo Leitner [Mon, 14 May 2018 17:34:36 +0000 (14:34 -0300)]
sctp: add sctp_packet_singleton

Factor out the code for generating singletons. It's used only once, but
helps to keep the context contained.

The const variables are to ease the reading of subsequent calls in there.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add tc flower match support for tunnel VNI
Kumar Sanghvi [Mon, 14 May 2018 12:21:21 +0000 (17:51 +0530)]
cxgb4: add tc flower match support for tunnel VNI

Adds support for matching flows based on tunnel VNI value.
Introduces fw APIs for allocating/removing MPS entries related
to encapsulation. And uses the same while adding/deleting filters
for offloading flows based on tunnel VNI match.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: stmmac: Add Jose Abreu as co-maintainer
Jose Abreu [Mon, 14 May 2018 09:29:56 +0000 (10:29 +0100)]
net: stmmac: Add Jose Abreu as co-maintainer

I'm offering to be a co-maintainer for stmmac driver.

As per discussion with Alexandre, I will arrange to get STM32 boards to
test patches in GMAC version 3.x and 4.1. I also have HW to test GMAC
version 5.

Looking forward to contribute to net-dev!

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bpf-jit-cleanups'
Alexei Starovoitov [Tue, 15 May 2018 02:11:46 +0000 (19:11 -0700)]
Merge branch 'bpf-jit-cleanups'

Daniel Borkmann says:

====================
This series follows up mostly with with some minor cleanups on top
of 'Move ld_abs/ld_ind to native BPF' as well as implements better
32/64 bit immediate load into register and saves tail call init on
cBPF for the arm64 JIT. Last but not least we add a couple of test
cases. For details please see individual patches. Thanks!

v1 -> v2:
  - Minor fix in i64_i16_blocks() to remove 24 shift.
  - Added last two patches.
  - Added Acks from prior round.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf: add ld64 imm test cases
Daniel Borkmann [Mon, 14 May 2018 21:22:34 +0000 (23:22 +0200)]
bpf: add ld64 imm test cases

Add test cases where we combine semi-random imm values, mainly for testing
JITs when they have different encoding options for 64 bit immediates in
order to reduce resulting image size.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, arm64: save 4 bytes in prologue when ebpf insns came from cbpf
Daniel Borkmann [Mon, 14 May 2018 21:22:33 +0000 (23:22 +0200)]
bpf, arm64: save 4 bytes in prologue when ebpf insns came from cbpf

We can trivially save 4 bytes in prologue for cBPF since tail calls
can never be used from there. The register push/pop is pairwise,
here, x25 (fp) and x26 (tcc), so no point in changing that, only
reset to zero is not needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, arm64: optimize 32/64 immediate emission
Daniel Borkmann [Mon, 14 May 2018 21:22:32 +0000 (23:22 +0200)]
bpf, arm64: optimize 32/64 immediate emission

Improve the JIT to emit 64 and 32 bit immediates, the current
algorithm is not optimal and we often emit more instructions
than actually needed. arm64 has movz, movn, movk variants but
for the current 64 bit immediates we only use movz with a
series of movk when needed.

For example loading ffffffffffffabab emits the following 4
instructions in the JIT today:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: ffff, shift: 16, result: 00000000ffffabab
  * movk: ffff, shift: 32, result: 0000ffffffffabab
  * movk: ffff, shift: 48, result: ffffffffffffabab

Whereas after the patch the same load only needs a single
instruction:

  * movn: 5454, shift:  0, result: ffffffffffffabab

Another example where two extra instructions can be saved:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: 1f2f, shift: 16, result: 000000001f2fabab
  * movk: ffff, shift: 32, result: 0000ffff1f2fabab
  * movk: ffff, shift: 48, result: ffffffff1f2fabab

After the patch:

  * movn: e0d0, shift: 16, result: ffffffff1f2fffff
  * movk: abab, shift:  0, result: ffffffff1f2fabab

Another example with movz, before:

  * movz: 0000, shift:  0, result: 0000000000000000
  * movk: fea0, shift: 32, result: 0000fea000000000

After:

  * movz: fea0, shift: 32, result: 0000fea000000000

Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
are loaded via emit_a64_mov_i64() which is a similar optimization
as done in 6fe8b9c1f41d ("bpf, x64: save several bytes by using
mov over movabsq when possible"). On arm64, the latter allows to
use a single instruction with movn due to zero extension where
otherwise two would be needed. And last but not least add a
missing optimization in emit_a64_mov_i() where movn is used but
the subsequent movk not needed. With some of the Cilium programs
in use, this shrinks the needed instructions by about three
percent. Tested on Cavium ThunderX CN8890.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 years agobpf, arm64: save 4 bytes of unneeded stack space
Daniel Borkmann [Mon, 14 May 2018 21:22:31 +0000 (23:22 +0200)]
bpf, arm64: save 4 bytes of unneeded stack space

Follow-up to 816d9ef32a8b ("bpf, arm64: remove ld_abs/ld_ind") in
that the extra 4 byte JIT scratchpad is not needed anymore since it
was in ld_abs/ld_ind as stack buffer for bpf_load_pointer().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>