www.infradead.org Git - users/jedix/linux-maple.git/log

sparc64: sunvdc: skip vdisk response validation upon error

Skip validating the vdisk IO response from vdisk server, if IO
request has failed.

sunvdc checks if the size of the request processed by the
server matches with the size of request sent by vdc. This
is to ensure that partial IO completions are caught, since
it's not expected. In the case where the server reports an
error, it could set the size of IO processed to zero.
Therefore, validating the size of request processed in the
case of an error could mis-classify the problem.

Orabug: 26242270

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: add DAX2 support to dax driver

Orabug: 25904994

Introduce DAX2 support in the driver. This involves negotiating the right
version with hypervisor as well as exposing a new INIT_V2 ioctl. This new
ioctl will return failure if DAX2 is not present on the system, otherwise
it will attempt to initialize the DAX2. A user should first call INIT_V2
and on failure call INIT_V1. See Documentation/sparc/dax.txt for more
detail.

Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
Reviewed-by: Sanath Kumar <sanath.s.kumar@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Exclude perf user callchain during critical sections

This fixes another cause of random segfaults and bus errors that
may occur while running perf with the callgraph (-g) option.

Critical sections beginning with spin_lock_irqsave() raise the
interrupt level to PIL_NORMAL_MAX (14) and intentionally do not block
performance counter interrupts, which arrive at PIL_NMI (15).  So perf
code must be very careful about what it does since it might execute in
the middle of one of these critical sections. In particular, the
perf_callchain_user() path is problematic because it accesses user
space and may cause TLB activity as well as faults as it unwinds the
user stack.

One particular critical section occurs in switch_mm:

spin_lock_irqsave(&mm->context.lock, flags);
...
load_secondary_context(mm);
tsb_context_switch(mm);
...
spin_unlock_irqrestore(&mm->context.lock, flags);

If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active tsb for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.

Some potential solutions are:

1) Make critical sections run at PIL_NMI instead of PIL_NORMAL_MAX.
This would certainly eliminate the problem, but it would also prevent
perf from having any visibility into code running in these critical
sections, and it seems clear that PIL_NORMAL_MAX is used for just
this reason.

2) Protect this particular critical section by masking all interrupts,
either by setting %pil to PIL_NMI or by clearing pstate.ie around the
calls to load_secondary_context() and tsb_context_switch(). This approach
has a few drawbacks:

- It would only address this particular critical section, and would
   have to be repeated in several other known places. There might be
   other such critical sections that are not known.

- It has a performance cost which would be incurred at every context
   switch, since it would require additional accesses to %pil or
   %pstate.

- Turning off pstate.ie would require changing __tsb_context_switch(),
   which expects to be called with pstate.ie on.

3) Avoid user space MMU activity entirely in perf_callchain_user() by
implementing a new copy_from_user() function that accesses the user
stack via physical addresses. This works, but requires quite a bit of
new code to get it to perform reasonably, ie, caching of translations,
etc.

4) Allow the perf interrupt to happen in existing critical sections as
it does now, but have perf code detect that this is happening, and
skip any user callchain processing. This approach was deemed best, as
the change is extremely localized and covers both known and unknown
instances of perf interrupting critical sections. Perf has interrupted
a critical section when %pil == PIL_NORMAL_MAX at the time of the perf
interrupt.

Ordinarily, a function that has a pt_regs passed in can simply examine
(reg->tstate & TSTATE_PIL) to extract the interrupt level that was in
effect at the time of the perf interrupt.  However, when the perf
interrupt occurs while executing in the kernel, the pt_regs pointer is
replaced with task_pt_regs(current) in perf_callchain(), and so
perf_callchain_user() does not have access to the state of the machine
at the time of the perf interrupt. To work around this, we check
(regs->tstate & TSTATE_PIL) in perf_event_nmi_handler() before calling
in to the arch independent part of perf, and temporarily change the
event attributes so that user callchain processing is avoided. Though
a user stack sample is not collected, this loss is not statistically
significant. Kernel call graph collection is not affected.

Orabug: 25577560

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: rtrap must set PSTATE.mcde before handling outstanding user work

The kernel must execute with PSTATE.mcde=1 for ADI version checking to
be enabled when the kernel reads or writes user memory mapped with ADI
enabled using versioned addresses.  If PSTATE.mcde=0 then the MMU
interprets version bits in an address as address bits, and an access
attempt results in a data access exception.  Until now setting
PSTATE.mcde=1 in the kernel has been handled only by patching etrap to
ensure that is set on entry into the kernel.  However, there are code
paths in rtrap that overwrite PSTATE and inadvertently clear PSTATE.mcde
before additional execution happens in the kernel.

rtrap is executed to exit the kernel and return to user mode execution.
Before restoring registers and returning to user mode, rtrap checks for
work to do.  The check is done with interrupts disabled, and if there is
work to do, then interrupts are enabled before calling a function to
complete the work after which interrupts are disabled again and the
check is repeated.  Interrupts are disabled and enabled by overwriting
PSTATE.  Possible work includes (but is not limited to) preemption,
signal delivery, and  writing out buffered user register windows to the
stack.  All of these may lead to accessing user addresses.  In the case
of preemption, a resumed thread will run with PSTATE.mcde=0 until it
completes a return to user mode or is rescheduled on a CPU where
PSTATE.mcde is set.  If the thread accesses ADI-enabled user memory with
a versioned address (e.g. to complete some I/O) in that timeframe then
the access will fail.  To fix the problem, patch rtrap to set
PSTATE.mcde when interrupts are enabled before handling the work.

Orabug: 25853545
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: restrict advertized checksum offloads to just IP

As much as we'd like to play well with others, we really aren't
handling the checksums on non-IP protocol packets very well. This
is easily seen when trying to do TCP over ipv6 - the checksums are
garbage.

Here we restrict the checksum feature flag to just IP traffic so
that we aren't given work we can't yet do.

Orabug: 26175391, 26259755

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit 7e9191c54a36c864b901ea8ce56dc42f10c2f5ae)
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc-config: Enable timestamp in dmesg output.

Orabug: 24760107

Currently timestamp is not enabled in dmesg output. Timestamp
helps a lot in debugging timing related kernel issues.

Enable it by default.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Avoid DCTI Couples

Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated per
Oracle SPARC Architecture notes below(Section 6.3.4.7 - DCTI Couples).

"A delayed control transfer instruction (DCTI) in the delay slot of another
DCTI is referred to as a DCTI couple. The use of DCTI couples is deprecated
in the Oracle SPARC Architecture; no new software should place a DCTI in
the delay slot of another DCTI, because on future Oracle SPARC Architecture
implementations DCTI couples may execute either slowly or differently than
the programmer assumes it will."

Orabug: 25456049

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

net/rds: Fix minor linker warnings

Fixes: fcdaab66 {IB/{core,ipoib},net/{mlx4,rds}}: Mark unload_allowed
as __initdata variable

Seeing this warning while building the kernel. Fix it.
MODPOST 1555 modules
WARNING: net/rds/rds_rdma.o(.text+0x1d8): Section mismatch in
reference from the function rds_rdma_init() to the variable
.init.data:unload_allowed
The function rds_rdma_init() references
the variable __initdata unload_allowed.
This is often because rds_rdma_init lacks a __initdata
annotation or the annotation of unload_allowed is wrong.

Orabug: 25393132

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

drivers/usb: Skip auto handoff for TI and RENESAS usb controllers

I have never seen auto handoff working on TI and RENESAS xhci
cards. Eventually, we force handoff. This code forces the handoff
unconditionally. It saves 5 seconds boot time for each card.
Added the vendor/device id checks for the card which I have tested.

Orabug: 22345396

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Karl Volz <karl.volz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

usb/core: Added devspec sysfs entry for devices behind the usb hub

Grub finds incorrect of_node path for devices behind usb hub.
Added devspec sysfs entry for devices behind usb hub so that
right of_node path is returned during grub sysfs walk for these
devices.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 51fa91475e431e75b802dbab7c056f2829dc9410)

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Orabug: 24785721
Signed-off-by: Allen Pais <allen.pais@oracle.com>

USB: core: let USB device know device node

Although most of USB devices are hot-plug's, there are still some devices
are hard wired on the board, eg, for HSIC and SSIC interface USB devices.
If these kinds of USB devices are multiple functions, and they can supply
other interfaces like i2c, gpios for other devices, we may need to
describe these at device tree.

In this commit, it uses "reg" in dts as physical port number to match
the phyiscal port number decided by USB core, if they are the same,
then the device node is for the device we are creating for USB core.

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 69bec725985324e79b1c47ea287815ac4ddb0521)

Conflicts:

include/linux/usb/of.h

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Orabug: 24785721
Signed-off-by: Allen Pais <allen.pais@oracle.com>

Improves clear_huge_page() using work queues

The idea is to exploit the parallelism available on large
multicore systems such as SPARC T7 systems to clear huge pages
in parallel with multiple worker threads.

Orabug: 25130263

Reviewed-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Kishore Pusukuri <kishore.kumar.pusukuri@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

bnxt: add dma mapping attributes

On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING
attribute in our dma mapping in order to get the expected performance
out of the receive path. Setting this boosts a simple iperf receive
session from 2 Gbe to 23.4 Gbe.

Orabug: 25830685

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

dma-mapping: add interfaces for mapping pages with attributes

Fix up the map and unmap page interfaces to make attributes
available.

Orabug: 25830685

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
(backported from commit 0495c3d367944e4af053983ff3cdf256b567b053)
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

IB/IPoIB: ibX: failed to create mcg debug file

When udev renames the netdev devices, ipoib debugfs entries does not
get renamed. As a result, if subsequent probe of ipoib device reuse the
name then creating a debugfs entry for the new device would fail.

Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part
of ipoib event handling in order to avoid any race condition between these.

Fixes: 1732b0ef3b3a ([IPoIB] add path record information in debugfs)
Cc: stable@vger.kernel.org # 2.6.15+
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
commit 771a52584096c45e4565e8aabb596eece9d73d61)

Orabug: 24711873

Reviewed-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ftrace: remove unnecessary __maybe_unused from waitfd() parameters

__maybe_unused is not required for unused function parameters. In
fact, using __maybe_unused confuses the ftrace parser. Thus, this
change removes these superfluous descriptors.

Orabug: 24327424

Reviewed-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
(cherry picked from commit 2f772dbf28c6bd410ca0855fc3260331030b6d9a)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

mm: fix new crash in unmapped_area_topdown()

Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
mmap testing.  That's the VM_BUG_ON(gap_end < gap_start) at the
end of unmapped_area_topdown().  Linus points out how MAP_FIXED
(which does not have to respect our stack guard gap intentions)
could result in gap_end below gap_start there.  Fix that, and
the similar case in its alternative, unmapped_area().

Cc: stable@vger.kernel.org
Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89)

Orabug: 26161422
CVE: CVE-2017-1000364
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>

mm: larger stack guard gap, between vmas

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1be7107fbe18eed3e319a6c3e83c78254b693acb)

Orabug: 26161422
CVE: CVE-2017-1000364
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
arch/powerpc/mm/hugetlbpage-radix.c
arch/powerpc/mm/mmap.c
arch/s390/mm/mmap.c
include/linux/mm.h
mm/gup.c
mm/memory.c
mm/mmap.c

net/rds: prioritize the base connection establishment

As of today, all the TOS connections can only be established after their
base connections are up. This is due to the fact that TOS connections rely
on their base connections to perform route resolution. Nevertheless, when
all the connections drop/reconnect(e.g., ADDR_CHANGE event), the TOS
connections establishment consume the CPU resources by constantly retrying
the connection establishment until their base connections are up.

Thus, this patch delays all the TOS connections if their associated base
connections are not up. By doing so, the priority is given to the base
connections establishment. Consequently, the base connections can be
established faster and subsequent their associated TOS connections.

Orabug: 25521901

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
Tested-by: Rosa Isela Lopez Romero <rosa.lopez@oracle.com>

net/rds: determine active/passive connection with IP addresses

This patch changes RDS to use randomize backoff only in the first attempt
to reconnect. This means both ends try to be active by sending out REQ to
its peer in random t seconds. If the connection can't be established due to
a race, the peer IP addresses comparison is used to determine
active/passive connection establishment. (e.g IP_A > IP_B)

The following description illustrates the connection establishment,

t1randA: 192.168.1.A (active)  --------------> 192.168.1.B (passive)
t1randB: 192.168.1.A (passive) <-------------  192.168.1.B (active)
t2     : 192.168.1.A (active) ---------------> REJ
t3     : 192.168.1.B (active) ---------------> REJ
t4     : Connection between A,B is not up.
t5     : 192.168.1.A (active) --------------> 192.168.1.B (passive)

Orabug: 25521901

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Suggested-by : Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
Tested-by: Rosa Isela Lopez Romero <rosa.lopez@oracle.com>

net/rds: use different workqueue for base_conn

RDS uses rds_wq for various operations. During ADDR_CHANGE event,
each connection queues at least two tasks into this single-threaded
workqueue. Furthermore, the TOS connections have dependency on its
base connection. Thus, a separate workqueue is created specifically
for the base connections to speed up base connection establishment.

Orabug: 25521901

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
Tested-by: Rosa Isela Lopez Romero <rosa.lopez@oracle.com>

net/rds: Revert "RDS: add reconnect retry scheme for stalled connections"

This reverts commit 5acb959ad59966b0b6905802ed720d26c560c3c5.

Commit "RDS: add reconnect retry scheme for stalled connections" introduces
the rds_reconnect_timeout to retry the connection establishment after
sysctl_reconnect_retry_ms (default is 1000ms). Nevertheless, this proactive
mechanism is overkilled and it is causing long brownout time in the
virtualized environment. In short, below are the justifications why commit
5acb959ad59966b0b6905802ed720d26c560c3c5 is reverted.

a) The retry counter starts ticking after RDS received an ADDR_CHANGE
event. After receiving an ADDR_CHANGE event, RDS needs to perform shutdown
via shutdown_worker. Then, initiate a new connection via connect_worker.
Eventually, a CM REQ is only sent out after rds received
RDMA_CM_EVENT_ADDR_RESOLVED and RDMA_CM_EVENT_ROUTE_RESOLVED events. If the
retry is made to cater for stalled connection due to missing CM messages,
the retry should only happen after a CM REQ is sent. With the current
retry scheme (and with the default 1000 ms) that happens after ADDR_CHANGE
event, it introduces congestion in the single threaded workqueue.

b) Assuming that we modify the retry counter to start ticking after a CM
REQ message is sent out. By introducing another retry timeout, it
complicates the system tuning. Why? First, the sysctl_reconnect_retry_ms
relies on the underlying cma_response_timeout. Any modication of
cma_response_timeout requires to tune sysctl_reconnect_retry_ms. Second, it
is hard to find a universal timing that fits all configurations
(bare-metal, virtualized, mixed environment, and homo/heterogenous system).

Orabug: 25521901

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
Tested-by: Rosa Isela Lopez Romero <rosa.lopez@oracle.com>
Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

IB/mlx4: Fix CM REQ retries in paravirt mode

CM REQs cannot be successfully retried, because a new pv_cm_id is
created for each request, without checking if one already exists.

This commit fixes this, by checking if an id exists before creating
one.

This bug can be provoked by running an RDMA CM user-land application,
but inserting a five seconds delay before the rdma_accept() call on
the passive side. This delay is larger than the default CMA timeout,
and triggers a retry from the active side. The retried REQ will use
another pv_cm_id (the cm_id on the wire). This confuses the CM
protocol and two REJs are sent from the passive side.

This commit is required to achieve the reduced HA Brownout time,
needed by Exadata. The Brownout issue is tracked by orabug 25521901.

Orabug: 26287667

Suggested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reported-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

uek-rpm: enable CONFIG_TCM_USER2 for ol6 and ol7

Enables userspace passthrough driver (tcmu) for target core.

Orabug: 25395066

Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case

For the bidirectional case, the Data-Out buffer blocks will always at
the head of the tcmu_cmd's bitmap, and before gathering the Data-In
buffer, first of all it should skip the Data-Out ones, or the device
supporting BIDI commands won't work.

Fixed: 26418649eead ("target/user: Introduce data_bitmap, replace
data_length/data_head/data_tail")
Reported-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Cc: stable@vger.kernel.org # 4.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit a5d68ba85801a78c892a0eb8efb711e293ed314b)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

tcmu: Fix wrongly calculating of the base_command_size

The t_data_nents and t_bidi_data_nents are the numbers of the
segments, but it couldn't be sure the block size equals to size
of the segment.

For the worst case, all the blocks are discontiguous and there
will need the same number of iovecs, that's to say: blocks == iovs.
So here just set the number of iovs to block count needed by tcmu
cmd.

Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit abe342a5b4b5aa579f6bf40ba73447c699e6b579)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

tcmu: Fix possible overwrite of t_data_sg's last iov[]

If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.

To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.

So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.

Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit ab22d2604c86ceb01bb2725c9860b88a7dd383bb)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

tcmu: make cmd timeout configurable

A single daemon could implement multiple types of devices
using multuple types of real devices that may not support
restarting from crashes and/or handling tcmu timeouts. This
makes the cmd timeout configurable, so handlers that do not
support it can turn if off for now.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit af980e46a26ac8805685bb70c8572dbc47abb126)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

tcmu: add helper to check if dev was configured

This adds a helper to check if the dev was configured. It
will be used in the next patch to prevent updates to some
config settings after the device has been setup.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 972c7f167974fa41ea8a2eed4b857cc59f59c42c)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Conflicts:
drivers/target/target_core_user.c

tcmu: return on first Opt parse failure

We only were returing failure if the last opt to be parsed failed.
This has a return failure when we first detect a failure.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 2579325ca0acc598fdf41ba12b2871d3467f28df)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

tcmu: allow hw_max_sectors greater than 128

tcmu hard codes the hw_max_sectors to 128 which is a litle small.
Userspace uses the max_sectors to report the optimal IO size and
some initiators perform better with larger IOs (open-iscsi seems
to do better with 256 to 512 depending on the test).

(Fix do not display hw max sectors twice - MNC)

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 3abaa2bfdb1e6bb33d38a2e82cf3bb82ec0197bf)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix use-after-free of tcmu_cmds if they are expired

Don't free the cmd in tcmu_check_expired_cmd, it's still referenced by
an entry in our cmd_id->cmd idr. If userspace ever resumes processing,
tcmu_handle_completions() will use the now-invalid cmd pointer.

Instead, don't free cmd. It will be freed by tcmu_handle_completion() if
userspace ever recovers, or tcmu_free_device if not.

Cc: stable@vger.kernel.org
Reported-by: Bryant G Ly <bgly@us.ibm.com>
Tested-by: Bryant G Ly <bgly@us.ibm.com>
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Orabug: 25395066
(cherry picked from commit d0905ca757bc40bd1ebc261a448a521b064777d7)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Add an #include directive

Since this driver uses kmap_atomic(), include the highmem header file.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Andy Grover <agrover@redhat.com>
Orabug: 25395066
(cherry picked from commit f5045724578babc7bd3460087f34cc787a8b0e20)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix a data type in tcmu_queue_cmd()

This patch avoids that sparse reports the following error messages:

drivers/target/target_core_user.c:547:13: warning: incorrect type in assignment (different base types)
drivers/target/target_core_user.c:547:13:    expected int [signed] ret
drivers/target/target_core_user.c:547:13:    got restricted sense_reason_t
drivers/target/target_core_user.c:548:20: warning: restricted sense_reason_t degrades to integer
drivers/target/target_core_user.c:557:16: warning: incorrect type in return expression (different base types)
drivers/target/target_core_user.c:557:16:    expected restricted sense_reason_t
drivers/target/target_core_user.c:557:16:    got int [signed] ret

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Orabug: 25395066
(cherry picked from commit ecaf597b411e9a7b071bf7a36a4cf750c529cd28)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix comments to not refer to data ring

We no longer use a ringbuffer for the data area, so this might cause
confusion. Just call it the data area.

Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 3d9b95558f5874ac5d63a057813dc66b480de7e1)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Return an error if cmd data size is too large

Userspace should be implementing VPD B0 (Block Limits) to inform the
initiator of max data size, but just in case we do get a too-large request,
do what the spec says and return INVALID_CDB_FIELD.

Make sure to unlock udev->cmdr_lock before returning.

Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 554617b2bbe25c3fb3c80c28fe7a465884bb40b1)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Use sense_reason_t in tcmu_queue_cmd_ring

Instead of using -ERROR-style returns, use sense_reason_t. This lets us
remove tcmu_pass_op(), and return more correct sense values.

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 02eb924fabc5b699c0d9d354491e6f0767e3c139)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Report capability of handling out-of-order completions to userspace

TCMU_MAILBOX_FLAG_CAP_OOOC was introduced, and userspace can check the flag
for out-of-order completion capability support.

Also update the document on how to use the feature.

Signed-off-by: Sheng Yang <sheng@yasker.org>
Reviewed-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 32c76de3466ed2a875e36c140ac4e3800fdfab6e)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix size_t format-spec build warning

Fix the following printk size_t warning as per 0-day build:

All warnings (new ones prefixed by >>):

   drivers/target/target_core_user.c: In function 'is_ring_space_avail':
>> drivers/target/target_core_user.c:385:12: warning: format '%lu'
>> expects argument of type 'long unsigned int', but argument 3 has type
>> 'size_t {aka unsigned int}' [-Wformat=]
      pr_debug("no data space: only %lu available, but ask for %lu\n",
               ^

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 0241fd39ce7bc9b82b7e57305cb0d6bb1364d45b)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Don't free expired command when time out

Which would result in NPE after when userspace connected again.

Expired command would be freed either when handling command(by userspace),
or when device was tearing down

Reviewed-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Sheng Yang <sheng@yasker.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit b25c786399367b9a8bd955d8496669d019409bec)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Introduce data_bitmap, replace data_length/data_head/data_tail

The data_bitmap was introduced to support asynchornization accessing of
data area.

We divide mailbox data area into blocks, and use data_bitmap to track the
usage of data area. All the new command's data would start with a new block,
and may left unusable space after it end. But it's easy to track using
data_bitmap.

Now we can allocate data area for asynchronization accessing from userspace,
since we can track the allocation using data_bitmap. The userspace part would
be the same as Maxim's previous asynchronized implementation.

Signed-off-by: Sheng Yang <sheng@yasker.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 26418649eead52619d8dd6cbc6760a1b144dbcd2)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Free data ring in unified function

Prepare for data_bitmap in the next patch.

Reviewed-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Sheng Yang <sheng@yasker.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 0c28481ffb4683ef21c6664d15dbd5ae5a6cd027)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Use iovec[] to describe continuous area

We don't need use one iovec per scatter-gather list entry, since data
area are continuous.

Reviewed-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Sheng Yang <sheng@yasker.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit f1dbd087cc7a28c6c174cb28cf98c19f4efb1fba)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix cast from pointer to phys_addr_t

The uio_mem structure has a member that is a phys_addr_t, but can
be a number of other types too. The target core driver attempts
to assign a pointer from vmalloc() to it, by casting it to
phys_addr_t, but that causes a warning when phys_addr_t is longer
than a pointer:

drivers/target/target_core_user.c: In function 'tcmu_configure_device':
drivers/target/target_core_user.c:906:22: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]

This adds another cast to uintptr_t to shut up the warning.
A nicer fix might be to have additional fields in uio_mem
for the different purposes, so we can assign a pointer directly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 0633e123465b61a12a262b742bebf2a9945f7964)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Make sure netlink would reach all network namespaces

The current code only allow netlink to reach the initial network namespace,
which caused trouble for any client running inside container.

This patch would make sure TCMU netlink would work for all network
namespaces.

Signed-off-by: Sheng Yang <sheng@yasker.org>
Acked-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 20c08b362f4b0c41103fe9d75c61ca348d021441)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Allow user to set block size before enabling device

The capability of setting hw_block_size was added along with 9c1cd1b68
"target/user: Only support full command pass-through", though default
setting override the user specified value during the enabling of device,
which called by target_configure_device() to set block_size matching
hw_block_size, result in user not able to set different block size other
than default 512.

This patch would use existing hw_block_size value if already set, otherwise
it would be set to default value(512).

Update: Fix the coding style issue.

(Drop unnecessary re-export of dev->dev_attrib.block_size - nab)

Signed-off-by: Sheng Yang <sheng@yasker.org>
Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 81ee28de860095cc0c063b92eea53cb97771f796)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Do not set unused fields in tcmu_ops

TCMU sets TRANSPORT_FLAG_PASSTHROUGH, so INQUIRY commands will not be
emulated by LIO but passed up to userspace. Therefore TCMU should not
set these, just like pscsi doesn't.

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 6ba4bd297d99ad522a6414001e6837ddaa8753fd)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix time calc in expired cmd processing

Reversed arguments meant that we were doing nothing for cmds whose deadline
had passed.

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 611e2267b68fc061aea86345b3a8b87151395187)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target: use stringify.h instead of own definition

Signed-off-by: David Disseldorp <ddiss@suse.de>
Acked-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit ac64a2ce509104a746321a4f9646b6750cf281eb)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix UFLAG_UNKNOWN_OP handling

Calling transport_generic_request_failure() from here causes list
corruption. We should be using target_complete_cmd() instead.

Which we do in all other cases, so the UNKNOWN_OP case can become just
another member of the big else/if chain in tcmu_handle_completion().

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit ed97d0cd78a337450e17eb613bdeec15e729af46)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Remove unused variable

We don't use it any more.

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 4824640ec3fc84337cb2baa9fb780e95864feb88)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Fix inconsistent kmap_atomic/kunmap_atomic

Pointers that are mapped by kmap_atomic() + offset must
be unmapped without the offset. That would cause problems
if the SG element length exceeds the PAGE_SIZE limit.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit e2e21bd8f979a24462070cc89fae11e819cae90a)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Add support for bidirectional commands

Enable TCMU to handle bidirectional SCSI commands. In such cases,
entries in iov[] cover both the Data-In and the Data-Out buffers. The
first iov_cnt entries correspond to the Data-Out buffer, while the
remaining iov_bidi_cnt entries correspond to the Data-In buffer.

Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Vangelis Koukis <vkoukis@arrikto.com>
Reviewed-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit e4648b014e03baee45d5f5146c1219b19e4e5f2f)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

target/user: Refactor data area allocation code

Introduce alloc_and_scatter_data_area()/gather_and_free_data_area()
functions that allocate/deallocate space from the data area and copy
data to/from a given scatter-gather list. These functions are needed so
the next patch, introducing support for bidirectional commands in TCMU,
can use the same code path both for t_data_sg and for t_bidi_data_sg.

Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Vangelis Koukis <vkoukis@arrikto.com>
Reviewed-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit f97ec7db1606875666366bfcba8476f8c917db96)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

driver/user: Don't warn for DMA_NONE data direction

Some SCSI commands (for example the TEST UNIT READY command) do not
carry data and so data_direction is DMA_NONE. Patch TCMU to not print a
warning message about unknown data direction, when it is DMA_NONE.

Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Vangelis Koukis <vkoukis@arrikto.com>
Reviewed-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 2bc396a2529ae8a2287f17a49d893ce790e19110)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>

uek-config: disable CONFIG_MOUSE_PS2_VMMOUSE for ol6

Orabug: 26264650

Because ol6 doesn't support secure boot and enabling this item
also causes mouse malfuntion while running ol6 as VMware guest.
So disable it in ol6 configuration

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: missing rtnl_unlock in igb_sriov_reinit()

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f468adc944ef40f23cacdd898e8cfbb5ba4b75a4)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: bump version to igb-5.4.0

Bump igb version to match other igb drivers.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0742337c1984c7cdbac9465cdda004cbdc69d41c)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igbvf: bump version to igbvf-2.4.0

Bump version to match other igbvf drivers.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3d05fd0a99a4a8b49d486116c01c7ac089b64670)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: fix non static symbol warning

Fixes the following sparse warning:

drivers/net/ethernet/intel/igb/igb_ethtool.c:2707:5: warning:
symbol 'igb_rxnfc_write_vlan_prio_filter' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7a823471ad42cba6c3b658494d8437ca5c166292)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: fix error code in igb_add_ethtool_nfc_entry()

Use error "rmgr: Cannot insert RX class rule: Operation not supported" is
more meaningful than "rmgr: Cannot insert RX class rule: Unknown error 524"

Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 54be81328c07324a88cd5dd86b2e7422db3e768e)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: support RX flow classification by VLAN priority

This patch is meant to allow for RX network flow classification to insert
and remove VLAN priority filter by ethtool

Example:
Add an VLAN priority filter:
$ ethtool -N eth0 flow-type ether vlan 0x6000 vlan-mask 0x1FFF action 2 loc 1

Show all filters:
$ ethtool -n eth0
4 RX rings available
Total 1 rules

Filter: 1
Flow Type: Raw Ethernet
Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
Ethertype: 0x0 mask: 0xFFFF
VLAN EtherType: 0x0 mask: 0xffff
VLAN: 0x6000 mask: 0x1fff
User-defined: 0x0 mask: 0xffffffffffffffff
Action: Direct to queue 2

Delete the filter by location:
$ ethtool -N delete 1

Signed-off-by: Ruhao Gao <ruhao.gao@ni.com>
Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7a277a963bf394d80f8998a7143a208e0891f6e7)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: support RX flow classification by ethertype

This patch is meant to allow for RX network flow classification to insert
and remove ethertype filter by ethtool

Example:
Add an ethertype filter:
$ ethtool -N eth0 flow-type ether proto 0x88F8 action 2

Show all filters:
$ ethtool -n eth0
4 RX rings available
Total 1 rules

Filter: 15
Flow Type: Raw Ethernet
Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
Ethertype: 0x88F8 mask: 0x0
Action: Direct to queue 2

Delete the filter by location:
$ ethtool -N delete 15

Signed-off-by: Ruhao Gao <ruhao.gao@ni.com>
Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 64c75d41ace516b7e4f0f187f91282aa43a51b38)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: add support of RX network flow classification

This patch is meant to allow for RX network flow classification to insert
and remove Rx filter by ethtool. Ethtool interface has it's own rules
manager

Show all filters:
$ ethtool -n eth0
4 RX rings available
Total 2 rules

Signed-off-by: Ruhao Gao <ruhao.gao@ni.com>
Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0e71def252815d732f86d11d87d63f7186d9d3be)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: fix adjusting PTP timestamps for Tx/Rx latency

Fix PHY delay compensation math in igb_ptp_tx_hwtstamp() and
igb_ptp_rx_rgtstamp. Add PHY delay compensation in
igb_ptp_rx_pktstamp().

In the IGB driver, there are two functions that retrieve timestamps
received by the PHY - igb_ptp_rx_rgtstamp() and igb_ptp_rx_pktstamp().
The previous commit only changed igb_ptp_rx_rgtstamp(), and the change
was incorrect.

There are two instances in which PHY delay compensations should be
made:

- Before the packet transmission over the PHY, the latency between
  when the packet is timestamped and transmission of the packets,
  should be an add operation, but it is currently a subtract.

- After the packets are received from the PHY, the latency between
  the receiving and timestamping of the packets should be a subtract
  operation, but it is currently an add.

Signed-off-by: Kshitiz Gupta <kshitiz.gupta@ni.com>
Fixes: 3f544d2 (igb: adjust ptp timestamps for tx/rx latency)
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0066c8b6f4050d7c57f6379d6fd4535e2f267f17)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Only DMA sync frame length

On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.

For an IMX6Q, this gives around 6% increased TCP receive performance,
which is cache operations bound and reduces CPU load for TCP transmit.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 64f2525ca4e76b1704b867458808ed6ffc58b803)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: call igb_ptp_suspend during suspend/resume cycle

Properly stop the extra workqueue items and ensure that we resume
cleanly. This is better than using igb_ptp_init and igb_ptp_stop since
these functions destroy the PHC device, which will cause other problems
if we do so. Since igb_ptp_reset now re-schedules the work-queue item we
don't need an equivalent igb_ptp_resume in the resume workflow.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8646f7b4cdf2d0557e718c4524a3e31455b92ad7)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: implement igb_ptp_suspend

Make igb_ptp_stop take advantage of this new function to reduce code
duplication.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit e3f2350de829eb0c3349f416feed81c0a3ac0732)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: re-use igb_ptp_reset in igb_ptp_init

Modify igb_ptp_init to take advantage of igb_ptp_reset, and remove
duplicated work that was occurring in both igb_ptp_reset and
igb_ptp_init.

In total, resetting the TSAUXC register, and resetting the system time
both happen in igb_ptp_reset already. igb_ptp_reset now also takes care
of starting the delayed work item for overflow checks, as well.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 4f3ce71bb8d9c7591f25d3d8315249a250d208f0)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: introduce IGB_PTP_OVERFLOW_CHECK flag

Don't continue to use complex MAC type checks for handling various cases
where we have overflow check code. Make this code more obvious by
introducing a flag which is enabled for hardware that needs these
checks.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 63737166a056fdcc83be46175ad4f63538279d7c)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: introduce ptp_flags variable and use it to replace IGB_FLAG_PTP

Upcoming patches will introduce new PTP specific flags. To avoid
cluttering the normal flags variable, introduce PTP specific "ptp_flags"
variable for this purpose, and move IGB_FLAG_PTP to become
IGB_PTP_ENABLED.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 462f11888207934bbecd0f154f6cf0bedac5ecd0)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igbvf: use BIT() macro instead of shifts

To prevent signed bitshift issues, and improve code readability, use the
BIT() macro. Also make use of GENMASK or the unsigned postfix where this
is more appropriate than BIT()

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0ed2dbf4f469e2e286d903ebc091edfd9be4d063)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igbvf: remove unused variable and dead code

The variable rdlen is set but never used, and thus setting it is dead
code. Remove it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 12b28b41084aa61970fecb417c66c88dcce6afed)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: adjust PTP timestamps for Tx/Rx latency

Table 7-62 on page 338 of the i210 datasheet lists TX and RX latencies
for the various speeds the chip supports. To give better PTP timestamp
accuracy, adjust the timestamps by the amounts Intel gives based on
current link speed.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3f544d2a4d5c2d817cfee9e6302fc2909aaef155)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: make igb_update_pf_vlvf static

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8008f68cb805c0099723ba1b58d26257a52ce890)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: use BIT() macro or unsigned prefix

For bitshifts, we should make use of the BIT macro when possible, and
ensure that other bitshifts are marked as unsigned. This helps prevent
signed bitshift errors, and ensures similar style.

Make use of GENMASK and the unsigned postfix where BIT() isn't
appropriate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a51d8c217b15b97fede844dd6860f7b3c6ffcfef)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

Revert "igb: Fix a deadlock in igb_sriov_reinit"

This reverts commit 3eb14ea8d958 ("igb: Fix a deadlock in
igb_sriov_reinit")
It is the same as commit f468adc944ef ("igb: missing rtnl_unlock in
igb_sriov_reinit()")
There is no rtnl_lock() in igb_resume before, rtnl_unlock will cause a
deadlock.

Signed-off-by: Arika Chen <arika.chen@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d99e366fc90c9b6e6197584ecd3a185441452b0c)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Garbled output for "ethtool -m"

Garbled output for "ethtool -m ethX", in igb-driven NICs with module /
plugin EEPROM (i.e. SFP information). Each output data byte appears
duplicated.

In igb_ethtool.c, igb_get_module_eeprom() is reading the EEPROM via i2c;
the eeprom offset for each word that's read via igb_read_phy_reg_i2c()
was passed in #words, whereas it needs to be a byte offset.
This patches fixes the bug.

Signed-off-by: Doron Shikmoni <doron.shikmoni@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit efea95d45e6ab4a30df9801f8e9bf68007ee9b43)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: allow setting MAC address on i211 using a device tree blob

The Intel i211 LOM PCIe Ethernet controllers' iNVM operates as an OTP
and has no external EEPROM interface [1]. The following allows the
driver to pickup the MAC address from a device tree blob when CONFIG_OF
has been enabled.

[1]
http://www.intel.com/content/www/us/en/embedded/products/networking/i211-ethernet-controller-datasheet.html

Signed-off-by: John Holland <jotihojr@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 806ffb1d504927d1449397377eac63bb63489266)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Fix sparse warning about passing __beXX into leXX_to_cpup

We were casting the addr as __beXX and then passing it into le32_to_cpu
because the device expects the MAC address to be in network order even
though the register set is little endian. Instead of casting it as __beXX
we can just cast it as __leXX in order to maintain consistency since the
region of memory is already in little endian order as far as we are
concerned.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 415cd2a645b2573f173cc52419049f9caacf9a47)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: call ndo_stop() instead of dev_close() when running offline selftest

Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 46eafa59e18d034ba616fdcca688c388d0bbfd91)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Fix VLAN tag stripping on Intel i350

Problem: When switching off VLAN offloading on an i350, the VLAN
interface gets unusable.  For testing, set up a VLAN on an i350
and some remote machine, e.g.:

  $ ip link add link eth0 name eth0.42 type vlan id 42
  $ ip addr add 192.168.42.1/24 dev eth0.42
  $ ip link set dev eth0.42 up

Offloading is switched on by default:

  $ ethtool -k eth0 | grep vlan-offload
  rx-vlan-offload: on
  tx-vlan-offload: on

  $ ping -c 3 -I eth0.42 192.168.42.2
  [...works as usual...]

Now switch off VLAN offloading and try again:

  $ ethtool -K eth0 rxvlan off
  Actual changes:
  rx-vlan-offload: off
  tx-vlan-offload: off [requested on]
  $ ping -c 3 -I eth0.42 192.168.42.2
  PING 192.168.42.2 (192.168.42.2) from 192.168.42.1 eth0.42: 56(84) bytes of da
ta.

  --- 192.168.42.2 ping statistics ---
  3 packets transmitted, 0 received, 100% packet loss, time 1999ms

I can only reproduce it on an i350, the above works fine on a 82580.

While inspecting the igb source, I came across the code in igb_set_vmolr
which sets the E1000_VMOLR_STRVLAN/E1000_DVMOLR_STRVLAN flags once and
for all, and in all of the igb code there's no other place where the
STRVLAN is set or cleared.  Thus, VLAN stripping is enabled in igb
unconditionally, independently of the offloading setting.

I compared that to the latest Intel igb-5.3.3.5 driver from
http://sourceforge.net/projects/e1000/ which in fact sets and clears the
STRVLAN flag independently from igb_set_vmolr in its own function
igb_set_vf_vlan_strip, depending on the vlan settings.

So I included the STRVLAN handling from the igb-5.3.3.5 driver into our
current igb driver and tested the above scenario again.  This time ping
still works after switching off VLAN offloading.

Tested on i350, with and without addtional VFs, as well as on 82580
successfully.

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 030f9f52642a20cbd8c1334a237e92e3ef55e2b1)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igbvf: remove "link is Up" message when registering mcast address

A similar issue was addressed a few years ago in the following thread:

http://www.spinics.net/lists/netdev/msg245877.html

At that time there were concerns that removing this statement may cause other
side effects. However the submitter addressed those concerns. But the dialogue
went cold. We have a new case where a customers application is registering and
un-registering multicast addresses every few seconds. This is leading to many
"Link is Up" messages in the logs as a result of the
"netif_carrier_off(netdev)" statement called by igbvf_msix_other(). Also on
some kernels it is interfering with the bonding driver causing it to failover
and subsequently affecting connectivity.

The Sourgeforge driver does not make this call and is therefore not affected.
If there were any side effects I would expect that driver to also be affected.
I have tested re-loading the igbvf driver and downing the adapter with the PF
entity on the host where the VM has this patch. When I bring it back up again
connectivity is restored as expected. Therefore I request that this patch gets
submitted.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit cc54a59ae6e528c70666033ed085d059f555a57d)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igbvf: Add support for generic Tx checksums

This patch adds support for generic Tx checksums to the igbvf driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.  In
the case of the VF drivers this meant adding support for SCTP CRCs, and
inner checksum offloads for MPLS and various tunnel types.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ea6ce6024f9397ff2667fe16447447e622bc4c31)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/intel/igbvf/netdev.c

igb: Add support for generic Tx checksums

This patch adds support for generic Tx checksums to the igb driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6e033700887bf29d4e59f6978a02d989787be620)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/net/ethernet/intel/igb/igb_main.c

igb: rename igb define to be more generic

E1000_MRQC_ENABLE_RSS_4Q enables 4 and 8 queues depending on the part
so rename to be generic.

Similarly, E1000_MRQC_ENABLE_VMDQ_RSS_2Q has no numeric meaning so
rename to be more generic.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c883de9fd787b6f49bf825f3de3601aeb78a7114)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: add conditions for I210 to generate periodic clock output

In general case the maximum supported half cycle time of the synchronized
output clock is 70msec. Slower half cycle time than 70msec can be
programmed also as long as the output clock is synchronized to whole
seconds, useful specifically for generating a 1Hz clock.

Permitted values for the clock half cycle time are: 125,000,000 decimal,
250,000,000 decimal and 500,000,000 decimal (equals to 125msec, 250msec
and 500msec respectively).

Before this patch, only the half cycle time of less than or equal to 70msec
uses the I210 clock output function. This patch adds additional conditions
when half cycle time is equal to 125msec or 250msec or 500msec to use
clock output function.

Under other conditions, interrupt driven target time output events method
is still used to generate the desired clock output.

Signed-off-by: Roland Hii <roland.king.guan.hii@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 569f3b3d4e9898ae30788fde128e3277d996710e)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: enable WoL for OEM devices regardless of EEPROM setting

Override EEPROM settings for specific OEM devices.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5e350b9260a2e94a9dd1b20fb720d855d5bf1034)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: constify e1000_phy_operations structure

This e1000_phy_operations structure is never modified, so declare it as
const. Other structures of this type are already const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5b70e4a12a525b5f3d4a3e3f0567ed877195b187)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: When GbE link up, wait for Remote receiver status condition

I210 device IPv6 autoconf test sometimes fails,
because DAD NS for link-local is not transmitted.
This packet is silently dropped.
This problem is seen only GbE environment.

igb_watchdog_task link up detection continues to the following process.
The following cases are observed:
1.PHY 1000BASE-T Status Register Remote receiver status bit is NG.
(NG status becomes OK after about 200 - 700ms)
2.In this case, the transfer packet is silently dropped.

1000BASE-T Status register
[Expected]: 0x3800 or 0x7800
[problem occurred]: 0x2800 or 0x6800
Frequency of occurrence: approx 1/10 - 1/40 observed

In order to avoid this problem,
wait until 1000BASE-T Status register "Remote receiver status OK"

After applying this patch, at least 400 runs succeed with no problems.

Signed-off-by: Takuma Ueba <t.ueba11@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b72f3f72005dfd649d787535bd04ada3b3f1b3ba)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Add workaround for VLAN tag stripping on 82576

There was a workaround partially implemented for the 82576 that is needed
in order for VLAN tag stripping to function correctly.  The original code
had side effects that would make it so the workaround was active on all
MACs.  I have updated the code so that the workaround is enabled, but
limited to the 82576, or activated if we exceed the available unicast
addresses.

The workaround has a side effect of mirroring all of the traffic outgoing
from the VFs back to the PF.  As such it is not recommended to use the
82576 in promiscuous mode as it will take a performance hit, though this is
now consistent with the performance as seen on the out-of-tree igb driver.

I also limited the scope of the UTA bits all being set to only when the
VMOLR register is enabled.  This should limit the effects of the UTA
register so that we don't pick up any excess traffic unless promiscuous
mode has been enabled on the PF, whereas before the PF would have ended up
in something equivalent to unicast promiscuous mode with VLAN filtering
otherwise.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bf456abb9b82d5376e7189cca00b528dd86d1559)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Enable use of "bridge fdb add" to set unicast table entries

This change makes it so that we can use the bridge utility to add a FDB
entry for the PF to an igb port. By doing this we can enable the VFs to
talk to virtual ports residing on top of the PF.

In addition this should also address issues with MACVLANs trying to reside
on top of the PF as well as they would have had similar issues when added
to the PF with SR-IOV enabled.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 268f9d33a9319bb2d4d999e264aef9c00081bba0)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Drop unnecessary checks in transmit path

This patch drops several checks that we dropped from ixgbe some ago. It
should not be possible for us to be called with either of the conditional
statements returning true so we can just drop them from the hot-path.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9c2f186e45faa34d5f6ff52aa84c361d4be72288)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Add support for VLAN promiscuous with SR-IOV and NTUPLE

This change fixes things so that we can fully support SR-IOV or the
recently added NTUPLE filtering while allowing support for VLAN promiscuous
mode. By making this change we are able to support possible scenarios such
as SR-IOV with the PF connected to a Linux bridge hosting other VMs.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 16903caa339961b9f8a68b64f4f313789de48599)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Clean-up configuration of VF port VLANs

This patch is meant to clean-up the configuration of the VF port based VLAN
configuration. The original logic was a bit muddled and had some
undesirable side effects such as VLANs being either completely stripped
from the port or VLANs being left when they shouldn't be. The idea behind
this code is to avoid any events such as spurious spoof notifications when
we are removing one VLAN tag and replacing it with another.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a15d92598a5a741037b873fd4a43595a63048bbd)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Merge VLVF configuration into igb_vfta_set

This change makes it so that we can merge the configuration of the VLVF
registers into the setting of the VFTA register. By doing this we simplify
the logic and make use of similar functionality that we have already added
for ixgbe making it easier to maintain both drivers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8b77c6b20f32511175dfd00322ae82fb31949d55)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Always enable VLAN 0 even if 8021q is not loaded

This patch makes it so that we always add VLAN 0. This is important as we
need to guarantee the PF can receive untagged frames in the case of SR-IOV
being enabled but VLAN filtering not being enabled in the kernel.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5982a5565a08abf3b9ff18941b3e3cc94f7c8286)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Do not factor VLANs into RLPML calculation

The RLPML registers already take the size of VLAN headers into account when
determining the maximum packet length. This is called out in EAS documents
for several parts including the 82576 and the i350. As such we can drop
the addition of size to the value programmed into the RLPML registers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d3836f8e2517fb04328c673989fd780030926694)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Allow asymmetric configuration of MTU versus Rx frame size

Since the igb driver is using page based receive there is no point in
limiting the Rx capabilities of the device. The driver can receive 9K
jumbo frames at all times. The only changes needed due to MTU changes are
updates for the FIFO sizes and flow-control watermarks.

Update the maximum frame size to reflect the 9.5K limitation of the
hardware, and replace all instances of max_frame_size with
MAX_JUMBO_FRAME_SIZE when referring to an Rx FIFO or frame.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 45693bcb00cbd379c373ab22ccd9a9d4755cc7ed)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Refactor VFTA configuration

This patch starts the clean-up process on the VFTA configuration.
Specifically in this patch I attempt to address and simplify several items
while also updating the code to bring it more inline with what is already
in ixgbe.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 832e821c51e381966464c8a0f30f12eb1514eba0)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>