Jag Raman [Wed, 21 Jun 2017 15:23:50 +0000 (11:23 -0400)]
sparc64: sunvdc: skip vdisk response validation upon error
Skip validating the vdisk IO response from vdisk server, if IO
request has failed.
sunvdc checks if the size of the request processed by the
server matches with the size of request sent by vdc. This
is to ensure that partial IO completions are caught, since
it's not expected. In the case where the server reports an
error, it could set the size of IO processed to zero.
Therefore, validating the size of request processed in the
case of an error could mis-classify the problem.
Introduce DAX2 support in the driver. This involves negotiating the right
version with hypervisor as well as exposing a new INIT_V2 ioctl. This new
ioctl will return failure if DAX2 is not present on the system, otherwise
it will attempt to initialize the DAX2. A user should first call INIT_V2
and on failure call INIT_V1. See Documentation/sparc/dax.txt for more
detail.
Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com> Reviewed-by: Sanath Kumar <sanath.s.kumar@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Aldridge [Tue, 30 May 2017 14:59:02 +0000 (08:59 -0600)]
sparc64: Exclude perf user callchain during critical sections
This fixes another cause of random segfaults and bus errors that
may occur while running perf with the callgraph (-g) option.
Critical sections beginning with spin_lock_irqsave() raise the
interrupt level to PIL_NORMAL_MAX (14) and intentionally do not block
performance counter interrupts, which arrive at PIL_NMI (15). So perf
code must be very careful about what it does since it might execute in
the middle of one of these critical sections. In particular, the
perf_callchain_user() path is problematic because it accesses user
space and may cause TLB activity as well as faults as it unwinds the
user stack.
One particular critical section occurs in switch_mm:
If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active tsb for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.
Some potential solutions are:
1) Make critical sections run at PIL_NMI instead of PIL_NORMAL_MAX.
This would certainly eliminate the problem, but it would also prevent
perf from having any visibility into code running in these critical
sections, and it seems clear that PIL_NORMAL_MAX is used for just
this reason.
2) Protect this particular critical section by masking all interrupts,
either by setting %pil to PIL_NMI or by clearing pstate.ie around the
calls to load_secondary_context() and tsb_context_switch(). This approach
has a few drawbacks:
- It would only address this particular critical section, and would
have to be repeated in several other known places. There might be
other such critical sections that are not known.
- It has a performance cost which would be incurred at every context
switch, since it would require additional accesses to %pil or
%pstate.
- Turning off pstate.ie would require changing __tsb_context_switch(),
which expects to be called with pstate.ie on.
3) Avoid user space MMU activity entirely in perf_callchain_user() by
implementing a new copy_from_user() function that accesses the user
stack via physical addresses. This works, but requires quite a bit of
new code to get it to perform reasonably, ie, caching of translations,
etc.
4) Allow the perf interrupt to happen in existing critical sections as
it does now, but have perf code detect that this is happening, and
skip any user callchain processing. This approach was deemed best, as
the change is extremely localized and covers both known and unknown
instances of perf interrupting critical sections. Perf has interrupted
a critical section when %pil == PIL_NORMAL_MAX at the time of the perf
interrupt.
Ordinarily, a function that has a pt_regs passed in can simply examine
(reg->tstate & TSTATE_PIL) to extract the interrupt level that was in
effect at the time of the perf interrupt. However, when the perf
interrupt occurs while executing in the kernel, the pt_regs pointer is
replaced with task_pt_regs(current) in perf_callchain(), and so
perf_callchain_user() does not have access to the state of the machine
at the time of the perf interrupt. To work around this, we check
(regs->tstate & TSTATE_PIL) in perf_event_nmi_handler() before calling
in to the arch independent part of perf, and temporarily change the
event attributes so that user callchain processing is avoided. Though
a user stack sample is not collected, this loss is not statistically
significant. Kernel call graph collection is not affected.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Anthony Yznaga [Tue, 13 Jun 2017 20:47:06 +0000 (13:47 -0700)]
sparc64: rtrap must set PSTATE.mcde before handling outstanding user work
The kernel must execute with PSTATE.mcde=1 for ADI version checking to
be enabled when the kernel reads or writes user memory mapped with ADI
enabled using versioned addresses. If PSTATE.mcde=0 then the MMU
interprets version bits in an address as address bits, and an access
attempt results in a data access exception. Until now setting
PSTATE.mcde=1 in the kernel has been handled only by patching etrap to
ensure that is set on entry into the kernel. However, there are code
paths in rtrap that overwrite PSTATE and inadvertently clear PSTATE.mcde
before additional execution happens in the kernel.
rtrap is executed to exit the kernel and return to user mode execution.
Before restoring registers and returning to user mode, rtrap checks for
work to do. The check is done with interrupts disabled, and if there is
work to do, then interrupts are enabled before calling a function to
complete the work after which interrupts are disabled again and the
check is repeated. Interrupts are disabled and enabled by overwriting
PSTATE. Possible work includes (but is not limited to) preemption,
signal delivery, and writing out buffered user register windows to the
stack. All of these may lead to accessing user addresses. In the case
of preemption, a resumed thread will run with PSTATE.mcde=0 until it
completes a return to user mode or is rescheduled on a CPU where
PSTATE.mcde is set. If the thread accesses ADI-enabled user memory with
a versioned address (e.g. to complete some I/O) in that timeframe then
the access will fail. To fix the problem, patch rtrap to set
PSTATE.mcde when interrupts are enabled before handling the work.
Orabug: 25853545 Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Wed, 14 Jun 2017 22:43:37 +0000 (15:43 -0700)]
sunvnet: restrict advertized checksum offloads to just IP
As much as we'd like to play well with others, we really aren't
handling the checksums on non-IP protocol packets very well. This
is easily seen when trying to do TCP over ipv6 - the checksums are
garbage.
Here we restrict the checksum feature flag to just IP traffic so
that we aren't given work we can't yet do.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit 7e9191c54a36c864b901ea8ce56dc42f10c2f5ae) Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Allen Pais [Wed, 21 Jun 2017 09:48:25 +0000 (15:18 +0530)]
arch/sparc: Avoid DCTI Couples
Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated per
Oracle SPARC Architecture notes below(Section 6.3.4.7 - DCTI Couples).
"A delayed control transfer instruction (DCTI) in the delay slot of another
DCTI is referred to as a DCTI couple. The use of DCTI couples is deprecated
in the Oracle SPARC Architecture; no new software should place a DCTI in
the delay slot of another DCTI, because on future Oracle SPARC Architecture
implementations DCTI couples may execute either slowly or differently than
the programmer assumes it will."
Babu Moger [Wed, 11 Jan 2017 00:13:02 +0000 (16:13 -0800)]
net/rds: Fix minor linker warnings
Fixes: fcdaab66 {IB/{core,ipoib},net/{mlx4,rds}}: Mark unload_allowed
as __initdata variable
Seeing this warning while building the kernel. Fix it.
MODPOST 1555 modules
WARNING: net/rds/rds_rdma.o(.text+0x1d8): Section mismatch in
reference from the function rds_rdma_init() to the variable
.init.data:unload_allowed
The function rds_rdma_init() references
the variable __initdata unload_allowed.
This is often because rds_rdma_init lacks a __initdata
annotation or the annotation of unload_allowed is wrong.
Babu Moger [Wed, 30 Mar 2016 20:28:41 +0000 (13:28 -0700)]
drivers/usb: Skip auto handoff for TI and RENESAS usb controllers
I have never seen auto handoff working on TI and RENESAS xhci
cards. Eventually, we force handoff. This code forces the handoff
unconditionally. It saves 5 seconds boot time for each card.
Added the vendor/device id checks for the card which I have tested.
Vijay Kumar [Thu, 6 Oct 2016 19:06:11 +0000 (12:06 -0700)]
usb/core: Added devspec sysfs entry for devices behind the usb hub
Grub finds incorrect of_node path for devices behind usb hub.
Added devspec sysfs entry for devices behind usb hub so that
right of_node path is returned during grub sysfs walk for these
devices.
Vijay Kumar [Fri, 28 Oct 2016 20:59:57 +0000 (13:59 -0700)]
USB: core: let USB device know device node
Although most of USB devices are hot-plug's, there are still some devices
are hard wired on the board, eg, for HSIC and SSIC interface USB devices.
If these kinds of USB devices are multiple functions, and they can supply
other interfaces like i2c, gpios for other devices, we may need to
describe these at device tree.
In this commit, it uses "reg" in dts as physical port number to match
the phyiscal port number decided by USB core, if they are the same,
then the device node is for the device we are creating for USB core.
Signed-off-by: Peter Chen <peter.chen@freescale.com> Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 69bec725985324e79b1c47ea287815ac4ddb0521)
Conflicts:
include/linux/usb/of.h
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com>
Orabug: 24785721 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Allen Pais [Wed, 21 Jun 2017 09:16:50 +0000 (14:46 +0530)]
Improves clear_huge_page() using work queues
The idea is to exploit the parallelism available on large
multicore systems such as SPARC T7 systems to clear huge pages
in parallel with multiple worker threads.
Reviewed-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Kishore Pusukuri <kishore.kumar.pusukuri@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Tue, 4 Apr 2017 18:41:35 +0000 (11:41 -0700)]
bnxt: add dma mapping attributes
On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING
attribute in our dma mapping in order to get the expected performance
out of the receive path. Setting this boosts a simple iperf receive
session from 2 Gbe to 23.4 Gbe.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
(backported from commit 0495c3d367944e4af053983ff3cdf256b567b053) Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shamir Rabinovitch [Wed, 29 Mar 2017 10:21:59 +0000 (06:21 -0400)]
IB/IPoIB: ibX: failed to create mcg debug file
When udev renames the netdev devices, ipoib debugfs entries does not
get renamed. As a result, if subsequent probe of ipoib device reuse the
name then creating a debugfs entry for the new device would fail.
Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part
of ipoib event handling in order to avoid any race condition between these.
Fixes: 1732b0ef3b3a ([IPoIB] add path record information in debugfs) Cc: stable@vger.kernel.org # 2.6.15+ Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
commit 771a52584096c45e4565e8aabb596eece9d73d61)
Reviewed-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Tom Hromatka [Tue, 16 Aug 2016 16:46:56 +0000 (09:46 -0700)]
ftrace: remove unnecessary __maybe_unused from waitfd() parameters
__maybe_unused is not required for unused function parameters. In
fact, using __maybe_unused confuses the ftrace parser. Thus, this
change removes these superfluous descriptors.
Reviewed-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
(cherry picked from commit 2f772dbf28c6bd410ca0855fc3260331030b6d9a) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Hugh Dickins [Tue, 20 Jun 2017 09:10:44 +0000 (02:10 -0700)]
mm: fix new crash in unmapped_area_topdown()
Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
mmap testing. That's the VM_BUG_ON(gap_end < gap_start) at the
end of unmapped_area_topdown(). Linus points out how MAP_FIXED
(which does not have to respect our stack guard gap intentions)
could result in gap_end below gap_start there. Fix that, and
the similar case in its alternative, unmapped_area().
Cc: stable@vger.kernel.org Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas") Reported-by: Dave Jones <davej@codemonkey.org.uk> Debugged-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89)
Orabug: 26161422
CVE: CVE-2017-1000364 Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: John Haxby <john.haxby@oracle.com>
Hugh Dickins [Mon, 19 Jun 2017 11:03:24 +0000 (04:03 -0700)]
mm: larger stack guard gap, between vmas
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Wei Lin Guay [Mon, 15 May 2017 11:52:47 +0000 (13:52 +0200)]
net/rds: prioritize the base connection establishment
As of today, all the TOS connections can only be established after their
base connections are up. This is due to the fact that TOS connections rely
on their base connections to perform route resolution. Nevertheless, when
all the connections drop/reconnect(e.g., ADDR_CHANGE event), the TOS
connections establishment consume the CPU resources by constantly retrying
the connection establishment until their base connections are up.
Thus, this patch delays all the TOS connections if their associated base
connections are not up. By doing so, the priority is given to the base
connections establishment. Consequently, the base connections can be
established faster and subsequent their associated TOS connections.
Wei Lin Guay [Mon, 15 May 2017 11:42:56 +0000 (13:42 +0200)]
net/rds: determine active/passive connection with IP addresses
This patch changes RDS to use randomize backoff only in the first attempt
to reconnect. This means both ends try to be active by sending out REQ to
its peer in random t seconds. If the connection can't be established due to
a race, the peer IP addresses comparison is used to determine
active/passive connection establishment. (e.g IP_A > IP_B)
The following description illustrates the connection establishment,
t1randA: 192.168.1.A (active) --------------> 192.168.1.B (passive)
t1randB: 192.168.1.A (passive) <------------- 192.168.1.B (active)
t2 : 192.168.1.A (active) ---------------> REJ
t3 : 192.168.1.B (active) ---------------> REJ
t4 : Connection between A,B is not up.
t5 : 192.168.1.A (active) --------------> 192.168.1.B (passive)
RDS uses rds_wq for various operations. During ADDR_CHANGE event,
each connection queues at least two tasks into this single-threaded
workqueue. Furthermore, the TOS connections have dependency on its
base connection. Thus, a separate workqueue is created specifically
for the base connections to speed up base connection establishment.
Commit "RDS: add reconnect retry scheme for stalled connections" introduces
the rds_reconnect_timeout to retry the connection establishment after
sysctl_reconnect_retry_ms (default is 1000ms). Nevertheless, this proactive
mechanism is overkilled and it is causing long brownout time in the
virtualized environment. In short, below are the justifications why commit 5acb959ad59966b0b6905802ed720d26c560c3c5 is reverted.
a) The retry counter starts ticking after RDS received an ADDR_CHANGE
event. After receiving an ADDR_CHANGE event, RDS needs to perform shutdown
via shutdown_worker. Then, initiate a new connection via connect_worker.
Eventually, a CM REQ is only sent out after rds received
RDMA_CM_EVENT_ADDR_RESOLVED and RDMA_CM_EVENT_ROUTE_RESOLVED events. If the
retry is made to cater for stalled connection due to missing CM messages,
the retry should only happen after a CM REQ is sent. With the current
retry scheme (and with the default 1000 ms) that happens after ADDR_CHANGE
event, it introduces congestion in the single threaded workqueue.
b) Assuming that we modify the retry counter to start ticking after a CM
REQ message is sent out. By introducing another retry timeout, it
complicates the system tuning. Why? First, the sysctl_reconnect_retry_ms
relies on the underlying cma_response_timeout. Any modication of
cma_response_timeout requires to tune sysctl_reconnect_retry_ms. Second, it
is hard to find a universal timing that fits all configurations
(bare-metal, virtualized, mixed environment, and homo/heterogenous system).
Håkon Bugge [Mon, 19 Jun 2017 10:23:03 +0000 (12:23 +0200)]
IB/mlx4: Fix CM REQ retries in paravirt mode
CM REQs cannot be successfully retried, because a new pv_cm_id is
created for each request, without checking if one already exists.
This commit fixes this, by checking if an id exists before creating
one.
This bug can be provoked by running an RDMA CM user-land application,
but inserting a five seconds delay before the rdma_accept() call on
the passive side. This delay is larger than the default CMA timeout,
and triggers a retry from the active side. The retried REQ will use
another pv_cm_id (the cm_id on the wire). This confuses the CM
protocol and two REJs are sent from the passive side.
This commit is required to achieve the reduced HA Brownout time,
needed by Exadata. The Brownout issue is tracked by orabug 25521901.
Xiubo Li [Fri, 31 Mar 2017 02:35:25 +0000 (10:35 +0800)]
tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
For the bidirectional case, the Data-Out buffer blocks will always at
the head of the tcmu_cmd's bitmap, and before gathering the Data-In
buffer, first of all it should skip the Data-Out ones, or the device
supporting BIDI commands won't work.
Xiubo Li [Mon, 27 Mar 2017 09:07:41 +0000 (17:07 +0800)]
tcmu: Fix wrongly calculating of the base_command_size
The t_data_nents and t_bidi_data_nents are the numbers of the
segments, but it couldn't be sure the block size equals to size
of the segment.
For the worst case, all the blocks are discontiguous and there
will need the same number of iovecs, that's to say: blocks == iovs.
So here just set the number of iovs to block count needed by tcmu
cmd.
Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit abe342a5b4b5aa579f6bf40ba73447c699e6b579) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Xiubo Li [Mon, 27 Mar 2017 09:07:40 +0000 (17:07 +0800)]
tcmu: Fix possible overwrite of t_data_sg's last iov[]
If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.
To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.
So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.
Reviewed-by: Mike Christie <mchristi@redhat.com> Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit ab22d2604c86ceb01bb2725c9860b88a7dd383bb) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Mike Christie [Thu, 9 Mar 2017 08:42:09 +0000 (02:42 -0600)]
tcmu: make cmd timeout configurable
A single daemon could implement multiple types of devices
using multuple types of real devices that may not support
restarting from crashes and/or handling tcmu timeouts. This
makes the cmd timeout configurable, so handlers that do not
support it can turn if off for now.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit af980e46a26ac8805685bb70c8572dbc47abb126) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Mike Christie [Thu, 9 Mar 2017 08:42:08 +0000 (02:42 -0600)]
tcmu: add helper to check if dev was configured
This adds a helper to check if the dev was configured. It
will be used in the next patch to prevent updates to some
config settings after the device has been setup.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 972c7f167974fa41ea8a2eed4b857cc59f59c42c) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Conflicts:
drivers/target/target_core_user.c
Mike Christie [Thu, 2 Mar 2017 05:14:39 +0000 (23:14 -0600)]
tcmu: allow hw_max_sectors greater than 128
tcmu hard codes the hw_max_sectors to 128 which is a litle small.
Userspace uses the max_sectors to report the optimal IO size and
some initiators perform better with larger IOs (open-iscsi seems
to do better with 256 to 512 depending on the test).
(Fix do not display hw max sectors twice - MNC)
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 3abaa2bfdb1e6bb33d38a2e82cf3bb82ec0197bf) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Andy Grover [Tue, 22 Nov 2016 00:35:30 +0000 (16:35 -0800)]
target/user: Fix use-after-free of tcmu_cmds if they are expired
Don't free the cmd in tcmu_check_expired_cmd, it's still referenced by
an entry in our cmd_id->cmd idr. If userspace ever resumes processing,
tcmu_handle_completions() will use the now-invalid cmd pointer.
Instead, don't free cmd. It will be freed by tcmu_handle_completion() if
userspace ever recovers, or tcmu_free_device if not.
Cc: stable@vger.kernel.org Reported-by: Bryant G Ly <bgly@us.ibm.com> Tested-by: Bryant G Ly <bgly@us.ibm.com> Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Orabug: 25395066
(cherry picked from commit d0905ca757bc40bd1ebc261a448a521b064777d7) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Bart Van Assche [Fri, 18 Nov 2016 23:32:59 +0000 (15:32 -0800)]
target/user: Fix a data type in tcmu_queue_cmd()
This patch avoids that sparse reports the following error messages:
drivers/target/target_core_user.c:547:13: warning: incorrect type in assignment (different base types)
drivers/target/target_core_user.c:547:13: expected int [signed] ret
drivers/target/target_core_user.c:547:13: got restricted sense_reason_t
drivers/target/target_core_user.c:548:20: warning: restricted sense_reason_t degrades to integer
drivers/target/target_core_user.c:557:16: warning: incorrect type in return expression (different base types)
drivers/target/target_core_user.c:557:16: expected restricted sense_reason_t
drivers/target/target_core_user.c:557:16: got int [signed] ret
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Orabug: 25395066
(cherry picked from commit ecaf597b411e9a7b071bf7a36a4cf750c529cd28) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Andy Grover [Thu, 25 Aug 2016 15:55:53 +0000 (08:55 -0700)]
target/user: Return an error if cmd data size is too large
Userspace should be implementing VPD B0 (Block Limits) to inform the
initiator of max data size, but just in case we do get a too-large request,
do what the spec says and return INVALID_CDB_FIELD.
Make sure to unlock udev->cmdr_lock before returning.
Signed-off-by: Andy Grover <agrover@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 554617b2bbe25c3fb3c80c28fe7a465884bb40b1) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Nicholas Bellinger [Sun, 28 Feb 2016 02:25:22 +0000 (18:25 -0800)]
target/user: Fix size_t format-spec build warning
Fix the following printk size_t warning as per 0-day build:
All warnings (new ones prefixed by >>):
drivers/target/target_core_user.c: In function 'is_ring_space_avail':
>> drivers/target/target_core_user.c:385:12: warning: format '%lu'
>> expects argument of type 'long unsigned int', but argument 3 has type
>> 'size_t {aka unsigned int}' [-Wformat=]
pr_debug("no data space: only %lu available, but ask for %lu\n",
^
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 0241fd39ce7bc9b82b7e57305cb0d6bb1364d45b) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
The data_bitmap was introduced to support asynchornization accessing of
data area.
We divide mailbox data area into blocks, and use data_bitmap to track the
usage of data area. All the new command's data would start with a new block,
and may left unusable space after it end. But it's easy to track using
data_bitmap.
Now we can allocate data area for asynchronization accessing from userspace,
since we can track the allocation using data_bitmap. The userspace part would
be the same as Maxim's previous asynchronized implementation.
Signed-off-by: Sheng Yang <sheng@yasker.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 26418649eead52619d8dd6cbc6760a1b144dbcd2) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Arnd Bergmann [Mon, 1 Feb 2016 16:29:45 +0000 (17:29 +0100)]
target/user: Fix cast from pointer to phys_addr_t
The uio_mem structure has a member that is a phys_addr_t, but can
be a number of other types too. The target core driver attempts
to assign a pointer from vmalloc() to it, by casting it to
phys_addr_t, but that causes a warning when phys_addr_t is longer
than a pointer:
drivers/target/target_core_user.c: In function 'tcmu_configure_device':
drivers/target/target_core_user.c:906:22: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
This adds another cast to uintptr_t to shut up the warning.
A nicer fix might be to have additional fields in uio_mem
for the different purposes, so we can assign a pointer directly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 0633e123465b61a12a262b742bebf2a9945f7964) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Sheng Yang [Mon, 28 Dec 2015 19:57:39 +0000 (11:57 -0800)]
target/user: Allow user to set block size before enabling device
The capability of setting hw_block_size was added along with 9c1cd1b68
"target/user: Only support full command pass-through", though default
setting override the user specified value during the enabling of device,
which called by target_configure_device() to set block_size matching
hw_block_size, result in user not able to set different block size other
than default 512.
This patch would use existing hw_block_size value if already set, otherwise
it would be set to default value(512).
Update: Fix the coding style issue.
(Drop unnecessary re-export of dev->dev_attrib.block_size - nab)
Signed-off-by: Sheng Yang <sheng@yasker.org> Cc: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 81ee28de860095cc0c063b92eea53cb97771f796) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Andy Grover [Fri, 13 Nov 2015 18:42:20 +0000 (10:42 -0800)]
target/user: Do not set unused fields in tcmu_ops
TCMU sets TRANSPORT_FLAG_PASSTHROUGH, so INQUIRY commands will not be
emulated by LIO but passed up to userspace. Therefore TCMU should not
set these, just like pscsi doesn't.
Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 6ba4bd297d99ad522a6414001e6837ddaa8753fd) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Pointers that are mapped by kmap_atomic() + offset must
be unmapped without the offset. That would cause problems
if the SG element length exceeds the PAGE_SIZE limit.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit e2e21bd8f979a24462070cc89fae11e819cae90a) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
target/user: Add support for bidirectional commands
Enable TCMU to handle bidirectional SCSI commands. In such cases,
entries in iov[] cover both the Data-In and the Data-Out buffers. The
first iov_cnt entries correspond to the Data-Out buffer, while the
remaining iov_bidi_cnt entries correspond to the Data-In buffer.
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Vangelis Koukis <vkoukis@arrikto.com> Reviewed-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit e4648b014e03baee45d5f5146c1219b19e4e5f2f) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Introduce alloc_and_scatter_data_area()/gather_and_free_data_area()
functions that allocate/deallocate space from the data area and copy
data to/from a given scatter-gather list. These functions are needed so
the next patch, introducing support for bidirectional commands in TCMU,
can use the same code path both for t_data_sg and for t_bidi_data_sg.
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Vangelis Koukis <vkoukis@arrikto.com> Reviewed-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit f97ec7db1606875666366bfcba8476f8c917db96) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
driver/user: Don't warn for DMA_NONE data direction
Some SCSI commands (for example the TEST UNIT READY command) do not
carry data and so data_direction is DMA_NONE. Patch TCMU to not print a
warning message about unknown data direction, when it is DMA_NONE.
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Vangelis Koukis <vkoukis@arrikto.com> Reviewed-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 25395066
(cherry picked from commit 2bc396a2529ae8a2287f17a49d893ce790e19110) Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com> Reviewed-by: Shan Hai <shan.hai@oracle.com>
Because ol6 doesn't support secure boot and enabling this item
also causes mouse malfuntion while running ol6 as VMware guest.
So disable it in ol6 configuration
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
This patch is meant to allow for RX network flow classification to insert
and remove ethertype filter by ethtool
Example:
Add an ethertype filter:
$ ethtool -N eth0 flow-type ether proto 0x88F8 action 2
Show all filters:
$ ethtool -n eth0
4 RX rings available
Total 1 rules
Filter: 15
Flow Type: Raw Ethernet
Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
Ethertype: 0x88F8 mask: 0x0
Action: Direct to queue 2
Delete the filter by location:
$ ethtool -N delete 15
Signed-off-by: Ruhao Gao <ruhao.gao@ni.com> Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 64c75d41ace516b7e4f0f187f91282aa43a51b38)
igb: add support of RX network flow classification
This patch is meant to allow for RX network flow classification to insert
and remove Rx filter by ethtool. Ethtool interface has it's own rules
manager
Show all filters:
$ ethtool -n eth0
4 RX rings available
Total 2 rules
Signed-off-by: Ruhao Gao <ruhao.gao@ni.com> Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0e71def252815d732f86d11d87d63f7186d9d3be)
igb: fix adjusting PTP timestamps for Tx/Rx latency
Fix PHY delay compensation math in igb_ptp_tx_hwtstamp() and
igb_ptp_rx_rgtstamp. Add PHY delay compensation in
igb_ptp_rx_pktstamp().
In the IGB driver, there are two functions that retrieve timestamps
received by the PHY - igb_ptp_rx_rgtstamp() and igb_ptp_rx_pktstamp().
The previous commit only changed igb_ptp_rx_rgtstamp(), and the change
was incorrect.
There are two instances in which PHY delay compensations should be
made:
- Before the packet transmission over the PHY, the latency between
when the packet is timestamped and transmission of the packets,
should be an add operation, but it is currently a subtract.
- After the packets are received from the PHY, the latency between
the receiving and timestamping of the packets should be a subtract
operation, but it is currently an add.
Signed-off-by: Kshitiz Gupta <kshitiz.gupta@ni.com> Fixes: 3f544d2 (igb: adjust ptp timestamps for tx/rx latency) Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0066c8b6f4050d7c57f6379d6fd4535e2f267f17)
Andrew Lunn [Fri, 3 Jun 2016 21:03:25 +0000 (23:03 +0200)]
igb: Only DMA sync frame length
On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.
For an IMX6Q, this gives around 6% increased TCP receive performance,
which is cache operations bound and reduces CPU load for TCP transmit.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 64f2525ca4e76b1704b867458808ed6ffc58b803)
Jacob Keller [Tue, 24 May 2016 20:56:31 +0000 (13:56 -0700)]
igb: call igb_ptp_suspend during suspend/resume cycle
Properly stop the extra workqueue items and ensure that we resume
cleanly. This is better than using igb_ptp_init and igb_ptp_stop since
these functions destroy the PHC device, which will cause other problems
if we do so. Since igb_ptp_reset now re-schedules the work-queue item we
don't need an equivalent igb_ptp_resume in the resume workflow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8646f7b4cdf2d0557e718c4524a3e31455b92ad7)
Jacob Keller [Tue, 24 May 2016 20:56:30 +0000 (13:56 -0700)]
igb: implement igb_ptp_suspend
Make igb_ptp_stop take advantage of this new function to reduce code
duplication.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit e3f2350de829eb0c3349f416feed81c0a3ac0732)
Jacob Keller [Tue, 24 May 2016 20:56:29 +0000 (13:56 -0700)]
igb: re-use igb_ptp_reset in igb_ptp_init
Modify igb_ptp_init to take advantage of igb_ptp_reset, and remove
duplicated work that was occurring in both igb_ptp_reset and
igb_ptp_init.
In total, resetting the TSAUXC register, and resetting the system time
both happen in igb_ptp_reset already. igb_ptp_reset now also takes care
of starting the delayed work item for overflow checks, as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 4f3ce71bb8d9c7591f25d3d8315249a250d208f0)
Jacob Keller [Tue, 24 May 2016 20:56:28 +0000 (13:56 -0700)]
igb: introduce IGB_PTP_OVERFLOW_CHECK flag
Don't continue to use complex MAC type checks for handling various cases
where we have overflow check code. Make this code more obvious by
introducing a flag which is enabled for hardware that needs these
checks.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 63737166a056fdcc83be46175ad4f63538279d7c)
Jacob Keller [Tue, 24 May 2016 20:56:27 +0000 (13:56 -0700)]
igb: introduce ptp_flags variable and use it to replace IGB_FLAG_PTP
Upcoming patches will introduce new PTP specific flags. To avoid
cluttering the normal flags variable, introduce PTP specific "ptp_flags"
variable for this purpose, and move IGB_FLAG_PTP to become
IGB_PTP_ENABLED.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 462f11888207934bbecd0f154f6cf0bedac5ecd0)
Jacob Keller [Wed, 13 Apr 2016 23:08:31 +0000 (16:08 -0700)]
igbvf: use BIT() macro instead of shifts
To prevent signed bitshift issues, and improve code readability, use the
BIT() macro. Also make use of GENMASK or the unsigned postfix where this
is more appropriate than BIT()
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0ed2dbf4f469e2e286d903ebc091edfd9be4d063)
Jacob Keller [Wed, 13 Apr 2016 23:08:30 +0000 (16:08 -0700)]
igbvf: remove unused variable and dead code
The variable rdlen is set but never used, and thus setting it is dead
code. Remove it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 12b28b41084aa61970fecb417c66c88dcce6afed)
Nathan Sullivan [Tue, 3 May 2016 23:10:56 +0000 (18:10 -0500)]
igb: adjust PTP timestamps for Tx/Rx latency
Table 7-62 on page 338 of the i210 datasheet lists TX and RX latencies
for the various speeds the chip supports. To give better PTP timestamp
accuracy, adjust the timestamps by the amounts Intel gives based on
current link speed.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3f544d2a4d5c2d817cfee9e6302fc2909aaef155)
Jacob Keller [Wed, 13 Apr 2016 23:08:29 +0000 (16:08 -0700)]
igb: make igb_update_pf_vlvf static
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8008f68cb805c0099723ba1b58d26257a52ce890)
Jacob Keller [Wed, 13 Apr 2016 23:08:28 +0000 (16:08 -0700)]
igb: use BIT() macro or unsigned prefix
For bitshifts, we should make use of the BIT macro when possible, and
ensure that other bitshifts are marked as unsigned. This helps prevent
signed bitshift errors, and ensures similar style.
Make use of GENMASK and the unsigned postfix where BIT() isn't
appropriate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a51d8c217b15b97fede844dd6860f7b3c6ffcfef)
This reverts commit 3eb14ea8d958 ("igb: Fix a deadlock in
igb_sriov_reinit")
It is the same as commit f468adc944ef ("igb: missing rtnl_unlock in
igb_sriov_reinit()")
There is no rtnl_lock() in igb_resume before, rtnl_unlock will cause a
deadlock.
Signed-off-by: Arika Chen <arika.chen@huawei.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d99e366fc90c9b6e6197584ecd3a185441452b0c)
Doron Shikmoni [Wed, 17 Feb 2016 07:34:25 +0000 (09:34 +0200)]
igb: Garbled output for "ethtool -m"
Garbled output for "ethtool -m ethX", in igb-driven NICs with module /
plugin EEPROM (i.e. SFP information). Each output data byte appears
duplicated.
In igb_ethtool.c, igb_get_module_eeprom() is reading the EEPROM via i2c;
the eeprom offset for each word that's read via igb_read_phy_reg_i2c()
was passed in #words, whereas it needs to be a byte offset.
This patches fixes the bug.
Signed-off-by: Doron Shikmoni <doron.shikmoni@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit efea95d45e6ab4a30df9801f8e9bf68007ee9b43)
John Holland [Thu, 18 Feb 2016 11:10:52 +0000 (12:10 +0100)]
igb: allow setting MAC address on i211 using a device tree blob
The Intel i211 LOM PCIe Ethernet controllers' iNVM operates as an OTP
and has no external EEPROM interface [1]. The following allows the
driver to pickup the MAC address from a device tree blob when CONFIG_OF
has been enabled.
Signed-off-by: John Holland <jotihojr@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 806ffb1d504927d1449397377eac63bb63489266)
Alexander Duyck [Fri, 18 Mar 2016 23:06:53 +0000 (16:06 -0700)]
igb: Fix sparse warning about passing __beXX into leXX_to_cpup
We were casting the addr as __beXX and then passing it into le32_to_cpu
because the device expects the MAC address to be in network order even
though the register set is little endian. Instead of casting it as __beXX
we can just cast it as __leXX in order to maintain consistency since the
region of memory is already in little endian order as far as we are
concerned.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 415cd2a645b2573f173cc52419049f9caacf9a47)
Stefan Assmann [Wed, 3 Feb 2016 08:20:50 +0000 (09:20 +0100)]
igb: call ndo_stop() instead of dev_close() when running offline selftest
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 46eafa59e18d034ba616fdcca688c388d0bbfd91)
Corinna Vinschen [Thu, 28 Jan 2016 12:53:23 +0000 (13:53 +0100)]
igb: Fix VLAN tag stripping on Intel i350
Problem: When switching off VLAN offloading on an i350, the VLAN
interface gets unusable. For testing, set up a VLAN on an i350
and some remote machine, e.g.:
$ ip link add link eth0 name eth0.42 type vlan id 42
$ ip addr add 192.168.42.1/24 dev eth0.42
$ ip link set dev eth0.42 up
Offloading is switched on by default:
$ ethtool -k eth0 | grep vlan-offload
rx-vlan-offload: on
tx-vlan-offload: on
$ ping -c 3 -I eth0.42 192.168.42.2
[...works as usual...]
Now switch off VLAN offloading and try again:
$ ethtool -K eth0 rxvlan off
Actual changes:
rx-vlan-offload: off
tx-vlan-offload: off [requested on]
$ ping -c 3 -I eth0.42 192.168.42.2
PING 192.168.42.2 (192.168.42.2) from 192.168.42.1 eth0.42: 56(84) bytes of da
ta.
I can only reproduce it on an i350, the above works fine on a 82580.
While inspecting the igb source, I came across the code in igb_set_vmolr
which sets the E1000_VMOLR_STRVLAN/E1000_DVMOLR_STRVLAN flags once and
for all, and in all of the igb code there's no other place where the
STRVLAN is set or cleared. Thus, VLAN stripping is enabled in igb
unconditionally, independently of the offloading setting.
I compared that to the latest Intel igb-5.3.3.5 driver from
http://sourceforge.net/projects/e1000/ which in fact sets and clears the
STRVLAN flag independently from igb_set_vmolr in its own function
igb_set_vf_vlan_strip, depending on the vlan settings.
So I included the STRVLAN handling from the igb-5.3.3.5 driver into our
current igb driver and tested the above scenario again. This time ping
still works after switching off VLAN offloading.
Tested on i350, with and without addtional VFs, as well as on 82580
successfully.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 030f9f52642a20cbd8c1334a237e92e3ef55e2b1)
At that time there were concerns that removing this statement may cause other
side effects. However the submitter addressed those concerns. But the dialogue
went cold. We have a new case where a customers application is registering and
un-registering multicast addresses every few seconds. This is leading to many
"Link is Up" messages in the logs as a result of the
"netif_carrier_off(netdev)" statement called by igbvf_msix_other(). Also on
some kernels it is interfering with the bonding driver causing it to failover
and subsequently affecting connectivity.
The Sourgeforge driver does not make this call and is therefore not affected.
If there were any side effects I would expect that driver to also be affected.
I have tested re-loading the igbvf driver and downing the adapter with the PF
entity on the host where the VM has this patch. When I bring it back up again
connectivity is restored as expected. Therefore I request that this patch gets
submitted.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit cc54a59ae6e528c70666033ed085d059f555a57d)
Alexander Duyck [Wed, 13 Jan 2016 15:31:30 +0000 (07:31 -0800)]
igbvf: Add support for generic Tx checksums
This patch adds support for generic Tx checksums to the igbvf driver. It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.
In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
MACLEN (maximum of 127), retrieved from:
skb_network_offset()
IPLEN (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset
The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.
I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers. In
the case of the VF drivers this meant adding support for SCTP CRCs, and
inner checksum offloads for MPLS and various tunnel types.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ea6ce6024f9397ff2667fe16447447e622bc4c31)
Alexander Duyck [Wed, 13 Jan 2016 15:31:23 +0000 (07:31 -0800)]
igb: Add support for generic Tx checksums
This patch adds support for generic Tx checksums to the igb driver. It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.
In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
MACLEN (maximum of 127), retrieved from:
skb_network_offset()
IPLEN (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset
The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.
I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6e033700887bf29d4e59f6978a02d989787be620)
Roland Hii [Mon, 11 Jan 2016 07:34:18 +0000 (15:34 +0800)]
igb: add conditions for I210 to generate periodic clock output
In general case the maximum supported half cycle time of the synchronized
output clock is 70msec. Slower half cycle time than 70msec can be
programmed also as long as the output clock is synchronized to whole
seconds, useful specifically for generating a 1Hz clock.
Permitted values for the clock half cycle time are: 125,000,000 decimal,
250,000,000 decimal and 500,000,000 decimal (equals to 125msec, 250msec
and 500msec respectively).
Before this patch, only the half cycle time of less than or equal to 70msec
uses the I210 clock output function. This patch adds additional conditions
when half cycle time is equal to 125msec or 250msec or 500msec to use
clock output function.
Under other conditions, interrupt driven target time output events method
is still used to generate the desired clock output.
Signed-off-by: Roland Hii <roland.king.guan.hii@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 569f3b3d4e9898ae30788fde128e3277d996710e)
Julia Lawall [Sun, 3 Jan 2016 06:44:56 +0000 (07:44 +0100)]
igb: constify e1000_phy_operations structure
This e1000_phy_operations structure is never modified, so declare it as
const. Other structures of this type are already const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5b70e4a12a525b5f3d4a3e3f0567ed877195b187)
Takuma Ueba [Thu, 31 Dec 2015 05:58:14 +0000 (14:58 +0900)]
igb: When GbE link up, wait for Remote receiver status condition
I210 device IPv6 autoconf test sometimes fails,
because DAD NS for link-local is not transmitted.
This packet is silently dropped.
This problem is seen only GbE environment.
igb_watchdog_task link up detection continues to the following process.
The following cases are observed:
1.PHY 1000BASE-T Status Register Remote receiver status bit is NG.
(NG status becomes OK after about 200 - 700ms)
2.In this case, the transfer packet is silently dropped.
1000BASE-T Status register
[Expected]: 0x3800 or 0x7800
[problem occurred]: 0x2800 or 0x6800
Frequency of occurrence: approx 1/10 - 1/40 observed
In order to avoid this problem,
wait until 1000BASE-T Status register "Remote receiver status OK"
After applying this patch, at least 400 runs succeed with no problems.
Signed-off-by: Takuma Ueba <t.ueba11@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b72f3f72005dfd649d787535bd04ada3b3f1b3ba)
Alexander Duyck [Thu, 7 Jan 2016 07:11:43 +0000 (23:11 -0800)]
igb: Add workaround for VLAN tag stripping on 82576
There was a workaround partially implemented for the 82576 that is needed
in order for VLAN tag stripping to function correctly. The original code
had side effects that would make it so the workaround was active on all
MACs. I have updated the code so that the workaround is enabled, but
limited to the 82576, or activated if we exceed the available unicast
addresses.
The workaround has a side effect of mirroring all of the traffic outgoing
from the VFs back to the PF. As such it is not recommended to use the
82576 in promiscuous mode as it will take a performance hit, though this is
now consistent with the performance as seen on the out-of-tree igb driver.
I also limited the scope of the UTA bits all being set to only when the
VMOLR register is enabled. This should limit the effects of the UTA
register so that we don't pick up any excess traffic unless promiscuous
mode has been enabled on the PF, whereas before the PF would have ended up
in something equivalent to unicast promiscuous mode with VLAN filtering
otherwise.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bf456abb9b82d5376e7189cca00b528dd86d1559)
Alexander Duyck [Thu, 7 Jan 2016 07:11:34 +0000 (23:11 -0800)]
igb: Enable use of "bridge fdb add" to set unicast table entries
This change makes it so that we can use the bridge utility to add a FDB
entry for the PF to an igb port. By doing this we can enable the VFs to
talk to virtual ports residing on top of the PF.
In addition this should also address issues with MACVLANs trying to reside
on top of the PF as well as they would have had similar issues when added
to the PF with SR-IOV enabled.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 268f9d33a9319bb2d4d999e264aef9c00081bba0)
Alexander Duyck [Thu, 7 Jan 2016 07:11:26 +0000 (23:11 -0800)]
igb: Drop unnecessary checks in transmit path
This patch drops several checks that we dropped from ixgbe some ago. It
should not be possible for us to be called with either of the conditional
statements returning true so we can just drop them from the hot-path.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9c2f186e45faa34d5f6ff52aa84c361d4be72288)
Alexander Duyck [Thu, 7 Jan 2016 07:11:18 +0000 (23:11 -0800)]
igb: Add support for VLAN promiscuous with SR-IOV and NTUPLE
This change fixes things so that we can fully support SR-IOV or the
recently added NTUPLE filtering while allowing support for VLAN promiscuous
mode. By making this change we are able to support possible scenarios such
as SR-IOV with the PF connected to a Linux bridge hosting other VMs.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 16903caa339961b9f8a68b64f4f313789de48599)
Alexander Duyck [Thu, 7 Jan 2016 07:11:11 +0000 (23:11 -0800)]
igb: Clean-up configuration of VF port VLANs
This patch is meant to clean-up the configuration of the VF port based VLAN
configuration. The original logic was a bit muddled and had some
undesirable side effects such as VLANs being either completely stripped
from the port or VLANs being left when they shouldn't be. The idea behind
this code is to avoid any events such as spurious spoof notifications when
we are removing one VLAN tag and replacing it with another.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a15d92598a5a741037b873fd4a43595a63048bbd)
Alexander Duyck [Thu, 7 Jan 2016 07:11:04 +0000 (23:11 -0800)]
igb: Merge VLVF configuration into igb_vfta_set
This change makes it so that we can merge the configuration of the VLVF
registers into the setting of the VFTA register. By doing this we simplify
the logic and make use of similar functionality that we have already added
for ixgbe making it easier to maintain both drivers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8b77c6b20f32511175dfd00322ae82fb31949d55)
Alexander Duyck [Thu, 7 Jan 2016 07:10:54 +0000 (23:10 -0800)]
igb: Always enable VLAN 0 even if 8021q is not loaded
This patch makes it so that we always add VLAN 0. This is important as we
need to guarantee the PF can receive untagged frames in the case of SR-IOV
being enabled but VLAN filtering not being enabled in the kernel.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5982a5565a08abf3b9ff18941b3e3cc94f7c8286)
Alexander Duyck [Thu, 7 Jan 2016 07:10:47 +0000 (23:10 -0800)]
igb: Do not factor VLANs into RLPML calculation
The RLPML registers already take the size of VLAN headers into account when
determining the maximum packet length. This is called out in EAS documents
for several parts including the 82576 and the i350. As such we can drop
the addition of size to the value programmed into the RLPML registers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d3836f8e2517fb04328c673989fd780030926694)
Alexander Duyck [Thu, 7 Jan 2016 07:10:39 +0000 (23:10 -0800)]
igb: Allow asymmetric configuration of MTU versus Rx frame size
Since the igb driver is using page based receive there is no point in
limiting the Rx capabilities of the device. The driver can receive 9K
jumbo frames at all times. The only changes needed due to MTU changes are
updates for the FIFO sizes and flow-control watermarks.
Update the maximum frame size to reflect the 9.5K limitation of the
hardware, and replace all instances of max_frame_size with
MAX_JUMBO_FRAME_SIZE when referring to an Rx FIFO or frame.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 45693bcb00cbd379c373ab22ccd9a9d4755cc7ed)
Alexander Duyck [Thu, 7 Jan 2016 07:10:30 +0000 (23:10 -0800)]
igb: Refactor VFTA configuration
This patch starts the clean-up process on the VFTA configuration.
Specifically in this patch I attempt to address and simplify several items
while also updating the code to bring it more inline with what is already
in ixgbe.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 832e821c51e381966464c8a0f30f12eb1514eba0)