Allen Pais [Wed, 14 Jun 2017 07:37:36 +0000 (13:07 +0530)]
arch/sparc: Enable queued spinlock support for SPARC
This patch makes the necessary changes in SPARC architecture to enable
queued spinlock support. Here are some of the earlier discussions about
this feature.
https://lwn.net/Articles/561775/
https://lwn.net/Articles/590243/
Cleaned-up the spinlock_64.h. The definitions of arch_spin_xxx are
replaced by the function in <asm-generic/qspinlock.h>
Signed-off-by: Babu Moger <babu.moger@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 145d978585977438ebb55079487827006c604e39)
Conflicts:
arch/sparc/include/asm/spinlock_64.h
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Babu Moger [Tue, 30 May 2017 20:59:02 +0000 (13:59 -0700)]
arch/sparc: Introduce xchg16 for SPARC
SPARC supports 32 bit and 64 bit xchg right now. Add the support
for 16 bit (2 byte) xchg. This is required to support queued spinlock
feature which uses 2 byte xchg. This is achieved using 4 byte cas
instructions with byte manipulations.
Also re-arranged the code to call __cmpxchg_u32 inside xchg16.
Signed-off-by: Babu Moger <babu.moger@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 79d39e2bab60d18a68a5abc00be4506864397efc)
Conflicts:
arch/sparc/include/asm/cmpxchg_64.h
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Allen Pais [Wed, 14 Jun 2017 07:30:39 +0000 (13:00 +0530)]
arch/sparc: Enable queued rwlocks for SPARC
Enable queued rwlocks for SPARC. Here are the discussions on this feature
when this was introduced.
https://lwn.net/Articles/572765/
https://lwn.net/Articles/582200/
Cleaned-up the arch_read_xxx and arch_write_xxx definitions in spinlock_64.h.
These routines are replaced by the functions in include/asm-generic/qrwlock.h
Signed-off-by: Babu Moger <babu.moger@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a37594f198363fd9321ece54440336fd4b2a9c8e)
Babu Moger [Wed, 24 May 2017 23:55:12 +0000 (17:55 -0600)]
arch/sparc: Introduce cmpxchg_u8 SPARC
SPARC supports 32 bit and 64 bit cmpxchg right now. Add support
for 8 bit (1 byte) cmpxchg. This is required to support queued
rwlocks feature which uses 1 byte cmpxchg.
The function __cmpxchg_u8 here uses the 4 byte cas instruction with a
byte manipulation to achieve 1 byte cmpxchg.
Signed-off-by: Babu Moger <babu.moger@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a12ee2349312d7112b9b7c6ac2e70c5ec2ca334e)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Found this problem while enabling queued rwlock on SPARC.
The parameter CONFIG_CPU_BIG_ENDIAN is used to clear the
specific byte in qrwlock structure. Without this parameter,
we clear the wrong byte. Here is the code.
Signed-off-by: Babu Moger <babu.moger@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 97d9f969161d79e6a4bba247e67ce731ff861f79)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Babu Moger [Wed, 24 May 2017 23:55:10 +0000 (17:55 -0600)]
kernel/locking: Fix compile error with qrwlock.c
Saw these compile errors on SPARC when queued rwlock feature is enabled.
CC kernel/locking/qrwlock.o
kernel/locking/qrwlock.c: In function 'queued_read_lock_slowpath':
kernel/locking/qrwlock.c:89: error: implicit declaration of function 'arch_spin_lock'
kernel/locking/qrwlock.c:102: error: implicit declaration of function 'arch_spin_unlock'
make[4]: *** [kernel/locking/qrwlock.o] Error 1
Include spinlock.h in qrwlock.c to fix it.
Signed-off-by: Babu Moger <babu.moger@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9ab6055f959032258c0f83a070cd0d26ed7a8fc5)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Babu Moger [Wed, 24 May 2017 23:55:09 +0000 (17:55 -0600)]
arch/sparc: Remove the check #ifndef __LINUX_SPINLOCK_TYPES_H
Saw these compile errors on SPARC when queued rwlock feature is enabled.
CC kernel/locking/qrwlock.o
In file included from ./include/asm-generic/qrwlock_types.h:5,
from ./arch/sparc/include/asm/qrwlock.h:4,
from kernel/locking/qrwlock.c:24:
./arch/sparc/include/asm/spinlock_types.h:5:3: error:
#error "please don't include this file directly"
SPARC has this guard which causes compile error when spinlock_types.h
is included directly.
@ifndef __LINUX_SPINLOCK_TYPES_H
@ error "please don't include this file directly"
@endif
Remove this un-necessary "ifndef __LINUX_SPINLOCK_TYPES_H" stanza from SPARC.
Signed-off-by: Babu Moger <babu.moger@oracle.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8b93b4a9e1be78930eb9d640f75818993f70e065)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
pan xinhui [Mon, 18 Jul 2016 09:47:39 +0000 (17:47 +0800)]
locking/qrwlock: Fix write unlock bug on big endian systems
This patch aims to get rid of endianness in queued_write_unlock(). We
want to set __qrwlock->wmode to NULL, however the address is not
&lock->cnts in big endian machine. That causes queued_write_unlock()
write NULL to the wrong field of __qrwlock.
So implement __qrwlock_write_byte() which returns the correct
__qrwlock->wmode address.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman.Long@hpe.com Cc: arnd@arndb.de Cc: boqun.feng@gmail.com Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1468835259-4486-1-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2db34e8bf9a22f4e38b29deccee57457bc0e7d74)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Tue, 10 Nov 2015 00:09:23 +0000 (19:09 -0500)]
locking/qspinlock: Avoid redundant read of next pointer
With optimistic prefetch of the next node cacheline, the next pointer
may have been properly inititalized. As a result, the reading
of node->next in the contended path may be redundant. This patch
eliminates the redundant read if the next pointer value is not NULL.
Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447114167-47185-4-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit aa68744f80bfb6f26fbe7f10e42876066f7dac1b)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Tue, 10 Nov 2015 00:09:22 +0000 (19:09 -0500)]
locking/qspinlock: Prefetch the next node cacheline
A queue head CPU, after acquiring the lock, will have to notify
the next CPU in the wait queue that it has became the new queue
head. This involves loading a new cacheline from the MCS node of the
next CPU. That operation can be expensive and add to the latency of
locking operation.
This patch addes code to optmistically prefetch the next MCS node
cacheline if the next pointer is defined and it has been spinning
for the MCS lock for a while. This reduces the locking latency and
improves the system throughput.
The performance change will depend on whether the prefetch overhead
can be hidden within the latency of the lock spin loop. On really
short critical section, there may not be performance gain at all. With
longer critical section, however, it was found to have a performance
boost of 5-10% over a range of different queue depths with a spinlock
loop microbenchmark.
Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447114167-47185-3-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 81b5598665a24083dd889fbd8cb08b0d8de4b8ad)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Thu, 9 Jul 2015 16:32:22 +0000 (12:32 -0400)]
locking/qrwlock: Reduce reader/writer to reader lock transfer latency
Currently, a reader will check first to make sure that the writer mode
byte is cleared before incrementing the reader count. That waiting is
not really necessary. It increases the latency in the reader/writer
to reader transition and reduces readers performance.
This patch eliminates that waiting. It also has the side effect
of reducing the chance of writer lock stealing and improving the
fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
locking loop of mostly readers (RW ratio = 10,000:1) has the following
performance numbers in a Haswell-EX box:
Waiman Long [Fri, 19 Jun 2015 15:50:01 +0000 (11:50 -0400)]
locking/qrwlock: Better optimization for interrupt context readers
The qrwlock is fair in the process context, but becoming unfair when
in the interrupt context to support use cases like the tasklist_lock.
The current code isn't that well-documented on what happens when
in the interrupt context. The rspin_until_writer_unlock() will only
spin if the writer has gotten the lock. If the writer is still in the
waiting state, the increment in the reader count will cause the writer
to remain in the waiting state and the new interrupt context reader
will get the lock and return immediately. The current code, however,
does an additional read of the lock value which is not necessary as
the information has already been there in the fast path. This may
sometime cause an additional cacheline transfer when the lock is
highly contended.
This patch passes the lock value information gotten in the fast path
to the slow path to eliminate the additional read. It also documents
the action for the interrupt context readers more clearly.
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434729002-57724-3-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0e06e5be70d392aa842c1455ec2d0baf62aeed48)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Tue, 9 Jun 2015 15:19:13 +0000 (11:19 -0400)]
locking/qrwlock: Don't contend with readers when setting _QW_WAITING
The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.
A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:
With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433863153-30722-3-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 405963b6a57c60040bc1dad2597f7f4b897954d1)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Babu Moger [Wed, 31 May 2017 19:56:22 +0000 (12:56 -0700)]
locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS
To be consistent with the queued spinlocks which use
CONFIG_QUEUED_SPINLOCKS config parameter, the one for the queued
rwlocks is now renamed to CONFIG_QUEUED_RWLOCKS.
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431367031-36697-1-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit c7114b4e6c53111d415485875725b60213ffc675)
Conflicts:
arch/x86/Kconfig
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Fri, 24 Apr 2015 18:56:35 +0000 (14:56 -0400)]
locking/qspinlock: Use a simple write to grab the lock
Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
tail code in the lock is set.
With that change, the are some slight improvement in the performance
of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.
[Standalone/Embedded - same node]
# of tasks Before patch After patch %Change
---------- ----------- ---------- -------
3 2324/2321 2248/2265 -3%/-2%
4 2890/2896 2819/2831 -2%/-2%
5 3611/3595 3522/3512 -2%/-2%
6 4281/4276 4173/4160 -3%/-3%
7 5018/5001 4875/4861 -3%/-3%
8 5759/5750 5563/5568 -3%/-3%
[Standalone/Embedded - different nodes]
# of tasks Before patch After patch %Change
---------- ----------- ---------- -------
3 12242/12237 12087/12093 -1%/-1%
4 10688/10696 10507/10521 -2%/-2%
It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queued spinlock.
The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:
AIM7 XFS Disk Test
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
ticketlock 5678233 3.17 96.61 5.81
qspinlock 5750799 3.13 94.83 5.97
AIM7 EXT4 Disk Test
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
ticketlock 1114551 16.15 509.72 7.11
qspinlock 2184466 8.24 232.99 6.01
The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.
The "ebizzy -m" test was also run with the following results:
kernel records/s Real Time Sys Time Usr Time
----- --------- --------- -------- --------
ticketlock 2075 10.00 216.35 3.49
qspinlock 3023 10.00 198.20 4.80
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2c83e8e9492dc823be1d96d4c5ef75d16d3866a0)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.
By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.
This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.
This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.
All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 69f9cae90907e09af95fb991ed384670cef8dd32)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Fri, 24 Apr 2015 18:56:33 +0000 (14:56 -0400)]
locking/qspinlock: Extract out code snippets for the next patch
This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.
1) the logic for the exchange of new and previous tail code words
into a new xchg_tail() function.
2) the logic for clearing the pending bit and setting the locked bit
into a new clear_pending_set_locked() function.
This patch also simplifies the trylock operation before queuing by
calling queued_spin_trylock() directly.
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 6403bd7d0ea1878a487296114eccf78658d7dd7a)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.
It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.
In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit c1fb159db9f2e50e0f4025bed92a67a6a7bfa7b7)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Fri, 24 Apr 2015 18:56:30 +0000 (14:56 -0400)]
locking/qspinlock: Introduce a simple generic 4-byte queued spinlock
This patch introduces a new generic queued spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queued spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much faster in high contention situations especially when
the spinlock is embedded within the data structure to be protected.
Only in light to moderate contention where the average queue depth
is around 1-3 will this queued spinlock be potentially a bit slower
due to the higher slowpath overhead.
This queued spinlock is especially suit to NUMA machines with a large
number of cores as the chance of spinlock contention is much higher
in those machines. The cost of contention is also higher because of
slower inter-node memory traffic.
Due to the fact that spinlocks are acquired with preemption disabled,
the process will not be migrated to another CPU while it is trying
to get a spinlock. Ignoring interrupt handling, a CPU can only be
contending in one spinlock at any one time. Counting soft IRQ, hard
IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
activities. By allocating a set of per-cpu queue nodes and used them
to form a waiting queue, we can encode the queue node address into a
much smaller 24-bit size (including CPU number and queue node index)
leaving one byte for the lock.
Please note that the queue node is only needed when waiting for the
lock. Once the lock is acquired, the queue node can be released to
be used later.
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a33fda35e3a7655fb7df756ed67822afb5ed5e8d)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Waiman Long [Thu, 30 Apr 2015 21:12:16 +0000 (17:12 -0400)]
locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write()
In up_write()/up_read(), rwsem_wake() will be called whenever it
detects that some writers/readers are waiting. The rwsem_wake()
function will take the wait_lock and call __rwsem_do_wake() to do the
real wakeup. For a heavily contended rwsem, doing a spin_lock() on
wait_lock will cause further contention on the heavily contended rwsem
cacheline resulting in delay in the completion of the up_read/up_write
operations.
This patch makes the wait_lock taking and the call to __rwsem_do_wake()
optional if at least one spinning writer is present. The spinning
writer will be able to take the rwsem and call rwsem_wake() later
when it calls up_write(). With the presence of a spinning writer,
rwsem_wake() will now try to acquire the lock using trylock. If that
fails, it will just quit.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 59aabfc7e959f5f213e4e5cc7567ab4934da2adf)
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Jane Chu [Tue, 6 Jun 2017 22:25:01 +0000 (16:25 -0600)]
arch/sparc: revised support for 4096cpus
In the process of upstreaming patch bbd4b32b05cc529e74b1dd5ee3edc396fa7dd129
that went into uek4 for the NR_CPUS=4096 support, I received and incorporated
a comment to split up the allocation for the mondo block and mondo cpulist.
This patch is to update uek4 for consistency.
Emil Tantilov [Wed, 17 May 2017 22:17:51 +0000 (15:17 -0700)]
ixgbe: always call setup_mac_link for multispeed fiber
Remove the logic which would previously skip the link configuration
in the case where we are already at the requested speed in
ixgbe_setup_mac_link_multispeed_fiber().
By exiting early we are skipping the link configuration and as such
the driver may not always configure the PHY correctly for SFP+.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 08ed48e182ef870517a84d2331c4c5da8f1c3b3a) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Emil Tantilov [Wed, 17 May 2017 22:17:46 +0000 (15:17 -0700)]
ixgbe: add write flush when configuring CS4223/7
Make sure the writes are processed immediately. Without the flush it
is possible for operations on one port to spill over the other as the
resource is shared.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 410a494902777c11f95031d9ed757d7f8f09c5c6) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Fri, 12 May 2017 18:38:10 +0000 (11:38 -0700)]
ixgbevf: Resolve warnings for -Wimplicit-fallthrough
Additions to gcc 7 now warn whenever a switch statement falls through
implicitly. This patch adds explicit fall through comments to address the
following warnings:
drivers/net/ethernet/intel/ixgbevf/vf.c: In function ‘ixgbevf_get_reta_locked’:
drivers/net/ethernet/intel/ixgbevf/vf.c:336:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (hw->mac.type < ixgbe_mac_X550_vf)
^
drivers/net/ethernet/intel/ixgbevf/vf.c:338:2: note: here
default:
^~~~~~~
drivers/net/ethernet/intel/ixgbevf/vf.c: In function ‘ixgbevf_get_rss_key_locked’:
drivers/net/ethernet/intel/ixgbevf/vf.c:402:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (hw->mac.type < ixgbe_mac_X550_vf)
^
drivers/net/ethernet/intel/ixgbevf/vf.c:404:2: note: here
default:
^~~~~~~
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 80666035c70bc8def691b4cb98fa39da3d6fdee1) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Fri, 12 May 2017 18:38:09 +0000 (11:38 -0700)]
ixgbevf: Resolve truncation warning for q_vector->name
The following warning is now shown as a result of new checks added for
gcc 7:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function ‘ixgbevf_open’:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1363:13: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 3 and 18 [-Wformat-truncation=]
"%s-%s-%d", netdev->name, "TxRx", ri++);
^~
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1363:6: note: directive argument in the range [0, 2147483647]
"%s-%s-%d", netdev->name, "TxRx", ri++);
^~~~~~~~~~
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1362:4: note: ‘snprintf’ output between 8 and 32 bytes into a destination of size 24
snprintf(q_vector->name, sizeof(q_vector->name) - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s-%s-%d", netdev->name, "TxRx", ri++);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Resolve this warning by making a couple of changes.
- Don't reserve space for the null terminator. Since snprintf adds the
null terminator automatically, there is no need for us to reserve a byte
for it.
- Change a couple variables that can never be negative from int to
unsigned int.
While we're making changes to the format string, move the constant strings
into the format string instead of providing them as specifiers.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 31f5d9b1e890d52c807093fac7ee7f00eb369897) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Fri, 12 May 2017 18:38:07 +0000 (11:38 -0700)]
ixgbe: Resolve truncation warning for q_vector->name
The following warning is now shown as a result of new checks added for
gcc 7:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function ‘ixgbe_open’:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3118:13: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 3 and 18 [-Wformat-truncation=]
"%s-%s-%d", netdev->name, "TxRx", ri++);
^~
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3118:6: note: directive argument in the range [0, 2147483647]
"%s-%s-%d", netdev->name, "TxRx", ri++);
^~~~~~~~~~
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3117:4: note: ‘snprintf’ output between 8 and 32 bytes into a destination of size 24
snprintf(q_vector->name, sizeof(q_vector->name) - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s-%s-%d", netdev->name, "TxRx", ri++);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Resolve this warning by making a couple of changes.
- Don't reserve space for the null terminator. Since snprintf adds the
null terminator automatically, there is no need for us to reserve a byte
for it.
- Change a couple variables that can never be negative from int to
unsigned int.
While we're making changes to the format string, move the constant strings
into the format string instead of providing them as specifiers.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit e61e4c8b905b995a5334acf5fb9c7bcaec7417da) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Fri, 28 Apr 2017 19:42:03 +0000 (12:42 -0700)]
ixgbe: Add error checking to setting VF MAC
Currently, when setting a VF MAC address there are no error checks to
ensure that the MAC filter was successfully added. This patch adds
additional error checks, reporting, and propagation of errors. It also
will not set the MAC address unless adding the MAC filter was successful.
With these changes, setting the mac address to zeros can no longer call
ixgbe_set_vf_mac() as adding a zero MAC address filter is not valid.
Instead directly delete the filter and, if successful, clear the MAC
address.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 6af3d0faede8b8c2ccd93f31d9f146ffd0b463d6) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Mark Rustad [Tue, 25 Apr 2017 20:55:25 +0000 (13:55 -0700)]
ixgbe: Correct thermal sensor event check
The thermal sensor event logic is messed up, because it can execute
the code when there is no thermal event. The current logic is that
it will exit when !capable && !event whereas it really should exit
when !capable || !event. For one thing, it means that the service
task is doing too much work. It probably has some other symptoms as
well. So, correct the logic, simplifying to only execute when there
is a thermal event. The capable check is redundant.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 22cb4fff3d9756229f1e67987f4fabb57a8c68ca) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Paul Greenwalt [Fri, 21 Apr 2017 09:37:13 +0000 (05:37 -0400)]
ixgbe: Remove MAC X550EM_X 1Gbase-t led_[on|off] support
Since FW configures the PHY and MAC X550EM_X has no
PHY access, led_[on|off] is not supported with the 1Gbase-t design.
Removed MAC X550EM_X 1Gbase-t led_[on|off] support by setting
function pointers to NULL and added NULL pointer checks. Also set
init_led_link_act to NULL and added NULL pointer check.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 5e999fb43ebb5a64554890cda57edc1edd68a2ab) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Thu, 13 Apr 2017 14:26:07 +0000 (07:26 -0700)]
ixgbevf: Check for RSS key before setting value
The RSS key is being repopulated every time the interface is brought up
regardless of whether there is an existing value. If the user sets the RSS
key and the interface is brought up (e.g. reset), the user specified RSS
key will be overwritten.
This patch changes the rss_key to a pointer so we can check to see if the
key has been populated and preserve it accordingly.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit e60ae00361bf4e5ef08cde5a30f131cf287ffe30) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Thu, 13 Apr 2017 14:26:06 +0000 (07:26 -0700)]
ixgbevf: Fix errors in retrieving RETA and RSS from PF
Mailbox support for getting RETA and RSS is available for only 82599 and
x540; a previous patch reversed the logic and these adapters were
returning not supported.
Also, the NACK check in ixgbevf_get_rss_key_locked() was checking for the
command IXGBE_VF_GET_RETA instead of IXGBE_VF_GET_RSS_KEY.
This patch corrects both issues by correcting the logic and checking for
the right command.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 82fb670c5fdd5662c406871a6c21ebd55ba68e45) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Thu, 13 Apr 2017 14:26:05 +0000 (07:26 -0700)]
ixgbe: Check for RSS key before setting value
The RSS key is being repopulated every time the interface is brought up
regardless of whether there is an existing value. If the user sets the RSS
key and the interface is brought up (e.g. reset), the user specified RSS
key will be overwritten.
This patch changes the rss_key to a pointer so we can check to see if the
key has been populated and preserve it accordingly.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3dfbfc7ebb959d68b35d5ca3b7499cc73dc57261) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Wed, 12 Apr 2017 20:35:22 +0000 (13:35 -0700)]
ixgbe: Allow setting zero MAC address for VF
Currently, there is no logic that allows a VF's MAC address to be removed
from the RAR table.
Allow the user to specify a zero MAC address in order to clear the VF's
MAC address from the RAR table. This functionality is also utilized by
libvirt when removing VFs.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 27bdc44cdb2a8d96322d5978895eaae881fb8c2d) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Paul Greenwalt [Mon, 13 Mar 2017 09:47:56 +0000 (05:47 -0400)]
ixgbe: Acquire PHY semaphore before device reset
A recent firmware change fixed an issue to acquire the PHY semaphore before
accessing PHY registers. This led to a case where SW can issue a device
reset clearing the MDIO registers. This patch makes SW acquire the PHY
semaphore before issuing a device reset.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 6133406be1aabfb041f024109efc41756970800e) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Thu, 2 Mar 2017 23:01:36 +0000 (15:01 -0800)]
ixgbe: Fix output from ixgbe_dump
I just found that when we had changed the Rx path to check for length
instead of the DD bit we introduced an issue in ixgbe_dump since we were no
longer clearing the status bits.
To correct this I am updating ixgbe_dump to look for the length bits in the
descriptor since that is what we are using in the Rx path.
Fixes: c3630cc40b4f ("ixgbe: Use length to determine if descriptor is done") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 18a8cc9815746b8f0ae6f78733877d3846058d1c) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Wed, 1 Mar 2017 19:52:09 +0000 (11:52 -0800)]
ixgbe: add check for VETO bit when configuring link for KR
We did not have a check in place for MMNGC.MNG_VETO when setting up link
on X550EM_X KR devices which resulted in link loss for the BMC when
loading the driver.
This patch adds a check for ixgbe_check_reset_blocked() in setup_link()
since in that case there is no PHY reset function.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f4a6374ba46132896154397ce3c559ccb0d15e60) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Don Skidmore [Thu, 2 Feb 2017 19:38:46 +0000 (14:38 -0500)]
ixgbe: Remove unused define
Remove the Marvell 1145 PHY define as we have never had a device that
supports it and have no plan to in the future. The existence of this
define has caused confusing on whether or not this PHY was supported
by ixgbe.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 7ee814d7a6c449e2e96f76f1acb2b7d47dab108c) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Emil Tantilov [Fri, 20 Jan 2017 22:11:56 +0000 (14:11 -0800)]
ixgbe: do not use adapter->num_vfs when setting VFs via module parameter
Avoid setting adapter->num_vfs early in the init code path when
using the max_vfs module parameter by passing it to ixgbe_enable_sriov()
as a function parameter.
This fixes an issue where if we failed to allocate vfinfo in
__ixgbe_enable_sriov() the driver will crash with NULL pointer in
ixgbe_disable_sriov() when attempting to free the vfinfo struct based
on adapter->num_vfs. Also it cleans up the assignment of adapter->num_vfs
since now it will only be set in __ixgbe_enable_sriov() and cleared in
ixgbe_disable_sriov().
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 5c11f00ddac2c030827cdecf9c2d3678cbd3137b) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Emil Tantilov [Fri, 20 Jan 2017 22:11:50 +0000 (14:11 -0800)]
ixgbe: return early instead of wrap block in if statement
Since we exit at the end of the block, we can save a level of
indentation by performing an early return, and make the next several
sections of code more legible, with fewer 80 character line breaks.
Also moved allocating vfinfo at the beginning and the notification
for enabling SRIOV at the end of the function when we know that it
will succeed.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit da614d042ac236e5db52c56c7d7d8accd325dd40) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Don Skidmore [Sat, 31 Dec 2016 02:07:58 +0000 (21:07 -0500)]
ixgbe: Add X552 XFI backplane support
This patch add support for X552 XFI backplane interface. The XFI
backplane requires a custom tuned link. HW/FW owns the link config
for XF backplane and SW must not interfere with it.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 18e01ee75f4533cddd774b8618e20d26d7d0d958) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Joe Perches [Tue, 3 Jan 2017 15:28:11 +0000 (07:28 -0800)]
ixgbe: Remove pr_cont uses
As pr_cont output can be interleaved by other processes,
using pr_cont should be avoided where possible.
Miscellanea:
- Use a temporary pointer to hold the next descriptions and
consolidate the pr_cont uses
- Use the temporary buffer to hold the 8 u32 register values and
emit those in a single go
- Coalesce formats and logging neatening around those changes
- Fix a defective output for the rx ring entry description when
also emitting rx_buffer_info data
This reduces overall object size a tiny bit too.
$ size drivers/net/ethernet/intel/ixgbe/*.o*
text data bss dec hex filename
62167 728 12 62907 f5bb drivers/net/ethernet/intel/ixgbe/ixgbe_main.o.new
62273 728 12 63013 f625 drivers/net/ethernet/intel/ixgbe/ixgbe_main.o.old
Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 332f235836082fe7d3d890409ed6a20e0ea0d923) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Fri, 3 Feb 2017 17:19:40 +0000 (09:19 -0800)]
ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines
On architectures that have a cache line size larger than 64 Bytes we start
running into issues where the amount of headroom for the frame starts
shrinking.
The size of skb_shared_info on a system with a 64B L1 cache line size is
320. This increases to 384 with a 128B cache line, and 512 with a 256B
cache line.
In addition the NET_SKB_PAD value increases as well consistent with the
cache line size. As a result when we get to a 256B cache line as seen on
the s390 we end up 768 bytes used by padding and shared info leaving us
with only 1280 bytes to use for data storage. On architectures such as
this we should default to using 3K Rx buffers out of a 8K page instead of
trying to do 1.5K buffers out of a 4K page.
To take all of this into account I have added one small check so that we
compare the max_frame to the amount of actual data we can store. This was
already occurring for igb, but I had overlooked it for ixgbe as it doesn't
have strict limits for 82599 once we enable jumbo frames. By adding this
check we will automatically enable 3K Rx buffers as soon as the maximum
frame size we can handle drops below the standard Ethernet MTU.
I also went through and fixed one small typo that I found where I had left
an IGB in a variable name due to a copy/paste error.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit c74042f3b3ca982652af99cad85252a2655c6064) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Paolo Abeni [Thu, 15 Dec 2016 14:20:34 +0000 (15:20 +0100)]
ixgbe: update the rss key on h/w, when ethtool ask for it
Currently ixgbe_set_rxfh() updates the rss_key copy in the driver
memory, but does not push the new value into the h/w. This commit
add a new helper for the latter operation and call it in
ixgbe_set_rxfh(), so that the h/w rss key value can be really
updated via ethtool.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit d3aa9c9f212a729e46653d4c1eb6a9ab190efe3a) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Tue, 17 Jan 2017 16:37:29 +0000 (08:37 -0800)]
ixgbe: Don't bother clearing buffer memory for descriptor rings
This patch makes it so that we don't need to bother with clearing the
memory out for the descriptor rings. The general idea is to only free
buffers associated with buffers in use which are located between the
next_to_clean and next_to_use or next_to_alloc values. Everything outside
of those regions can be safely ignored since they should have no buffers
associated with them.
The advantage to doing things this way is that is should speed up bring-up
and tear-down of the rings. Specifically we can avoid the 512 or more
cycles required to memset the rings in tear-down. In the bring-up phase we
then clear the memory as a part of initialization. The general idea is
that the clearing in initialization can act as a prefetch of sorts for the
buffer info structures so they are in the local CPU when we go to populate
them. This should help to improve overall time needed to perform a
suspend/resume.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit ffed21bcee7a544f99a9c9b18c23b361a0b1e476) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Tue, 17 Jan 2017 16:37:03 +0000 (08:37 -0800)]
ixgbe: Add private flag to control buffer mode
Since there are potential drawbacks to the new Rx allocation approach I
thought it best to add a "chicken bit" so that we can turn the feature off
if in the event that a problem is found.
It also provides a means of validating the legacy Rx path in the event that
we are forced to fall back. At some point in the future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2ccdf26ff614dd49b14e76c0c076f5f4e9562e79) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Tue, 17 Jan 2017 16:36:54 +0000 (08:36 -0800)]
ixgbe: Add support for padding packet
This patch adds support for providing a buffer with headroom and tailroom
to allow for shared info, NET_SKB_PAD, and NET_IP_ALIGN. With this
combined with the DMA changes we can start using build_skb to build frames
around an incoming Rx buffer instead of having to memcpy the headers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2de6aa3a666e63699978f81d0d5523e7e0778f7b) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Tue, 17 Jan 2017 16:36:28 +0000 (08:36 -0800)]
ixgbe: Use length to determine if descriptor is done
This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit c3630cc40b4f0fe004e21f19bfb5cd2231c105f8) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Tue, 17 Jan 2017 16:36:14 +0000 (08:36 -0800)]
ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE
In order to support build_skb with jumbo frames it will be necessary to use
3K buffers for the Rx path with 8K pages backing them. This is needed on
architectures that implement 4K pages because we can't support 2K buffers
plus padding in a 4K page.
In the case of systems that support page sizes larger than 4K the 3K
attribute will only be applied to FCoE as we can fall back to using just 2K
buffers and adding the padding.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug:26242766
(cherry picked from commit 4f4542bfb3b539bef118578ffafcc98e4ce91979) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Alexander Duyck [Tue, 17 Jan 2017 16:35:44 +0000 (08:35 -0800)]
ixgbe: Only DMA sync frame length
On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f215af8cae4c283d8a522ea166d94f763dc4aebf) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Fri, 11 Nov 2016 00:01:33 +0000 (16:01 -0800)]
ixgbe: Support 2.5Gb and 5Gb speed
Though not advertised through ethtool, if the link partner advertises a
2.5Gb or 5Gb connection, and the adapter supports it, allow the speed to be
used.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 1dc0eb75a8f88e37c2aee75fca0313cd6e30a1e1) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Eric Dumazet [Fri, 3 Feb 2017 00:59:18 +0000 (16:59 -0800)]
ixgbevf: get rid of custom busy polling code
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26242766
(cherry picked from commit 508aac6dee025f93eab1e806d20762ea6327b43d) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Eric Dumazet [Fri, 3 Feb 2017 00:26:39 +0000 (16:26 -0800)]
ixgbe: get rid of custom busy polling code
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26242766
(cherry picked from commit 3ffc1af576550ec61d35668485954e49da29d168) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Don Skidmore [Fri, 16 Dec 2016 02:18:32 +0000 (21:18 -0500)]
ixgbe: Add PF support for VF promiscuous mode
This patch extends the xcast mailbox message to include support for
unicast promiscuous mode. To allow a VF to enter this mode the PF
must be in promiscuous mode.
A later patch will add the support needed in the VF driver (ixgbevf)
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 07eea570acccbc0f9402357d652868571fdbb2b9) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Don Skidmore [Fri, 16 Dec 2016 02:18:31 +0000 (21:18 -0500)]
ixgbevf: Add support for VF promiscuous mode
This patch extends the mailbox message to allow for VF promiscuous
mode support.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 41e544cdad0bd669600825d8de73c8f420640bf9)
Chunk in ixgbevf_main.c deleted due to dependencies Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Mark Rustad [Wed, 14 Dec 2016 19:02:00 +0000 (11:02 -0800)]
ixgbe: Fix issues with EEPROM access
There are two problems with EEPROM access. One is that it needs to
hold the semaphore until the entire response is read or else the
response can be corrupted by other firmware accesses. The second
problem is that acquiring and releasing the semaphore is slow, so
it should be taken and released once when multiple EEPROM accesses
will be done.
Both of these issues can be solved by adding a new function,
ixgbe_hic_unlocked, to issue firmware commands that will assume
that the caller has acquired the needed semaphore.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3efa9ed260ce838976eb9177bae7249caf7a2aa1) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Don Skidmore [Wed, 14 Dec 2016 01:34:51 +0000 (20:34 -0500)]
ixgbe: Configure advertised speeds correctly for KR/KX backplane
This patch ensures that the advertised link speeds are configured
for X553 KR/KX backplane. Without this patch the link remains at
1G when resuming from low power after being downshifted by LPLU.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 54f6d4c42451dbd2cc7e0f0bd8fc3eddcab511fe) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Yusuke Suzuki [Mon, 21 Nov 2016 06:48:45 +0000 (06:48 +0000)]
ixgbe: Fix incorrect bitwise operations of PTP Rx timestamp flags
Rx timestamp does not work on 82599 and X540 because bitwise operation
of RX_HWTSTAMP flags is incorrect and ixgbe_ptp_rx_hwtstamp() is never
called. This patch fixes it to enable Rx timestamp on 82599 and X540.
Without this fix:
ptp4l[278.730]: selected /dev/ptp8 as PTP clock
ptp4l[278.733]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.733]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.834]: port 1: received SYNC without timestamp
ptp4l[278.835]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[279.834]: port 1: received SYNC without timestamp
ptp4l[280.834]: port 1: received SYNC without timestamp
ptp4l[281.834]: port 1: received SYNC without timestamp
ptp4l[282.834]: port 1: received SYNC without timestamp
ptp4l[282.835]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[282.835]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[283.834]: port 1: received SYNC without timestamp
With this fix:
ptp4l[239.154]: selected /dev/ptp8 as PTP clock
ptp4l[239.157]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[239.157]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[240.989]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[244.989]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[244.989]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[246.977]: master offset -899583339542096 s0 freq +0 path delay 16222
ptp4l[247.977]: master offset -899583339617265 s1 freq -75169 path delay 16177
ptp4l[248.977]: master offset -130 s2 freq -75299 path delay 16177
ptp4l[248.977]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[249.977]: master offset -9 s2 freq -75217 path delay 16177
ptp4l[250.977]: master offset 88 s2 freq -75123 path delay 16132
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices") Signed-off-by: Yusuke Suzuki <yus-suzuki@uf.jp.nec.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit aeb4c73100be8aade8a1189b50bd226b709ca8bb) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Emil Tantilov [Wed, 16 Nov 2016 19:25:34 +0000 (11:25 -0800)]
ixgbevf: fix AER error handling
Make sure that we free the IRQs in ixgbevf_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.
Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbevf_remove() after a failed recovery from
AER error because the interrupts were not freed.
Also moved the down and free functions into ixgbevf_close_suspend()
same as with ixgbe.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit b19cf6eea9e2a497e6475fd02c0703f0b3a6d083) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Emil Tantilov [Wed, 16 Nov 2016 17:48:02 +0000 (09:48 -0800)]
ixgbe: fix AER error handling
Make sure that we free the IRQs in ixgbe_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.
Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbe_remove() after a failed recovery from
AER error because the interrupts were not freed.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 126db13fa0e6d05c9f94e0125f61e773bd5ab079) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Ken Cox [Tue, 15 Nov 2016 19:00:37 +0000 (13:00 -0600)]
ixgbe: test for trust in macvlan adjustments for VF
There are two methods for setting mac addresses in a Macvlan, that
differentiate themselves in the function macvlan_set_mac_Address.
If the macvlan mode is passthru, then we use the dev_set_mac_address
method, otherwise we use the dev_uc api via macvlan_sync_addresses.
The latter method (which would stem from using any non-passthru mode,
like bridge, or vepa), calls down into the driver in a path that terminates
in ixgbevf_set_uc_addr_vf, which sends a IXGBE_VF_SET_MACVLAN message,
which causes the pf to spawn the noted error message. This occurs because
it appears that the guest is trying to delete the mac address of the macvlan
before adding another.
The other path in macvlan_set_mac_address uses dev_set_mac_address, which
calls into ixgbevf_set_mac which uses the IXGBE_VF_SET_MAC_ADDR to the
pf to set the macvlan mac address.
The discrepancy here is in the handlers. The handler function for
IXGBE_VF_SET_MAC_ADDR (ixgbe_set_vf_mac_addr) has a check for
the vfinfo[].trusted bit to allow the operation if the vf is trusted.
In comparison, the IXGBE_VF_SET_MACVLAN message handler
(ixgbe_set_vf_macvlan_msg) has no such check of the trusted bit.
Signed-off-by: Ken Cox <jkc@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit a9d2d53a788a9c5bc8a7d1b4ea7857b68e221357) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Emil Tantilov [Fri, 11 Nov 2016 18:12:51 +0000 (10:12 -0800)]
ixgbevf: handle race between close and suspend on shutdown
When an interface is part of a namespace it is possible that
ixgbevf_close() may be called while ixgbevf_suspend() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs()
To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and check for !netif_device_present()
to avoid entering close while in suspend.
Also added rtnl locks to ixgbevf_queue_reset_subtask().
CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2dad7b2775ea030c898fe4946971edd25af237d1) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Emil Tantilov [Fri, 11 Nov 2016 18:07:47 +0000 (10:07 -0800)]
ixgbe: handle close/suspend race with netif_device_detach/present
When an interface is part of a namespace it is possible that
ixgbe_close() may be called while __ixgbe_shutdown() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs().
To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and ixgbe_clear_interrupt_scheme()
in __ixgbe_shutdown() and check for netif_device_present()
to avoid clearing the interrupts second time in ixgbe_close();
Also extend the rtnl lock in ixgbe_resume() to netif_device_attach().
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f7f37e7ff2b9b7eff7fbd035569cab35896869a3) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Fri, 11 Nov 2016 00:00:33 +0000 (16:00 -0800)]
ixgbe: Fix reporting of 100Mb capability
BaseT adapters that are capable of supporting 100Mb are not reporting this
capability. This patch corrects the reporting so that 100Mb is shown as
supported on those adapters.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f215266470dfe86196a31fe0725a86cea77f9a18) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Thu, 10 Nov 2016 17:57:29 +0000 (09:57 -0800)]
ixgbe: Reduce I2C retry count on X550 devices
A retry count of 10 is likely to run into problems on X550 devices that
have to detect and reset unresponsive CS4227 devices. So, reduce the I2C
retry count to 3 for X550 and above. This should avoid any possible
regressions in existing devices.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3f0d646b720d541309b11e190db58086f446f41e) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Wed, 9 Nov 2016 18:48:48 +0000 (10:48 -0800)]
ixgbe: Add bounds check for x540 LED functions
This is an extension of commit 003287e0f087 ("ixgbevf: Correct parameter
sent to LED function"); add bounds checking to x540 functions to ensure the
index is valid.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 910c9c0f59567ec204924d88ca04337bb04f17d9) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Mon, 31 Oct 2016 19:11:58 +0000 (12:11 -0700)]
ixgbe: Fix check for ixgbe_phy_x550em_ext_t reset
The generic PHY reset check we had previously is not sufficient for the
ixgbe_phy_x550em_ext_t PHY type. Check 1.CC02.0 instead - same as
ixgbe_init_ext_t_x550().
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 5c092749e304d8b49567be633d5be31393538e3b) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tony Nguyen [Wed, 26 Oct 2016 23:25:18 +0000 (16:25 -0700)]
ixgbe: Report driver version to firmware for x550 devices
Some x550 devices require the driver version reported to its firmware; this
patch sends the driver version string to the firmware through the host
interface command for x550 devices.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit cb8e051446ae554aae38163d3421edc793221784) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Vijay Kumar [Wed, 1 Feb 2017 19:34:40 +0000 (11:34 -0800)]
Documentation/sparc: Steps for sending break on sunhv console
Documented the steps for sending break on sunhv console.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4cfe140618b99e653134598de9f18b48743549ec)
Vijay Kumar [Wed, 1 Feb 2017 19:34:39 +0000 (11:34 -0800)]
sparc64: Send break twice from console to return to boot prom
Now we can also jump to boot prom from sunhv console by sending
break twice on console for both running and panicked kernel
cases.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7db60d05e5ccc0a473fa2275f90f2fca0002ab21)
Vijay Kumar [Wed, 1 Feb 2017 19:34:38 +0000 (11:34 -0800)]
sparc64: Migrate hvcons irq to panicked cpu
On panic, all other CPUs are stopped except the one which had
hit panic. To keep console alive, we need to migrate hvcons irq
to panicked CPU.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7dd4fcf5b70694dc961eb6b954673e4fc9730dbd)
Vijay Kumar [Wed, 1 Feb 2017 19:34:37 +0000 (11:34 -0800)]
sparc64: Set cpu state to offline when stopped
CPU needs to be marked offline before stopping it. When not marked
offline, the xcall receives HV_EWOULDBLOCK and so assumes that not all
CPUs received the message, and retries. After 10000 retries, it finally
fails with fatal mondo timeout.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cffb3e76818fee4763a2ce5f2b1eca2d7885e2cf)
Since the tcmu feature support for the iscsi target is added by below
bug fixes build the tcmu kernel module by modifying default kernel
configuration files.
The fixes which add tcmu feature to the iscsi target:
Orabug: 25983319/25983379/25791789 Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Shan Hai <shan.hai@oracle.com>
Over-eager editing for conflict resolution in merge commit c311ca8a3d9349dfc606cada0d8ca14e58728c9c ("Merge tag 'v4.1.12-70#22913653a12'
into pmem-4.1-merge") resulted in the accidental deletion of two calls to
mutex_lock(). Since the matching calls to mutex_unlock() were correctly left
in the source, this produced a lock imbalance, detectable via lockdep.
This commit restores the missing mutex_lock() calls.
md/raid5: don't index beyond end of array in need_this_block().
When need_this_block probably shouldn't be called when there
are more than 2 failed devices, we really don't want it to try
indexing beyond the end of the failed_num[] of fdev[] arrays.
So limit the loops to at most 2 iterations.
Reported-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.de>
(cherry picked from commit 36707bb2e7c6730d79d6cdc6d1475d3d7e94c518)
Orabug: 26047272 Signed-off-by: Fred Herard <fred.herard@oracle.com> Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Nicolas Droux <nicolas.droux@oracle.com> Acked-by: Saar Maoz <Saar.Maoz@oracle.com> Acked-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>