www.infradead.org Git - users/jedix/linux-maple.git/log

arch/sparc: Enable queued spinlock support for SPARC

This patch makes the necessary changes in SPARC architecture to enable
queued spinlock support. Here are some of the earlier discussions about
this feature.
https://lwn.net/Articles/561775/
https://lwn.net/Articles/590243/

Cleaned-up the spinlock_64.h. The definitions of arch_spin_xxx are
replaced by the function in <asm-generic/qspinlock.h>

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 145d978585977438ebb55079487827006c604e39)

Conflicts:

arch/sparc/include/asm/spinlock_64.h

Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Introduce xchg16 for SPARC

SPARC supports 32 bit and 64 bit xchg right now. Add the support
for 16 bit (2 byte) xchg. This is required to support queued spinlock
feature which uses 2 byte xchg. This is achieved using 4 byte cas
instructions with byte manipulations.

Also re-arranged the code to call __cmpxchg_u32 inside xchg16.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 79d39e2bab60d18a68a5abc00be4506864397efc)

Conflicts:

arch/sparc/include/asm/cmpxchg_64.h

Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Enable queued rwlocks for SPARC

Enable queued rwlocks for SPARC. Here are the discussions on this feature
when this was introduced.
https://lwn.net/Articles/572765/
https://lwn.net/Articles/582200/

Cleaned-up the arch_read_xxx and arch_write_xxx definitions in spinlock_64.h.
These routines are replaced by the functions in include/asm-generic/qrwlock.h

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a37594f198363fd9321ece54440336fd4b2a9c8e)

Conflicts:

arch/sparc/Kconfig
arch/sparc/include/asm/spinlock_64.h

Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Introduce cmpxchg_u8 SPARC

SPARC supports 32 bit and 64 bit cmpxchg right now. Add support
for 8 bit (1 byte) cmpxchg. This is required to support queued
rwlocks feature which uses 1 byte cmpxchg.

The function __cmpxchg_u8 here uses the 4 byte cas instruction with a
byte manipulation to achieve 1 byte cmpxchg.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a12ee2349312d7112b9b7c6ac2e70c5ec2ca334e)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Define config parameter CPU_BIG_ENDIAN

Found this problem while enabling queued rwlock on SPARC.
The parameter CONFIG_CPU_BIG_ENDIAN is used to clear the
specific byte in qrwlock structure. Without this parameter,
we clear the wrong byte. Here is the code.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
{
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}

Define CPU_BIG_ENDIAN for SPARC to fix it.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 97d9f969161d79e6a4bba247e67ce731ff861f79)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

kernel/locking: Fix compile error with qrwlock.c

Saw these compile errors on SPARC when queued rwlock feature is enabled.

CC kernel/locking/qrwlock.o
kernel/locking/qrwlock.c: In function 'queued_read_lock_slowpath':
kernel/locking/qrwlock.c:89: error: implicit declaration of function 'arch_spin_lock'
kernel/locking/qrwlock.c:102: error: implicit declaration of function 'arch_spin_unlock'
make[4]: *** [kernel/locking/qrwlock.o] Error 1

Include spinlock.h in qrwlock.c to fix it.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9ab6055f959032258c0f83a070cd0d26ed7a8fc5)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Remove the check #ifndef __LINUX_SPINLOCK_TYPES_H

Saw these compile errors on SPARC when queued rwlock feature is enabled.

   CC      kernel/locking/qrwlock.o
In file included from ./include/asm-generic/qrwlock_types.h:5,
                  from ./arch/sparc/include/asm/qrwlock.h:4,
                  from kernel/locking/qrwlock.c:24:
./arch/sparc/include/asm/spinlock_types.h:5:3: error:
         #error "please don't include this file directly"

SPARC has this guard which causes compile error when spinlock_types.h
is included directly.
@ifndef __LINUX_SPINLOCK_TYPES_H
@ error "please don't include this file directly"
@endif

Remove this un-necessary "ifndef __LINUX_SPINLOCK_TYPES_H" stanza from SPARC.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8b93b4a9e1be78930eb9d640f75818993f70e065)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qrwlock: Fix write unlock bug on big endian systems

This patch aims to get rid of endianness in queued_write_unlock(). We
want to set __qrwlock->wmode to NULL, however the address is not
&lock->cnts in big endian machine. That causes queued_write_unlock()
write NULL to the wrong field of __qrwlock.

So implement __qrwlock_write_byte() which returns the correct
__qrwlock->wmode address.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman.Long@hpe.com
Cc: arnd@arndb.de
Cc: boqun.feng@gmail.com
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1468835259-4486-1-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2db34e8bf9a22f4e38b29deccee57457bc0e7d74)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qrwlock: Implement queue_write_unlock() using smp_store_release()

Since the following commit:

536fa402221f ("compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release()")

smp_store_release() supports byte accesses, so use that in writer unlock
and remove the conditional macro override.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <Waiman.Long@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1438880084-18856-6-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2b2a85a4d3534b8884fcfa5bb52837f0e1c672bc)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qspinlock: Avoid redundant read of next pointer

With optimistic prefetch of the next node cacheline, the next pointer
may have been properly inititalized. As a result, the reading
of node->next in the contended path may be redundant. This patch
eliminates the redundant read if the next pointer value is not NULL.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447114167-47185-4-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit aa68744f80bfb6f26fbe7f10e42876066f7dac1b)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qspinlock: Prefetch the next node cacheline

A queue head CPU, after acquiring the lock, will have to notify
the next CPU in the wait queue that it has became the new queue
head. This involves loading a new cacheline from the MCS node of the
next CPU. That operation can be expensive and add to the latency of
locking operation.

This patch addes code to optmistically prefetch the next MCS node
cacheline if the next pointer is defined and it has been spinning
for the MCS lock for a while. This reduces the locking latency and
improves the system throughput.

The performance change will depend on whether the prefetch overhead
can be hidden within the latency of the lock spin loop. On really
short critical section, there may not be performance gain at all. With
longer critical section, however, it was found to have a performance
boost of 5-10% over a range of different queue depths with a spinlock
loop microbenchmark.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447114167-47185-3-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 81b5598665a24083dd889fbd8cb08b0d8de4b8ad)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qrwlock: Reduce reader/writer to reader lock transfer latency

Currently, a reader will check first to make sure that the writer mode
byte is cleared before incrementing the reader count. That waiting is
not really necessary. It increases the latency in the reader/writer
to reader transition and reduces readers performance.

This patch eliminates that waiting. It also has the side effect
of reducing the chance of writer lock stealing and improving the
fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
locking loop of mostly readers (RW ratio = 10,000:1) has the following
performance numbers in a Haswell-EX box:

        Kernel          Locking Rate (Kops/s)
        ------          ---------------------
        4.1.1               15,063,081
        4.1.1+patch         17,241,552  (+14.4%)

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1436459543-29126-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ffffeaf318bd8da036eb8eb784b025a9f829201b)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qrwlock: Better optimization for interrupt context readers

The qrwlock is fair in the process context, but becoming unfair when
in the interrupt context to support use cases like the tasklist_lock.

The current code isn't that well-documented on what happens when
in the interrupt context. The rspin_until_writer_unlock() will only
spin if the writer has gotten the lock. If the writer is still in the
waiting state, the increment in the reader count will cause the writer
to remain in the waiting state and the new interrupt context reader
will get the lock and return immediately. The current code, however,
does an additional read of the lock value which is not necessary as
the information has already been there in the fast path. This may
sometime cause an additional cacheline transfer when the lock is
highly contended.

This patch passes the lock value information gotten in the fast path
to the slow path to eliminate the additional read. It also documents
the action for the interrupt context readers more clearly.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434729002-57724-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0e06e5be70d392aa842c1455ec2d0baf62aeed48)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qrwlock: Rename functions to queued_*()

To sync up with the naming convention used in qspinlock, all the
qrwlock functions were renamed to started with "queued" instead of
"queue".

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1434729002-57724-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f7d71f2052555ae57b47322f2c2f6c29ff2438ae)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qrwlock: Don't contend with readers when setting _QW_WAITING

The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.

A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:

With R:W ratio = 5:1

Threads    w/o patch with patch % change
-------    --------- ---------- --------
   2      990     895   -9.6%
   3     2136    1912 -10.5%
   4     3166    2830 -10.6%
   5     3953    3629   -8.2%
   6     4628    4405   -4.8%
   7     5344    5197   -2.8%
   8     6065    6004   -1.0%
   9     6826    6811   -0.2%
  10     7599    7599    0.0%
  15     9757    9766   +0.1%
  20    13767   13817   +0.4%

With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433863153-30722-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 405963b6a57c60040bc1dad2597f7f4b897954d1)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS

To be consistent with the queued spinlocks which use
CONFIG_QUEUED_SPINLOCKS config parameter, the one for the queued
rwlocks is now renamed to CONFIG_QUEUED_RWLOCKS.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431367031-36697-1-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit c7114b4e6c53111d415485875725b60213ffc675)

Conflicts:

arch/x86/Kconfig

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qspinlock: Use a simple write to grab the lock

Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
tail code in the lock is set.

With that change, the are some slight improvement in the performance
of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

[Standalone/Embedded - same node]
  # of tasks Before patch After patch %Change
  ---------- ----------- ---------- -------
       3 2324/2321 2248/2265 -3%/-2%
       4 2890/2896 2819/2831 -2%/-2%
       5 3611/3595 3522/3512 -2%/-2%
       6 4281/4276 4173/4160 -3%/-3%
       7 5018/5001 4875/4861 -3%/-3%
       8 5759/5750 5563/5568 -3%/-3%

[Standalone/Embedded - different nodes]
  # of tasks Before patch After patch %Change
  ---------- ----------- ---------- -------
       3 12242/12237 12087/12093 -1%/-1%
       4 10688/10696 10507/10521 -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queued spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

                AIM7 XFS Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            5678233    3.17       96.61       5.81
  qspinlock             5750799    3.13       94.83       5.97

                AIM7 EXT4 Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            1114551   16.15      509.72       7.11
  qspinlock             2184466    8.24      232.99       6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The "ebizzy -m" test was also run with the following results:

  kernel               records/s  Real Time   Sys Time    Usr Time
  -----                ---------  ---------   --------    --------
  ticketlock             2075       10.00      216.35       3.49
  qspinlock              3023       10.00      198.20       4.80

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2c83e8e9492dc823be1d96d4c5ef75d16d3866a0)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qspinlock: Optimize for smaller NR_CPUS

When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 69f9cae90907e09af95fb991ed384670cef8dd32)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qspinlock: Extract out code snippets for the next patch

This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.

1) the logic for the exchange of new and previous tail code words
into a new xchg_tail() function.
2) the logic for clearing the pending bit and setting the locked bit
into a new clear_pending_set_locked() function.

This patch also simplifies the trylock operation before queuing by
calling queued_spin_trylock() directly.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 6403bd7d0ea1878a487296114eccf78658d7dd7a)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qspinlock: Add pending bit

Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.

It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.

In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit c1fb159db9f2e50e0f4025bed92a67a6a7bfa7b7)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/qspinlock: Introduce a simple generic 4-byte queued spinlock

This patch introduces a new generic queued spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queued spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much faster in high contention situations especially when
the spinlock is embedded within the data structure to be protected.

Only in light to moderate contention where the average queue depth
is around 1-3 will this queued spinlock be potentially a bit slower
due to the higher slowpath overhead.

This queued spinlock is especially suit to NUMA machines with a large
number of cores as the chance of spinlock contention is much higher
in those machines. The cost of contention is also higher because of
slower inter-node memory traffic.

Due to the fact that spinlocks are acquired with preemption disabled,
the process will not be migrated to another CPU while it is trying
to get a spinlock. Ignoring interrupt handling, a CPU can only be
contending in one spinlock at any one time. Counting soft IRQ, hard
IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
activities. By allocating a set of per-cpu queue nodes and used them
to form a waiting queue, we can encode the queue node address into a
much smaller 24-bit size (including CPU number and queue node index)
leaving one byte for the lock.

Please note that the queue node is only needed when waiting for the
lock. Once the lock is acquired, the queue node can be released to
be used later.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a33fda35e3a7655fb7df756ed67822afb5ed5e8d)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write()

In up_write()/up_read(), rwsem_wake() will be called whenever it
detects that some writers/readers are waiting. The rwsem_wake()
function will take the wait_lock and call __rwsem_do_wake() to do the
real wakeup. For a heavily contended rwsem, doing a spin_lock() on
wait_lock will cause further contention on the heavily contended rwsem
cacheline resulting in delay in the completion of the up_read/up_write
operations.

This patch makes the wait_lock taking and the call to __rwsem_do_wake()
optional if at least one spinning writer is present. The spinning
writer will be able to take the rwsem and call rwsem_wake() later
when it calls up_write(). With the presence of a spinning writer,
rwsem_wake() will now try to acquire the lock using trylock. If that
fails, it will just quit.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Jason Low <jason.low2@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 59aabfc7e959f5f213e4e5cc7567ab4934da2adf)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Orabug: 26183741
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: revised support for 4096cpus

In the process of upstreaming patch bbd4b32b05cc529e74b1dd5ee3edc396fa7dd129
that went into uek4 for the NR_CPUS=4096 support, I received and incorporated
a comment to split up the allocation for the mondo block and mondo cpulist.
This patch is to update uek4 for consistency.

Orabug: 25505750

Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ixgbe: fix incorrect status check

Check for ret_val instead of !ret_val to allow the rest of
the code to execute and configure the speed properly.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit d9c23ff80b9fdc1f2e8efeb5368adfd93493d7b4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: add missing configuration for rate select 1

Add RS1 configuration to ixgbe_set_soft_rate_select_speed()

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3ce5cb75f39378e3b77628352735632ccc98b489)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: always call setup_mac_link for multispeed fiber

Remove the logic which would previously skip the link configuration
in the case where we are already at the requested speed in
ixgbe_setup_mac_link_multispeed_fiber().

By exiting early we are skipping the link configuration and as such
the driver may not always configure the PHY correctly for SFP+.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 08ed48e182ef870517a84d2331c4c5da8f1c3b3a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: add write flush when configuring CS4223/7

Make sure the writes are processed immediately. Without the flush it
is possible for operations on one port to spill over the other as the
resource is shared.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 410a494902777c11f95031d9ed757d7f8f09c5c6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: correct CS4223/7 PHY identification

Previous method was unreliable. Use a different register to
differentiate between the SKUs.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit cc1de78c2a3d936d733bc9bd3f6e0655d03c2fb7)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: Resolve warnings for -Wimplicit-fallthrough

Additions to gcc 7 now warn whenever a switch statement falls through
implicitly.  This patch adds explicit fall through comments to address the
following warnings:

drivers/net/ethernet/intel/ixgbevf/vf.c: In function ‘ixgbevf_get_reta_locked’:
drivers/net/ethernet/intel/ixgbevf/vf.c:336:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (hw->mac.type < ixgbe_mac_X550_vf)
      ^
drivers/net/ethernet/intel/ixgbevf/vf.c:338:2: note: here
  default:
  ^~~~~~~
drivers/net/ethernet/intel/ixgbevf/vf.c: In function ‘ixgbevf_get_rss_key_locked’:
drivers/net/ethernet/intel/ixgbevf/vf.c:402:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (hw->mac.type < ixgbe_mac_X550_vf)
      ^
drivers/net/ethernet/intel/ixgbevf/vf.c:404:2: note: here
  default:
  ^~~~~~~

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 80666035c70bc8def691b4cb98fa39da3d6fdee1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: Resolve truncation warning for q_vector->name

The following warning is now shown as a result of new checks added for
gcc 7:

drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function ‘ixgbevf_open’:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1363:13: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 3 and 18 [-Wformat-truncation=]
      "%s-%s-%d", netdev->name, "TxRx", ri++);
             ^~
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1363:6: note: directive argument in the range [0, 2147483647]
      "%s-%s-%d", netdev->name, "TxRx", ri++);
      ^~~~~~~~~~
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1362:4: note: ‘snprintf’ output between 8 and 32 bytes into a destination of size 24
    snprintf(q_vector->name, sizeof(q_vector->name) - 1,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      "%s-%s-%d", netdev->name, "TxRx", ri++);
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Resolve this warning by making a couple of changes.
- Don't reserve space for the null terminator.  Since snprintf adds the
   null terminator automatically, there is no need for us to reserve a byte
   for it.

- Change a couple variables that can never be negative from int to
   unsigned int.

While we're making changes to the format string, move the constant strings
into the format string instead of providing them as specifiers.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 31f5d9b1e890d52c807093fac7ee7f00eb369897)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Resolve warnings for -Wimplicit-fallthrough

This patch adds/changes fall through comments to address new warnings
produced by gcc 7.

Fixed formatting on a couple of comments in the function.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 93df9465c93e634c49f18271218076ab0b9aaf75)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Resolve truncation warning for q_vector->name

The following warning is now shown as a result of new checks added for
gcc 7:

drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function ‘ixgbe_open’:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3118:13: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 3 and 18 [-Wformat-truncation=]
      "%s-%s-%d", netdev->name, "TxRx", ri++);
             ^~
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3118:6: note: directive argument in the range [0, 2147483647]
      "%s-%s-%d", netdev->name, "TxRx", ri++);
      ^~~~~~~~~~
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3117:4: note: ‘snprintf’ output between 8 and 32 bytes into a destination of size 24
    snprintf(q_vector->name, sizeof(q_vector->name) - 1,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      "%s-%s-%d", netdev->name, "TxRx", ri++);
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Resolve this warning by making a couple of changes.
- Don't reserve space for the null terminator.  Since snprintf adds the
   null terminator automatically, there is no need for us to reserve a byte
   for it.

- Change a couple variables that can never be negative from int to
   unsigned int.

While we're making changes to the format string, move the constant strings
into the format string instead of providing them as specifiers.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit e61e4c8b905b995a5334acf5fb9c7bcaec7417da)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Add error checking to setting VF MAC

Currently, when setting a VF MAC address there are no error checks to
ensure that the MAC filter was successfully added. This patch adds
additional error checks, reporting, and propagation of errors. It also
will not set the MAC address unless adding the MAC filter was successful.

With these changes, setting the mac address to zeros can no longer call
ixgbe_set_vf_mac() as adding a zero MAC address filter is not valid.
Instead directly delete the filter and, if successful, clear the MAC
address.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 6af3d0faede8b8c2ccd93f31d9f146ffd0b463d6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Correct thermal sensor event check

The thermal sensor event logic is messed up, because it can execute
the code when there is no thermal event. The current logic is that
it will exit when !capable && !event whereas it really should exit
when !capable || !event. For one thing, it means that the service
task is doing too much work. It probably has some other symptoms as
well. So, correct the logic, simplifying to only execute when there
is a thermal event. The capable check is redundant.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 22cb4fff3d9756229f1e67987f4fabb57a8c68ca)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: enable L3/L4 filtering for Tx switched packets

This will ensure that VF-to-VF traffic on the same PF
is filtered to allow RSS operation.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit e6b41c888154b5c529ba4d65b6fc55f2a7ae4d75)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Remove MAC X550EM_X 1Gbase-t led_[on|off] support

Since FW configures the PHY and MAC X550EM_X has no
PHY access, led_[on|off] is not supported with the 1Gbase-t design.

Removed MAC X550EM_X 1Gbase-t led_[on|off] support by setting
function pointers to NULL and added NULL pointer checks. Also set
init_led_link_act to NULL and added NULL pointer check.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 5e999fb43ebb5a64554890cda57edc1edd68a2ab)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: Check for RSS key before setting value

The RSS key is being repopulated every time the interface is brought up
regardless of whether there is an existing value. If the user sets the RSS
key and the interface is brought up (e.g. reset), the user specified RSS
key will be overwritten.

This patch changes the rss_key to a pointer so we can check to see if the
key has been populated and preserve it accordingly.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit e60ae00361bf4e5ef08cde5a30f131cf287ffe30)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: Fix errors in retrieving RETA and RSS from PF

Mailbox support for getting RETA and RSS is available for only 82599 and
x540; a previous patch reversed the logic and these adapters were
returning not supported.

Also, the NACK check in ixgbevf_get_rss_key_locked() was checking for the
command IXGBE_VF_GET_RETA instead of IXGBE_VF_GET_RSS_KEY.

This patch corrects both issues by correcting the logic and checking for
the right command.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 82fb670c5fdd5662c406871a6c21ebd55ba68e45)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Check for RSS key before setting value

The RSS key is being repopulated every time the interface is brought up
regardless of whether there is an existing value. If the user sets the RSS
key and the interface is brought up (e.g. reset), the user specified RSS
key will be overwritten.

This patch changes the rss_key to a pointer so we can check to see if the
key has been populated and preserve it accordingly.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3dfbfc7ebb959d68b35d5ca3b7499cc73dc57261)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Add 1000Base-T device based on X550EM_X MAC

Add support for new 1000Base-T device based on X550EM_X MAC
type. All PHY operations are disabled as the PHY is controlled
by FW.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 8dc963e1cd245e67d6a9ffb8447fc88fb6eaa370)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Allow setting zero MAC address for VF

Currently, there is no logic that allows a VF's MAC address to be removed
from the RAR table.

Allow the user to specify a zero MAC address in order to clear the VF's
MAC address from the RAR table. This functionality is also utilized by
libvirt when removing VFs.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 27bdc44cdb2a8d96322d5978895eaae881fb8c2d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: fix size of queue stats length

IXGBEVF_QUEUE_STATS_LEN is based on ixgebvf_stats, not ixgbe_stats.

This change fixes a bug where ethtool -S displayed some empty fields.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f87fc44770f54ff1b54d44ae9cec11f10efeca02)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: clean macvlan MAC filter table on VF reset

Flush the macvlan filters on VF reset to avoid conflict with other VFs that
may end up using the same MAC address.

The main change here is the call to ixgbe_set_vf_macvlan() with index 0.

Moved ixgbe_set_vf_macvlan() in front of ixgbe_vf_reset_event() to avoid
adding a prototype.

Reported-by: Sritej Kanakadandi Sritej Rama <skanakad@cisco.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit e251ecf75226d5ff772df1ef170b8b981308dc68)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Acquire PHY semaphore before device reset

A recent firmware change fixed an issue to acquire the PHY semaphore before
accessing PHY registers. This led to a case where SW can issue a device
reset clearing the MDIO registers. This patch makes SW acquire the PHY
semaphore before issuing a device reset.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 6133406be1aabfb041f024109efc41756970800e)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Fix output from ixgbe_dump

I just found that when we had changed the Rx path to check for length
instead of the DD bit we introduced an issue in ixgbe_dump since we were no
longer clearing the status bits.

To correct this I am updating ixgbe_dump to look for the length bits in the
descriptor since that is what we are using in the Rx path.

Fixes: c3630cc40b4f ("ixgbe: Use length to determine if descriptor is done")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 18a8cc9815746b8f0ae6f78733877d3846058d1c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: add check for VETO bit when configuring link for KR

We did not have a check in place for MMNGC.MNG_VETO when setting up link
on X550EM_X KR devices which resulted in link loss for the BMC when
loading the driver.

This patch adds a check for ixgbe_check_reset_blocked() in setup_link()
since in that case there is no PHY reset function.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f4a6374ba46132896154397ce3c559ccb0d15e60)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Remove unused define

Remove the Marvell 1145 PHY define as we have never had a device that
supports it and have no plan to in the future. The existence of this
define has caused confusing on whether or not this PHY was supported
by ixgbe.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 7ee814d7a6c449e2e96f76f1acb2b7d47dab108c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: do not use adapter->num_vfs when setting VFs via module parameter

Avoid setting adapter->num_vfs early in the init code path when
using the max_vfs module parameter by passing it to ixgbe_enable_sriov()
as a function parameter.

This fixes an issue where if we failed to allocate vfinfo in
__ixgbe_enable_sriov() the driver will crash with NULL pointer in
ixgbe_disable_sriov() when attempting to free the vfinfo struct based
on adapter->num_vfs. Also it cleans up the assignment of adapter->num_vfs
since now it will only be set in __ixgbe_enable_sriov() and cleared in
ixgbe_disable_sriov().

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 5c11f00ddac2c030827cdecf9c2d3678cbd3137b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: return early instead of wrap block in if statement

Since we exit at the end of the block, we can save a level of
indentation by performing an early return, and make the next several
sections of code more legible, with fewer 80 character line breaks.

Also moved allocating vfinfo at the beginning and the notification
for enabling SRIOV at the end of the function when we know that it
will succeed.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit da614d042ac236e5db52c56c7d7d8accd325dd40)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: move num_vfs_macvlans allocation into separate function

Move the code allocating memory for list of MAC addresses that
the VFs can use for MACVLAN into its own function.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2bc0972988c770dce093584ffd641856e3b18c5c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: add default setup_link for x550em_a MAC type

Add default setting for mac->ops.setup_link on x550em_a MAC types.
This fixes a link issue on KR parts.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 5b9d3cfb6b362a407d0c1ced588b962c5c934f24)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: list X553 backplane speeds correctly

We forgot to indicate some of the supported speed on the X553
backplane. This patch attempts to correct for that.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 8e5c9c534f7da526529124d43b3a87b245587f92)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Add X552 XFI backplane support

This patch add support for X552 XFI backplane interface. The XFI
backplane requires a custom tuned link. HW/FW owns the link config
for XF backplane and SW must not interfere with it.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 18e01ee75f4533cddd774b8618e20d26d7d0d958)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Complete support for X553 sgmii

The initial patches supporting X553 sgmii forgot some details. This patch
should cover those missing spots.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 18bda0d93b633a238cc4df257e39c36ea8b1dccf)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Remove driver config for KX4 PHY

The KX4 PHY is configured by the NVM. Currently, the driver is overwriting
the config; remove the code associated with KX4 configuration.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Orabug: 26242766
(cherry picked from commit 5fbf5addad987028d31d69863c7ec8ed776562cb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Remove pr_cont uses

As pr_cont output can be interleaved by other processes,
using pr_cont should be avoided where possible.

Miscellanea:

- Use a temporary pointer to hold the next descriptions and
  consolidate the pr_cont uses
- Use the temporary buffer to hold the 8 u32 register values and
  emit those in a single go
- Coalesce formats and logging neatening around those changes
- Fix a defective output for the rx ring entry description when
  also emitting rx_buffer_info data

This reduces overall object size a tiny bit too.

$ size drivers/net/ethernet/intel/ixgbe/*.o*
   text    data     bss     dec     hex filename
  62167     728      12   62907    f5bb drivers/net/ethernet/intel/ixgbe/ixgbe_main.o.new
  62273     728      12   63013    f625 drivers/net/ethernet/intel/ixgbe/ixgbe_main.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 332f235836082fe7d3d890409ed6a20e0ea0d923)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Avoid Tx hang by not allowing more than the number of VFs supported.

When DCB is enabled, add checks to ensure creation of number of VF's is
valid based on the traffic classes configured by the device.

Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Tested-by: Ronald Bynoe <ronald.j.bynoe@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit b5d8acbb8781269cd4e2b986c9b0b884c0ed836a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines

On architectures that have a cache line size larger than 64 Bytes we start
running into issues where the amount of headroom for the frame starts
shrinking.

The size of skb_shared_info on a system with a 64B L1 cache line size is
320.  This increases to 384 with a 128B cache line, and 512 with a 256B
cache line.

In addition the NET_SKB_PAD value increases as well consistent with the
cache line size.  As a result when we get to a 256B cache line as seen on
the s390 we end up 768 bytes used by padding and shared info leaving us
with only 1280 bytes to use for data storage.  On architectures such as
this we should default to using 3K Rx buffers out of a 8K page instead of
trying to do 1.5K buffers out of a 4K page.

To take all of this into account I have added one small check so that we
compare the max_frame to the amount of actual data we can store.  This was
already occurring for igb, but I had overlooked it for ixgbe as it doesn't
have strict limits for 82599 once we enable jumbo frames.  By adding this
check we will automatically enable 3K Rx buffers as soon as the maximum
frame size we can handle drops below the standard Ethernet MTU.

I also went through and fixed one small typo that I found where I had left
an IGB in a variable name due to a copy/paste error.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit c74042f3b3ca982652af99cad85252a2655c6064)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: update the rss key on h/w, when ethtool ask for it

Currently ixgbe_set_rxfh() updates the rss_key copy in the driver
memory, but does not push the new value into the h/w. This commit
add a new helper for the latter operation and call it in
ixgbe_set_rxfh(), so that the h/w rss key value can be really
updated via ethtool.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit d3aa9c9f212a729e46653d4c1eb6a9ab190efe3a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Don't bother clearing buffer memory for descriptor rings

This patch makes it so that we don't need to bother with clearing the
memory out for the descriptor rings.  The general idea is to only free
buffers associated with buffers in use which are located between the
next_to_clean and next_to_use or next_to_alloc values.  Everything outside
of those regions can be safely ignored since they should have no buffers
associated with them.

The advantage to doing things this way is that is should speed up bring-up
and tear-down of the rings.  Specifically we can avoid the 512 or more
cycles required to memset the rings in tear-down.  In the bring-up phase we
then clear the memory as a part of initialization.  The general idea is
that the clearing in initialization can act as a prefetch of sorts for the
buffer info structures so they are in the local CPU when we go to populate
them.  This should help to improve overall time needed to perform a
suspend/resume.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit ffed21bcee7a544f99a9c9b18c23b361a0b1e476)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Add private flag to control buffer mode

Since there are potential drawbacks to the new Rx allocation approach I
thought it best to add a "chicken bit" so that we can turn the feature off
if in the event that a problem is found.

It also provides a means of validating the legacy Rx path in the event that
we are forced to fall back. At some point in the future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2ccdf26ff614dd49b14e76c0c076f5f4e9562e79)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Add support for padding packet

This patch adds support for providing a buffer with headroom and tailroom
to allow for shared info, NET_SKB_PAD, and NET_IP_ALIGN. With this
combined with the DMA changes we can start using build_skb to build frames
around an incoming Rx buffer instead of having to memcpy the headers.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2de6aa3a666e63699978f81d0d5523e7e0778f7b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Use length to determine if descriptor is done

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit c3630cc40b4f0fe004e21f19bfb5cd2231c105f8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE

In order to support build_skb with jumbo frames it will be necessary to use
3K buffers for the Rx path with 8K pages backing them. This is needed on
architectures that implement 4K pages because we can't support 2K buffers
plus padding in a 4K page.

In the case of systems that support page sizes larger than 4K the 3K
attribute will only be applied to FCoE as we can fall back to using just 2K
buffers and adding the padding.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug:26242766
(cherry picked from commit 4f4542bfb3b539bef118578ffafcc98e4ce91979)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Only DMA sync frame length

On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f215af8cae4c283d8a522ea166d94f763dc4aebf)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Update version to reflect added functionality

Update the driver version to reflect the new devices that it
supports.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 1733284d02e21ec256f10794109d8c39c3c1b0f8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: prefix Data Center Bridge ops struct

Since dcbnl_ops is global, it should be prefixed by ixgbe_

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3f40c74ccef0a0bc8cdc52105e1ac712e8e32868)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Support 2.5Gb and 5Gb speed

Though not advertised through ethtool, if the link partner advertises a
2.5Gb or 5Gb connection, and the adapter supports it, allow the speed to be
used.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 1dc0eb75a8f88e37c2aee75fca0313cd6e30a1e1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: get rid of custom busy polling code

In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.

Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26242766
(cherry picked from commit 508aac6dee025f93eab1e806d20762ea6327b43d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: get rid of custom busy polling code

In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.

Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26242766
(cherry picked from commit 3ffc1af576550ec61d35668485954e49da29d168)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Add PF support for VF promiscuous mode

This patch extends the xcast mailbox message to include support for
unicast promiscuous mode. To allow a VF to enter this mode the PF
must be in promiscuous mode.

A later patch will add the support needed in the VF driver (ixgbevf)

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 07eea570acccbc0f9402357d652868571fdbb2b9)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: Add support for VF promiscuous mode

This patch extends the mailbox message to allow for VF promiscuous
mode support.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 41e544cdad0bd669600825d8de73c8f420640bf9)
Chunk in ixgbevf_main.c deleted due to dependencies
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Implement support for firmware-controlled PHYs

Implement support for devices that have firmware-controlled PHYs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
Remove GENEVE define in ixgbe_main.c
(cherry picked from commit b3eb4e1860f3595431f74064870c36da295a9fbe)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Implement firmware interface to access some PHYs

Implement new interface for firmware commands to access some PHYs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 12c78ef0982201463f87494bedf289c094b24853)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Remove unused firmware version functions and method

The firmware version method and functions are not used anywhere, so
remove them all.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit d2a10ae72ed61759638ec06927bc9850c278d310)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Fix issues with EEPROM access

There are two problems with EEPROM access. One is that it needs to
hold the semaphore until the entire response is read or else the
response can be corrupted by other firmware accesses. The second
problem is that acquiring and releasing the semaphore is slow, so
it should be taken and released once when multiple EEPROM accesses
will be done.

Both of these issues can be solved by adding a new function,
ixgbe_hic_unlocked, to issue firmware commands that will assume
that the caller has acquired the needed semaphore.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3efa9ed260ce838976eb9177bae7249caf7a2aa1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Configure advertised speeds correctly for KR/KX backplane

This patch ensures that the advertised link speeds are configured
for X553 KR/KX backplane. Without this patch the link remains at
1G when resuming from low power after being downshifted by LPLU.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 54f6d4c42451dbd2cc7e0f0bd8fc3eddcab511fe)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: restore hw_addr on resume or error

Restore adapter->hw.hw_addr after handling an error, or a resume
operation to make sure we can access the registers.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 26403b7fde65ecfc74f5571e3cfa602dc9b3d6cb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Fix incorrect bitwise operations of PTP Rx timestamp flags

Rx timestamp does not work on 82599 and X540 because bitwise operation
of RX_HWTSTAMP flags is incorrect and ixgbe_ptp_rx_hwtstamp() is never
called. This patch fixes it to enable Rx timestamp on 82599 and X540.

Without this fix:
ptp4l[278.730]: selected /dev/ptp8 as PTP clock
ptp4l[278.733]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.733]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.834]: port 1: received SYNC without timestamp
ptp4l[278.835]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[279.834]: port 1: received SYNC without timestamp
ptp4l[280.834]: port 1: received SYNC without timestamp
ptp4l[281.834]: port 1: received SYNC without timestamp
ptp4l[282.834]: port 1: received SYNC without timestamp
ptp4l[282.835]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[282.835]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[283.834]: port 1: received SYNC without timestamp

With this fix:
ptp4l[239.154]: selected /dev/ptp8 as PTP clock
ptp4l[239.157]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[239.157]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[240.989]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[244.989]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[244.989]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[246.977]: master offset -899583339542096 s0 freq      +0 path delay     16222
ptp4l[247.977]: master offset -899583339617265 s1 freq  -75169 path delay     16177
ptp4l[248.977]: master offset       -130 s2 freq  -75299 path delay     16177
ptp4l[248.977]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[249.977]: master offset         -9 s2 freq  -75217 path delay     16177
ptp4l[250.977]: master offset         88 s2 freq  -75123 path delay     16132

Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Yusuke Suzuki <yus-suzuki@uf.jp.nec.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit aeb4c73100be8aade8a1189b50bd226b709ca8bb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: fix AER error handling

Make sure that we free the IRQs in ixgbevf_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.

Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbevf_remove() after a failed recovery from
AER error because the interrupts were not freed.

Also moved the down and free functions into ixgbevf_close_suspend()
same as with ixgbe.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit b19cf6eea9e2a497e6475fd02c0703f0b3a6d083)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: fix AER error handling

Make sure that we free the IRQs in ixgbe_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.

Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbe_remove() after a failed recovery from
AER error because the interrupts were not freed.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 126db13fa0e6d05c9f94e0125f61e773bd5ab079)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: test for trust in macvlan adjustments for VF

There are two methods for setting mac addresses in a Macvlan, that
differentiate themselves in the function macvlan_set_mac_Address.
If the macvlan mode is passthru, then we use the dev_set_mac_address
method, otherwise we use the dev_uc api via macvlan_sync_addresses.
The latter method (which would stem from using any non-passthru mode,
like bridge, or vepa), calls down into the driver in a path that terminates
in ixgbevf_set_uc_addr_vf, which sends a IXGBE_VF_SET_MACVLAN message,
which causes the pf to spawn the noted error message. This occurs because
it appears that the guest is trying to delete the mac address of the macvlan
before adding another.

The other path in macvlan_set_mac_address uses dev_set_mac_address, which
calls into ixgbevf_set_mac which uses the IXGBE_VF_SET_MAC_ADDR to the
pf to set the macvlan mac address.

The discrepancy here is in the handlers. The handler function for
IXGBE_VF_SET_MAC_ADDR (ixgbe_set_vf_mac_addr) has a check for
the vfinfo[].trusted bit to allow the operation if the vf is trusted.
In comparison, the IXGBE_VF_SET_MACVLAN message handler
(ixgbe_set_vf_macvlan_msg) has no such check of the trusted bit.

Signed-off-by: Ken Cox <jkc@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit a9d2d53a788a9c5bc8a7d1b4ea7857b68e221357)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: handle race between close and suspend on shutdown

When an interface is part of a namespace it is possible that
ixgbevf_close() may be called while ixgbevf_suspend() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs()

To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and check for !netif_device_present()
to avoid entering close while in suspend.

Also added rtnl locks to ixgbevf_queue_reset_subtask().

CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2dad7b2775ea030c898fe4946971edd25af237d1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: handle close/suspend race with netif_device_detach/present

When an interface is part of a namespace it is possible that
ixgbe_close() may be called while __ixgbe_shutdown() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs().

To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and ixgbe_clear_interrupt_scheme()
in __ixgbe_shutdown() and check for netif_device_present()
to avoid clearing the interrupts second time in ixgbe_close();

Also extend the rtnl lock in ixgbe_resume() to netif_device_attach().

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f7f37e7ff2b9b7eff7fbd035569cab35896869a3)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Fix reporting of 100Mb capability

BaseT adapters that are capable of supporting 100Mb are not reporting this
capability. This patch corrects the reporting so that 100Mb is shown as
supported on those adapters.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit f215266470dfe86196a31fe0725a86cea77f9a18)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Reduce I2C retry count on X550 devices

A retry count of 10 is likely to run into problems on X550 devices that
have to detect and reset unresponsive CS4227 devices. So, reduce the I2C
retry count to 3 for X550 and above. This should avoid any possible
regressions in existing devices.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 3f0d646b720d541309b11e190db58086f446f41e)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Add bounds check for x540 LED functions

This is an extension of commit 003287e0f087 ("ixgbevf: Correct parameter
sent to LED function"); add bounds checking to x540 functions to ensure the
index is valid.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 910c9c0f59567ec204924d88ca04337bb04f17d9)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: add mask for 64 RSS queues

The indirection table was reported incorrectly for X550 and newer
where we can support up to 64 RSS queues.

Reported-by Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 2bf1a87b903bd81b1448a1cef73de59fb6c4d340)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Fix check for ixgbe_phy_x550em_ext_t reset

The generic PHY reset check we had previously is not sufficient for the
ixgbe_phy_x550em_ext_t PHY type. Check 1.CC02.0 instead - same as
ixgbe_init_ext_t_x550().

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 5c092749e304d8b49567be633d5be31393538e3b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Report driver version to firmware for x550 devices

Some x550 devices require the driver version reported to its firmware; this
patch sends the driver version string to the firmware through the host
interface command for x550 devices.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit cb8e051446ae554aae38163d3421edc793221784)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: do not disable FEC from the driver

FEC is configured by the NVM and the driver should not be
overriding it.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26242766
(cherry picked from commit 1fe954b2097bb907b4578e6a74e4c1d23785a601)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

Merge remote-tracking branch 'origin/topic/uek-4.1/dtrace' into uek/uek-4.1-next

Documentation/sparc: Steps for sending break on sunhv console

Documented the steps for sending break on sunhv console.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4cfe140618b99e653134598de9f18b48743549ec)

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Orabug: 24297427

sparc64: Send break twice from console to return to boot prom

Now we can also jump to boot prom from sunhv console by sending
break twice on console for both running and panicked kernel
cases.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7db60d05e5ccc0a473fa2275f90f2fca0002ab21)

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Orabug: 24297427

sparc64: Migrate hvcons irq to panicked cpu

On panic, all other CPUs are stopped except the one which had
hit panic. To keep console alive, we need to migrate hvcons irq
to panicked CPU.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7dd4fcf5b70694dc961eb6b954673e4fc9730dbd)

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Orabug: 24297427

sparc64: Set cpu state to offline when stopped

CPU needs to be marked offline before stopping it. When not marked
offline, the xcall receives HV_EWOULDBLOCK and so assumes that not all
CPUs received the message, and retries. After 10000 retries, it finally
fails with fatal mondo timeout.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cffb3e76818fee4763a2ce5f2b1eca2d7885e2cf)

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Orabug: 24297427

uek-rpm/config: build tcmu kernel module by default

Orabug: 26269847

Since the tcmu feature support for the iscsi target is added by below
bug fixes build the tcmu kernel module by modifying default kernel
configuration files.

The fixes which add tcmu feature to the iscsi target:
Orabug: 25983319/25983379/25791789
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Shan Hai <shan.hai@oracle.com>

restore mutex_lock() call to blk_mq_freeze_queue_start()

Orabug: 26266901

Over-eager editing for conflict resolution in merge commit
c311ca8a3d9349dfc606cada0d8ca14e58728c9c ("Merge tag 'v4.1.12-70#22913653a12'
into pmem-4.1-merge") resulted in the accidental deletion of two calls to
mutex_lock(). Since the matching calls to mutex_unlock() were correctly left
in the source, this produced a lock imbalance, detectable via lockdep.

This commit restores the missing mutex_lock() calls.

Signed-off-by: Dan Duval <dan.duval@oracle.com>

md/raid5: don't index beyond end of array in need_this_block().

When need_this_block probably shouldn't be called when there
are more than 2 failed devices, we really don't want it to try
indexing beyond the end of the failed_num[] of fdev[] arrays.

So limit the loops to at most 2 iterations.

Reported-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.de>
(cherry picked from commit 36707bb2e7c6730d79d6cdc6d1475d3d7e94c518)
Orabug: 26047272
Signed-off-by: Fred Herard <fred.herard@oracle.com>
Reviewed-by: John Sobecki <john.sobecki@oracle.com>

dtrace: io provider probes for nfs

DTrace io provider probes are added for NFS read and write requests.

Orabug: 26242655

Signed-off-by: Nicolas Droux <nicolas.droux@oracle.com>
Acked-by: Saar Maoz <Saar.Maoz@oracle.com>
Acked-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>