]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
5 years agoselftests/powerpc: Add wrapper for gettid
Sandipan Das [Mon, 27 Jul 2020 04:00:39 +0000 (09:30 +0530)]
selftests/powerpc: Add wrapper for gettid

The gettid() syscall wrapper was first introduced in
glibc 2.30. This adds a wrapper for use in distros
running older versions.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8ca3b0eeda989707815d1cf337cc33f090408965.1595821792.git.sandipan@linux.ibm.com
5 years agoselftests/powerpc: Add helper to exit on failure
Sandipan Das [Mon, 27 Jul 2020 04:00:38 +0000 (09:30 +0530)]
selftests/powerpc: Add helper to exit on failure

This adds a helper similar to FAIL_IF() which lets a
program exit with code 1 (to indicate failure) when
the given condition is true.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dac282d5c2e96e7816dc522e4e20d56d7c79c898.1595821792.git.sandipan@linux.ibm.com
5 years agoselftests/powerpc: Harden test for execute-disabled pkeys
Sandipan Das [Mon, 27 Jul 2020 04:00:37 +0000 (09:30 +0530)]
selftests/powerpc: Harden test for execute-disabled pkeys

Commit 192b6a7805989 ("powerpc/book3s64/pkeys: Fix
pkey_access_permitted() for execute disable pkey") fixed a
bug that caused repetitive faults for pkeys with no execute
rights alongside some combination of read and write rights.

This removes the last two cases of the test, which check
the behaviour of pkeys with read, write but no execute
rights and all the rights, in favour of checking all the
possible combinations of read, write and execute rights
to be able to detect bugs like the one mentioned above.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/db467500f8af47727bba6b35796e8974a78b71e5.1595821792.git.sandipan@linux.ibm.com
5 years agoselftests/powerpc: Add pkey helpers for rights
Sandipan Das [Mon, 27 Jul 2020 04:00:36 +0000 (09:30 +0530)]
selftests/powerpc: Add pkey helpers for rights

This adds some new pkey-related helper to print
access rights of a pkey in the "rwx" format and
to generate different valid combinations of pkey
rights starting from a given combination.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6cc1c7d1f686618668a3e090f1d0c2a4cd9dea3f.1595821792.git.sandipan@linux.ibm.com
5 years agoselftests/powerpc: Move pkey helpers to headers
Sandipan Das [Mon, 27 Jul 2020 04:00:35 +0000 (09:30 +0530)]
selftests/powerpc: Move pkey helpers to headers

This moves all the pkey-related helpers to a new header
file and also a helper to print error messages in signal
handlers to the existing utils header file.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/28e633fa9ec1a6500c12188e09ea1887b10a10c1.1595821792.git.sandipan@linux.ibm.com
5 years agopowerpc/pseries: Add KVM guest doorbell restrictions
Nicholas Piggin [Sun, 26 Jul 2020 03:51:55 +0000 (13:51 +1000)]
powerpc/pseries: Add KVM guest doorbell restrictions

KVM guests have certain restrictions and performance quirks when using
doorbells. This patch moves the EPAPR KVM guest test so it can be shared
with PSERIES, and uses that in doorbell setup code to apply the KVM
guest quirks and  improves IPI performance for two cases:

 - PowerVM guests may now use doorbells even if they are secure.

 - KVM guests no longer use doorbells if XIVE is available.

There is a valid complaint that "KVM guest" is not a very reasonable
thing to test for, it's preferable for the hypervisor to advertise
particular behaviours to the guest so they could change if the
hypervisor implementation or configuration changes. However in this case
we were already assuming a KVM guest worst case, so this patch is about
containing those quirks. If KVM later advertises fast doorbells, we
should test for that and override the quirks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-4-npiggin@gmail.com
5 years agopowerpc/pseries: Use doorbells even if XIVE is available
Nicholas Piggin [Sun, 26 Jul 2020 03:51:54 +0000 (13:51 +1000)]
powerpc/pseries: Use doorbells even if XIVE is available

KVM supports msgsndp in guests by trapping and emulating the
instruction, so it was decided to always use XIVE for IPIs if it is
available. However on PowerVM systems, msgsndp can be used and gives
better performance. On large systems, high XIVE interrupt rates can
have sub-linear scaling, and using msgsndp can reduce the load on
the interrupt controller.

So switch to using core local doorbells even if XIVE is available.
This reduces performance for KVM guests with an SMT topology by
about 50% for ping-pong context switching between SMT vCPUs. An
option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
to get KVM performance back.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-3-npiggin@gmail.com
5 years agopowerpc: Inline doorbell sending functions
Nicholas Piggin [Sun, 26 Jul 2020 03:51:53 +0000 (13:51 +1000)]
powerpc: Inline doorbell sending functions

These are only called in one place for a given platform, so inline
them for performance.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
[mpe: Fix build errors related to KVM]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-2-npiggin@gmail.com
5 years agopowerpc/perf: Fix MMCRA_BHRB_DISABLE define for binutils < 2.28
Athira Rajeev [Wed, 29 Jul 2020 04:16:54 +0000 (00:16 -0400)]
powerpc/perf: Fix MMCRA_BHRB_DISABLE define for binutils < 2.28

Commit 9908c826d5ed ("powerpc/perf: Add Power10 PMU feature to DT CPU
features") defines MMCRA_BHRB_DISABLE as `0x2000000000UL`. Binutils
version less than 2.28 doesn't support UL suffix.

  arch/powerpc/kernel/cpu_setup_power.S: Assembler messages:
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: junk at end of line, first unrecognized character is `L'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: junk at end of line, first unrecognized character is `L'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: operand out of range (0x0000002000000000 is not between 0xffffffffffff8000 and 0x000000000000ffff)

Fix this by wrapping it with the `_UL` macro.

Fixes: 9908c826d5ed ("Add Power10 PMU feature to DT CPU features")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1595996214-5833-1-git-send-email-atrajeev@linux.vnet.ibm.com
5 years agopowerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=y
Michael Ellerman [Mon, 27 Jul 2020 03:44:36 +0000 (13:44 +1000)]
powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=y

skiroot_defconfig fails:

arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used
   48 | static atomic_t cpus_in_fadump;

Fix it by moving the definition into the #ifdef where it's used.

Fixes: ba608c4fa12c ("powerpc/fadump: fix race between pstore write and fadump crash trigger")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727070341.595634-1-mpe@ellerman.id.au
5 years agopowerpc/powernv/pci.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:09 +0000 (17:38 -0700)]
powerpc/powernv/pci.h: delete duplicated word

Drop the repeated word "for".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-10-rdunlap@infradead.org
5 years agopowerpc/smu.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:08 +0000 (17:38 -0700)]
powerpc/smu.h: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-9-rdunlap@infradead.org
5 years agopowerpc/reg.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:07 +0000 (17:38 -0700)]
powerpc/reg.h: delete duplicated word

Drop the repeated word "a".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-8-rdunlap@infradead.org
5 years agopowerpc/ppc_asm.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:06 +0000 (17:38 -0700)]
powerpc/ppc_asm.h: delete duplicated word

Drop the repeated word "in".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-7-rdunlap@infradead.org
5 years agopowerpc/hw_breakpoint.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:05 +0000 (17:38 -0700)]
powerpc/hw_breakpoint.h: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-6-rdunlap@infradead.org
5 years agopowerpc/epapr_hcalls.h: delete duplicated words
Randy Dunlap [Sun, 26 Jul 2020 00:38:04 +0000 (17:38 -0700)]
powerpc/epapr_hcalls.h: delete duplicated words

Drop the repeated words "file" and "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-5-rdunlap@infradead.org
5 years agopowerpc/cputime.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:03 +0000 (17:38 -0700)]
powerpc/cputime.h: delete duplicated word

Drop the repeated word "use".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-4-rdunlap@infradead.org
5 years agopowerpc/book3s/radix-4k.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:02 +0000 (17:38 -0700)]
powerpc/book3s/radix-4k.h: delete duplicated word

Drop the repeated word "per".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-3-rdunlap@infradead.org
5 years agopowerpc/book3s/mmu-hash.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:38:01 +0000 (17:38 -0700)]
powerpc/book3s/mmu-hash.h: delete duplicated word

Drop the repeated word "below".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-2-rdunlap@infradead.org
5 years agopowerpc/lib: remove memcpy_flushcache redundant return
Li RongQing [Fri, 26 Apr 2019 11:36:30 +0000 (19:36 +0800)]
powerpc/lib: remove memcpy_flushcache redundant return

Align it with other architectures and none of the callers has
been interested its return

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1556278590-14727-1-git-send-email-lirongqing@baidu.com
5 years agopowerpc/ptdump: Refactor update of pg_state
Christophe Leroy [Mon, 29 Jun 2020 11:17:19 +0000 (11:17 +0000)]
powerpc/ptdump: Refactor update of pg_state

In note_page(), the pg_state is updated the same way in two places.

Add note_page_update_state() to do it.

Also include the display of boundary markers there as it is missing
"no level" leg, leading to a mismatch when the first two markers
are at the same address and the first displayed area uses that
address.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a284a809f01c705bbaab303b06fda216f147a99a.1593429426.git.christophe.leroy@csgroup.eu
5 years agopowerpc/ptdump: Refactor update of st->last_pa
Christophe Leroy [Mon, 29 Jun 2020 11:17:18 +0000 (11:17 +0000)]
powerpc/ptdump: Refactor update of st->last_pa

st->last_pa is always updated in note_page() so it can
be done outside the if/elseif/else block.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/610d6b1a60ad0bedef865a90153c1110cfaa507e.1593429426.git.christophe.leroy@csgroup.eu
5 years agopowerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX
Christophe Leroy [Mon, 29 Jun 2020 11:15:26 +0000 (11:15 +0000)]
powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX

When STRICT_KERNEL_RWX is set, we want to set NX bit on vmalloc
segments. But modules require exec.

Use a dedicated segment for modules. There is not much space
above kernel, and we don't waste vmalloc space to do alignment.
Therefore, we take the segment before PAGE_OFFSET for modules.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eb8faba9148b6cf17c696ba776b4e8ee2f6313bf.1593428200.git.christophe.leroy@csgroup.eu
5 years agopowerpc/32s: Kernel space starts at TASK_SIZE
Christophe Leroy [Mon, 29 Jun 2020 11:15:24 +0000 (11:15 +0000)]
powerpc/32s: Kernel space starts at TASK_SIZE

Kernel space starts at TASK_SIZE. Select kernel page table
when address is over TASK_SIZE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/893425e32cd0a003539573b2d115e0ffa98bc26c.1593428200.git.christophe.leroy@csgroup.eu
5 years agopowerpc/32: Set user/kernel boundary at TASK_SIZE instead of PAGE_OFFSET
Christophe Leroy [Mon, 29 Jun 2020 11:15:23 +0000 (11:15 +0000)]
powerpc/32: Set user/kernel boundary at TASK_SIZE instead of PAGE_OFFSET

User space stops at TASK_SIZE. At the moment, kernel space starts
at PAGE_OFFSET.

In order to use space between TASK_SIZE and PAGE_OFFSET for modules,
make TASK_SIZE the limit between user and kernel space.

Note that fault.c already considers TASK_SIZE as the boundary between
user and kernel space.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b38b52cd8dabbb56fbd6f9219d6f3cdccbb43b44.1593428200.git.christophe.leroy@csgroup.eu
5 years agopowerpc/32s: Only leave NX unset on segments used for modules
Christophe Leroy [Mon, 29 Jun 2020 11:15:22 +0000 (11:15 +0000)]
powerpc/32s: Only leave NX unset on segments used for modules

Instead of leaving NX unset on all segments above the start
of vmalloc space, only leave NX unset on segments used for
modules.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7172c0f5253419315e434a1816ee3d6ed6505bc0.1593428200.git.christophe.leroy@csgroup.eu
5 years agopowerpc: Use MODULES_VADDR if defined
Christophe Leroy [Mon, 29 Jun 2020 11:15:21 +0000 (11:15 +0000)]
powerpc: Use MODULES_VADDR if defined

In order to allow allocation of modules outside of vmalloc space,
use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined.

Redefine module_alloc() when MODULES_VADDR defined.
Unmap corresponding KASAN shadow memory.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7ecf5fff1eef67d450e73fc412b6ec3818483d75.1593428200.git.christophe.leroy@csgroup.eu
5 years agopowerpc/lib: Prepare code-patching for modules allocated outside vmalloc space
Christophe Leroy [Mon, 29 Jun 2020 11:15:20 +0000 (11:15 +0000)]
powerpc/lib: Prepare code-patching for modules allocated outside vmalloc space

Use is_vmalloc_or_module_addr() instead of is_vmalloc_addr()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d884db0e5a6f521331639d8c0f13e520d5a4fef.1593428200.git.christophe.leroy@csgroup.eu
5 years agopowerpc/papr_scm: Make some symbols static
Wei Yongjun [Sat, 25 Jul 2020 09:19:49 +0000 (17:19 +0800)]
powerpc/papr_scm: Make some symbols static

The sparse tool complains as follows:

arch/powerpc/platforms/pseries/papr_scm.c:97:1: warning:
 symbol 'papr_nd_regions' was not declared. Should it be static?
arch/powerpc/platforms/pseries/papr_scm.c:98:1: warning:
 symbol 'papr_ndr_lock' was not declared. Should it be static?

Those variables are not used outside of papr_scm.c, so this
commit marks them static.

Fixes: 85343a8da2d9 ("powerpc/papr/scm: Add bad memory ranges to nvdimm bad ranges")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725091949.75234-1-weiyongjun1@huawei.com
5 years agopowerpc/64s: allow for clang's objdump differences
Bill Wendling [Fri, 24 Jul 2020 22:49:01 +0000 (15:49 -0700)]
powerpc/64s: allow for clang's objdump differences

Clang's objdump emits slightly different output from GNU's objdump,
causing a list of warnings to be emitted during relocatable builds.
E.g., clang's objdump emits this:

   c000000000000004: 2c 00 00 48  b  0xc000000000000030
   ...
   c000000000005c6c: 10 00 82 40  bf 2, 0xc000000000005c7c

while GNU objdump emits:

   c000000000000004: 2c 00 00 48  b    c000000000000030 <__start+0x30>
   ...
   c000000000005c6c: 10 00 82 40  bne  c000000000005c7c <masked_interrupt+0x3c>

Adjust llvm-objdump's output to remove the extraneous '0x' and convert
'bf' and 'bt' to 'bne' and 'beq' resp. to more closely match GNU
objdump's output.

Note that clang's objdump doesn't yet output the relocation symbols on
PPC.

Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/191c67db31264b69cf6b566fd69851beb3dd0abb.1595630874.git.morbo@google.com
5 years agopowerpc: Implement smp_cond_load_relaxed()
Nicholas Piggin [Fri, 24 Jul 2020 13:14:23 +0000 (23:14 +1000)]
powerpc: Implement smp_cond_load_relaxed()

This implements smp_cond_load_relaxed() with the slowpath busy loop
using the preferred SMT priority pattern.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
[mpe: Make it 64-bit only to fix build errors on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-7-npiggin@gmail.com
5 years agopowerpc/qspinlock: Optimised atomic_try_cmpxchg_lock() that adds the lock hint
Nicholas Piggin [Fri, 24 Jul 2020 13:14:22 +0000 (23:14 +1000)]
powerpc/qspinlock: Optimised atomic_try_cmpxchg_lock() that adds the lock hint

This brings the behaviour of the uncontended fast path back to roughly
equivalent to simple spinlocks -- a single atomic op with lock hint.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-6-npiggin@gmail.com
5 years agopowerpc/pseries: Implement paravirt qspinlocks for SPLPAR
Nicholas Piggin [Fri, 24 Jul 2020 13:14:21 +0000 (23:14 +1000)]
powerpc/pseries: Implement paravirt qspinlocks for SPLPAR

This implements the generic paravirt qspinlocks using H_PROD and
H_CONFER to kick and wait.

This uses an un-directed yield to any CPU rather than the directed
yield to a pre-empted lock holder that paravirtualised simple
spinlocks use, that requires no kick hcall. This is something that
could be investigated and improved in future.

Performance results can be found in the commit which added queued
spinlocks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-5-npiggin@gmail.com
5 years agopowerpc/64s: Implement queued spinlocks and rwlocks
Nicholas Piggin [Fri, 24 Jul 2020 13:14:20 +0000 (23:14 +1000)]
powerpc/64s: Implement queued spinlocks and rwlocks

These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.

Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).

On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:

https://github.com/linuxppc/issues/issues/305#issuecomment-663487453

Observations are:

- Queued spinlocks are equal when contention is insignificant, as
  expected and as measured with microbenchmarks.

- When there is contention, on bare metal queued spinlocks have better
  throughput and max latency at all points.

- When virtualised, queued spinlocks are slightly worse approaching
  peak throughput, but significantly better throughput and max latency
  at all points beyond peak, until queued spinlock maximum latency
  rises when clients are 2x vCPUs.

The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com
5 years agopowerpc: Move spinlock implementation to simple_spinlock
Nicholas Piggin [Fri, 24 Jul 2020 13:14:19 +0000 (23:14 +1000)]
powerpc: Move spinlock implementation to simple_spinlock

To prepare for queued spinlocks. This is a simple rename except to
update preprocessor guard name and a file reference.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-3-npiggin@gmail.com
5 years agopowerpc/pseries: Move some PAPR paravirt functions to their own file
Nicholas Piggin [Fri, 24 Jul 2020 13:14:18 +0000 (23:14 +1000)]
powerpc/pseries: Move some PAPR paravirt functions to their own file

These functions will be used by the queued spinlock implementation,
and may be useful elsewhere too, so move them out of spinlock.h.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-2-npiggin@gmail.com
5 years agopowerpc/numa: Limit possible nodes to within num_possible_nodes
Srikar Dronamraju [Fri, 24 Jul 2020 10:58:09 +0000 (16:28 +0530)]
powerpc/numa: Limit possible nodes to within num_possible_nodes

MAX_NUMNODES is a theoretical maximum number of nodes thats is
supported by the kernel. Device tree properties exposes the number of
possible nodes on the current platform. The kernel would detected this
and would use it for most of its resource allocations. If the platform
now increases the nodes to over what was already exposed, then it may
lead to inconsistencies. Hence limit it to the already exposed nodes.

Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724105809.24733-1-srikar@linux.vnet.ibm.com
5 years agomacintosh/via-macii: Clarify definition of macii_init()
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Clarify definition of macii_init()

The function prototype correctly specifies the 'static' storage class.
Let the function definition match the declaration for better readability.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c025aed0b1506399b73ff1d1bfa40ed641fcb3e3.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Use the stack for reset request storage
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Use the stack for reset request storage

The adb_request struct can be stored on the stack because the request
is synchronous and is completed before the function returns.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a40f80dde90991757007b6962c386a208c970586.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Use unsigned type for autopoll_devs variable
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Use unsigned type for autopoll_devs variable

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ca5be30ba745c08c2b7a1f0618f99c61b303e983.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Use bool type for reading_reply variable
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Use bool type for reading_reply variable

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/779551219a11b19e574dfcd87e4ef60af08c4fc3.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Handle poll replies correctly
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Handle poll replies correctly

Userspace applications may use /dev/adb to send Talk requests. Such
requests always have req->reply_expected == 1. The same is true of Talk
requests sent by the kernel, except for poll requests queued internally
by the via-macii driver. Those requests have req->reply_expected == 0.

Consequently, poll reply packets get treated like autopoll reply packets.
(It doesn't make sense to try to distinguish them.) Always enter 'reading'
state after a poll request, so that the reply gets collected and passed
to adb_input(), and none go missing.

All Talk replies passed to adb_input() come from polling or autopolling,
so call adb_input() with the autopoll parameter set to 1.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/754cddfa045e5bfa53e5da199831de02e7d2f27f.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Remove read_done state
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Remove read_done state

The driver state machine may enter the 'read_done' state when leaving the
'idle' or 'reading' state. This transition is pointless, as is the extra
interrupt it requires. The interrupt is produced by the transceiver
(even when it has no data to send) because an extra EVEN/ODD toggle
was signalled by the driver. Drop the extra state to simplify the code.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0253194363af4426f9788796811a6a29fb87c713.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Handle /CTLR_IRQ signal correctly
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Handle /CTLR_IRQ signal correctly

I'm told that the /CTLR_IRQ signal from the ADB transceiver gets
interpreted by MacOS to mean SRQ, bus timeout or end-of-packet depending
on the circumstances, and that Linux's via-macii driver does not
correctly interpret this signal.

Instead, the via-macii driver interprets certain received byte values
(0x00 and 0xFF) as signalling end of packet and bus timeout
(respectively). Problem is, those values can also appear under other
circumstances.

This patch changes the bus timeout, end of packet and SRQ detection logic
to bring it closer to the logic that MacOS reportedly uses.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6541fda1d8db3ae87c3abe17d189a10dc96e2382.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Poll the device most likely to respond
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Poll the device most likely to respond

Poll the most recently polled device by default, rather than the lowest
device address that happens to be enabled in autopoll_devs. This improves
input latency. Re-use macii_queue_poll() rather than duplicate that logic.
This eliminates a static struct and function.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5836f80886ebcfbe5be5fb7e0dc49feed6469712.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/via-macii: Access autopoll_devs when inside lock
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Access autopoll_devs when inside lock

The interrupt handler should be excluded when accessing the autopoll_devs
variable.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5952dd8a9bc9de90f1acc4790c51dd42b4c98065.1593318192.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Implement SRQ autopolling
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Implement SRQ autopolling

The adb_driver.autopoll method is needed during ADB bus scan and device
address assignment. Implement this method so that the IOP's list of
device addresses can be updated. When the list is empty, disable SRQ
autopolling.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0fb7fdcd99d7820bb27faf1f27f7f6f1923914ef.1590880623.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Implement sending -> idle state transition
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Implement sending -> idle state transition

On leaving the 'sending' state, proceed to the 'idle' state if no reply is
expected. Drop redundant test for adb_iop_state == sending && current_req.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6991996dd4aaf0b52cfd650172bf0f6fbe37a452.1590880623.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Implement idle -> sending state transition
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Implement idle -> sending state transition

In the present algorithm, the 'idle' state transition does not take
place until there's a bus timeout. Once idle, the driver does not
automatically proceed with the next request.

Change the algorithm so that queued ADB requests will be sent as soon as
the driver becomes idle. This is to take place after the current IOP
message is completed.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dedcdfc62f43e85cc4c2a8d211a7e2fec7bc7c1a.1590880623.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Resolve static checker warnings
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Resolve static checker warnings

drivers/macintosh/adb-iop.c:215:28: warning: Using plain integer as NULL pointer
drivers/macintosh/adb-iop.c:170:5: warning: symbol 'adb_iop_probe' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:177:5: warning: symbol 'adb_iop_init' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:184:5: warning: symbol 'adb_iop_send_request' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:230:5: warning: symbol 'adb_iop_autopoll' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:236:6: warning: symbol 'adb_iop_poll' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:241:5: warning: symbol 'adb_iop_reset_bus' was not declared. Should it be static?

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/25edf4450abd20e002b166ba3a11189dc1efa906.1590880623.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Access current_req and adb_iop_state when inside lock
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Access current_req and adb_iop_state when inside lock

Drop the redundant local_irq_save/restore() from adb_iop_start() because
the caller has to do it anyway. This is the pattern used in via-macii.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbe32b087c7e04d68e2425f6a2df4a414d167c32.1590880623.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Adopt bus reset algorithm from via-macii driver
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Adopt bus reset algorithm from via-macii driver

This algorithm is slightly shorter and avoids the surprising
adb_iop_start() call in adb_iop_poll().

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b63d56ecb6e75f11a0bf02231f3b2db656a528a3.1590880623.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Correct comment text
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Correct comment text

This patch improves comment style and corrects some misunderstandings
in the text.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/996f835d2f3d90baaaf9ee954e252d06e8886c6f.1590880623.git.fthain@telegraphics.com.au
5 years agomacintosh/adb-iop: Remove dead and redundant code
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Remove dead and redundant code

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7720ffb559c334504e16b24d9c2f3b8973d2d674.1590880623.git.fthain@telegraphics.com.au
5 years agopowerpc/perf: Initialize power10 PMU registers in cpu setup routine
Athira Rajeev [Thu, 23 Jul 2020 07:32:37 +0000 (03:32 -0400)]
powerpc/perf: Initialize power10 PMU registers in cpu setup routine

Initialize Monitor Mode Control Register 3 (MMCR3)
SPR which is new in power10. For PowerISA v3.1, BHRB disable
is controlled via Monitor Mode Control Register A (MMCRA) bit,
namely "BHRB Recording Disable (BHRBRD)". This patch also initializes
MMCRA BHRBRD to disable BHRB feature at boot for power10.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1595489557-2047-1-git-send-email-atrajeev@linux.vnet.ibm.com
5 years agopowerpc/powernv/sriov: Remove vfs_expanded
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:15 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Remove vfs_expanded

Previously iov->vfs_expanded was used for two purposes.

1) To work out how much we need to multiple the per-VF BAR size to figure
   out the total space required for the IOV BAR.

2) To indicate that IOV is not usable with this device (vfs_expanded == 0).

We don't really need the field for either since the multiple in 1) is
always the number PEs supported by the PHB. Similarly, we don't really need
it in 2) either since the IOV data field will be NULL if we can't use IOV
with the device.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-16-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Make single PE mode a per-BAR setting
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:14 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Make single PE mode a per-BAR setting

Using single PE BARs to map an SR-IOV BAR is really a choice about what
strategy to use when mapping a BAR. It doesn't make much sense for this to
be a global setting since a device might have one large BAR which needs to
be mapped with single PE windows and another smaller BAR that can be mapped
with a regular segmented window. Make the segmented vs single decision a
per-BAR setting and clean up the logic that decides which mode to use.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-15-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Refactor M64 BAR setup
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:13 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Refactor M64 BAR setup

Split up the logic so that we have one branch that handles setting up a
segmented window and another that handles setting up single PE windows for
each VF.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-14-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Move M64 BAR allocation into a helper
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:12 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Move M64 BAR allocation into a helper

I want to refactor the loop this code is currently inside of. Hoist it on
out.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-13-oohall@gmail.com
5 years agopowerpc/powernv/sriov: De-indent setup and teardown
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:11 +0000 (16:57 +1000)]
powerpc/powernv/sriov: De-indent setup and teardown

Remove the IODA2 PHB checks. We already assume IODA2 in several places so
there's not much point in wrapping most of the setup and teardown process
in an if block.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-12-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Drop iov->pe_num_map[]
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:10 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Drop iov->pe_num_map[]

Currently the iov->pe_num_map[] does one of two things depending on
whether single PE mode is being used or not. When it is, this contains an
array which maps a vf_index to the corresponding PE number. When single PE
mode is not being used this contains a scalar which is the base PE for the
set of enabled VFs (for for VFn is base + n).

The array was necessary because when calling pnv_ioda_alloc_pe() there is
no guarantee that the allocated PEs would be contigious. We can now
allocate contigious blocks of PEs so this is no longer an issue. This
allows us to drop the if (single_mode) {} .. else {} block scattered
through the SR-IOV code which is a nice clean up.

This also fixes a bug in pnv_pci_sriov_disable() which is the non-atomic
bitmap_clear() to manipulate the PE allocation map. Other users of the map
assume it will be accessed with atomic ops.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-11-oohall@gmail.com
5 years agopowerpc/powernv/pci: Refactor pnv_ioda_alloc_pe()
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:09 +0000 (16:57 +1000)]
powerpc/powernv/pci: Refactor pnv_ioda_alloc_pe()

Rework the PE allocation logic to allow allocating blocks of PEs rather
than individually. We'll use this to allocate contigious blocks of PEs for
the SR-IOVs.

This patch also adds code to pnv_ioda_alloc_pe() and pnv_ioda_reserve_pe() to
use the existing, but unused, phb->pe_alloc_mutex. Currently these functions
use atomic bit ops to release a currently allocated PE number. However,
the pnv_ioda_alloc_pe() wants to have exclusive access to the bit map while
scanning for hole large enough to accomodate the allocation size.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-10-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Factor out M64 BAR setup
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:08 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Factor out M64 BAR setup

The sequence required to use the single PE BAR mode is kinda janky and
requires a little explanation. The API was designed with P7-IOC style
windows where the setup process is something like:

1. Configure the window start / end address
2. Enable the window
3. Map the segments of each window to the PE

For Single PE BARs the process is:

1. Set the PE for segment zero on a disabled window
2. Set the range
3. Enable the window

Move the OPAL calls into their own helper functions where the quirks can be
contained.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-9-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Simplify used window tracking
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:07 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Simplify used window tracking

No need for the multi-dimensional arrays, just use a bitmap.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-8-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Rename truncate_iov
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:06 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Rename truncate_iov

This prevents SR-IOV being used by making the SR-IOV BAR resources
unallocatable. Rename it to reflect what it actually does.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-7-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Explain how SR-IOV works on PowerNV
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:05 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Explain how SR-IOV works on PowerNV

SR-IOV support on PowerNV is a byzantine maze of hooks. I have no idea
how anyone is supposed to know how it works except through a lot of
stuffering. Write up some docs about the overall story to help out
the next sucker^Wperson who needs to tinker with it.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-6-oohall@gmail.com
5 years agopowerpc/powernv/sriov: Move SR-IOV into a separate file
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:04 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Move SR-IOV into a separate file

pci-ioda.c is getting a bit unwieldly due to the amount of stuff jammed in
there. The SR-IOV support can be extracted easily enough and is mostly
standalone, so move it into a separate file.

This patch also moves the PowerNV SR-IOV specific fields from pci_dn and
moves them into a platform specific structure. I'm not sure how they ended
up in there in the first place, but leaking platform specifics into common
code has proven to be a terrible idea so far so lets stop doing that.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-5-oohall@gmail.com
5 years agopowerpc/powernv/pci: Initialise M64 for IODA1 as a 1-1 window
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:03 +0000 (16:57 +1000)]
powerpc/powernv/pci: Initialise M64 for IODA1 as a 1-1 window

We pre-configure the m64 window for IODA1 as a 1-1 segment-PE mapping,
similar to PHB3. Currently the actual mapping of segments occurs in
pnv_ioda_pick_m64_pe(), but we can move it into pnv_ioda1_init_m64() and
drop the IODA1 specific code paths in the PE setup / teardown.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-4-oohall@gmail.com
5 years agopowerpc/powernv/pci: Add explicit tracking of the DMA setup state
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:02 +0000 (16:57 +1000)]
powerpc/powernv/pci: Add explicit tracking of the DMA setup state

There's an optimisation in the PE setup which skips performing DMA
setup for a PE if we only have bridges in a PE. The assumption being
that only "real" devices will DMA to system memory, which is probably
fair. However, if we start off with only bridge devices in a PE then
add a non-bridge device the new device won't be able to use DMA because
we never configured it.

Fix this (admittedly pretty weird) edge case by tracking whether we've done
the DMA setup for the PE or not. If a non-bridge device is added to the PE
(via rescan or hotplug, or whatever) we can set up DMA on demand.

This also means the only remaining user of the old "DMA Weight" code is
the IODA1 DMA setup code that it was originally added for, which is good.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-3-oohall@gmail.com
5 years agopowerpc/powernv/pci: Always tear down DMA windows on PE release
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:01 +0000 (16:57 +1000)]
powerpc/powernv/pci: Always tear down DMA windows on PE release

Currently we have these two functions:

pnv_pci_ioda2_release_dma_pe(), and
pnv_pci_ioda2_release_pe_dma()

The first is used when tearing down VF PEs and the other is used for normal
devices. There's very little difference between the two though. The latter
(non-VF) will skip a call to pnv_pci_ioda2_unset_window() unless
CONFIG_IOMMU_API=y is set. There's no real point in doing this so fold the
two together.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-2-oohall@gmail.com
5 years agopowerpc/powernv/pci: Add pci_bus_to_pnvhb() helper
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:00 +0000 (16:57 +1000)]
powerpc/powernv/pci: Add pci_bus_to_pnvhb() helper

Add a helper to go from a pci_bus structure to the pnv_phb that hosts
that bus. There's a lot of instances of the following pattern:

struct pci_controller *hose = pci_bus_to_host(pdev->bus);
struct pnv_phb *phb = hose->private_data;

Without any other uses of the pci_controller inside the function. This
is hard to read since it requires you to memorise the contents of the
private data fields and kind of error prone since it involves blindly
assigning a void pointer. Add a helper to make it more concise and
explicit.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-1-oohall@gmail.com
5 years agopowerpc/eeh: Move PE tree setup into the platform
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:31 +0000 (18:12 +1000)]
powerpc/eeh: Move PE tree setup into the platform

The EEH core has a concept of a "PE tree" to support PowerNV. The PE tree
follows the PCI bus structures because a reset asserted on an upstream
bridge will be propagated to the downstream bridges. On pseries there's a
1-1 correspondence between what the guest sees are a PHB and a PE so the
"tree" is really just a single node.

Current the EEH core is reponsible for setting up this PE tree which it
does by traversing the pci_dn tree. The structure of the pci_dn tree
matches the bus tree on PowerNV which leads to the PE tree being "correct"
this setup method doesn't make a whole lot of sense and it's actively
confusing for the pseries case where it doesn't really do anything.

We want to remove the dependence on pci_dn anyway so this patch move
choosing where to insert a new PE into the platform code rather than
being part of the generic EEH code. For PowerNV this simplifies the
tree building logic and removes the use of pci_dn. For pseries we
keep the existing logic. I'm not really convinced it does anything
due to the 1-1 PE-to-PHB correspondence so every device under that
PHB should be in the same PE, but I'd rather not remove it entirely
until we've had a chance to look at it more deeply.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-14-oohall@gmail.com
5 years agopowerpc/eeh: Drop pdn use in eeh_pe_tree_insert()
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:30 +0000 (18:12 +1000)]
powerpc/eeh: Drop pdn use in eeh_pe_tree_insert()

This is mostly just to make the subsequent diffs less noisy. No functional
changes.

One thing that needs calling out is the removal of the "config_addr"
variable and replacing it with edev->bdfn. The contents of edev->bdfn are
the same, however it's worth pointing out that what RTAS calls a
"config_addr" isn't the same as the bdfn. The config_addr is supposed to
be: <bus><devfn><reg> with each field being an 8 bit number. Various parts
of the EEH code use BDFN and "config_addr" as interchangeable quantities
even though they aren't really.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-13-oohall@gmail.com
5 years agopowerpc/eeh: Rename eeh_{add_to|remove_from}_parent_pe()
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:29 +0000 (18:12 +1000)]
powerpc/eeh: Rename eeh_{add_to|remove_from}_parent_pe()

The naming of eeh_{add_to|remove_from}_parent_pe() doesn't really reflect
what they actually do. If the PE referred to be edev->pe_config_addr
already exists under that PHB then the edev is added to that PE. However,
if the PE doesn't exist the a new one is created for the edev.

The bulk of the implementation of eeh_add_to_parent_pe() covers that
second case. Similarly, most of eeh_remove_from_parent_pe() is
determining when it's safe to delete a PE.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-12-oohall@gmail.com
5 years agopowerpc/eeh: Remove class code field from edev
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:28 +0000 (18:12 +1000)]
powerpc/eeh: Remove class code field from edev

The edev->class_code field is never referenced anywhere except for the
platform specific probe functions. The same information is available in
the pci_dev for PowerNV and in the pci_dn on pseries so we can remove
the field.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-11-oohall@gmail.com
5 years agopowerpc/eeh: Remove spurious use of pci_dn in eeh_dump_dev_log
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:27 +0000 (18:12 +1000)]
powerpc/eeh: Remove spurious use of pci_dn in eeh_dump_dev_log

Retrieve the domain, bus, device, and function numbers from the edev.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-10-oohall@gmail.com
5 years agopowerpc/eeh: Pass eeh_dev to eeh_ops->{read|write}_config()
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:26 +0000 (18:12 +1000)]
powerpc/eeh: Pass eeh_dev to eeh_ops->{read|write}_config()

Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference
a specific device rather than pci_dn. No functional changes.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-9-oohall@gmail.com
5 years agopowerpc/eeh: Pass eeh_dev to eeh_ops->resume_notify()
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:25 +0000 (18:12 +1000)]
powerpc/eeh: Pass eeh_dev to eeh_ops->resume_notify()

Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference
a specific device rather than pci_dn. No functional changes.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-8-oohall@gmail.com
5 years agopowerpc/eeh: Pass eeh_dev to eeh_ops->restore_config()
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:24 +0000 (18:12 +1000)]
powerpc/eeh: Pass eeh_dev to eeh_ops->restore_config()

Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference
a specific device rather than pci_dn. No functional changes.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-7-oohall@gmail.com
5 years agopowerpc/eeh: Remove VF config space restoration
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:23 +0000 (18:12 +1000)]
powerpc/eeh: Remove VF config space restoration

There's a bunch of strange things about this code. First up is that none of
the fields being written to are functional for a VF. The SR-IOV
specification lists then as "Reserved, but OS should preserve" so writing
new values to them doesn't do anything and is clearly wrong from a
correctness perspective.

However, since VFs are designed to be managed by the OS there is an
argument to be made that we should be saving and restoring some parts of
config space. We already sort of do that by saving the first 64 bytes of
config space in the eeh_dev (see eeh_dev->config_space[]). This is
inadequate since it doesn't even consider saving and restoring the PCI
capability structures. However, this is a problem with EEH in general and
that needs to be fixed for non-VF devices too.

There's no real reason to keep around this around so delete it.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-6-oohall@gmail.com
5 years agopowerpc/eeh: Kill off eeh_ops->get_pe_addr()
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:22 +0000 (18:12 +1000)]
powerpc/eeh: Kill off eeh_ops->get_pe_addr()

This is used in precisely one place which is in pseries specific platform
code.  There's no need to have the callback in eeh_ops since the platform
chooses the EEH PE addresses anyway. The PowerNV implementation has always
been a stub too so remove it.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-5-oohall@gmail.com
5 years agopowerpc/pseries: Stop using pdn->pe_number
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:21 +0000 (18:12 +1000)]
powerpc/pseries: Stop using pdn->pe_number

The pci_dn->pe_number field is mainly used to track the IODA PE number of a
device on PowerNV. At some point it grew a user in the pseries SR-IOV
support which muddies the waters a bit, so remove it.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-4-oohall@gmail.com
5 years agopowerpc/eeh: Move vf_index out of pci_dn and into eeh_dev
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:20 +0000 (18:12 +1000)]
powerpc/eeh: Move vf_index out of pci_dn and into eeh_dev

Drivers that do not support the PCI error handling callbacks are handled by
tearing down the device and re-probing them. If the device being removed is
a virtual function then we need to know the VF index so it can be removed
using the pci_iov_{add|remove}_virtfn() API.

Currently this is handled by looking up the pci_dn, and using the vf_index
that was stashed there when the pci_dn for the VF was created in
pcibios_sriov_enable(). We would like to eliminate the use of pci_dn
outside of pseries though so we need to provide the generic EEH code with
some other way to find the vf_index.

The easiest thing to do here is move the vf_index field out of pci_dn and
into eeh_dev.  Currently pci_dn and eeh_dev are allocated and initialized
together so this is a fairly minimal change in preparation for splitting
pci_dn and eeh_dev in the future.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-3-oohall@gmail.com
5 years agopowerpc/eeh: Remove eeh_dev.c
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:19 +0000 (18:12 +1000)]
powerpc/eeh: Remove eeh_dev.c

The only thing in this file is eeh_dev_init() which is allocates and
initialises an eeh_dev based on a pci_dn. This is only ever called from
pci_dn.c so move it into there and remove the file.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-2-oohall@gmail.com
5 years agopowerpc/eeh: Remove eeh_dev_phb_init_dynamic()
Oliver O'Halloran [Sat, 25 Jul 2020 08:12:18 +0000 (18:12 +1000)]
powerpc/eeh: Remove eeh_dev_phb_init_dynamic()

This function is a one line wrapper around eeh_phb_pe_create() and despite
the name it doesn't create any eeh_dev structures. Replace it with direct
calls to eeh_phb_pe_create() since that does what it says on the tin
and removes a layer of indirection.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-1-oohall@gmail.com
5 years agopowerpc/watchpoint: Remove 512 byte boundary
Ravi Bangoria [Thu, 23 Jul 2020 09:08:13 +0000 (14:38 +0530)]
powerpc/watchpoint: Remove 512 byte boundary

Power10 has removed 512 bytes boundary from match criteria i.e. the watch
range can cross 512 bytes boundary.

Note: ISA 3.1 Book III 9.4 match criteria includes 512 byte limit but that
is a documentation mistake and hopefully will be fixed in the next version
of ISA. Though, ISA 3.1 change log mentions about removal of 512B boundary:

  Multiple DEAW:
  Added a second Data Address Watchpoint. [H]DAR is
  set to the first byte of overlap. 512B boundary is
  removed.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-11-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Return available watchpoints dynamically
Ravi Bangoria [Thu, 23 Jul 2020 09:08:12 +0000 (14:38 +0530)]
powerpc/watchpoint: Return available watchpoints dynamically

So far Book3S Powerpc supported only one watchpoint. Power10 is
introducing 2nd DAWR. Enable 2nd DAWR support for Power10.
Availability of 2nd DAWR will depend on CPU_FTR_DAWR1.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-10-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Guest support for 2nd DAWR hcall
Ravi Bangoria [Thu, 23 Jul 2020 09:08:11 +0000 (14:38 +0530)]
powerpc/watchpoint: Guest support for 2nd DAWR hcall

2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5.
Enable powervm guest support with that. This has no effect on kvm guest
because kvm will return error if guest does hcall with resource value 5.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-9-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Rename current H_SET_MODE DAWR macro
Ravi Bangoria [Thu, 23 Jul 2020 09:08:10 +0000 (14:38 +0530)]
powerpc/watchpoint: Rename current H_SET_MODE DAWR macro

Current H_SET_MODE hcall macro name for setting/resetting DAWR0 is
H_SET_MODE_RESOURCE_SET_DAWR. Add suffix 0 to macro name as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-8-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit
Ravi Bangoria [Thu, 23 Jul 2020 09:08:09 +0000 (14:38 +0530)]
powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit

As per the PAPR, bit 0 of byte 64 in pa-features property indicates
availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
DAWR is present, otherwise not. Host generally uses "cpu-features",
which masks "pa-features". But "cpu-features" are still not used for
guests and thus this change is mostly applicable for guests only.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-7-ravi.bangoria@linux.ibm.com
5 years agopowerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
Ravi Bangoria [Thu, 23 Jul 2020 09:08:08 +0000 (14:38 +0530)]
powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR

Add new device-tree feature for 2nd DAWR. If this feature is present,
2nd DAWR is supported, otherwise not.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-6-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Enable watchpoint functionality on power10 guest
Ravi Bangoria [Thu, 23 Jul 2020 09:08:07 +0000 (14:38 +0530)]
powerpc/watchpoint: Enable watchpoint functionality on power10 guest

CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE
(controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree
node is not PAPR compatible and thus not yet used by kvm or pHyp
guests. Enable watchpoint functionality on power10 guest (both kvm
and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that
this change does not enable 2nd DAWR support.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-5-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Fix DAWR exception for CACHEOP
Ravi Bangoria [Thu, 23 Jul 2020 09:08:06 +0000 (14:38 +0530)]
powerpc/watchpoint: Fix DAWR exception for CACHEOP

'ea' returned by analyse_instr() needs to be aligned down to cache
block size for CACHEOP instructions. analyse_instr() does not set
size for CACHEOP, thus size also needs to be calculated manually.

Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly")
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-4-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Fix DAWR exception constraint
Ravi Bangoria [Thu, 23 Jul 2020 09:08:05 +0000 (14:38 +0530)]
powerpc/watchpoint: Fix DAWR exception constraint

Pedro Miraglia Franco de Carvalho noticed that on p8/p9, DAR value is
inconsistent with different type of load/store. Like for byte,word
etc. load/stores, DAR is set to the address of the first byte of
overlap between watch range and real access. But for quadword load/
store it's sometime set to the address of the first byte of real
access whereas sometime set to the address of the first byte of
overlap. This issue has been fixed in p10. In p10(ISA 3.1), DAR is
always set to the address of the first byte of overlap. Commit 27985b2a640e
("powerpc/watchpoint: Don't ignore extraneous exceptions blindly")
wrongly assumes that DAR is set to the address of the first byte of
overlap for all load/stores on p8/p9 as well. Fix that. With the fix,
we now rely on 'ea' provided by analyse_instr(). If analyse_instr()
fails, generate event unconditionally on p8/p9, and on p10 generate
event only if DAR is within a DAWR range.

Note: 8xx is not affected.

Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly")
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-3-ravi.bangoria@linux.ibm.com
5 years agopowerpc/watchpoint: Fix 512 byte boundary limit
Ravi Bangoria [Thu, 23 Jul 2020 09:08:04 +0000 (14:38 +0530)]
powerpc/watchpoint: Fix 512 byte boundary limit

Milton Miller reported that we are aligning start and end address to
wrong size SZ_512M. It should be SZ_512. Fix that.

While doing this change I also found a case where ALIGN() comparison
fails. Within a given aligned range, ALIGN() of two addresses does not
match when start address is pointing to the first byte and end address
is pointing to any other byte except the first one. But that's not true
for ALIGN_DOWN(). ALIGN_DOWN() of any two addresses within that range
will always point to the first byte. So use ALIGN_DOWN() instead of
ALIGN().

Fixes: e68ef121c1f4 ("powerpc/watchpoint: Use builtin ALIGN*() macros")
Reported-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-2-ravi.bangoria@linux.ibm.com
5 years agopowerpc/book3s64/pkey: Disable pkey on POWER6 and before
Aneesh Kumar K.V [Sun, 26 Jul 2020 13:25:17 +0000 (18:55 +0530)]
powerpc/book3s64/pkey: Disable pkey on POWER6 and before

POWER6 only supports AMR update via privileged mode (MSR[PR] = 0,
SPRN_AMR=29) The PR=1 (userspace) alias for that SPR (SPRN_AMR=13) was
only supported from POWER7. Since we don't allow userspace modifying
of AMR value we should disable pkey support on P6 and before.

The hypervisor will still report pkey support via
"ibm,processor-storage-keys". Hence also check for P7 CPU_FTR bit to
decide on pkey support.

Fixes: f491fe3fb41e ("powerpc/book3s64/pkeys: Simplify the key initialization")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726132517.399076-1-aneesh.kumar@linux.ibm.com
5 years agopowerpc/sstep: Fix incorrect CONFIG symbol in scv handling
Michael Ellerman [Fri, 24 Jul 2020 10:42:54 +0000 (20:42 +1000)]
powerpc/sstep: Fix incorrect CONFIG symbol in scv handling

When I "fixed" the ppc64e build in Nick's recent patch, I typoed the
CONFIG symbol, resulting in one that doesn't exist. Fix it to use the
correct symbol.

Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131609.1640533-1-mpe@ellerman.id.au
5 years agopowerpc/test_emulate_sstep: Fix build error
Michael Ellerman [Thu, 23 Jul 2020 23:02:26 +0000 (09:02 +1000)]
powerpc/test_emulate_sstep: Fix build error

ppc64_book3e_allmodconfig fails with:

  arch/powerpc/lib/test_emulate_step.c: In function 'test_pld':
  arch/powerpc/lib/test_emulate_step.c:113:7: error: implicit declaration of function 'cpu_has_feature'
    113 |  if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
        |       ^~~~~~~~~~~~~~~

Add an include of cpu_has_feature.h to fix it.

Fixes: b6b54b42722a ("powerpc/sstep: Add tests for prefixed integer load/stores")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724004109.1461709-1-mpe@ellerman.id.au
5 years agoMerge branch 'scv' support into next
Michael Ellerman [Thu, 23 Jul 2020 07:43:44 +0000 (17:43 +1000)]
Merge branch 'scv' support into next

From Nick's cover letter:

Linux powerpc new system call instruction and ABI

System Call Vectored (scv) ABI
==============================

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is
performance (trading slower SRR0/1 with faster LR/CTR registers, and
entering the kernel with MSR[EE] and MSR[RI] left enabled, which can
reduce MSR updates. The scv instruction has 128 levels (not enough to
cover the Linux system call space).

Assignment and advertisement
----------------------------
The proposal is to assign scv levels conservatively, and advertise
them with HWCAP feature bits as we add support for more.

Linux has not enabled FSCR[SCV] yet, so executing the scv instruction
will cause the kernel to log a "SCV facility unavilable" message, and
deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a
HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it.

This change allocates the zero level ('scv 0'), advertised with
PPC_FEATURE2_SCV, which will be used to provide normal Linux system
calls (equivalent to 'sc').

Attempting to execute scv with other levels will cause a SIGILL to be
delivered the same as before, but will not log a "SCV facility
unavailable" message (because the processor facility is enabled).

Calling convention
------------------
The proposal is for scv 0 to provide the standard Linux system call
ABI with the following differences from sc convention[1]:

- LR is to be volatile across scv calls. This is necessary because the
  scv instruction clobbers LR. From previous discussion, this should
  be possible to deal with in GCC clobbers and CFI.

- cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow
  the kernel system call exit to avoid restoring the volatile cr
  registers (although we probably still would anyway to avoid
  information leaks).

- Error handling: The consensus among kernel, glibc, and musl is to
  move to using negative return values in r3 rather than CR0[SO]=1 to
  indicate error, which matches most other architectures, and is
  closer to a function call.

Notes
-----
- r0,r4-r8 are documented as volatile in the ABI, but the kernel patch
  as submitted currently preserves them. This is to leave room for
  deciding which way to go with these. Some small benefit was found by
  preserving them[1] but I'm not convinced it's worth deviating from
  the C function call ABI just for this. Release code should follow
  the ABI.

Previous discussions:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html

[1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html

5 years agoselftests/powerpc: Add test of memcmp at end of page
Michael Ellerman [Wed, 22 Jul 2020 05:53:15 +0000 (15:53 +1000)]
selftests/powerpc: Add test of memcmp at end of page

Update our memcmp selftest, to test the case where we're comparing up
to the end of a page and the subsequent page is not mapped. We have to
make sure we don't read off the end of the page and cause a fault.

We had a bug there in the past, fixed in commit
d9470757398a ("powerpc/64: Fix memcmp reading past the end of src/dest").

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722055315.962391-1-mpe@ellerman.id.au