]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
4 years agopowerpc/ptdump: Convert powerpc to GENERIC_PTDUMP
Christophe Leroy [Thu, 8 Jul 2021 16:49:43 +0000 (16:49 +0000)]
powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP

This patch converts powerpc to the generic PTDUMP implementation.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/03166d569526be70214fe9370a7bad219d2f41c8.1625762907.git.christophe.leroy@csgroup.eu
4 years agopowerpc/ptdump: Reduce level numbers by 1 in note_page() and add p4d level
Christophe Leroy [Thu, 8 Jul 2021 16:49:42 +0000 (16:49 +0000)]
powerpc/ptdump: Reduce level numbers by 1 in note_page() and add p4d level

Do the same as commit f8f0d0b6fa20 ("mm: ptdump: reduce level numbers
by 1 in note_page()") and add missing p4d level.

This will align powerpc to the users of generic ptdump.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d76495c574132b197b445a1f133755cca4b912a4.1625762906.git.christophe.leroy@csgroup.eu
4 years agopowerpc/ptdump: Remove unused 'page_size' parameter
Christophe Leroy [Thu, 8 Jul 2021 16:49:41 +0000 (16:49 +0000)]
powerpc/ptdump: Remove unused 'page_size' parameter

note_page_update_state() doesn't use page_size. Remove it.

Could also be removed to note_page() but as a following patch
will remove all current users of note_page(), just leave it as
is for now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2f80d052001155251bfe009c360d0c5d9242c6b.1625762906.git.christophe.leroy@csgroup.eu
4 years agopowerpc/ptdump: Use DEFINE_SHOW_ATTRIBUTE()
Christophe Leroy [Thu, 8 Jul 2021 16:49:40 +0000 (16:49 +0000)]
powerpc/ptdump: Use DEFINE_SHOW_ATTRIBUTE()

Use DEFINE_SHOW_ATTRIBUTE() instead of open coding
open() and fops.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b864a92693ca8413ef0b19f0c12065c212899b6e.1625762905.git.christophe.leroy@csgroup.eu
4 years agopowerpc: Avoid link stack corruption in misc asm functions
Christophe Leroy [Tue, 24 Aug 2021 07:56:35 +0000 (07:56 +0000)]
powerpc: Avoid link stack corruption in misc asm functions

bl;mflr is used at several places to get code position.

Use bcl 20,31,+4 instead of bl in order to preserve link stack.

See commit c974809a26a1 ("powerpc/vdso: Avoid link stack corruption
in __get_datapage()") for details.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c6eabb4fb6c156f75d56dcbcc6f243e5ac0fba42.1629791763.git.christophe.leroy@csgroup.eu
4 years agopowerpc/booke: Avoid link stack corruption in several places
Christophe Leroy [Tue, 24 Aug 2021 07:56:26 +0000 (07:56 +0000)]
powerpc/booke: Avoid link stack corruption in several places

Use bcl 20,31,+4 instead of bl in order to preserve link stack.

See commit c974809a26a1 ("powerpc/vdso: Avoid link stack corruption
in __get_datapage()") for details.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e9fbc285eceb720e6c0e032ef47fe8b05f669b48.1629791751.git.christophe.leroy@csgroup.eu
4 years agopowerpc/32: indirect function call use bctrl rather than blrl in ret_from_kernel_thread
Christophe Leroy [Fri, 20 Aug 2021 05:16:05 +0000 (05:16 +0000)]
powerpc/32: indirect function call use bctrl rather than blrl in ret_from_kernel_thread

Copied from commit 89bbe4c798bc ("powerpc/64: indirect function call
use bctrl rather than blrl in ret_from_kernel_thread")

blrl is not recommended to use as an indirect function call, as it may
corrupt the link stack predictor.

This is not a performance critical path but this should be fixed for
consistency.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/91b1d242525307ceceec7ef6e832bfbacdd4501b.1629436472.git.christophe.leroy@csgroup.eu
4 years agopowerpc/compat_sys: Declare syscalls
Cédric Le Goater [Mon, 23 Aug 2021 09:00:39 +0000 (11:00 +0200)]
powerpc/compat_sys: Declare syscalls

This fixes a compile error with W=1.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210823090039.166120-3-clg@kaod.org
4 years agopowerpc/prom: Fix unused variable ‘reserve_map’ when CONFIG_PPC32 is not set
Cédric Le Goater [Mon, 23 Aug 2021 09:00:38 +0000 (11:00 +0200)]
powerpc/prom: Fix unused variable ‘reserve_map’ when CONFIG_PPC32 is not set

This fixes a compile error with W=1.

arch/powerpc/kernel/prom.c: In function ‘early_reserve_mem’:
arch/powerpc/kernel/prom.c:625:10: error: variable ‘reserve_map’ set but not used [-Werror=unused-but-set-variable]
  __be64 *reserve_map;
          ^~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210823090039.166120-2-clg@kaod.org
4 years agopowerpc/syscalls: Remove __NR__exit
Christophe Leroy [Mon, 23 Aug 2021 06:45:20 +0000 (06:45 +0000)]
powerpc/syscalls: Remove __NR__exit

__NR__exit is nowhere used. On most architectures it was removed by
commit 135ab6ec8fda ("[PATCH] remove remaining errno and
__KERNEL_SYSCALLS__ references") but not on powerpc.

powerpc removed __KERNEL_SYSCALLS__ in commit 3db03b4afb3e ("[PATCH]
rename the provided execve functions to kernel_execve"), but __NR__exit
was left over.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6457eb4f327313323ed1f70e540bbb4ddc9178fa.1629701106.git.christophe.leroy@csgroup.eu
4 years agopowerpc/audit: Simplify syscall_get_arch()
Christophe Leroy [Fri, 20 Aug 2021 09:39:14 +0000 (09:39 +0000)]
powerpc/audit: Simplify syscall_get_arch()

Make use of is_32bit_task() and CONFIG_CPU_LITTLE_ENDIAN
to simplify syscall_get_arch().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4be53b9187a4d8c163968f4d224267e41a7fcc33.1629451479.git.christophe.leroy@csgroup.eu
4 years agopowerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()
Christophe Leroy [Fri, 20 Aug 2021 09:28:19 +0000 (09:28 +0000)]
powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()

Use is_32bit_task() which already handles CONFIG_COMPAT.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ba49cdd574558a0363300c3f6b5b062b397cb071.1629451483.git.christophe.leroy@csgroup.eu
4 years agoKVM: PPC: Book3S PR: Remove unused variable
Cédric Le Goater [Thu, 19 Aug 2021 12:56:54 +0000 (14:56 +0200)]
KVM: PPC: Book3S PR: Remove unused variable

This fixes a compile error with W=1.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819125656.14498-5-clg@kaod.org
4 years agoKVM: PPC: Book3S PR: Declare kvmppc_handle_exit_pr()
Cédric Le Goater [Thu, 19 Aug 2021 12:56:53 +0000 (14:56 +0200)]
KVM: PPC: Book3S PR: Declare kvmppc_handle_exit_pr()

This fixes a compile error with W=1.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819125656.14498-4-clg@kaod.org
4 years agopowerpc/pseries/vas: Declare pseries_vas_fault_thread_fn() as static
Cédric Le Goater [Thu, 19 Aug 2021 12:56:52 +0000 (14:56 +0200)]
powerpc/pseries/vas: Declare pseries_vas_fault_thread_fn() as static

This fixes a compile error with W=1.

Fixes: 6d0aaf5e0de0 ("powerpc/pseries/vas: Setup IRQ and fault handling")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819125656.14498-3-clg@kaod.org
4 years agopowerpc/perf/hv-gpci: Fix counter value parsing
Kajol Jain [Fri, 13 Aug 2021 08:21:58 +0000 (13:51 +0530)]
powerpc/perf/hv-gpci: Fix counter value parsing

H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in
the result buffer. Result buffer has specific format defined in the PAPR
specification. One of the fields is counter offset and width of the
counter data returned.

Counter data are returned in a unsigned char array in big endian byte
order. To get the final counter data, the values must be left shifted
byte at a time. But commit 220a0c609ad17 ("powerpc/perf: Add support for
the hv gpci (get performance counter info) interface") made the shifting
bitwise and also assumed little endian order. Because of that, hcall
counters values are reported incorrectly.

In particular this can lead to counters go backwards which messes up the
counter prev vs now calculation and leads to huge counter value
reporting:

  #: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
           -C 0 -I 1000
        time             counts unit events
     1.000078854 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     2.000213293                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     3.000320107                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     4.000428392                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     5.000537864                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     6.000649087                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     7.000760312                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     8.000865218             16,448      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     9.000978985 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    10.001088891             16,384      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    11.001201435                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    12.001307937 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/

Fix the shifting logic to correct match the format, ie. read bytes in
big endian order.

Fixes: e4f226b1580b ("powerpc/perf/hv-gpci: Increase request buffer size")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813082158.429023-1-kjain@linux.ibm.com
4 years agopowerpc/tau: Add 'static' storage qualifier to 'tau_work' definition
Finn Thain [Thu, 19 Aug 2021 00:46:54 +0000 (10:46 +1000)]
powerpc/tau: Add 'static' storage qualifier to 'tau_work' definition

This patch prevents the following sparse warning.

arch/powerpc/kernel/tau_6xx.c:199:1: sparse: sparse: symbol 'tau_work'
was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/44ab381741916a51e783c4a50d0b186abdd8f280.1629334014.git.fthain@linux-m68k.org
4 years agopowerpc/kvm: Remove obsolete and unneeded select
Lukas Bulwahn [Thu, 19 Aug 2021 11:39:53 +0000 (13:39 +0200)]
powerpc/kvm: Remove obsolete and unneeded select

Commit a278e7ea608b ("powerpc: Fix compile issue with force DAWR")
selects the non-existing config PPC_DAWR_FORCE_ENABLE for config
KVM_BOOK3S_64_HANDLER. As this commit also introduces a config PPC_DAWR
and this config PPC_DAWR is selected with PPC if PPC64, there is no
need for any further select in the KVM_BOOK3S_64_HANDLER.

Remove an obsolete and unneeded select in config KVM_BOOK3S_64_HANDLER.

The issue was identified with ./scripts/checkkconfigsymbols.py.

Fixes: a278e7ea608b ("powerpc: Fix compile issue with force DAWR")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819113954.17515-2-lukas.bulwahn@gmail.com
4 years agopowerpc/32: Remove unneccessary calculations in load_up_{fpu/altivec}
Christophe Leroy [Wed, 18 Aug 2021 08:47:28 +0000 (08:47 +0000)]
powerpc/32: Remove unneccessary calculations in load_up_{fpu/altivec}

No need to re-read SPRN_THREAD, we can calculate thread address
from current (r2).

And remove a reload of value 1 into r4 as r4 is already 1.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c04cce578b97a76a9e69a096698b1d89f721768a.1629276437.git.christophe.leroy@csgroup.eu
4 years agoselftests/powerpc: Remove duplicated include from tm-poison.c
Zheng Yongjun [Fri, 26 Mar 2021 06:48:08 +0000 (14:48 +0800)]
selftests/powerpc: Remove duplicated include from tm-poison.c

Remove duplicated include.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210326064808.3262568-1-zhengyongjun3@huawei.com
4 years agopowerpc: Remove duplicate includes
Wan Jiabing [Tue, 23 Mar 2021 06:29:05 +0000 (14:29 +0800)]
powerpc: Remove duplicate includes

interrupt.c: asm/interrupt.h has been included at line 12, so remove the
duplicate one at line 10.

time.c: linux/sched/clock.h has been included at line 33,so remove the
duplicate one at line 56 and move sched/cputime.h under sched including
segament.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210323062916.295346-1-wanjiabing@vivo.com
4 years agopowerpc/configs: Regenerate mpc885_ads_defconfig
Joel Stanley [Tue, 17 Aug 2021 04:54:07 +0000 (14:24 +0930)]
powerpc/configs: Regenerate mpc885_ads_defconfig

Regenerate atop v5.14-rc6 by doing a make savedefconfig.

The changes a re-ordering except for the following (which are still set
indirectly):

 - CONFIG_DEBUG_KERNEL=y selected by EXPERT

 - CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008 which is the default
   setting

Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817045407.2445664-4-joel@jms.id.au
4 years agopowerpc/config: Renable MTD_PHYSMAP_OF
Joel Stanley [Tue, 17 Aug 2021 04:54:06 +0000 (14:24 +0930)]
powerpc/config: Renable MTD_PHYSMAP_OF

CONFIG_MTD_PHYSMAP_OF is not longer enabled as it depends on
MTD_PHYSMAP which is not enabled.

This is a regression from commit 642b1e8dbed7 ("mtd: maps: Merge
physmap_of.c into physmap-core.c"), which added the extra dependency.
Add CONFIG_MTD_PHYSMAP=y so this stays in the config, as Christophe said
it is useful for build coverage.

Fixes: 642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817045407.2445664-3-joel@jms.id.au
4 years agopowerpc/config: Fix IPV6 warning in mpc855_ads
Joel Stanley [Tue, 17 Aug 2021 04:54:05 +0000 (14:24 +0930)]
powerpc/config: Fix IPV6 warning in mpc855_ads

When building this config there's a warning:

  79:warning: override: reassigning to symbol IPV6

Commit 9a1762a4a4ff ("powerpc/8xx: Update mpc885_ads_defconfig to
improve CI") added CONFIG_IPV6=y, but left '# CONFIG_IPV6 is not set'
in.

IPV6 is default y, so remove both to clean up the build.

Fixes: 9a1762a4a4ff ("powerpc/8xx: Update mpc885_ads_defconfig to improve CI")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817045407.2445664-2-joel@jms.id.au
4 years agopowerpc/head_check: Fix shellcheck errors
Michael Ellerman [Mon, 16 Aug 2021 06:36:02 +0000 (16:36 +1000)]
powerpc/head_check: Fix shellcheck errors

Replace "cat file | grep pattern" with "grep pattern file", and quote a
few variables. Together that fixes all shellcheck errors.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817125154.3369884-1-mpe@ellerman.id.au
4 years agopowerpc/head_check: use stdout for error messages
Randy Dunlap [Sun, 15 Aug 2021 22:23:34 +0000 (15:23 -0700)]
powerpc/head_check: use stdout for error messages

Prefer stderr instead of stdout for error messages.
This is a good practice and can help CI error detecting and
reporting (0day in this case).

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210815222334.9575-1-rdunlap@infradead.org
4 years agopowerpc/pseries: Fix build error when NUMA=n
Michael Ellerman [Mon, 16 Aug 2021 02:30:11 +0000 (12:30 +1000)]
powerpc/pseries: Fix build error when NUMA=n

As reported by lkp, if NUMA=n we see a build error:

   arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 'pseries_cpu_hotplug_init':
   arch/powerpc/platforms/pseries/hotplug-cpu.c:1022:8: error: 'node_to_cpumask_map' undeclared
    1022 |        node_to_cpumask_map[node]);

Use cpumask_of_node() which has an empty stub for NUMA=n, and when
NUMA=y does a lookup from node_to_cpumask_map[].

Fixes: bd1dd4c5f528 ("powerpc/pseries: Prevent free CPU ids being reused on another node")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210816041032.2839343-1-mpe@ellerman.id.au
4 years agopowerpc: Add "-z notext" flag to disable diagnostic
Fangrui Song [Fri, 13 Aug 2021 20:05:11 +0000 (13:05 -0700)]
powerpc: Add "-z notext" flag to disable diagnostic

Object files used to link .tmp_vmlinux.kallsyms1 have many
R_PPC64_ADDR64 relocations in non-SHF_WRITE sections. There are many
text relocations (e.g. in .rela___ksymtab_gpl+* and .rela__mcount_loc
sections) in a -pie link and are disallowed by LLD:

  ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against local symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output
  >>> defined in arch/powerpc/kernel/head_64.o
  >>> referenced by arch/powerpc/kernel/head_64.o:(__restart_table+0x10)

Newer GNU ld configured with "--enable-textrel-check=error" will report
an error as well:

  $ ld-new -EL -m elf64lppc -pie ... -o .tmp_vmlinux.kallsyms1 ...
  ld-new: read-only segment has dynamic relocations

Add "-z notext" to suppress the errors. Non-CONFIG_RELOCATABLE builds
use the default -no-pie mode and thus R_PPC64_ADDR64 relocations can be
resolved at link-time.

Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813200511.1905703-1-morbo@google.com
4 years agopowerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
Christophe Leroy [Tue, 13 Apr 2021 16:38:10 +0000 (16:38 +0000)]
powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.

For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.

Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.

unsigned long test(struct pt_regs *regs)
{
int ret;

WARN_ON(regs->msr & MSR_PR);

return regs->gpr[3];
}

unsigned long test9w(unsigned long a, unsigned long b)
{
if (WARN_ON(!b))
return 0;
return a / b;
}

Before the patch:

000003a8 <test>:
 3a8: 81 23 00 84  lwz     r9,132(r3)
 3ac: 71 29 40 00  andi.   r9,r9,16384
 3b0: 40 82 00 0c  bne     3bc <test+0x14>
 3b4: 80 63 00 0c  lwz     r3,12(r3)
 3b8: 4e 80 00 20  blr

 3bc: 0f e0 00 00  twui    r0,0
 3c0: 80 63 00 0c  lwz     r3,12(r3)
 3c4: 4e 80 00 20  blr

0000000000000bf0 <.test9w>:
 bf0: 7c 89 00 74  cntlzd  r9,r4
 bf4: 79 29 d1 82  rldicl  r9,r9,58,6
 bf8: 0b 09 00 00  tdnei   r9,0
 bfc: 2c 24 00 00  cmpdi   r4,0
 c00: 41 82 00 0c  beq     c0c <.test9w+0x1c>
 c04: 7c 63 23 92  divdu   r3,r3,r4
 c08: 4e 80 00 20  blr

 c0c: 38 60 00 00  li      r3,0
 c10: 4e 80 00 20  blr

After the patch:

000003a8 <test>:
 3a8: 81 23 00 84  lwz     r9,132(r3)
 3ac: 71 29 40 00  andi.   r9,r9,16384
 3b0: 40 82 00 0c  bne     3bc <test+0x14>
 3b4: 80 63 00 0c  lwz     r3,12(r3)
 3b8: 4e 80 00 20  blr

 3bc: 0f e0 00 00  twui    r0,0

0000000000000c50 <.test9w>:
 c50: 7c 89 00 74  cntlzd  r9,r4
 c54: 79 29 d1 82  rldicl  r9,r9,58,6
 c58: 0b 09 00 00  tdnei   r9,0
 c5c: 7c 63 23 92  divdu   r3,r3,r4
 c60: 4e 80 00 20  blr

 c70: 38 60 00 00  li      r3,0
 c74: 4e 80 00 20  blr

In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.

In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.

We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/389962b1b702e3c78d169e59bcfac56282889173.1618331882.git.christophe.leroy@csgroup.eu
4 years agopowerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
Christophe Leroy [Tue, 13 Apr 2021 16:38:09 +0000 (16:38 +0000)]
powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.

For catching simple conditions like a variable having value 0, this
is efficient because it does the test and the trap at the same time.
But most conditions used with BUG_ON or WARN_ON are more complex and
forces GCC to format the condition into a 0 or 1 value in a register.
This will usually require 2 to 3 instructions.

The most efficient solution would be to use __builtin_trap() because
GCC is able to optimise the use of the different trap instructions
based on the requested condition, but this is complex if not
impossible for the following reasons:
- __builtin_trap() is a non-recoverable instruction, so it can't be
used for WARN_ON
- Knowing which line of code generated the trap would require the
analysis of DWARF information. This is not a feature we have today.

As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
the way WARN_ON() is implemented is suboptimal. That commit also
mentions an issue with 'long long' condition. It fixed it for
WARN_ON() but the same problem still exists today with BUG_ON() on
PPC32. It will be fixed by using the generic implementation.

By using the generic implementation, gcc will naturally generate a
branch to the unconditional trap generated by BUG().

As modern powerpc implement zero-cycle branch,
that's even more efficient.

And for the functions using WARN_ON() and its return, the test
on return from WARN_ON() is now also used for the WARN_ON() itself.

On PPC64 we don't want it because we want to be able to use CFAR
register to track how we entered the code that trapped. The CFAR
register would be clobbered by the branch.

A simple test function:

unsigned long test9w(unsigned long a, unsigned long b)
{
if (WARN_ON(!b))
return 0;
return a / b;
}

Before the patch:

0000046c <test9w>:
 46c: 7c 89 00 34  cntlzw  r9,r4
 470: 55 29 d9 7e  rlwinm  r9,r9,27,5,31
 474: 0f 09 00 00  twnei   r9,0
 478: 2c 04 00 00  cmpwi   r4,0
 47c: 41 82 00 0c  beq     488 <test9w+0x1c>
 480: 7c 63 23 96  divwu   r3,r3,r4
 484: 4e 80 00 20  blr

 488: 38 60 00 00  li      r3,0
 48c: 4e 80 00 20  blr

After the patch:

00000468 <test9w>:
 468: 2c 04 00 00  cmpwi   r4,0
 46c: 41 82 00 0c  beq     478 <test9w+0x10>
 470: 7c 63 23 96  divwu   r3,r3,r4
 474: 4e 80 00 20  blr

 478: 0f e0 00 00  twui    r0,0
 47c: 38 60 00 00  li      r3,0
 480: 4e 80 00 20  blr

So we see before the patch we need 3 instructions on the likely path
to handle the WARN_ON(). With the patch the trap goes on the unlikely
path.

See below the difference at the entry of system_call_exception where
we have several BUG_ON(), allthough less impressing.

With the patch:

00000000 <system_call_exception>:
   0: 81 6a 00 84  lwz     r11,132(r10)
   4: 90 6a 00 88  stw     r3,136(r10)
   8: 71 60 00 02  andi.   r0,r11,2
   c: 41 82 00 70  beq     7c <system_call_exception+0x7c>
  10: 71 60 40 00  andi.   r0,r11,16384
  14: 41 82 00 6c  beq     80 <system_call_exception+0x80>
  18: 71 6b 80 00  andi.   r11,r11,32768
  1c: 41 82 00 68  beq     84 <system_call_exception+0x84>
  20: 94 21 ff e0  stwu    r1,-32(r1)
  24: 93 e1 00 1c  stw     r31,28(r1)
  28: 7d 8c 42 e6  mftb    r12
...
  7c: 0f e0 00 00  twui    r0,0
  80: 0f e0 00 00  twui    r0,0
  84: 0f e0 00 00  twui    r0,0

Without the patch:

00000000 <system_call_exception>:
   0: 94 21 ff e0  stwu    r1,-32(r1)
   4: 93 e1 00 1c  stw     r31,28(r1)
   8: 90 6a 00 88  stw     r3,136(r10)
   c: 81 6a 00 84  lwz     r11,132(r10)
  10: 69 60 00 02  xori    r0,r11,2
  14: 54 00 ff fe  rlwinm  r0,r0,31,31,31
  18: 0f 00 00 00  twnei   r0,0
  1c: 69 60 40 00  xori    r0,r11,16384
  20: 54 00 97 fe  rlwinm  r0,r0,18,31,31
  24: 0f 00 00 00  twnei   r0,0
  28: 69 6b 80 00  xori    r11,r11,32768
  2c: 55 6b 8f fe  rlwinm  r11,r11,17,31,31
  30: 0f 0b 00 00  twnei   r11,0
  34: 7d 8c 42 e6  mftb    r12

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b286e07fb771a664b631cd07a40b09c06f26e64b.1618331881.git.christophe.leroy@csgroup.eu
4 years agopowerpc/pseries: Add support for FORM2 associativity
Aneesh Kumar K.V [Thu, 12 Aug 2021 13:22:23 +0000 (18:52 +0530)]
powerpc/pseries: Add support for FORM2 associativity

PAPR interface currently supports two different ways of communicating resource
grouping details to the OS. These are referred to as Form 0 and Form 1
associativity grouping. Form 0 is the older format and is now considered
deprecated. This patch adds another resource grouping named FORM2.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-6-aneesh.kumar@linux.ibm.com
4 years agopowerpc/pseries: Add a helper for form1 cpu distance
Aneesh Kumar K.V [Thu, 12 Aug 2021 13:22:22 +0000 (18:52 +0530)]
powerpc/pseries: Add a helper for form1 cpu distance

This helper is only used with the dispatch trace log collection.
A later patch will add Form2 affinity support and this change helps
in keeping that simpler. Also add a comment explaining we don't expect
the code to be called with FORM0

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-5-aneesh.kumar@linux.ibm.com
4 years agopowerpc/pseries: Consolidate different NUMA distance update code paths
Aneesh Kumar K.V [Thu, 12 Aug 2021 13:22:21 +0000 (18:52 +0530)]
powerpc/pseries: Consolidate different NUMA distance update code paths

The associativity details of the newly added resourced are collected from
the hypervisor via "ibm,configure-connector" rtas call. Update the numa
distance details of the newly added numa node after the above call.

Instead of updating NUMA distance every time we lookup a node id
from the associativity property, add helpers that can be used
during boot which does this only once. Also remove the distance
update from node id lookup helpers.

Currently, we duplicate parsing code for ibm,associativity and
ibm,associativity-lookup-arrays in the kernel. The associativity array provided
by these device tree properties are very similar and hence can use
a helper to parse the node id and numa distance details.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-4-aneesh.kumar@linux.ibm.com
4 years agopowerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
Aneesh Kumar K.V [Thu, 12 Aug 2021 13:22:20 +0000 (18:52 +0530)]
powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY

Also make related code cleanup that will allow adding FORM2_AFFINITY in
later patches. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-3-aneesh.kumar@linux.ibm.com
4 years agopowerpc/pseries: rename min_common_depth to primary_domain_index
Aneesh Kumar K.V [Thu, 12 Aug 2021 13:22:19 +0000 (18:52 +0530)]
powerpc/pseries: rename min_common_depth to primary_domain_index

No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-2-aneesh.kumar@linux.ibm.com
4 years agopowerpc: rename powerpc_debugfs_root to arch_debugfs_dir
Aneesh Kumar K.V [Thu, 12 Aug 2021 13:28:31 +0000 (18:58 +0530)]
powerpc: rename powerpc_debugfs_root to arch_debugfs_dir

No functional change in this patch. arch_debugfs_dir is the generic kernel
name declared in linux/debugfs.h for arch-specific debugfs directory.
Architectures like x86/s390 already use the name. Rename powerpc
specific powerpc_debugfs_root to arch_debugfs_dir.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
4 years agopowerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry
Aneesh Kumar K.V [Thu, 12 Aug 2021 13:28:30 +0000 (18:58 +0530)]
powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry

Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling.
Also add a debugfs entry for tlb_local_single_page_flush_ceiling.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-1-aneesh.kumar@linux.ibm.com
4 years agocpufreq: powernv: Fix init_chip_info initialization in numa=off
Pratik R. Sampat [Wed, 28 Jul 2021 12:05:00 +0000 (17:35 +0530)]
cpufreq: powernv: Fix init_chip_info initialization in numa=off

In the numa=off kernel command-line configuration init_chip_info() loops
around the number of chips and attempts to copy the cpumask of that node
which is NULL for all iterations after the first chip.

Hence, store the cpu mask for each chip instead of derving cpumask from
node while populating the "chips" struct array and copy that to the
chips[i].mask

Fixes: 053819e0bf84 ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level")
Cc: stable@vger.kernel.org # v4.3+
Reported-by: Shirisha Ganta <shirisha.ganta1@ibm.com>
Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
[mpe: Rename goto label to out_free_chip_cpu_mask]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210728120500.87549-2-psampat@linux.ibm.com
4 years agopowerpc: wii_defconfig: Enable OTP by default
Emmanuel Gil Peyrot [Sun, 1 Aug 2021 07:38:22 +0000 (09:38 +0200)]
powerpc: wii_defconfig: Enable OTP by default

This selects the nintendo-otp module when building for this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210801073822.12452-6-linkmauve@linkmauve.fr
4 years agopowerpc: wii.dts: Expose the OTP on this platform
Emmanuel Gil Peyrot [Sun, 1 Aug 2021 07:38:21 +0000 (09:38 +0200)]
powerpc: wii.dts: Expose the OTP on this platform

This can be used by the newly-added nintendo-otp nvmem module.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210801073822.12452-5-linkmauve@linkmauve.fr
4 years agopowerpc: wii.dts: Reduce the size of the control area
Emmanuel Gil Peyrot [Sun, 1 Aug 2021 07:38:20 +0000 (09:38 +0200)]
powerpc: wii.dts: Reduce the size of the control area

This is wrong, but needed in order to avoid overlapping ranges with the
OTP area added in the next commit.  A refactor of this part of the
device tree is needed: according to Wiibrew[1], this area starts at
0x0d800000 and spans 0x400 bytes (that is, 0x100 32-bit registers),
encompassing PIC and GPIO registers, amongst the ones already exposed in
this device tree, which should become children of the control@d800000
node.

[1] https://wiibrew.org/wiki/Hardware/Hollywood_Registers

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210801073822.12452-4-linkmauve@linkmauve.fr
4 years agopowerpc: Bulk conversion to generic_handle_domain_irq()
Marc Zyngier [Mon, 2 Aug 2021 16:26:28 +0000 (17:26 +0100)]
powerpc: Bulk conversion to generic_handle_domain_irq()

Wherever possible, replace constructs that match either
generic_handle_irq(irq_find_mapping()) or
generic_handle_irq(irq_linear_revmap()) to a single call to
generic_handle_domain_irq().

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210802162630.2219813-13-maz@kernel.org
4 years agoKVM: PPC: Book3S HV: XIVE: Add support for automatic save-restore
Cédric Le Goater [Tue, 20 Jul 2021 13:42:09 +0000 (15:42 +0200)]
KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restore

On P10, the feature doing an automatic "save & restore" of a VCPU
interrupt context is set by default in OPAL. When a VP context is
pulled out, the state of the interrupt registers are saved by the XIVE
interrupt controller under the internal NVP structure representing the
VP. This saves a costly store/load in guest entries and exits.

If OPAL advertises the "save & restore" feature in the device tree,
it should also have set the 'H' bit in the CAM line. Check that when
vCPUs are connected to their ICP in KVM before going any further.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210720134209.256133-3-clg@kaod.org
4 years agoKVM: PPC: Book3S HV: XIVE: Add a 'flags' field
Cédric Le Goater [Tue, 20 Jul 2021 13:42:08 +0000 (15:42 +0200)]
KVM: PPC: Book3S HV: XIVE: Add a 'flags' field

Use it to hold platform specific features. P9 DD2 introduced
single-escalation support. P10 will add others.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210720134209.256133-2-clg@kaod.org
4 years agopowerpc: use IRQF_NO_DEBUG for IPIs
Cédric Le Goater [Mon, 19 Jul 2021 13:06:14 +0000 (15:06 +0200)]
powerpc: use IRQF_NO_DEBUG for IPIs

There is no need to use the lockup detector ("noirqdebug") for IPIs.
The ipistorm benchmark measures a ~10% improvement on high systems
when this flag is set.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210719130614.195886-1-clg@kaod.org
4 years agopowerpc/xive: Use XIVE domain under xmon and debugfs
Cédric Le Goater [Thu, 1 Jul 2021 13:27:49 +0000 (15:27 +0200)]
powerpc/xive: Use XIVE domain under xmon and debugfs

The default domain of the PCI/MSIs is not the XIVE domain anymore. To
list the IRQ mappings under XMON and debugfs, query the IRQ data from
the low level XIVE domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-32-clg@kaod.org
4 years agoKVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts
Cédric Le Goater [Thu, 1 Jul 2021 13:27:48 +0000 (15:27 +0200)]
KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts

PCI MSIs now live in an MSI domain but the underlying calls, which
will EOI the interrupt in real mode, need an HW IRQ number mapped in
the XICS IRQ domain. Grab it there.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-31-clg@kaod.org
4 years agopowerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()
Cédric Le Goater [Thu, 1 Jul 2021 13:27:47 +0000 (15:27 +0200)]
powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()

pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts
when in real mode. Adding MSI domain broke the hack using the
'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the
IRQ chip data in the MSI domain.

The 'ioda.irq_chip' field is now unused and could be removed from the
pnv_phb struct.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-30-clg@kaod.org
4 years agopowerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices
Cédric Le Goater [Thu, 1 Jul 2021 13:27:46 +0000 (15:27 +0200)]
powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices

Before MSI domains, the default IRQ chip of PHB3 MSIs was patched by
pnv_set_msi_irq_chip() with the custom EOI handler pnv_ioda2_msi_eoi()
and the owning PHB was deduced from the 'ioda.irq_chip' field. This
path has been deprecated by the MSI domains but it is still in use by
the P8 CAPI 'cxl' driver.

Rewriting this driver to support MSI would be a waste of time.
Nevertheless, we can still remove the IRQ chip patch and set the IRQ
chip data instead. This is cleaner.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-29-clg@kaod.org
4 years agopowerpc/xics: Fix IRQ migration
Cédric Le Goater [Thu, 1 Jul 2021 13:27:45 +0000 (15:27 +0200)]
powerpc/xics: Fix IRQ migration

desc->irq_data points to the top level IRQ data descriptor which is
not necessarily in the XICS IRQ domain. MSIs are in another domain for
instance. Fix that by looking for a mapping on the low level XICS IRQ
domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-28-clg@kaod.org
4 years agopowerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interrupt
Cédric Le Goater [Thu, 1 Jul 2021 13:27:44 +0000 (15:27 +0200)]
powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interrupt

The pnv_ioda2_msi_eoi() chip handler is not used anymore for MSIs.
Simply use the check on the PSI-MSI chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-27-clg@kaod.org
4 years agopowerpc/powernv/pci: Drop unused MSI code
Cédric Le Goater [Thu, 1 Jul 2021 13:27:43 +0000 (15:27 +0200)]
powerpc/powernv/pci: Drop unused MSI code

MSIs should be fully managed by the PCI and IRQ subsystems now.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-26-clg@kaod.org
4 years agopowerpc/pseries/pci: Drop unused MSI code
Cédric Le Goater [Thu, 1 Jul 2021 13:27:42 +0000 (15:27 +0200)]
powerpc/pseries/pci: Drop unused MSI code

MSIs should be fully managed by the PCI and IRQ subsystems now.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-25-clg@kaod.org
4 years agopowerpc/xics: Drop unmask of MSIs at startup
Cédric Le Goater [Thu, 1 Jul 2021 13:27:41 +0000 (15:27 +0200)]
powerpc/xics: Drop unmask of MSIs at startup

That was a workaround in the XICS domain because of the lack of MSI
domain. This is now handled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-24-clg@kaod.org
4 years agopowerpc/pci: Drop XIVE restriction on MSI domains
Cédric Le Goater [Thu, 1 Jul 2021 13:27:40 +0000 (15:27 +0200)]
powerpc/pci: Drop XIVE restriction on MSI domains

The PowerNV and pSeries platforms now have support for both the XICS
and XIVE IRQ domains.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-23-clg@kaod.org
4 years agopowerpc/powernv/pci: Customize the MSI EOI handler to support PHB3
Cédric Le Goater [Thu, 1 Jul 2021 13:27:39 +0000 (15:27 +0200)]
powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3

PHB3s need an extra OPAL call to EOI the interrupt. The call takes an
OPAL HW IRQ number but it is translated into a vector number in OPAL.
Here, we directly use the vector number of the in-the-middle "PNV-MSI"
domain instead of grabbing the OPAL HW IRQ number in the XICS parent
domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-22-clg@kaod.org
4 years agopowerpc/xics: Add support for IRQ domain hierarchy
Cédric Le Goater [Thu, 1 Jul 2021 13:27:38 +0000 (15:27 +0200)]
powerpc/xics: Add support for IRQ domain hierarchy

XICS doesn't have any state associated with the IRQ. The support is
straightforward and simpler than for XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-21-clg@kaod.org
4 years agopowerpc/xics: Add debug logging to the set_irq_affinity handlers
Cédric Le Goater [Thu, 1 Jul 2021 13:27:37 +0000 (15:27 +0200)]
powerpc/xics: Add debug logging to the set_irq_affinity handlers

It really helps to know how the HW is configured when tweaking the IRQ
subsystem.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-20-clg@kaod.org
4 years agopowerpc/xics: Give a name to the default XICS IRQ domain
Cédric Le Goater [Thu, 1 Jul 2021 13:27:36 +0000 (15:27 +0200)]
powerpc/xics: Give a name to the default XICS IRQ domain

and clean up the error path.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-19-clg@kaod.org
4 years agopowerpc/xics: Rename the map handler in a check handler
Cédric Le Goater [Thu, 1 Jul 2021 13:27:35 +0000 (15:27 +0200)]
powerpc/xics: Rename the map handler in a check handler

This moves the IRQ initialization done under the different ICS backends
in the common part of XICS. The 'map' handler becomes a simple 'check'
on the HW IRQ at the FW level.

As we don't need an ICS anymore in xics_migrate_irqs_away(), the XICS
domain does not set a chip data for the IRQ.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-18-clg@kaod.org
4 years agopowerpc/xics: Remove ICS list
Cédric Le Goater [Thu, 1 Jul 2021 13:27:34 +0000 (15:27 +0200)]
powerpc/xics: Remove ICS list

We always had only one ICS per machine. Simplify the XICS driver by
removing the ICS list.

The ICS stored in the chip data of the XICS domain becomes useless and
we don't need it anymore to migrate away IRQs from a CPU. This will be
removed in a subsequent patch.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-17-clg@kaod.org
4 years agoKVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts
Cédric Le Goater [Thu, 1 Jul 2021 13:27:33 +0000 (15:27 +0200)]
KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts

PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
underlying calls handling the passthrough of the interrupt in the
guest need a number in the XIVE IRQ domain.

Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
PCI-MSI domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-16-clg@kaod.org
4 years agoKVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines
Cédric Le Goater [Thu, 1 Jul 2021 13:27:32 +0000 (15:27 +0200)]
KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines

The routine kvmppc_set_passthru_irq() calls kvmppc_xive_set_mapped()
and kvmppc_xive_clr_mapped() with an IRQ descriptor. Use directly the
host IRQ number to remove a useless conversion.

Add some debug.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-15-clg@kaod.org
4 years agoKVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts
Cédric Le Goater [Thu, 1 Jul 2021 13:27:31 +0000 (15:27 +0200)]
KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts

Passthrough PCI MSI interrupts are detected in KVM with a check on a
specific EOI handler (P8) or on XIVE (P9). We can now check the
PCI-MSI IRQ chip which is cleaner.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-14-clg@kaod.org
4 years agopowerpc/powernv/pci: Add MSI domains
Cédric Le Goater [Thu, 1 Jul 2021 13:27:30 +0000 (15:27 +0200)]
powerpc/powernv/pci: Add MSI domains

This is very similar to the MSI domains of the pSeries platform. The
MSI allocator is directly handled under the Linux PHB in the
in-the-middle "PNV-MSI" domain.

Only the XIVE (P9/P10) parent domain is supported for now. Support for
XICS will come later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-13-clg@kaod.org
4 years agopowerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup()
Cédric Le Goater [Thu, 1 Jul 2021 13:27:29 +0000 (15:27 +0200)]
powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup()

It will be used as a 'compose_msg' handler of the MSI domain introduced
later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-12-clg@kaod.org
4 years agopowerpc/pseries/pci: Add support of MSI domains to PHB hotplug
Cédric Le Goater [Thu, 1 Jul 2021 13:27:28 +0000 (15:27 +0200)]
powerpc/pseries/pci: Add support of MSI domains to PHB hotplug

Simply allocate or release the MSI domains when a PHB is inserted in
or removed from the machine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-11-clg@kaod.org
4 years agopowerpc/pseries/pci: Add a msi_free() handler to clear XIVE data
Cédric Le Goater [Thu, 1 Jul 2021 13:27:27 +0000 (15:27 +0200)]
powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data

The MSI domain clears the IRQ with msi_domain_free(), which calls
irq_domain_free_irqs_top(), which clears the handler data. This is a
problem for the XIVE controller since we need to unmap MMIO pages and
free a specific XIVE structure.

The 'msi_free()' handler is called before irq_domain_free_irqs_top()
when the handler data is still available. Use that to clear the XIVE
controller data.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-10-clg@kaod.org
4 years agopowerpc/pseries/pci: Add a domain_free_irqs() handler
Cédric Le Goater [Thu, 1 Jul 2021 13:27:26 +0000 (15:27 +0200)]
powerpc/pseries/pci: Add a domain_free_irqs() handler

The RTAS firmware can not disable one MSI at a time. It's all or
nothing. We need a custom free IRQ handler for that.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-9-clg@kaod.org
4 years agopowerpc/xive: Remove irqd_is_started() check when setting the affinity
Cédric Le Goater [Thu, 1 Jul 2021 13:27:25 +0000 (15:27 +0200)]
powerpc/xive: Remove irqd_is_started() check when setting the affinity

In the early days of XIVE support, commit cffb717ceb8e ("powerpc/xive:
Ensure active irqd when setting affinity") tried to fix an issue
related to interrupt migration. If the root cause was related to CPU
unplug, it should have been fixed and there is no reason to keep the
irqd_is_started() check. This test is also breaking affinity setting
of MSIs which can set before starting the associated IRQ.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-8-clg@kaod.org
4 years agopowerpc/xive: Drop unmask of MSIs at startup
Cédric Le Goater [Thu, 1 Jul 2021 13:27:24 +0000 (15:27 +0200)]
powerpc/xive: Drop unmask of MSIs at startup

That was a workaround in the XIVE domain because of the lack of MSI
domain. This is now handled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-7-clg@kaod.org
4 years agopowerpc/pseries/pci: Add MSI domains
Cédric Le Goater [Thu, 1 Jul 2021 13:27:23 +0000 (15:27 +0200)]
powerpc/pseries/pci: Add MSI domains

Two IRQ domains are added on top of default machine IRQ domain.

First, the top level "pSeries-PCI-MSI" domain deals with the MSI
specificities. In this domain, the HW IRQ numbers are generated by the
PCI MSI layer, they compose a unique ID for an MSI source with the PCI
device identifier and the MSI vector number.

These numbers can be quite large on a pSeries machine running under
the IBM Hypervisor and /sys/kernel/irq/ and /proc/interrupts will
require small fixes to show them correctly.

Second domain is the in-the-middle "pSeries-MSI" domain which acts as
a proxy between the PCI MSI subsystem and the machine IRQ subsystem.
It usually allocate the MSI vector numbers but, on pSeries machines,
this is done by the RTAS FW and RTAS returns IRQ numbers in the IRQ
number space of the machine. This is why the in-the-middle "pSeries-MSI"
domain has the same HW IRQ numbers as its parent domain.

Only the XIVE (P9/P10) parent domain is supported for now. We still
need to add support for IRQ domain hierarchy under XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-6-clg@kaod.org
4 years agopowerpc/xive: Ease debugging of xive_irq_set_affinity()
Cédric Le Goater [Thu, 1 Jul 2021 13:27:22 +0000 (15:27 +0200)]
powerpc/xive: Ease debugging of xive_irq_set_affinity()

pr_debug() is easier to activate and it helps to know how the kernel
configures the HW when tweaking the IRQ subsystem.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-5-clg@kaod.org
4 years agopowerpc/xive: Add support for IRQ domain hierarchy
Cédric Le Goater [Thu, 1 Jul 2021 13:27:21 +0000 (15:27 +0200)]
powerpc/xive: Add support for IRQ domain hierarchy

This adds handlers to allocate/free IRQs in a domain hierarchy. We
could try to use xive_irq_domain_map() in xive_irq_domain_alloc() but
we rely on xive_irq_alloc_data() to set the IRQ handler data and
duplicating the code is simpler.

xive_irq_free_data() needs to be called when IRQ are freed to clear
the MMIO mappings and free the XIVE handler data, xive_irq_data
structure. This is going to be a problem with MSI domains which we
will address later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-4-clg@kaod.org
4 years agopowerpc/pseries/pci: Introduce rtas_prepare_msi_irqs()
Cédric Le Goater [Thu, 1 Jul 2021 13:27:20 +0000 (15:27 +0200)]
powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs()

This splits the routine setting the MSIs in two parts: allocation of
MSIs for the PCI device at the FW level (RTAS) and the actual mapping
and activation of the IRQs.

rtas_prepare_msi_irqs() will serve as a handler for the PCI MSI domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-3-clg@kaod.org
4 years agopowerpc/pseries/pci: Introduce __find_pe_total_msi()
Cédric Le Goater [Thu, 1 Jul 2021 13:27:19 +0000 (15:27 +0200)]
powerpc/pseries/pci: Introduce __find_pe_total_msi()

It will help to size the PCI MSI domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-2-clg@kaod.org
4 years agoKVM: PPC: Use arch_get_random_seed_long instead of powernv variant
Alexey Kardashevskiy [Thu, 5 Aug 2021 07:56:49 +0000 (17:56 +1000)]
KVM: PPC: Use arch_get_random_seed_long instead of powernv variant

The powernv_get_random_long() does not work in nested KVM (which is
pseries) and produces a crash when accessing in_be64(rng->regs) in
powernv_get_random_long().

This replaces powernv_get_random_long with the ppc_md machine hook
wrapper.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210805075649.2086567-1-aik@ozlabs.ru
4 years agopowerpc/configs: Disable legacy ptys on microwatt defconfig
Anton Blanchard [Thu, 5 Aug 2021 01:20:05 +0000 (11:20 +1000)]
powerpc/configs: Disable legacy ptys on microwatt defconfig

We shouldn't need legacy ptys, and disabling the option improves boot
time by about 0.5 seconds.

Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210805112005.3cb1f412@kryten.localdomain
4 years agopowerpc: Always inline radix_enabled() to fix build failure
Jordan Niethe [Wed, 4 Aug 2021 01:37:24 +0000 (11:37 +1000)]
powerpc: Always inline radix_enabled() to fix build failure

This is the same as commit acdad8fb4a15 ("powerpc: Force inlining of
mmu_has_feature to fix build failure") but for radix_enabled().  The
config in the linked bugzilla causes the following build failure:

  LD      .tmp_vmlinux.kallsyms1
  powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `.__ptep_set_access_flags':
  pgtable.c:(.text+0x17c): undefined reference to `.radix__ptep_set_access_flags'
  powerpc64-linux-ld: arch/powerpc/mm/pageattr.o: in function `.change_page_attr':
  pageattr.c:(.text+0xc0): undefined reference to `.radix__flush_tlb_kernel_range'
  etc.

This is due to radix_enabled() not being inlined. See extract from
building with -Winline:

  In file included from arch/powerpc/include/asm/lppaca.h:46,
                   from arch/powerpc/include/asm/paca.h:17,
                   from arch/powerpc/include/asm/current.h:13,
                   from include/linux/thread_info.h:23,
                   from include/asm-generic/preempt.h:5,
                   from ./arch/powerpc/include/generated/asm/preempt.h:1,
                   from include/linux/preempt.h:78,
                   from include/linux/spinlock.h:51,
                   from include/linux/mmzone.h:8,
                   from include/linux/gfp.h:6,
                   from arch/powerpc/mm/pgtable.c:21:
  arch/powerpc/include/asm/book3s/64/pgtable.h: In function '__ptep_set_access_flags':
  arch/powerpc/include/asm/mmu.h:327:20: error: inlining failed in call to 'radix_enabled': call is unlikely and code size would grow [-Werror=inline]

The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime
and elimination of non possible parts of code at compile time. For this
to work radix_enabled() must be inlined so make it __always_inline.

Reported-by: Erhard F. <erhard_f@mailbox.org>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Trimmed error messages in change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213803
Link: https://lore.kernel.org/r/20210804013724.514468-1-jniethe5@gmail.com
4 years agopowerpc: Replace deprecated CPU-hotplug functions.
Sebastian Andrzej Siewior [Tue, 3 Aug 2021 14:15:46 +0000 (16:15 +0200)]
powerpc: Replace deprecated CPU-hotplug functions.

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210803141621.780504-4-bigeasy@linutronix.de
4 years agopowerpc/kexec: fix for_each_child.cocci warning
kernel test robot [Tue, 3 Aug 2021 14:59:55 +0000 (16:59 +0200)]
powerpc/kexec: fix for_each_child.cocci warning

for_each_node_by_type should have of_node_put() before return.

Generated by: scripts/coccinelle/iterators/for_each_child.cocci

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2108031654080.17639@hadrien
4 years agopowerpc/pseries: Prevent free CPU ids being reused on another node
Laurent Dufour [Thu, 29 Apr 2021 17:49:08 +0000 (19:49 +0200)]
powerpc/pseries: Prevent free CPU ids being reused on another node

When a CPU is hot added, the CPU ids are taken from the available mask
from the lower possible set. If that set of values was previously used
for a CPU attached to a different node, it appears to an application as
if these CPUs have migrated from one node to another node which is not
expected.

To prevent this, it is needed to record the CPU ids used for each node
and to not reuse them on another node. However, to prevent CPU hot plug
to fail, in the case the CPU ids is starved on a node, the capability to
reuse other nodes’ free CPU ids is kept. A warning is displayed in such
a case to warn the user.

A new CPU bit mask (node_recorded_ids_map) is introduced for each
possible node. It is populated with the CPU onlined at boot time, and
then when a CPU is hot plugged to a node. The bits in that mask remain
when the CPU is hot unplugged, to remind this CPU ids have been used for
this node.

If no id set was found, a retry is made without removing the ids used on
the other nodes to try reusing them. This is the way ids have been
allocated prior to this patch.

The effect of this patch can be seen by removing and adding CPUs using
the Qemu monitor. In the following case, the first CPU from the node 2
is removed, then the first one from the node 1 is removed too. Later,
the first CPU of the node 2 is added back. Without that patch, the
kernel will number these CPUs using the first CPU ids available which
are the ones freed when removing the second CPU of the node 0. This
leads to the CPU ids 16-23 to move from the node 1 to the node 2. With
the patch applied, the CPU ids 32-39 are used since they are the lowest
free ones which have not been used on another node.

At boot time:
  [root@vm40 ~]# numactl -H | grep cpus
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
  node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Vanilla kernel, after the CPU hot unplug/plug operations:
  [root@vm40 ~]# numactl -H | grep cpus
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  node 1 cpus: 24 25 26 27 28 29 30 31
  node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47

Patched kernel, after the CPU hot unplug/plug operations:
  [root@vm40 ~]# numactl -H | grep cpus
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  node 1 cpus: 24 25 26 27 28 29 30 31
  node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210429174908.16613-1-ldufour@linux.ibm.com
4 years agopseries/drmem: update LMBs after LPM
Laurent Dufour [Mon, 17 May 2021 09:06:06 +0000 (11:06 +0200)]
pseries/drmem: update LMBs after LPM

After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
updated by the hypervisor in the case the NUMA topology of the LPAR's
memory is updated.

This is handled by the kernel, but the memory's node is not updated because
there is no way to move a memory block between nodes from the Linux kernel
point of view.

If later a memory block is added or removed, drmem_update_dt() is called
and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to
match the added or removed LMB. But the LMB's associativity node has not
been updated after the DT node update and thus the node is overwritten by
the Linux's topology instead of the hypervisor one.

Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
updated to force an update of the LMB's associativity. However, ignore the
call to that hook when the update has been triggered by drmem_update_dt().
Because, in that case, the LMB tree has been used to set the DT property
and thus it doesn't need to be updated back. Since drmem_update_dt() is
called under the protection of the device_hotplug_lock and the hook is
called in the same context, use a simple boolean variable to detect that
call.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210517090606.56930-1-ldufour@linux.ibm.com
4 years agopowerpc/numa: Consider the max NUMA node for migratable LPAR
Laurent Dufour [Tue, 11 May 2021 07:31:36 +0000 (09:31 +0200)]
powerpc/numa: Consider the max NUMA node for migratable LPAR

When a LPAR is migratable, we should consider the maximum possible NUMA
node instead of the number of NUMA nodes from the actual system.

The DT property 'ibm,current-associativity-domains' defines the maximum
number of nodes the LPAR can see when running on that box. But if the
LPAR is being migrated on another box, it may see up to the nodes
defined by 'ibm,max-associativity-domains'. So if a LPAR is migratable,
that value should be used.

Unfortunately, there is no easy way to know if an LPAR is migratable or
not. The hypervisor exports the property 'ibm,migratable-partition' in
the case it set to migrate partition, but that would not mean that the
current partition is migratable.

Without this patch, when a LPAR is started on a 2 node box and then
migrated to a 3 node box, the hypervisor may spread the LPAR's CPUs on
the 3rd node. In that case if a CPU from that 3rd node is added to the
LPAR, it will be wrongly assigned to the node because the kernel has
been set to use up to 2 nodes (the configuration of the departure node).
With this patch applies, the CPU is correctly added to the 3rd node.

Fixes: f9f130ff2ec9 ("powerpc/numa: Detect support for coregroup")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210511073136.17795-1-ldufour@linux.ibm.com
4 years agopowerpc/non-smp: Unconditionaly call smp_mb() on switch_mm
Christophe Leroy [Mon, 5 Jul 2021 12:00:50 +0000 (12:00 +0000)]
powerpc/non-smp: Unconditionaly call smp_mb() on switch_mm

Commit 3ccfebedd8cf ("powerpc, membarrier: Skip memory barrier in
switch_mm()") added some logic to skip the smp_mb() in
switch_mm_irqs_off() before the call to switch_mmu_context().

However, on non SMP smp_mb() is just a compiler barrier and doing
it unconditionaly is simpler than the logic used to check whether the
barrier is needed or not.

After the patch:

00000000 <switch_mm_irqs_off>:
...
   c: 7c 04 18 40  cmplw   r4,r3
  10: 81 24 00 24  lwz     r9,36(r4)
  14: 91 25 04 c8  stw     r9,1224(r5)
  18: 4d 82 00 20  beqlr
  1c: 48 00 00 00  b       1c <switch_mm_irqs_off+0x1c>
1c: R_PPC_REL24 switch_mmu_context

Before the patch:

00000000 <switch_mm_irqs_off>:
...
   c: 7c 04 18 40  cmplw   r4,r3
  10: 81 24 00 24  lwz     r9,36(r4)
  14: 91 25 04 c8  stw     r9,1224(r5)
  18: 4d 82 00 20  beqlr
  1c: 81 24 00 28  lwz     r9,40(r4)
  20: 71 29 00 0a  andi.   r9,r9,10
  24: 40 82 00 34  bne     58 <switch_mm_irqs_off+0x58>
  28: 48 00 00 00  b       28 <switch_mm_irqs_off+0x28>
28: R_PPC_REL24 switch_mmu_context
...
  58: 2c 03 00 00  cmpwi   r3,0
  5c: 41 82 ff cc  beq     28 <switch_mm_irqs_off+0x28>
  60: 48 00 00 00  b       60 <switch_mm_irqs_off+0x60>
60: R_PPC_REL24 switch_mmu_context

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e9d501da0c59f60ca767b1b3ea4603fce6d02b9e.1625486440.git.christophe.leroy@csgroup.eu
4 years agopowerpc: Remove in_kernel_text()
Christophe Leroy [Sun, 27 Jun 2021 17:09:18 +0000 (17:09 +0000)]
powerpc: Remove in_kernel_text()

Last user of in_kernel_text() stopped using in with
commit 549e8152de80 ("powerpc: Make the 64-bit kernel as a
position-independent executable").

Generic function is_kernel_text() does the same.

So remote it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2a3a5b6f8cc0ef4e854d7b764f66aa8d2ee270d2.1624813698.git.christophe.leroy@csgroup.eu
4 years agopowerpc/64s/perf: Always use SIAR for kernel interrupts
Nicholas Piggin [Tue, 20 Jul 2021 14:15:04 +0000 (00:15 +1000)]
powerpc/64s/perf: Always use SIAR for kernel interrupts

If an interrupt is taken in kernel mode, always use SIAR for it rather than
looking at regs_sipr. This prevents samples piling up around interrupt
enable (hard enable or interrupt replay via soft enable) in PMUs / modes
where the PR sample indication is not in synch with SIAR.

This results in better sampling of interrupt entry and exit in particular.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
4 years agopowerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings
Parth Shah [Wed, 28 Jul 2021 17:56:07 +0000 (23:26 +0530)]
powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings

On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
itself to find both the L2 and L3 cache siblings.
Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
but fill the mask using same property "2" array.

Signed-off-by: Parth Shah <parth@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210728175607.591679-4-parth@linux.ibm.com
4 years agopowerpc/cacheinfo: Remove the redundant get_shared_cpu_map()
Gautham R. Shenoy [Wed, 28 Jul 2021 17:56:06 +0000 (23:26 +0530)]
powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()

The helper function get_shared_cpu_map() was added in

'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct
shared_cpu_map on big-cores")'

and subsequently expanded upon in

'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling
map/list for L2 cache")'

in order to help report the correct groups of threads sharing these caches
on big-core systems where groups of threads within a core can share
different sets of caches.

Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
cache->shared_cpu_map contains the correct set of thread-siblings
sharing the cache. Hence we no longer need the functions
get_shared_cpu_map(). This patch removes this function. We also remove
the helper function index_dir_to_cpu() which was only called by
get_shared_cpu_map().

With these functions removed, we can still see the correct
cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
caches distributed among groups of threads in a core.

With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
are split between the two groups of threads in a core, for CPUs 8,9,
the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
follows:

$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15

$ ppc64_cpu --smt=4
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11

$ ppc64_cpu --smt=2
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9

$ ppc64_cpu --smt=1
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210728175607.591679-3-parth@linux.ibm.com
4 years agopowerpc/cacheinfo: Lookup cache by dt node and thread-group id
Gautham R. Shenoy [Wed, 28 Jul 2021 17:56:05 +0000 (23:26 +0530)]
powerpc/cacheinfo: Lookup cache by dt node and thread-group id

Currently the cacheinfo code on powerpc indexes the "cache" objects
(modelling the L1/L2/L3 caches) where the key is device-tree node
corresponding to that cache. On some of the POWER server platforms
thread-groups within the core share different sets of caches (Eg: On
SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
threads 1,3,5,7 of the same core share another L1 cache). On such
platforms, there is a single device-tree node corresponding to that
cache and the cache-configuration within the threads of the core is
indicated via "ibm,thread-groups" device-tree property.

Since the current code is not aware of the "ibm,thread-groups"
property, on the aforementoined systems, cacheinfo code still treats
all the threads in the core to be sharing the cache because of the
single device-tree node (In the earlier example, the cacheinfo code
would says CPUs 0-7 share L1 cache).

In this patch, we make the powerpc cacheinfo code aware of the
"ibm,thread-groups" property. We indexe the "cache" objects by the
key-pair (device-tree node, thread-group id). For any CPUX, for a
given level of cache, the thread-group id is defined to be the first
CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
of cache which are not represented in "ibm,thread-groups" property,
the thread-group id is -1.

[parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map"
and "thread_group_l2_cache_map" to get rid of the compile error.]

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Parth Shah <parth@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210728175607.591679-2-parth@linux.ibm.com
4 years agopowerpc: move the install rule to arch/powerpc/Makefile
Masahiro Yamada [Thu, 29 Jul 2021 14:19:37 +0000 (23:19 +0900)]
powerpc: move the install rule to arch/powerpc/Makefile

Currently, the install target in arch/powerpc/Makefile descends into
arch/powerpc/boot/Makefile to invoke the shell script, but there is no
good reason to do so.

arch/powerpc/Makefile can run the shell script directly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729141937.445051-3-masahiroy@kernel.org
4 years agopowerpc: make the install target not depend on any build artifact
Masahiro Yamada [Thu, 29 Jul 2021 14:19:36 +0000 (23:19 +0900)]
powerpc: make the install target not depend on any build artifact

The install target should not depend on any build artifact.

The reason is explained in commit 19514fc665ff ("arm, kbuild: make
"make install" not depend on vmlinux").

Change the PowerPC installation code in a similar way.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729141937.445051-2-masahiroy@kernel.org
4 years agopowerpc: remove unused zInstall target from arch/powerpc/boot/Makefile
Masahiro Yamada [Thu, 29 Jul 2021 14:19:35 +0000 (23:19 +0900)]
powerpc: remove unused zInstall target from arch/powerpc/boot/Makefile

Commit c913e5f95e54 ("powerpc/boot: Don't install zImage.* from make
install") added the zInstall target to arch/powerpc/boot/Makefile,
but you cannot use it since the corresponding hook is missing in
arch/powerpc/Makefile.

It has never worked since its addition. Nobody has complained about
it for 7 years, which means this code was unneeded.

With this removal, the install.sh will be passed in with 4 parameters.
Simplify the shell script.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729141937.445051-1-masahiroy@kernel.org
4 years agocpuidle: pseries: Mark pseries_idle_proble() as __init
Nathan Chancellor [Tue, 3 Aug 2021 21:15:47 +0000 (14:15 -0700)]
cpuidle: pseries: Mark pseries_idle_proble() as __init

After commit 7cbd631d4dec ("cpuidle: pseries: Fixup CEDE0 latency only
for POWER10 onwards"), pseries_idle_probe() is no longer inlined when
compiling with clang, which causes a modpost warning:

WARNING: modpost: vmlinux.o(.text+0xc86a54): Section mismatch in
reference from the function pseries_idle_probe() to the function
.init.text:fixup_cede0_latency()
The function pseries_idle_probe() references
the function __init fixup_cede0_latency().
This is often because pseries_idle_probe lacks a __init
annotation or the annotation of fixup_cede0_latency is wrong.

pseries_idle_probe() is a non-init function, which calls
fixup_cede0_latency(), which is an init function, explaining the
mismatch. pseries_idle_probe() is only called from
pseries_processor_idle_init(), which is an init function, so mark
pseries_idle_probe() as __init so there is no more warning.

Fixes: 054e44ba99ae ("cpuidle: pseries: Add function to parse extended CEDE records")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210803211547.1093820-1-nathan@kernel.org
4 years agopowerpc/stacktrace: Include linux/delay.h
Michal Suchanek [Thu, 29 Jul 2021 18:01:03 +0000 (20:01 +0200)]
powerpc/stacktrace: Include linux/delay.h

commit 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
introduces udelay() call without including the linux/delay.h header.
This may happen to work on master but the header that declares the
functionshould be included nonetheless.

Fixes: 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729180103.15578-1-msuchanek@suse.de
4 years agocpuidle: pseries: Do not cap the CEDE0 latency in fixup_cede0_latency()
Gautham R. Shenoy [Mon, 19 Jul 2021 06:33:19 +0000 (12:03 +0530)]
cpuidle: pseries: Do not cap the CEDE0 latency in fixup_cede0_latency()

Currently in fixup_cede0_latency() code, we perform the fixup the
CEDE(0) exit latency value only if minimum advertized extended CEDE
latency values are less than 10us. This was done so as to not break
the expected behaviour on POWER8 platforms where the advertised
latency was higher than the default 10us, which would delay the SMT
folding on the core.

However, after the earlier patch "cpuidle/pseries: Fixup CEDE0 latency
only for POWER10 onwards", we can be sure that the fixup of CEDE0
latency is going to happen only from POWER10 onwards. Hence
unconditionally use the minimum exit latency provided by the platform.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626676399-15975-3-git-send-email-ego@linux.vnet.ibm.com
4 years agocpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards
Gautham R. Shenoy [Mon, 19 Jul 2021 06:33:18 +0000 (12:03 +0530)]
cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards

Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
of the Extended CEDE states advertised by the platform

On POWER9 LPARs, the firmwares advertise a very low value of 2us for
CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
PHYP hypervisor corresponds to the latency required to wakeup from the
underlying hardware idle state. However the wakeup latency from the
LPAR perspective should include

1. The time taken to transition the CPU from the Hypervisor into the
   LPAR post wakeup from platform idle state

2. Time taken to send the IPI from the source CPU (waker) to the idle
   target CPU (wakee).

1. can be measured via timer idle test, where we queue a timer, say
for 1ms, and enter the CEDE state. When the timer fires, in the timer
handler we compute how much extra timer over the expected 1ms have we
consumed. On a a POWER9 LPAR the numbers are

CEDE latency measured using a timer (numbers in ns)
N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
400     2601     5677     5668.74    5917    6413     9299   455.01

1. and 2. combined can be determined by an IPI latency test where we
send an IPI to an idle CPU and in the handler compute the time
difference between when the IPI was sent and when the handler ran. We
see the following numbers on POWER9 LPAR.

CEDE latency measured using an IPI (numbers in ns)
N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
400     711      7564     7369.43   8559    9514      9698   1200.01

Suppose, we consider the 99th percentile latency value measured using
the IPI to be the wakeup latency, the value would be 9.5us This is in
the ballpark of the default value of 10us.

Hence, use the exit latency of CEDE(0) based on the latency values
advertized by platform only from POWER10 onwards. The values
advertized on POWER10 platforms is more realistic and informed by the
latency measurements. For earlier platforms stick to the default value
of 10us. The fix was suggested by Michael Ellerman.

Fixes: d947fb4c965c ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
Reported-by: Enrico Joedecke <joedecke@de.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626676399-15975-2-git-send-email-ego@linux.vnet.ibm.com
4 years agopowerpc/kexec: blacklist functions called in real mode for kprobe
Hari Bathini [Wed, 14 Jul 2021 12:47:58 +0000 (18:17 +0530)]
powerpc/kexec: blacklist functions called in real mode for kprobe

As kprobe does not handle events happening in real mode, blacklist the
functions that only get called in real mode or in kexec sequence with
MMU turned off.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/162626687834.155313.4692863392927831843.stgit@hbathini-workstation.ibm.com
4 years agoMerge branch 'fixes' into next
Michael Ellerman [Mon, 26 Jul 2021 10:37:53 +0000 (20:37 +1000)]
Merge branch 'fixes' into next

Merge our fixes branch, which contains some fixes that didn't make it
into rc2 but which we'd like in next.

4 years agoKVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
Nicholas Piggin [Thu, 8 Jul 2021 11:26:22 +0000 (21:26 +1000)]
KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state

The H_ENTER_NESTED hypercall is handled by the L0, and it is a request
by the L1 to switch the context of the vCPU over to that of its L2
guest, and return with an interrupt indication. The L1 is responsible
for switching some registers to guest context, and the L0 switches
others (including all the hypervisor privileged state).

If the L2 MSR has TM active, then the L1 is responsible for
recheckpointing the L2 TM state. Then the L1 exits to L0 via the
H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit,
and then it recheckpoints the TM state as part of the nested entry and
finally HRFIDs into the L2 with TM active MSR. Not efficient, but about
the simplest approach for something that's horrendously complicated.

Problems arise if the L1 exits to the L0 with a TM state which does not
match the L2 TM state being requested. For example if the L1 is
transactional but the L2 MSR is non-transactional, or vice versa. The
L0's HRFID can take a TM Bad Thing interrupt and crash.

Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then
ensuring that if the L1 is suspended then the L2 must have TM active,
and if the L1 is not suspended then the L2 must not have TM active.

Fixes: 360cae313702 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>