]> www.infradead.org Git - linux.git/log
linux.git
3 years agoKVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault
Chenyi Qiang [Tue, 24 May 2022 13:56:21 +0000 (21:56 +0800)]
KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault

For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Drop amd_event_mapping[] in the KVM context
Like Xu [Wed, 18 May 2022 13:25:12 +0000 (21:25 +0800)]
KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context

All gp or fixed counters have been reprogrammed using PERF_TYPE_RAW,
which means that the table that maps perf_hw_id to event select values is
no longer useful, at least for AMD.

For Intel, the logic to check if the pmu event reported by Intel cpuid is
not available is still required, in which case pmc_perf_hw_id() could be
renamed to hw_event_is_unavail() and a bool value is returned to replace
the semantics of "PERF_COUNT_HW_MAX+1".

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-12-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()
Like Xu [Wed, 18 May 2022 13:25:11 +0000 (21:25 +0800)]
KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()

With the help of perf_get_hw_event_config(), KVM could query the correct
EVENTSEL_{EVENT, UMASK} pair of a kernel-generic hw event directly from
the different *_perfmon_event_map[] by the kernel's pre-defined perf_hw_id.

Also extend the bit range of the comparison field to
AMD64_RAW_EVENT_MASK_NB to prevent AMD from
defining EventSelect[11:8] into perfmon_event_map[] one day.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-11-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoperf: x86/core: Add interface to query perfmon_event_map[] directly
Like Xu [Wed, 18 May 2022 13:25:10 +0000 (21:25 +0800)]
perf: x86/core: Add interface to query perfmon_event_map[] directly

Currently, we have [intel|knc|p4|p6]_perfmon_event_map on the Intel
platforms and amd_[f17h]_perfmon_event_map on the AMD platforms.

Early clumsy KVM code or other potential perf_event users may have
hard-coded these perfmon_maps (e.g., arch/x86/kvm/svm/pmu.c), so
it would not make sense to program a common hardware event based
on the generic "enum perf_hw_id" once the two tables do not match.

Let's provide an interface for callers outside the perf subsystem to get
the counter config based on the perfmon_event_map currently in use,
and it also helps to save bytes.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <likexu@tencent.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220518132512.37864-10-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()
Like Xu [Wed, 18 May 2022 13:25:09 +0000 (21:25 +0800)]
KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()

The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
fixed counter using the PERF_TYPE_HARDWAR type and the gp being
able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
depending on the pmc->eventsel value.

After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
been setup with the same semantic value and will not be changed during
the guest runtime.

The original story of using the PERF_TYPE_HARDWARE type is to emulate
guest architecture PMU on a host without architecture PMU (the Pentium 4),
for which the guest vPMC needs to be reprogrammed using the kernel
generic perf_hw_id. But essentially, "the HARDWARE is just a convenience
wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
to use the PERF_TYPE_RAW type only in practice to program both gp and
fixed counters naturally in the reprogram_counter().

To make the gp and fixed counters more semantically symmetrical,
the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
via fixed_ctr_ctrl before the pmc_reprogram_counter() call.

Cc: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-9-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Use only the uniform interface reprogram_counter()
Paolo Bonzini [Wed, 25 May 2022 09:28:56 +0000 (05:28 -0400)]
KVM: x86/pmu: Use only the uniform interface reprogram_counter()

Since reprogram_counter(), reprogram_{gp, fixed}_counter() currently have
the same incoming parameter "struct kvm_pmc *pmc", the callers can simplify
the conetxt by using uniformly exported interface, which makes reprogram_
{gp, fixed}_counter() static and eliminates EXPORT_SYMBOL_GPL.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-8-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()
Like Xu [Wed, 18 May 2022 13:25:07 +0000 (21:25 +0800)]
KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()

Since afrer reprogram_fixed_counter() is called, it's bound to assign
the requested fixed_ctr_ctrl to pmu->fixed_ctr_ctrl, this assignment step
can be moved forward (the stale value for diff is saved extra early),
thus simplifying the passing of parameters.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-7-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()
Like Xu [Wed, 18 May 2022 13:25:06 +0000 (21:25 +0800)]
KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()

Because inside reprogram_gp_counter() it is bound to assign the requested
eventel to pmc->eventsel, this assignment step can be moved forward, thus
simplifying the passing of parameters to "struct kvm_pmc *pmc" only.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-6-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()
Like Xu [Wed, 18 May 2022 13:25:05 +0000 (21:25 +0800)]
KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()

Passing the reference "struct kvm_pmc *pmc" when creating
pmc->perf_event is sufficient. This change helps to simplify the
calling convention by replacing reprogram_{gp, fixed}_counter()
with reprogram_counter() seamlessly.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-5-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Extract check_pmu_event_filter() handling both GP and fixed counters
Like Xu [Wed, 18 May 2022 13:25:03 +0000 (21:25 +0800)]
KVM: x86/pmu: Extract check_pmu_event_filter() handling both GP and fixed counters

Checking the kvm->arch.pmu_event_filter policy in both gp and fixed
code paths was somewhat redundant, so common parts can be extracted,
which reduces code footprint and improves readability.

Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <20220518132512.37864-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Update comments for AMD gp counters
Like Xu [Wed, 18 May 2022 13:25:02 +0000 (21:25 +0800)]
KVM: x86/pmu: Update comments for AMD gp counters

The obsolete comment could more accurately state that AMD platforms
have two base MSR addresses and two different maximum numbers
for gp counters, depending on the X86_FEATURE_PERFCTR_CORE feature.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: always allow host-initiated writes to PMU MSRs
Paolo Bonzini [Wed, 25 May 2022 08:39:22 +0000 (04:39 -0400)]
KVM: x86: always allow host-initiated writes to PMU MSRs

Whenever an MSR is part of KVM_GET_MSR_INDEX_LIST, it has to be always
retrievable and settable with KVM_GET_MSR and KVM_SET_MSR.  Accept
the PMU MSRs unconditionally in intel_is_valid_msr, if the access was
host-initiated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: vmx, pmu: accept 0 for host-initiated write to MSR_IA32_DS_AREA
Paolo Bonzini [Wed, 1 Jun 2022 07:21:19 +0000 (03:21 -0400)]
KVM: vmx, pmu: accept 0 for host-initiated write to MSR_IA32_DS_AREA

Whenever an MSR is part of KVM_GET_MSR_INDEX_LIST, as is the case
for MSR_IA32_DS_AREA, it has to be always settable with KVM_SET_MSR.
Accept a zero value for these MSRs to obey the contract.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Ignore pmu->global_ctrl check if vPMU doesn't support global_ctrl
Like Xu [Mon, 9 May 2022 10:22:02 +0000 (18:22 +0800)]
KVM: x86/pmu: Ignore pmu->global_ctrl check if vPMU doesn't support global_ctrl

MSR_CORE_PERF_GLOBAL_CTRL is introduced as part of Architecture PMU V2,
as indicated by Intel SDM 19.2.2 and the intel_is_valid_msr() function.

So in the absence of global_ctrl support, all PMCs are enabled as AMD does.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220509102204.62389-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing
Like Xu [Tue, 10 May 2022 04:44:07 +0000 (12:44 +0800)]
KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing

Assigning a value to pmu->global_ctrl just to set the value of
pmu->global_ctrl_mask is more readable but does not conform to the
specification. The value is reset to zero on Power up and Reset but
stays unchanged on INIT, like most other MSRs.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220510044407.26445-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: remove useless prototype
Paolo Bonzini [Fri, 20 May 2022 12:00:05 +0000 (08:00 -0400)]
KVM: x86/pmu: remove useless prototype

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Move the vmx_icl_pebs_cpu[] definition out of the header file
Like Xu [Wed, 18 May 2022 17:01:16 +0000 (01:01 +0800)]
KVM: x86/pmu: Move the vmx_icl_pebs_cpu[] definition out of the header file

Defining a static const array in a header file would introduce redundant
definitions to the point of confusing semantics, and such a use case would
only bring complaints from the compiler:

arch/x86/kvm/pmu.h:20:32: warning: ‘vmx_icl_pebs_cpu’ defined but not used [-Wunused-const-variable=]
   20 | static const struct x86_cpu_id vmx_icl_pebs_cpu[] = {
      |                                ^~~~~~~~~~~~~~~~

Fixes: a095df2c5f48 ("KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518170118.66263-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoselftests: kvm: replace ternary operator with min()
Guo Zhengkui [Wed, 11 May 2022 12:05:55 +0000 (20:05 +0800)]
selftests: kvm: replace ternary operator with min()

Fix the following coccicheck warnings:

tools/testing/selftests/kvm/lib/s390x/ucall.c:25:15-17: WARNING
opportunity for min()
tools/testing/selftests/kvm/lib/x86_64/ucall.c:27:15-17: WARNING
opportunity for min()
tools/testing/selftests/kvm/lib/riscv/ucall.c:56:15-17: WARNING
opportunity for min()
tools/testing/selftests/kvm/lib/aarch64/ucall.c:82:15-17: WARNING
opportunity for min()
tools/testing/selftests/kvm/lib/aarch64/ucall.c:55:20-21: WARNING
opportunity for min()

min() is defined in tools/include/linux/kernel.h.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <20220511120621.36956-1-guozhengkui@vivo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
Like Xu [Mon, 11 Apr 2022 10:19:46 +0000 (18:19 +0800)]
KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64

The CPUID features PDCM, DS and DTES64 are required for PEBS feature.
KVM would expose CPUID feature PDCM, DS and DTES64 to guest when PEBS
is supported in the KVM on the Ice Lake server platforms.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-18-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/cpuid: Refactor host/guest CPU model consistency check
Like Xu [Mon, 11 Apr 2022 10:19:45 +0000 (18:19 +0800)]
KVM: x86/cpuid: Refactor host/guest CPU model consistency check

For the same purpose, the leagcy intel_pmu_lbr_is_compatible() can be
renamed for reuse by more callers, and remove the comment about LBR
use case can be deleted by the way.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-17-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
Like Xu [Mon, 11 Apr 2022 10:19:44 +0000 (18:19 +0800)]
KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability

The information obtained from the interface perf_get_x86_pmu_capability()
doesn't change, so an exported "struct x86_pmu_capability" is introduced
for all guests in the KVM, and it's initialized before hardware_setup().

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-16-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
Like Xu [Mon, 11 Apr 2022 10:19:43 +0000 (18:19 +0800)]
KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations

The guest PEBS will be disabled when some users try to perf KVM and
its user-space through the same PEBS facility OR when the host perf
doesn't schedule the guest PEBS counter in a one-to-one mapping manner
(neither of these are typical scenarios).

The PEBS records in the guest DS buffer are still accurate and the
above two restrictions will be checked before each vm-entry only if
guest PEBS is deemed to be enabled.

Suggested-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-15-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
Like Xu [Mon, 11 Apr 2022 10:19:42 +0000 (18:19 +0800)]
KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h

It allows this inline function to be reused by more callers in
more files, such as pmu_intel.c.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-14-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
Like Xu [Mon, 11 Apr 2022 10:19:41 +0000 (18:19 +0800)]
KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled

The bit 12 represents "Processor Event Based Sampling Unavailable (RO)" :
1 = PEBS is not supported.
0 = PEBS is supported.

A write to this PEBS_UNAVL available bit will bring #GP(0) when guest PEBS
is enabled. Some PEBS drivers in guest may care about this bit.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20220411101946.20262-13-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
Like Xu [Mon, 11 Apr 2022 10:19:40 +0000 (18:19 +0800)]
KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS

If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive
PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable
bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL.
FCx_Adaptive_Record) are also supported.

Adaptive PEBS provides software the capability to configure the PEBS
records to capture only the data of interest, keeping the record size
compact. An overflow of PMCx results in generation of an adaptive PEBS
record with state information based on the selections specified in
MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group.

When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will
be added to the perf_guest_switch_msr() and switched during the VMX
transitions just like CORE_PERF_GLOBAL_CTRL MSR.

According to Intel SDM, software is recommended to  PEBS Baseline
when the following is true. IA32_PERF_CAPABILITIES.PEBS_BASELINE[14]
&& IA32_PERF_CAPABILITIES.PEBS_FMT[11:8] ≥ 4.

Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-12-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
Like Xu [Mon, 11 Apr 2022 10:19:39 +0000 (18:19 +0800)]
KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS

When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points
to the linear address of the first byte of the DS buffer management area,
which is used to manage the PEBS records.

When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the
perf_guest_switch_msr() and switched during the VMX transitions just like
CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0)
if the source register contains a non-canonical address.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-11-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
Like Xu [Mon, 11 Apr 2022 10:19:38 +0000 (18:19 +0800)]
KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter

The PEBS-PDIR facility on Ice Lake server is supported on IA31_FIXED0 only.
If the guest configures counter 32 and PEBS is enabled, the PEBS-PDIR
facility is supposed to be used, in which case KVM adjusts attr.precise_ip
to 3 and request host perf to assign the exactly requested counter or fail.

The CPU model check is also required since some platforms may place the
PEBS-PDIR facility in another counter index.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-10-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
Like Xu [Mon, 11 Apr 2022 10:19:37 +0000 (18:19 +0800)]
KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter

When a guest counter is configured as a PEBS counter through
IA32_PEBS_ENABLE, a guest PEBS event will be reprogrammed by
configuring a non-zero precision level in the perf_event_attr.

The guest PEBS overflow PMI bit would be set in the guest
GLOBAL_STATUS MSR when PEBS facility generates a PEBS
overflow PMI based on guest IA32_DS_AREA MSR.

Even with the same counter index and the same event code and
mask, guest PEBS events will not be reused for non-PEBS events.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-9-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
Like Xu [Mon, 11 Apr 2022 10:19:36 +0000 (18:19 +0800)]
KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS

If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the
IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed
and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE
that enable generation of PEBS records. The general-purpose counter bits
start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at
bit IA32_PEBS_ENABLE[32].

When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be
added to the perf_guest_switch_msr() and atomically switched during
the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR.

Based on whether the platform supports x86_pmu.pebs_ept, it has also
refactored the way to add more msrs to arr[] in intel_guest_get_msrs()
for extensibility.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-8-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agox86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value
Peter Zijlstra (Intel) [Mon, 11 Apr 2022 10:19:35 +0000 (18:19 +0800)]
x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value

The value of pebs_counter_mask will be accessed frequently
for repeated use in the intel_guest_get_msrs(). So it can be
optimized instead of endlessly mucking about with branches.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-7-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
Like Xu [Mon, 11 Apr 2022 10:19:34 +0000 (18:19 +0800)]
KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter

The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is a generic code refactoring.

Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-6-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
Like Xu [Mon, 11 Apr 2022 10:19:33 +0000 (18:19 +0800)]
KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled

On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
detect whether the processor supports performance monitoring facility.

It depends on the PMU is enabled for the guest, and a software write
operation to this available bit will be ignored. The proposal to ignore
the toggle in KVM is the way to go and that behavior matches bare metal.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-5-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoperf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
Like Xu [Mon, 11 Apr 2022 10:19:32 +0000 (18:19 +0800)]
perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values

Splitting the logic for determining the guest values is unnecessarily
confusing, and potentially fragile. Perf should have full knowledge and
control of what values are loaded for the guest.

If we change .guest_get_msrs() to take a struct kvm_pmu pointer, then it
can generate the full set of guest values by grabbing guest ds_area and
pebs_data_cfg. Alternatively, .guest_get_msrs() could take the desired
guest MSR values directly (ds_area and pebs_data_cfg), but kvm_pmu is
vendor agnostic, so we don't see any reason to not just pass the pointer.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-4-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoperf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
Like Xu [Mon, 11 Apr 2022 10:19:31 +0000 (18:19 +0800)]
perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest

With PEBS virtualization, the guest PEBS records get delivered to the
guest DS, and the host pmi handler uses perf_guest_cbs->is_in_guest()
to distinguish whether the PMI comes from the guest code like Intel PT.

No matter how many guest PEBS counters are overflowed, only triggering
one fake event is enough. The fake event causes the KVM PMI callback to
be called, thereby injecting the PEBS overflow PMI into the guest.

KVM may inject the PMI with BUFFER_OVF set, even if the guest DS is
empty. That should really be harmless. Thus guest PEBS handler would
retrieve the correct information from its own PEBS records buffer.

Cc: linux-perf-users@vger.kernel.org
Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoperf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
Like Xu [Mon, 11 Apr 2022 10:19:30 +0000 (18:19 +0800)]
perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server

Add support for EPT-Friendly PEBS, a new CPU feature that enlightens PEBS
to translate guest linear address through EPT, and facilitates handling
VM-Exits that occur when accessing PEBS records.  More information can
be found in the December 2021 release of Intel's SDM, Volume 3,
18.9.5 "EPT-Friendly PEBS". This new hardware facility makes sure the
guest PEBS records will not be lost, which is available on Intel Ice Lake
Server platforms (and later).

KVM will check this field through perf_get_x86_pmu_capability() instead
of hard coding the CPU models in the KVM code. If it is supported, the
guest PEBS capability will be exposed to the guest. Guest PEBS can be
enabled when and only when "EPT-Friendly PEBS" is supported and
EPT is enabled.

Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: VMX: enable IPI virtualization
Chao Gao [Tue, 19 Apr 2022 15:45:10 +0000 (23:45 +0800)]
KVM: VMX: enable IPI virtualization

With IPI virtualization enabled, the processor emulates writes to
APIC registers that would send IPIs. The processor sets the bit
corresponding to the vector in target vCPU's PIR and may send a
notification (IPI) specified by NDST and NV fields in target vCPU's
Posted-Interrupt Descriptor (PID). It is similar to what IOMMU
engine does when dealing with posted interrupt from devices.

A PID-pointer table is used by the processor to locate the PID of a
vCPU with the vCPU's APIC ID. The table size depends on maximum APIC
ID assigned for current VM session from userspace. Allocating memory
for PID-pointer table is deferred to vCPU creation, because irqchip
mode and VM-scope maximum APIC ID is settled at that point. KVM can
skip PID-pointer table allocation if !irqchip_in_kernel().

Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its
notification vector to wakeup vector. This can ensure that when an IPI
for blocked vCPUs arrives, VMM can get control and wake up blocked
vCPUs. And if a VCPU is preempted, its posted interrupt notification
is suppressed.

Note that IPI virtualization can only virualize physical-addressing,
flat mode, unicast IPIs. Sending other IPIs would still cause a
trap-like APIC-write VM-exit and need to be handled by VMM.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154510.11938-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agokvm: selftests: Add KVM_CAP_MAX_VCPU_ID cap test
Zeng Guang [Fri, 22 Apr 2022 13:44:56 +0000 (21:44 +0800)]
kvm: selftests: Add KVM_CAP_MAX_VCPU_ID cap test

Basic test coverage of KVM_CAP_MAX_VCPU_ID cap.

This capability can be enabled before vCPU creation and only allowed
to set once. if assigned vcpu id is beyond KVM_CAP_MAX_VCPU_ID
capability, vCPU creation will fail.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220422134456.26655-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Allow userspace to set maximum VCPU id for VM
Zeng Guang [Tue, 19 Apr 2022 15:44:44 +0000 (23:44 +0800)]
KVM: x86: Allow userspace to set maximum VCPU id for VM

Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace
can assign maximum possible vcpu id for current VM session using
KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl().

This is done for x86 only because the sole use case is to guide
memory allocation for PID-pointer table, a structure needed to
enable VMX IPI.

By default, max_vcpu_ids set as KVM_MAX_VCPU_IDS.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154444.11888-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: Move kvm_arch_vcpu_precreate() under kvm->lock
Zeng Guang [Tue, 19 Apr 2022 15:44:09 +0000 (23:44 +0800)]
KVM: Move kvm_arch_vcpu_precreate() under kvm->lock

kvm_arch_vcpu_precreate() targets to handle arch specific VM resource
to be prepared prior to the actual creation of vCPU. For example, x86
platform may need do per-VM allocation based on max_vcpu_ids at the
first vCPU creation. It probably leads to concurrency control on this
allocation as multiple vCPU creation could happen simultaneously. From
the architectual point of view, it's necessary to execute
kvm_arch_vcpu_precreate() under protect of kvm->lock.

Currently only arm64, x86 and s390 have non-nop implementations at the
stage of vCPU pre-creation. Remove the lock acquiring in s390's design
and make sure all architecture can run kvm_arch_vcpu_precreate() safely
under kvm->lock without recrusive lock issue.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154409.11842-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: VMX: Clean up vmx_refresh_apicv_exec_ctrl()
Zeng Guang [Tue, 19 Apr 2022 15:36:04 +0000 (23:36 +0800)]
KVM: VMX: Clean up vmx_refresh_apicv_exec_ctrl()

Remove the condition check cpu_has_secondary_exec_ctrls(). Calling
vmx_refresh_apicv_exec_ctrl() premises secondary controls activated
and VMCS fields related to APICv valid as well. If it's invoked in
wrong circumstance at the worst case, VMX operation will report
VMfailValid error without further harmful impact and just functions
as if all the secondary controls were 0.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153604.11786-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode
Zeng Guang [Tue, 19 Apr 2022 15:35:16 +0000 (23:35 +0800)]
KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode

Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR,
i.e. will trap and generate an APIC-write VM-Exit instead of intercepting
the WRMSR.  Add support for handling "nodecode" x2APIC writes, which
were previously impossible.

Note, x2APIC MSR writes are 64 bits wide.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153516.11739-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: VMX: Report tertiary_exec_control field in dump_vmcs()
Robert Hoo [Tue, 19 Apr 2022 15:34:41 +0000 (23:34 +0800)]
KVM: VMX: Report tertiary_exec_control field in dump_vmcs()

Add tertiary_exec_control field report in dump_vmcs(). Meanwhile,
reorganize the dump output of VMCS category as follows.

Before change:
*** Control State ***
 PinBased=0x000000ff CPUBased=0xb5a26dfa SecondaryExec=0x061037eb
 EntryControls=0000d1ff ExitControls=002befff

After change:
*** Control State ***
 CPUBased=0xb5a26dfa SecondaryExec=0x061037eb TertiaryExec=0x0000000000000010
 PinBased=0x000000ff EntryControls=0000d1ff ExitControls=002befff

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153441.11687-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config
Robert Hoo [Tue, 19 Apr 2022 15:34:00 +0000 (23:34 +0800)]
KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config

Check VMX features on tertiary execution control in VMCS config setup.
Sub-features in tertiary execution control to be enabled are adjusted
according to hardware capabilities although no sub-feature is enabled
in this patch.

EVMCSv1 doesn't support tertiary VM-execution control, so disable it
when EVMCSv1 is in use. And define the auxiliary functions for Tertiary
control field here, using the new BUILD_CONTROLS_SHADOW().

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153400.11642-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation
Robert Hoo [Tue, 19 Apr 2022 15:33:18 +0000 (23:33 +0800)]
KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation

The Tertiary VM-Exec Control, different from previous control fields, is 64
bit. So extend BUILD_CONTROLS_SHADOW() by adding a 'bit' parameter, to
support both 32 bit and 64 bit fields' auxiliary functions building.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153318.11595-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agox86/cpu: Add new VMX feature, Tertiary VM-Execution control
Robert Hoo [Tue, 19 Apr 2022 15:32:40 +0000 (23:32 +0800)]
x86/cpu: Add new VMX feature, Tertiary VM-Execution control

A new 64-bit control field "tertiary processor-based VM-execution
controls", is defined [1]. It's controlled by bit 17 of the primary
processor-based VM-execution controls.

Different from its brother VM-execution fields, this tertiary VM-
execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs,
TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.

Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3
(0x492), is also semantically different from its brothers, whose 64 bits
consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2].
Therefore, its init_vmx_capabilities() is a little different from others.

[1] ISE 6.2 "VMCS Changes"
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

[2] SDM Vol3. Appendix A.3

Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153240.11549-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Comment FNAME(sync_page) to document TLB flushing logic
Sean Christopherson [Fri, 13 May 2022 19:50:00 +0000 (19:50 +0000)]
KVM: x86/mmu: Comment FNAME(sync_page) to document TLB flushing logic

Add a comment to FNAME(sync_page) to explain why the TLB flushing logic
conspiculously doesn't handle the scenario of guest protections being
reduced.  Specifically, if synchronizing a SPTE drops execute protections,
KVM will not emit a TLB flush, whereas dropping writable or clearing A/D
bits does trigger a flush via mmu_spte_update().  Architecturally, until
the GPTE is implicitly or explicitly flushed from the guest's perspective,
KVM is not required to flush any old, stale translations.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220513195000.99371-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page()
Sean Christopherson [Fri, 13 May 2022 19:49:59 +0000 (19:49 +0000)]
KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page()

All of sync_page()'s existing checks filter out only !PRESENT gPTE,
because without execute-only, all upper levels are guaranteed to be at
least READABLE.  However, if EPT with execute-only support is in use by
L1, KVM can create an SPTE that is shadow-present but guest-inaccessible
(RWX=0) if the upper level combined permissions are R (or RW) and
the leaf EPTE is changed from R (or RW) to X.  Because the EPTE is
considered present when viewed in isolation, and no reserved bits are set,
FNAME(prefetch_invalid_gpte) will consider the GPTE valid, and cause a
not-present SPTE to be created.

The SPTE is "correct": the guest translation is inaccessible because
the combined protections of all levels yield RWX=0, and KVM will just
redirect any vmexits to the guest.  If EPT A/D bits are disabled, KVM
can mistake the SPTE for an access-tracked SPTE, but again such confusion
isn't fatal, as the "saved" protections are also RWX=0.  However,
creating a useless SPTE in general means that KVM messed up something,
even if this particular goof didn't manifest as a functional bug.
So, drop SPTEs whose new protections will yield a RWX=0 SPTE, and
add a WARN in make_spte() to detect creation of SPTEs that will
result in RWX=0 protections.

Fixes: d95c55687e11 ("kvm: mmu: track read permission explicitly for shadow EPT page tables")
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220513195000.99371-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: nSVM: Add svm_nested_soft_inject_test
Maciej S. Szmigiero [Sun, 1 May 2022 22:07:35 +0000 (00:07 +0200)]
KVM: selftests: nSVM: Add svm_nested_soft_inject_test

Add a KVM self-test that checks whether a nSVM L1 is able to successfully
inject a software interrupt, a soft exception and a NMI into its L2 guest.

In practice, this tests both the next_rip field consistency and
L1-injected event with intervening L0 VMEXIT during its delivery:
the first nested VMRUN (that's also trying to inject a software interrupt)
will immediately trigger a L0 NPF.
This L0 NPF will have zero in its CPU-returned next_rip field, which if
incorrectly reused by KVM will trigger a #PF when trying to return to
such address 0 from the interrupt handler.

For NMI injection this tests whether the L1 NMI state isn't getting
incorrectly mixed with the L2 NMI state if a L1 -> L2 NMI needs to be
re-injected.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
[sean: check exact L2 RIP on first soft interrupt]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <d5f3d56528558ad8e28a9f1e1e4187f5a1e6770a.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nSVM: Transparently handle L1 -> L2 NMI re-injection
Maciej S. Szmigiero [Sun, 1 May 2022 22:07:34 +0000 (00:07 +0200)]
KVM: nSVM: Transparently handle L1 -> L2 NMI re-injection

A NMI that L1 wants to inject into its L2 should be directly re-injected,
without causing L0 side effects like engaging NMI blocking for L1.

It's also worth noting that in this case it is L1 responsibility
to track the NMI window status for its L2 guest.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <f894d13501cd48157b3069a4b4c7369575ddb60e.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint
Sean Christopherson [Sun, 1 May 2022 22:07:33 +0000 (00:07 +0200)]
KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint

In the IRQ injection tracepoint, differentiate between Hard IRQs and Soft
"IRQs", i.e. interrupts that are reinjected after incomplete delivery of
a software interrupt from an INTn instruction.  Tag reinjected interrupts
as such, even though the information is usually redundant since soft
interrupts are only ever reinjected by KVM.  Though rare in practice, a
hard IRQ can be reinjected.

Signed-off-by: Sean Christopherson <seanjc@google.com>
[MSS: change "kvm_inj_virq" event "reinjected" field type to bool]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <9664d49b3bd21e227caa501cff77b0569bebffe2.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Print error code in exception injection tracepoint iff valid
Sean Christopherson [Sun, 1 May 2022 22:07:32 +0000 (00:07 +0200)]
KVM: x86: Print error code in exception injection tracepoint iff valid

Print the error code in the exception injection tracepoint if and only if
the exception has an error code.  Define the entire error code sequence
as a set of formatted strings, print empty strings if there's no error
code, and abuse __print_symbolic() by passing it an empty array to coerce
it into printing the error code as a hex string.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <e8f0511733ed2a0410cbee8a0a7388eac2ee5bac.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Trace re-injected exceptions
Sean Christopherson [Sun, 1 May 2022 22:07:31 +0000 (00:07 +0200)]
KVM: x86: Trace re-injected exceptions

Trace exceptions that are re-injected, not just those that KVM is
injecting for the first time.  Debugging re-injection bugs is painful
enough as is, not having visibility into what KVM is doing only makes
things worse.

Delay propagating pending=>injected in the non-reinjection path so that
the tracing can properly identify reinjected exceptions.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <25470690a38b4d2b32b6204875dd35676c65c9f2.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SVM: Re-inject INTn instead of retrying the insn on "failure"
Sean Christopherson [Sun, 1 May 2022 22:07:30 +0000 (00:07 +0200)]
KVM: SVM: Re-inject INTn instead of retrying the insn on "failure"

Re-inject INTn software interrupts instead of retrying the instruction if
the CPU encountered an intercepted exception while vectoring the INTn,
e.g. if KVM intercepted a #PF when utilizing shadow paging.  Retrying the
instruction is architecturally wrong e.g. will result in a spurious #DB
if there's a code breakpoint on the INT3/O, and lack of re-injection also
breaks nested virtualization, e.g. if L1 injects a software interrupt and
vectoring the injected interrupt encounters an exception that is
intercepted by L0 but not L1.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <1654ad502f860948e4f2d57b8bd881d67301f785.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SVM: Re-inject INT3/INTO instead of retrying the instruction
Sean Christopherson [Sun, 1 May 2022 22:07:29 +0000 (00:07 +0200)]
KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction

Re-inject INT3/INTO instead of retrying the instruction if the CPU
encountered an intercepted exception while vectoring the software
exception, e.g. if vectoring INT3 encounters a #PF and KVM is using
shadow paging.  Retrying the instruction is architecturally wrong, e.g.
will result in a spurious #DB if there's a code breakpoint on the INT3/O,
and lack of re-injection also breaks nested virtualization, e.g. if L1
injects a software exception and vectoring the injected exception
encounters an exception that is intercepted by L0 but not L1.

Due to, ahem, deficiencies in the SVM architecture, acquiring the next
RIP may require flowing through the emulator even if NRIPS is supported,
as the CPU clears next_rip if the VM-Exit is due to an exception other
than "exceptions caused by the INT3, INTO, and BOUND instructions".  To
deal with this, "skip" the instruction to calculate next_rip (if it's
not already known), and then unwind the RIP write and any side effects
(RFLAGS updates).

Save the computed next_rip and use it to re-stuff next_rip if injection
doesn't complete.  This allows KVM to do the right thing if next_rip was
known prior to injection, e.g. if L1 injects a soft event into L2, and
there is no backing INTn instruction, e.g. if L1 is injecting an
arbitrary event.

Note, it's impossible to guarantee architectural correctness given SVM's
architectural flaws.  E.g. if the guest executes INTn (no KVM injection),
an exit occurs while vectoring the INTn, and the guest modifies the code
stream while the exit is being handled, KVM will compute the incorrect
next_rip due to "skipping" the wrong instruction.  A future enhancement
to make this less awful would be for KVM to detect that the decoded
instruction is not the correct INTn and drop the to-be-injected soft
event (retrying is a lesser evil compared to shoving the wrong RIP on the
exception stack).

Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <65cb88deab40bc1649d509194864312a89bbe02e.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported
Sean Christopherson [Sun, 1 May 2022 22:07:28 +0000 (00:07 +0200)]
KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported

If NRIPS is supported in hardware but disabled in KVM, set next_rip to
the next RIP when advancing RIP as part of emulating INT3 injection.
There is no flag to tell the CPU that KVM isn't using next_rip, and so
leaving next_rip is left as is will result in the CPU pushing garbage
onto the stack when vectoring the injected event.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <cd328309a3b88604daa2359ad56f36cb565ce2d4.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"
Sean Christopherson [Sun, 1 May 2022 22:07:27 +0000 (00:07 +0200)]
KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"

Unwind the RIP advancement done by svm_queue_exception() when injecting
an INT3 ultimately "fails" due to the CPU encountering a VM-Exit while
vectoring the injected event, even if the exception reported by the CPU
isn't the same event that was injected.  If vectoring INT3 encounters an
exception, e.g. #NP, and vectoring the #NP encounters an intercepted
exception, e.g. #PF when KVM is using shadow paging, then the #NP will
be reported as the event that was in-progress.

Note, this is still imperfect, as it will get a false positive if the
INT3 is cleanly injected, no VM-Exit occurs before the IRET from the INT3
handler in the guest, the instruction following the INT3 generates an
exception (directly or indirectly), _and_ vectoring that exception
encounters an exception that is intercepted by KVM.  The false positives
could theoretically be solved by further analyzing the vectoring event,
e.g. by comparing the error code against the expected error code were an
exception to occur when vectoring the original injected exception, but
SVM without NRIPS is a complete disaster, trying to make it 100% correct
is a waste of time.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <450133cf0a026cb9825a2ff55d02cb136a1cb111.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0
Maciej S. Szmigiero [Sun, 1 May 2022 22:07:26 +0000 (00:07 +0200)]
KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0

Don't BUG/WARN on interrupt injection due to GIF being cleared,
since it's trivial for userspace to force the situation via
KVM_SET_VCPU_EVENTS (even if having at least a WARN there would be correct
for KVM internally generated injections).

  kernel BUG at arch/x86/kvm/svm/svm.c:3386!
  invalid opcode: 0000 [#1] SMP
  CPU: 15 PID: 926 Comm: smm_test Not tainted 5.17.0-rc3+ #264
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:svm_inject_irq+0xab/0xb0 [kvm_amd]
  Code: <0f> 0b 0f 1f 00 0f 1f 44 00 00 80 3d ac b3 01 00 00 55 48 89 f5 53
  RSP: 0018:ffffc90000b37d88 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88810a234ac0 RCX: 0000000000000006
  RDX: 0000000000000000 RSI: ffffc90000b37df7 RDI: ffff88810a234ac0
  RBP: ffffc90000b37df7 R08: ffff88810a1fa410 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff888109571000 R14: ffff88810a234ac0 R15: 0000000000000000
  FS:  0000000001821380(0000) GS:ffff88846fdc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f74fc550008 CR3: 000000010a6fe000 CR4: 0000000000350ea0
  Call Trace:
   <TASK>
   inject_pending_event+0x2f7/0x4c0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x791/0x17a0 [kvm]
   kvm_vcpu_ioctl+0x26d/0x650 [kvm]
   __x64_sys_ioctl+0x82/0xb0
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Fixes: 219b65dcf6c0 ("KVM: SVM: Improve nested interrupt injection")
Cc: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <35426af6e123cbe91ec7ce5132ce72521f02b1b5.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nSVM: Sync next_rip field from vmcb12 to vmcb02
Maciej S. Szmigiero [Sun, 1 May 2022 22:07:25 +0000 (00:07 +0200)]
KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02

The next_rip field of a VMCB is *not* an output-only field for a VMRUN.
This field value (instead of the saved guest RIP) in used by the CPU for
the return address pushed on stack when injecting a software interrupt or
INT3 or INTO exception.

Make sure this field gets synced from vmcb12 to vmcb02 when entering L2 or
loading a nested state and NRIPS is exposed to L1.  If NRIPS is supported
in hardware but not exposed to L1 (nrips=0 or hidden by userspace), stuff
vmcb02's next_rip from the new L2 RIP to emulate a !NRIPS CPU (which
saves RIP on the stack as-is).

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <c2e0a3d78db3ae30530f11d4e9254b452a89f42b.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoMerge tag 'kvm-s390-next-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Tue, 7 Jun 2022 16:28:53 +0000 (12:28 -0400)]
Merge tag 'kvm-s390-next-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: pvdump and selftest improvements

- add an interface to provide a hypervisor dump for secure guests
- improve selftests to show tests

3 years agoMerge branch 'kvm-5.20-early-patches' into HEAD
Paolo Bonzini [Tue, 7 Jun 2022 16:06:39 +0000 (12:06 -0400)]
Merge branch 'kvm-5.20-early-patches' into HEAD

3 years agoMerge branch 'kvm-5.19-early-fixes' into HEAD
Paolo Bonzini [Tue, 7 Jun 2022 16:06:02 +0000 (12:06 -0400)]
Merge branch 'kvm-5.19-early-fixes' into HEAD

3 years agoKVM: SVM: fix tsc scaling cache logic
Maxim Levitsky [Mon, 6 Jun 2022 18:11:49 +0000 (21:11 +0300)]
KVM: SVM: fix tsc scaling cache logic

SVM uses a per-cpu variable to cache the current value of the
tsc scaling multiplier msr on each cpu.

Commit 1ab9287add5e2
("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
broke this caching logic.

Refactor the code so that all TSC scaling multiplier writes go through
a single function which checks and updates the cache.

This fixes the following scenario:

1. A CPU runs a guest with some tsc scaling ratio.

2. New guest with different tsc scaling ratio starts on this CPU
   and terminates almost immediately.

   This ensures that the short running guest had set the tsc scaling ratio just
   once when it was set via KVM_SET_TSC_KHZ. Due to the bug,
   the per-cpu cache is not updated.

3. The original guest continues to run, it doesn't restore the msr
   value back to its own value, because the cache matches,
   and thus continues to run with a wrong tsc scaling ratio.

Fixes: 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Make hyperv_clock selftest more stable
Vitaly Kuznetsov [Wed, 1 Jun 2022 14:43:22 +0000 (16:43 +0200)]
KVM: selftests: Make hyperv_clock selftest more stable

hyperv_clock doesn't always give a stable test result, especially with
AMD CPUs. The test compares Hyper-V MSR clocksource (acquired either
with rdmsr() from within the guest or KVM_GET_MSRS from the host)
against rdtsc(). To increase the accuracy, increase the measured delay
(done with nop loop) by two orders of magnitude and take the mean rdtsc()
value before and after rdmsr()/KVM_GET_MSRS.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220601144322.1968742-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging
Ben Gardon [Wed, 25 May 2022 23:09:04 +0000 (23:09 +0000)]
KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging

Currently disabling dirty logging with the TDP MMU is extremely slow.
On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds
to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with
the shadow MMU.

When disabling dirty logging, zap non-leaf parent entries to allow
replacement with huge pages instead of recursing and zapping all of the
child, leaf entries. This reduces the number of TLB flushes required.
and reduces the disable dirty log time with the TDP MMU to ~3 seconds.

Opportunistically add a WARN() to catch GFNs that are mapped at a
higher level than their max level.

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220525230904.1584480-1-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agox86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
Jan Beulich [Tue, 7 Jun 2022 15:00:53 +0000 (17:00 +0200)]
x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()

As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs
and clobbering of "cc" don't work well together. The compiler appears to
mean to reject such, but doesn't - in its upstream form - quite manage
to yet for "cc". Furthermore two similar macros don't clobber "cc", and
clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler
always assumes status flags to be clobbered there.

Fixes: 989b5db215a2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()
Shaoqin Huang [Tue, 7 Jun 2022 00:59:05 +0000 (18:59 -0600)]
KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()

When freeing obsolete previous roots, check prev_roots as intended, not
the current root.

Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com>
Fixes: 527d5cd7eece ("KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped")
Message-Id: <20220607005905.2933378-1-shaoqin.huang@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoentry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set
Seth Forshee [Wed, 4 May 2022 18:08:40 +0000 (13:08 -0500)]
entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set

A livepatch transition may stall indefinitely when a kvm vCPU is heavily
loaded. To the host, the vCPU task is a user thread which is spending a
very long time in the ioctl(KVM_RUN) syscall. During livepatch
transition, set_notify_signal() will be called on such tasks to
interrupt the syscall so that the task can be transitioned. This
interrupts guest execution, but when xfer_to_guest_mode_work() sees that
TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an
exit to user mode is unnecessary, and guest execution is resumed without
transitioning the task for the livepatch.

This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal()
is expected to break tasks out of interruptible kernel loops and cause
them to return to userspace. Change xfer_to_guest_mode_work() to handle
TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run
loop that an exit to userpsace is needed. Any pending task_work will be
run when get_signal() is called from exit_to_user_mode_loop(), so there
is no longer any need to run task work from xfer_to_guest_mode_work().

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: Don't null dereference ops->destroy
Alexey Kardashevskiy [Wed, 1 Jun 2022 01:43:28 +0000 (03:43 +0200)]
KVM: Don't null dereference ops->destroy

A KVM device cleanup happens in either of two callbacks:
1) destroy() which is called when the VM is being destroyed;
2) release() which is called when a device fd is closed.

Most KVM devices use 1) but Book3s's interrupt controller KVM devices
(XICS, XIVE, XIVE-native) use 2) as they need to close and reopen during
the machine execution. The error handling in kvm_ioctl_create_device()
assumes destroy() is always defined which leads to NULL dereference as
discovered by Syzkaller.

This adds a checks for destroy!=NULL and adds a missing release().

This is not changing kvm_destroy_devices() as devices with defined
release() should have been removed from the KVM devices list by then.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoLinux 5.19-rc1 v5.19-rc1
Linus Torvalds [Mon, 6 Jun 2022 00:18:54 +0000 (17:18 -0700)]
Linux 5.19-rc1

3 years agoMerge tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Mon, 6 Jun 2022 00:14:03 +0000 (17:14 -0700)]
Merge tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file descriptor fix from Al Viro:
 "Fix for breakage in #work.fd this window"

* tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix the breakage in close_fd_get_file() calling conventions change

3 years agoMerge tag 'mm-hotfixes-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Mon, 6 Jun 2022 00:05:38 +0000 (17:05 -0700)]
Merge tag 'mm-hotfixes-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm hotfixes from Andrew Morton:
 "Fixups for various recently-added and longer-term issues and a few
  minor tweaks:

   - fixes for material merged during this merge window

   - cc:stable fixes for more longstanding issues

   - minor mailmap and MAINTAINERS updates"

* tag 'mm-hotfixes-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery
  x86/kexec: fix memory leak of elf header buffer
  mm/memremap: fix missing call to untrack_pfn() in pagemap_range()
  mm: page_isolation: use compound_nr() correctly in isolate_single_pageblock()
  mm: hugetlb_vmemmap: fix CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
  MAINTAINERS: add maintainer information for z3fold
  mailmap: update Josh Poimboeuf's email

3 years agoMerge tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Jun 2022 23:58:27 +0000 (16:58 -0700)]
Merge tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull delay-accounting update from Andrew Morton:
 "A single featurette for delay accounting.

  Delayed a bit because, unusually, it had dependencies on both the
  mm-stable and mm-nonmm-stable queues"

* tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  delayacct: track delays from write-protect copy

3 years agobluetooth: don't use bitmaps for random flag accesses
Linus Torvalds [Sun, 5 Jun 2022 18:51:48 +0000 (11:51 -0700)]
bluetooth: don't use bitmaps for random flag accesses

The bluetooth code uses our bitmap infrastructure for the two bits (!)
of connection setup flags, and in the process causes odd problems when
it converts between a bitmap and just the regular values of said bits.

It's completely pointless to do things like bitmap_to_arr32() to convert
a bitmap into a u32.  It shoudln't have been a bitmap in the first
place.  The reason to use bitmaps is if you have arbitrary number of
bits you want to manage (not two!), or if you rely on the atomicity
guarantees of the bitmap setting and clearing.

The code could use an "atomic_t" and use "atomic_or/andnot()" to set and
clear the bit values, but considering that it then copies the bitmaps
around with "bitmap_to_arr32()" and friends, there clearly cannot be a
lot of atomicity requirements.

So just use a regular integer.

In the process, this avoids the warnings about erroneous use of
bitmap_from_u64() which were triggered on 32-bit architectures when
conversion from a u64 would access two words (and, surprise, surprise,
only one word is needed - and indeed overkill - for a 2-bit bitmap).

That was always problematic, but the compiler seems to notice it and
warn about the invalid pattern only after commit 0a97953fd221 ("lib: add
bitmap_{from,to}_arr64") changed the exact implementation details of
'bitmap_from_u64()', as reported by Sudip Mukherjee and Stephen Rothwell.

Fixes: fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params flags")
Link: https://lore.kernel.org/all/YpyJ9qTNHJzz0FHY@debian/
Link: https://lore.kernel.org/all/20220606080631.0c3014f2@canb.auug.org.au/
Link: https://lore.kernel.org/all/20220605162537.1604762-1-yury.norov@gmail.com/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofix the breakage in close_fd_get_file() calling conventions change
Al Viro [Sun, 5 Jun 2022 18:01:42 +0000 (14:01 -0400)]
fix the breakage in close_fd_get_file() calling conventions change

It used to grab an extra reference to struct file rather than
just transferring to caller the one it had removed from descriptor
table.  New variant doesn't, and callers need to be adjusted.

Reported-and-tested-by: syzbot+47dd250f527cb7bebf24@syzkaller.appspotmail.com
Fixes: 6319194ec57b ("Unify the primitives for file descriptor closing")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agoMerge tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 18:00:43 +0000 (11:00 -0700)]
Merge tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX fix from Thomas Gleixner:
 "A single fix for x86/SGX to prevent that memory which is allocated for
  an SGX enclave is accounted to the wrong memory control group"

* tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Set active memcg prior to shmem allocation

3 years agoMerge tag 'x86-mm-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Sun, 5 Jun 2022 17:57:35 +0000 (10:57 -0700)]
Merge tag 'x86-mm-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm cleanup from Thomas Gleixner:
 "Use PAGE_ALIGNED() instead of open coding it in the x86/mm code"

* tag 'x86-mm-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Use PAGE_ALIGNED(x) instead of IS_ALIGNED(x, PAGE_SIZE)

3 years agoMerge tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Jun 2022 17:55:23 +0000 (10:55 -0700)]
Merge tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode updates from Thomas Gleixner:

 - Disable late microcode loading by default. Unless the HW people get
   their act together and provide a required minimum version in the
   microcode header for making a halfways informed decision its just
   lottery and broken.

 - Warn and taint the kernel when microcode is loaded late

 - Remove the old unused microcode loader interface

 - Remove a redundant perf callback from the microcode loader

* tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Remove unnecessary perf callback
  x86/microcode: Taint and warn on late loading
  x86/microcode: Default-disable late loading
  x86/microcode: Rip out the OLD_INTERFACE

3 years agoMerge tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Jun 2022 17:53:41 +0000 (10:53 -0700)]
Merge tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Thomas Gleixner:
 "A set of small x86 cleanups:

   - Remove unused headers in the IDT code

   - Kconfig indendation and comment fixes

   - Fix all 'the the' typos in one go instead of waiting for bots to
     fix one at a time"

* tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix all occurences of the "the the" typo
  x86/idt: Remove unused headers
  x86/Kconfig: Fix indentation of arch/x86/Kconfig.debug
  x86/Kconfig: Fix indentation and add endif comments to arch/x86/Kconfig

3 years agoMerge tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 17:49:42 +0000 (10:49 -0700)]
Merge tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot update from Thomas Gleixner:
 "Use strlcpy() instead of strscpy() in arch_setup()"

* tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/setup: Use strscpy() to replace deprecated strlcpy()

3 years agoMerge tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 17:47:06 +0000 (10:47 -0700)]
Merge tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull clockevent/clocksource updates from Thomas Gleixner:

 - Device tree bindings for MT8186

 - Tell the kernel that the RISC-V SBI timer stops in deeper power
   states

 - Make device tree parsing in sp804 more robust

 - Dead code removal and tiny fixes here and there

 - Add the missing SPDX identifiers

* tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
  clocksource/drivers/timer-ti-dm: Remove unnecessary NULL check
  clocksource/drivers/timer-sun5i: Convert to SPDX identifier
  clocksource/drivers/timer-sun4i: Convert to SPDX identifier
  clocksource/drivers/pistachio: Convert to SPDX identifier
  clocksource/drivers/orion: Convert to SPDX identifier
  clocksource/drivers/lpc32xx: Convert to SPDX identifier
  clocksource/drivers/digicolor: Convert to SPDX identifier
  clocksource/drivers/armada-370-xp: Convert to SPDX identifier
  clocksource/drivers/mips-gic-timer: Convert to SPDX identifier
  clocksource/drivers/jcore: Convert to SPDX identifier
  clocksource/drivers/bcm_kona: Convert to SPDX identifier
  clocksource/drivers/sp804: Avoid error on multiple instances
  clocksource/drivers/riscv: Events are stopped during CPU suspend
  clocksource/drivers/ixp4xx: Drop boardfile probe path
  dt-bindings: timer: Add compatible for Mediatek MT8186

3 years agoMerge tag 'sched-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Jun 2022 17:42:40 +0000 (10:42 -0700)]
Merge tag 'sched-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
 "Fix the fallout of sysctl code move which placed the init function
  wrong"

* tag 'sched-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/autogroup: Fix sysctl move

3 years agoMerge tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 17:40:31 +0000 (10:40 -0700)]
Merge tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:

  - Make the ICL event constraints match reality

  - Remove a unused local variable

* tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Remove unused local variable
  perf/x86/intel: Fix event constraints for ICL

3 years agoMerge tag 'perf-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 17:39:20 +0000 (10:39 -0700)]
Merge tag 'perf-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixlet from Thomas Gleixner:
 "Trivial indentation fix in Kconfig"

* tag 'perf-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/Kconfig: Fix indentation in the Kconfig file

3 years agoMerge tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Jun 2022 16:45:27 +0000 (09:45 -0700)]
Merge tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Thomas Gleixner:

 - Handle __ubsan_handle_builtin_unreachable() correctly and treat it as
   noreturn

 - Allow architectures to select uaccess validation

 - Use the non-instrumented bit test for test_cpu_has() to prevent
   escape from non-instrumentable regions

 - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape
   from non-instrumentable regions

 - Mark a few tiny inline as __always_inline to prevent GCC from
   bringing them out of line and instrumenting them

 - Mark the empty stub context_tracking_enabled() as always inline as
   GCC brings them out of line and instruments the empty shell

 - Annotate ex_handler_msr_mce() as dead end

* tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/extable: Annotate ex_handler_msr_mce() as a dead end
  context_tracking: Always inline empty stubs
  x86: Always inline on_thread_stack() and current_top_of_stack()
  jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds
  x86/cpu: Elide KCSAN for cpu_has() and friends
  objtool: Mark __ubsan_handle_builtin_unreachable() as noreturn
  objtool: Add CONFIG_HAVE_UACCESS_VALIDATION

3 years agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 5 Jun 2022 16:25:12 +0000 (09:25 -0700)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "Mostly small bug fixes plus other trivial updates.

  The major change of note is moving ufs out of scsi and a minor update
  to lpfc vmid handling"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
  scsi: qla2xxx: Remove unused 'ql_dm_tgt_ex_pct' parameter
  scsi: qla2xxx: Remove setting of 'req' and 'rsp' parameters
  scsi: mpi3mr: Fix kernel-doc
  scsi: lpfc: Add support for ATTO Fibre Channel devices
  scsi: core: Return BLK_STS_TRANSPORT for ALUA transitioning
  scsi: sd_zbc: Prevent zone information memory leak
  scsi: sd: Fix potential NULL pointer dereference
  scsi: mpi3mr: Rework mrioc->bsg_device model to fix warnings
  scsi: myrb: Fix up null pointer access on myrb_cleanup()
  scsi: core: Unexport scsi_bus_type
  scsi: sd: Don't call blk_cleanup_disk() in sd_probe()
  scsi: ufs: ufshcd: Delete unnecessary NULL check
  scsi: isci: Fix typo in comment
  scsi: pmcraid: Fix typo in comment
  scsi: smartpqi: Fix typo in comment
  scsi: qedf: Fix typo in comment
  scsi: esas2r: Fix typo in comment
  scsi: storvsc: Fix typo in comment
  scsi: ufs: Split the drivers/scsi/ufs directory
  scsi: qla1280: Remove redundant variable
  ...

3 years agoMerge tag 'hte/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra...
Linus Torvalds [Sun, 5 Jun 2022 16:12:28 +0000 (09:12 -0700)]
Merge tag 'hte/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux

Pull hardware timestamping subsystem from Thierry Reding:
 "This contains the new HTE (hardware timestamping engine) subsystem
  that has been in the works for a couple of months now.

  The infrastructure provided allows for drivers to register as hardware
  timestamp providers, while consumers will be able to request events
  that they are interested in (such as GPIOs and IRQs) to be timestamped
  by the hardware providers.

  Note that this currently supports only one provider, but there seems
  to be enough interest in this functionality and we expect to see more
  drivers added once this is merged"

[ Linus Walleij mentions the Intel PMC in the Elkhart and Tiger Lake
  platforms as another future timestamp provider ]

* tag 'hte/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  dt-bindings: timestamp: Correct id path
  dt-bindings: Renamed hte directory to timestamp
  hte: Uninitialized variable in hte_ts_get()
  hte: Fix off by one in hte_push_ts_ns()
  hte: Fix possible use-after-free in tegra_hte_test_remove()
  hte: Remove unused including <linux/version.h>
  MAINTAINERS: Add HTE Subsystem
  hte: Add Tegra HTE test driver
  tools: gpio: Add new hardware clock type
  gpiolib: cdev: Add hardware timestamp clock type
  gpio: tegra186: Add HTE support
  gpiolib: Add HTE support
  dt-bindings: Add HTE bindings
  hte: Add Tegra194 HTE kernel provider
  drivers: Add hardware timestamp engine (HTE) subsystem
  Documentation: Add HTE subsystem guide

3 years agoMerge tag 'kbuild-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Sun, 5 Jun 2022 16:06:03 +0000 (09:06 -0700)]
Merge tag 'kbuild-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - Fix build regressions for parisc, csky, nios2, openrisc

 - Simplify module builds for CONFIG_LTO_CLANG and CONFIG_X86_KERNEL_IBT

 - Remove arch/parisc/nm, which was presumably a workaround for old
   tools

 - Check the odd combination of EXPORT_SYMBOL and 'static' precisely

 - Make external module builds robust against "too long argument error"

 - Support j, k keys for moving the cursor in nconfig

* tag 'kbuild-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
  kbuild: Allow to select bash in a modified environment
  scripts: kconfig: nconf: make nconfig accept jk keybindings
  modpost: use fnmatch() to simplify match()
  modpost: simplify mod->name allocation
  kbuild: factor out the common objtool arguments
  kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o
  kbuild: clean .tmp_* pattern by make clean
  kbuild: remove redundant cleanups in scripts/link-vmlinux.sh
  kbuild: rebuild multi-object modules when objtool is updated
  kbuild: add cmd_and_savecmd macro
  kbuild: make *.mod rule robust against too long argument error
  kbuild: make built-in.a rule robust against too long argument error
  kbuild: check static EXPORT_SYMBOL* by script instead of modpost
  parisc: remove arch/parisc/nm
  kbuild: do not create *.prelink.o for Clang LTO or IBT
  kbuild: replace $(linked-object) with CONFIG options
  kbuild: do not try to parse *.cmd files for objects provided by compiler
  kbuild: replace $(if A,A,B) with $(or A,B) in scripts/Makefile.modpost
  modpost: squash if...else-if in find_elf_symbol2()
  modpost: reuse ARRAY_SIZE() macro for section_mismatch()
  ...

3 years agoMerge tag 'pull-18-rc1-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 02:07:15 +0000 (19:07 -0700)]
Merge tag 'pull-18-rc1-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs pathname updates from Al Viro:
 "Several cleanups in fs/namei.c"

* tag 'pull-18-rc1-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  namei: cleanup double word in comment
  get rid of dead code in legitimize_root()
  fs/namei.c:reserve_stack(): tidy up the call of try_to_unlazy()

3 years agoMerge tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 02:00:05 +0000 (19:00 -0700)]
Merge tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull mount handling updates from Al Viro:
 "Cleanups (and one fix) around struct mount handling.

  The fix is usermode_driver.c one - once you've done kern_mount(), you
  must kern_unmount(); simple mntput() will end up with a leak. Several
  failure exits in there messed up that way... In practice you won't hit
  those particular failure exits without fault injection, though"

* tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  move mount-related externs from fs.h to mount.h
  blob_to_mnt(): kern_unmount() is needed to undo kern_mount()
  m->mnt_root->d_inode->i_sb is a weird way to spell m->mnt_sb...
  linux/mount.h: trim includes
  uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)

3 years agoMerge tag 'pull-18-rc1-work.fd' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jun 2022 01:52:00 +0000 (18:52 -0700)]
Merge tag 'pull-18-rc1-work.fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file descriptor updates from Al Viro.

 - Descriptor handling cleanups

* tag 'pull-18-rc1-work.fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Unify the primitives for file descriptor closing
  fs: remove fget_many and fput_many interface
  io_uring_enter(): don't leave f.flags uninitialized

3 years agoMerge tag '5.19-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 5 Jun 2022 00:42:33 +0000 (17:42 -0700)]
Merge tag '5.19-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Nine cifs/smb3 client fixes.

  Includes DFS fixes, some cleanup of leagcy SMB1 code, duplicated
  message cleanup and a double free and deadlock fix"

* tag '5.19-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix uninitialized pointer in error case in dfs_cache_get_tgt_share
  cifs: skip trailing separators of prefix paths
  cifs: update internal module number
  cifs: version operations for smb20 unneeded when legacy support disabled
  cifs: do not build smb1ops if legacy support is disabled
  cifs: fix potential deadlock in direct reclaim
  cifs: when extending a file with falloc we should make files not-sparse
  cifs: remove repeated debug message on cifs_put_smb_ses()
  cifs: fix potential double free during failed mount

3 years agokbuild: Allow to select bash in a modified environment
Schspa Shi [Fri, 3 Jun 2022 09:38:52 +0000 (17:38 +0800)]
kbuild: Allow to select bash in a modified environment

This fixes the build error when the system has a default bash version
which is too old to support associative array variables.

The build error log as fellowing:
linux/scripts/check-local-export: line 11: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]

Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agoscripts: kconfig: nconf: make nconfig accept jk keybindings
Isak Ellmer [Wed, 1 Jun 2022 13:08:19 +0000 (15:08 +0200)]
scripts: kconfig: nconf: make nconfig accept jk keybindings

Make nconfig accept jk keybindings for movement in addition to arrow
keys.

Signed-off-by: Isak Ellmer <isak01@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agomodpost: use fnmatch() to simplify match()
Masahiro Yamada [Mon, 30 May 2022 09:01:39 +0000 (18:01 +0900)]
modpost: use fnmatch() to simplify match()

Replace the own implementation for wildcard (glob) matching with
a function call to fnmatch().

Also, change the return type to 'bool'.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agomodpost: simplify mod->name allocation
Masahiro Yamada [Mon, 30 May 2022 09:01:38 +0000 (18:01 +0900)]
modpost: simplify mod->name allocation

mod->name is set to the ELF filename with the suffix ".o" stripped.

The current code calls strdup() and free() to manipulate the string,
but a simpler approach is to pass new_module() with the name length
subtracted by 2.

Also, check if the passed filename ends with ".o" before stripping it.

The current code blindly chops the suffix:

    tmp[strlen(tmp) - 2] = '\0'

It will cause buffer under-run if strlen(tmp) < 2;

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
3 years agokbuild: factor out the common objtool arguments
Masahiro Yamada [Sat, 28 May 2022 15:47:04 +0000 (00:47 +0900)]
kbuild: factor out the common objtool arguments

scripts/Makefile.build and scripts/link-vmlinux.sh have similar setups
for the objtool arguments.

It was difficult to factor out them because all the vmlinux build rules
were written in a shell script. It is somewhat tedious to touch the two
files every time a new objtool option is supported.

To reduce the code duplication, move the objtool for vmlinux.o into
scripts/Makefile.vmlinux_o. Then, move the common macros to Makefile.lib
so they are shared between Makefile.build and Makefile.vmlinux_o.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
3 years agokbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o
Masahiro Yamada [Sat, 28 May 2022 15:47:03 +0000 (00:47 +0900)]
kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o

This is a preparation for moving the objtool rule in the next commit.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
3 years agokbuild: clean .tmp_* pattern by make clean
Masahiro Yamada [Sat, 28 May 2022 15:47:02 +0000 (00:47 +0900)]
kbuild: clean .tmp_* pattern by make clean

Change the "make clean" rule to remove all the .tmp_* files.

.tmp_objdiff is the only exception, which should be removed by
"make mrproper".

Rename the record directory of objdiff, .tmp_objdiff to .objdiff to
avoid the removal by "make clean".

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
3 years agoMerge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux
Linus Torvalds [Sat, 4 Jun 2022 21:04:27 +0000 (14:04 -0700)]
Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - bitmap: optimize bitmap_weight() usage, from me

 - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
   Carvalho Chehab

 - include/linux/find: Fix documentation, from Anna-Maria Behnsen

 - bitmap: fix conversion from/to fix-sized arrays, from me

 - bitmap: Fix return values to be unsigned, from Kees Cook

It has been in linux-next for at least a week with no problems.

* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
  nodemask: Fix return values to be unsigned
  bitmap: Fix return values to be unsigned
  KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
  KVM: x86: hyper-v: fix type of valid_bank_mask
  ia64: cleanup remove_siblinginfo()
  drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
  KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
  lib/bitmap: add test for bitmap_{from,to}_arr64
  lib: add bitmap_{from,to}_arr64
  lib/bitmap: extend comment for bitmap_(from,to)_arr32()
  include/linux/find: Fix documentation
  lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
  genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
  irq: mips: replace cpumask_weight with cpumask_empty where appropriate
  drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  ...

3 years agoMerge tag 'for-5.19/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sat, 4 Jun 2022 20:50:23 +0000 (13:50 -0700)]
Merge tag 'for-5.19/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull more parisc architecture updates from Helge Deller:
 "A fix to prevent crash at bootup if CONFIG_SCHED_MC is enabled, and
  add auto-detection of primary graphics card for framebuffer driver"

* tag 'for-5.19/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc/stifb: Keep track of hardware path of graphics card
  parisc/stifb: Implement fb_is_primary_device()
  parisc: fix a crash with multicore scheduler