Arnd Bergmann [Thu, 9 Nov 2017 21:38:18 +0000 (13:38 -0800)]
sysctl: add register_sysctl() dummy helper
register_sysctl() has been around for five years with commit fea478d4101a ("sysctl: Add register_sysctl for normal sysctl users") but
now that arm64 started using it, I ran into a compile error:
arch/arm64/kernel/armv8_deprecated.c: In function 'register_insn_emulation_sysctl':
arch/arm64/kernel/armv8_deprecated.c:257:2: error: implicit declaration of function 'register_sysctl'
This adds a inline function like we already have for
register_sysctl_paths() and register_sysctl_table().
Link: http://lkml.kernel.org/r/20171106133700.558647-1-arnd@arndb.de Fixes: 38b9aeb32fa7 ("arm64: Port deprecated instruction emulation to new sysctl interface") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Alex Benne <alex.bennee@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 10 Nov 2017 01:43:27 +0000 (17:43 -0800)]
Merge tag 'pci-v4.14-fixes-7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI maintainership updates from Bjorn Helgaas:
"Update MAINTAINERS for HiSilicon, Microsemi Switchtec, and native host
bridge drivers (Gabriele Paoloni, Sebastian Andrzej Siewior).
Note that starting with changes intended for v4.16, Lorenzo Pieralisi
will maintain the drivers/pci/{dwc,endpoint,host} directories. My
intent is to continue to merge those changes via my tree, so this
should be transparent to you"
* tag 'pci-v4.14-fixes-7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Add Lorenzo Pieralisi for PCI host bridge drivers
MAINTAINERS: Remove Gabriele Paoloni as HiSilicon PCI maintainer
MAINTAINERS: Remove Stephen Bates as Microsemi Switchtec maintainer
Linus Torvalds [Thu, 9 Nov 2017 19:16:28 +0000 (11:16 -0800)]
Merge tag 'pm-final-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull final power management fixes from Rafael Wysocki:
"These fix a regression in the schedutil cpufreq governor introduced by
a recent change and blacklist Dell XPS13 9360 from using the Low Power
S0 Idle _DSM interface which triggers serious problems on one of these
machines.
Specifics:
- Prevent the schedutil cpufreq governor from using the utilization
of a wrong CPU in some cases which started to happen after one of
the recent changes in it (Chris Redpath).
- Blacklist Dell XPS13 9360 from using the Low Power S0 Idle _DSM
interface as that causes serious issue (related to NVMe) to appear
on one of these machines, even though the other Dells XPS13 9360 in
somewhat different HW configurations behave correctly (Rafael
Wysocki)"
* tag 'pm-final-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360
cpufreq: schedutil: Examine the correct CPU when we update util
Linus Torvalds [Thu, 9 Nov 2017 17:58:11 +0000 (09:58 -0800)]
Merge tag 'sound-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The amount of the changes isn't as quite small as wished, nevertheless
they are straight fixes that deserve merging to 4.14 final.
Most of fixes are about ALSA core bugs spotted by fuzzer: a follow-up
fix for the previous nested rwsem patch, a fix to avoid the resource
hogs due to too many concurrent ALSA timer invocations, and a fix for
a crash with SYSEX MIDI transfer over OSS sequencer emulation that is
used by none but fuzzer.
The rest are usual HD-audio and USB-audio device-specific quirks,
which are safe to apply"
* tag 'sound-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - fix headset mic problem for Dell machines with alc274
ALSA: seq: Fix OSS sysex delivery in OSS emulation
ALSA: seq: Avoid invalid lockdep class warning
ALSA: timer: Limit max instances per timer
ALSA: usb-audio: support new Amanero Combo384 firmware version
1) Fix use-after-free in IPSEC input parsing, desintation address
pointer was loaded before pskb_may_pull() which can change the SKB
data pointers. From Florian Westphal.
2) Stack out-of-bounds read in xfrm_state_find(), from Steffen
Klassert.
3) IPVS state of SKB is not properly reset when moving between
namespaces, from Ye Yin.
4) Fix crash in asix driver suspend and resume, from Andrey Konovalov.
5) Don't deliver ipv6 l2tp tunnel packets to ipv4 l2tp tunnels, and
vice versa, from Guillaume Nault.
6) Fix DSACK undo on non-dup ACKs, from Priyaranjan Jha.
7) Fix regression in bond_xmit_hash()'s behavior after the TCP port
selection changes back in 4.2, from Hangbin Liu.
8) Two divide by zero bugs in USB networking drivers when parsing
descriptors, from Bjorn Mork.
9) Fix bonding slaves being stuck in BOND_LINK_FAIL state, from Jay
Vosburgh.
10) Missing skb_reset_mac_header() in qmi_wwan, from Kristian Evensen.
11) Fix the destruction of tc action object races properly, from Cong
Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
cls_u32: use tcf_exts_get_net() before call_rcu()
cls_tcindex: use tcf_exts_get_net() before call_rcu()
cls_rsvp: use tcf_exts_get_net() before call_rcu()
cls_route: use tcf_exts_get_net() before call_rcu()
cls_matchall: use tcf_exts_get_net() before call_rcu()
cls_fw: use tcf_exts_get_net() before call_rcu()
cls_flower: use tcf_exts_get_net() before call_rcu()
cls_flow: use tcf_exts_get_net() before call_rcu()
cls_cgroup: use tcf_exts_get_net() before call_rcu()
cls_bpf: use tcf_exts_get_net() before call_rcu()
cls_basic: use tcf_exts_get_net() before call_rcu()
net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()
Revert "net_sched: hold netns refcnt for each action"
net: usb: asix: fill null-ptr-deref in asix_suspend
Revert "net: usb: asix: fill null-ptr-deref in asix_suspend"
qmi_wwan: Add missing skb_reset_mac_header-call
bonding: fix slave stuck in BOND_LINK_FAIL state
qrtr: Move to postcore_initcall
net: qmi_wwan: fix divide by 0 on bad descriptors
net: cdc_ether: fix divide by 0 on bad descriptors
...
Hui Wang [Thu, 9 Nov 2017 00:48:08 +0000 (08:48 +0800)]
ALSA: hda - fix headset mic problem for Dell machines with alc274
Confirmed with Kailang of Realtek, the pin 0x19 is for Headset Mic, and
the pin 0x1a is for Headphone Mic, he suggested to apply
ALC269_FIXUP_DELL1_MIC_NO_PRESENCE to fix this problem. And we
verified applying this FIXUP can fix this problem.
Cc: <stable@vger.kernel.org> Cc: Kailang Yang <kailang@realtek.com> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Changbin Du [Thu, 9 Nov 2017 06:09:11 +0000 (14:09 +0800)]
x86/build: Make the boot image generation less verbose
This change suppresses the 'dd' output and adds the '-quiet' parameter
to mkisofs tool. It also removes the 'Using ...' messages, as none of the
messages matter to the user normally.
"make V=1" can still be used for a more verbose build.
The new build messages are now a streamlined set of:
$ make isoimage
...
Kernel: arch/x86/boot/bzImage is ready (#75)
GENIMAGE arch/x86/boot/image.iso
Kernel: arch/x86/boot/image.iso is ready
David S. Miller [Thu, 9 Nov 2017 01:03:10 +0000 (10:03 +0900)]
Merge branch 'net-sched-race-fix'
Cong Wang says:
====================
net_sched: close the race between call_rcu() and cleanup_net()
This patchset tries to fix the race between call_rcu() and
cleanup_net() again. Without holding the netns refcnt the
tc_action_net_exit() in netns workqueue could be called before
filter destroy works in tc filter workqueue. This patchset
moves the netns refcnt from tc actions to tcf_exts, without
breaking per-netns tc actions.
Patch 1 reverts the previous fix, patch 2 introduces two new
API's to help to address the bug and the rest patches switch
to the new API's. Please see each patch for details.
I was not able to reproduce this bug, but now after adding
some delay in filter destroy work I manage to trigger the
crash. After this patchset, the crash is not reproducible
any more and the debugging printk's show the order is expected
too.
====================
Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") Reported-by: Lucas Bates <lucasb@mojatatu.com> Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:30 +0000 (13:47 -0800)]
cls_u32: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:29 +0000 (13:47 -0800)]
cls_tcindex: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:28 +0000 (13:47 -0800)]
cls_rsvp: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:27 +0000 (13:47 -0800)]
cls_route: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:26 +0000 (13:47 -0800)]
cls_matchall: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:25 +0000 (13:47 -0800)]
cls_fw: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:24 +0000 (13:47 -0800)]
cls_flower: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:23 +0000 (13:47 -0800)]
cls_flow: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:22 +0000 (13:47 -0800)]
cls_cgroup: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:21 +0000 (13:47 -0800)]
cls_bpf: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:20 +0000 (13:47 -0800)]
cls_basic: use tcf_exts_get_net() before call_rcu()
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.
Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:19 +0000 (13:47 -0800)]
net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()
Instead of holding netns refcnt in tc actions, we can minimize
the holding time by saving it in struct tcf_exts instead. This
means we can just hold netns refcnt right before call_rcu() and
release it after tcf_exts_destroy() is done.
However, because on netns cleanup path we call tcf_proto_destroy()
too, obviously we can not hold netns for a zero refcnt, in this
case we have to do cleanup synchronously. It is fine for RCU too,
the caller cleanup_net() already waits for a grace period.
For other cases, refcnt is non-zero and we can safely grab it as
normal and release it after we are done.
This patch provides two new API for each filter to use:
tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can
use the following pattern:
if (tcf_exts_get_net()) // <== hold netns refcnt
call_rcu(some_rcu_callback);
else
__destroy_filter();
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 6 Nov 2017 21:47:18 +0000 (13:47 -0800)]
Revert "net_sched: hold netns refcnt for each action"
This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
If we hold that refcnt, the netns can never be destroyed until
all actions are destroyed by user, this breaks our netns design
which we expect all actions are destroyed when we destroy the
whole netns.
Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Kosina [Wed, 8 Nov 2017 20:18:18 +0000 (21:18 +0100)]
x86/mm: Unbreak modules that rely on external PAGE_KERNEL availability
Commit 7744ccdbc16f0 ("x86/mm: Add Secure Memory Encryption (SME)
support") as a side-effect made PAGE_KERNEL all of a sudden unavailable
to modules which can't make use of EXPORT_SYMBOL_GPL() symbols.
This is because once SME is enabled, sme_me_mask (which is introduced as
EXPORT_SYMBOL_GPL) makes its way to PAGE_KERNEL through _PAGE_ENC,
causing imminent build failure for all the modules which make use of all
the EXPORT-SYMBOL()-exported API (such as vmap(), __vmalloc(),
remap_pfn_range(), ...).
Exporting (as EXPORT_SYMBOL()) interfaces (and having done so for ages)
that take pgprot_t argument, while making it impossible to -- all of a
sudden -- pass PAGE_KERNEL to it, feels rather incosistent.
Restore the original behavior and make it possible to pass PAGE_KERNEL
to all its EXPORT_SYMBOL() consumers.
[ This is all so not wonderful. We shouldn't need that "sme_me_mask"
access at all in all those places that really don't care about that
level of detail, and just want _PAGE_KERNEL or whatever.
We have some similar issues with _PAGE_CACHE_WP and _PAGE_NOCACHE,
both of which hide a "cachemode2protval()" call, and which also ends
up using another EXPORT_SYMBOL(), but at least that only triggers for
the much more rare cases.
Maybe we could move these dynamic page table bits to be generated much
deeper down in the VM layer, instead of hiding them in the macros that
everybody uses.
So this all would merit some cleanup. But not today. - Linus ]
Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Despised-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Johansen [Wed, 8 Nov 2017 16:09:52 +0000 (08:09 -0800)]
apparmor: fix off-by-one comparison on MAXMAPPED_SIG
This came in yesterday, and I have verified our regression tests
were missing this and it can cause an oops. Please apply.
There is a an off-by-one comparision on sig against MAXMAPPED_SIG
that can lead to a read outside the sig_map array if sig
is MAXMAPPED_SIG. Fix this.
Verified that the check is an out of bounds case that can cause an oops.
Revised: add comparison fix to second case Fixes: cd1dbf76b23d ("apparmor: add the ability to mediate signals") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Biggers [Tue, 7 Nov 2017 22:29:02 +0000 (22:29 +0000)]
KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]
syzkaller reported a NULL pointer dereference in asn1_ber_decoder(). It
can be reproduced by the following command, assuming
CONFIG_PKCS7_TEST_KEY=y:
keyctl add pkcs7_test desc '' @s
The bug is that if the data buffer is empty, an integer underflow occurs
in the following check:
if (unlikely(dp >= datalen - 1))
goto data_overrun_error;
This results in the NULL data pointer being dereferenced.
Fix it by checking for 'datalen - dp < 2' instead.
Also fix the similar check for 'dp >= datalen - n' later in the same
function. That one possibly could result in a buffer overread.
The NULL pointer dereference was reproducible using the "pkcs7_test" key
type but not the "asymmetric" key type because the "asymmetric" key type
checks for a 0-length payload before calling into the ASN.1 decoder but
the "pkcs7_test" key type does not.
Ricardo Neri [Mon, 6 Nov 2017 02:27:57 +0000 (18:27 -0800)]
selftests/x86: Add tests for the STR and SLDT instructions
The STR and SLDT instructions are not valid when running on virtual-8086
mode and generate an invalid operand exception. These two instructions are
protected by the Intel User-Mode Instruction Prevention (UMIP) security
feature. In protected mode, if UMIP is enabled, these instructions generate
a general protection fault if called from CPL > 0. Linux traps the general
protection fault and emulates the instructions sgdt, sidt and smsw; but not
str and sldt.
These tests are added to verify that the emulation code does not emulate
these two instructions but the expected invalid operand exception is
seen.
Tests fallback to exit with INT3 in case emulation does happen.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-13-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:56 +0000 (18:27 -0800)]
selftests/x86: Add tests for User-Mode Instruction Prevention
Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel traps it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed
without causing such #GP. If no #GP exceptions occur, we expect to exit
virtual-8086 mode from INT3.
The instructions protected by UMIP are executed in representative use
cases:
a) displacement-only memory addressing
b) register-indirect memory addressing
c) results stored directly in operands
Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not
have the UMIP feature. Instead, results are printed for verification. A
simple verification is done to ensure that results of all tests are
identical.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-12-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:55 +0000 (18:27 -0800)]
x86/traps: Fix up general protection faults caused by UMIP
If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception() will emulate dummy results for these
instructions as follows: in virtual-8086 and protected modes, sgdt, sidt
and smsw are emulated; str and sldt are not emulated. No emulation is done
for user-space long mode processes.
If emulation is successful, the emulated result is passed to the user space
program and no SIGSEGV signal is emitted.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-11-git-send-email-ricardo.neri-calderon@linux.intel.com
[ Added curly braces. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:54 +0000 (18:27 -0800)]
x86/umip: Enable User-Mode Instruction Prevention at runtime
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a user space to be protected from. Given these similarities, UMIP can be
enabled along with SMAP and SMEP.
At the moment, UMIP is disabled by default at build time. It can be enabled
at build time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time,
it can be disabled at run time by adding clearcpuid=514 to the kernel
parameters.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-10-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:53 +0000 (18:27 -0800)]
x86/umip: Force a page fault when unable to copy emulated result to user
fixup_umip_exception() will be called from do_general_protection(). If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function, inspired in
force_sig_info_fault(), is introduced to model the page fault.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-9-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:52 +0000 (18:27 -0800)]
x86/umip: Add emulation code for UMIP instructions
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.
Rather than relaying to the user space the general protection fault caused
by the UMIP-protected instructions (in the form of a SIGSEGV signal), it
can be trapped and the instruction emulated to provide a dummy result.
This allows to both conserve the current kernel behavior and not reveal the
system resources that UMIP intends to protect (i.e., the locations of the
global descriptor and interrupt descriptor tables, the segment selectors of
the local descriptor table, the value of the task state register and the
contents of the CR0 register).
This emulation is needed because certain applications (e.g., WineHQ and
DOSEMU2) rely on this subset of instructions to function. Given that sldt
and str are not commonly used in programs that run on WineHQ or DOSEMU2,
they are not emulated. Also, emulation is provided only for 32-bit
processes; 64-bit processes that attempt to use the instructions that UMIP
protects will receive the SIGSEGV signal issued as a consequence of the
general protection fault.
The instructions protected by UMIP can be split in two groups. Those which
return a kernel memory address (sgdt and sidt) and those which return a
value (smsw, sldt and str; the last two not emulated).
For the instructions that return a kernel memory address, applications such
as WineHQ rely on the result being located in the kernel memory space, not
the actual location of the table. The result is emulated as a hard-coded
value that lies close to the top of the kernel memory. The limit for the
GDT and the IDT are set to zero.
The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.
Care is taken to appropriately emulate the results when segmentation is
used. That is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref() inspects the segment descriptor pointed by the
registers in pt_regs. This ensures that we correctly obtain the segment
base address and the address and operand sizes even if the user space
application uses a local descriptor table.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-8-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register
This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-7-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:50 +0000 (18:27 -0800)]
x86/insn-eval: Add support to resolve 16-bit address encodings
Tasks running in virtual-8086 mode, in protected mode with code segment
descriptors that specify 16-bit default address sizes via the D bit, or via
an address override prefix will use 16-bit addressing form encodings as
described in the Intel 64 and IA-32 Architecture Software Developer's
Manual Volume 2A Section 2.1.5, Table 2-1.
16-bit addressing encodings differ in several ways from the 32-bit/64-bit
addressing form encodings: ModRM.rm points to different registers and, in
some cases, effective addresses are indicated by the addition of the value
of two registers. Also, there is no support for SIB bytes. Thus, a
separate function is needed to parse this form of addressing.
Three functions are introduced. get_reg_offset_16() obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. get_eff_addr_modrm_16() computes the
effective address from the value of the register operands.
get_addr_ref_16() computes the linear address using the obtained effective
address and the base address of the segment.
Segment limits are enforced when running in protected mode.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-6-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:49 +0000 (18:27 -0800)]
x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
It is possible to utilize 32-bit address encodings in virtual-8086 mode via
an address override instruction prefix. However, the range of the
effective address is still limited to [0x-0xffff]. In such a case, return
error.
Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
such limit by truncating the most significant bytes of the computed linear
address.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-5-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:48 +0000 (18:27 -0800)]
x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
The function insn_get_addr_ref() is capable of handling only 64-bit
addresses. A previous commit introduced a function to handle 32-bit
addresses. Invoke these two functions from a third wrapper function that
calls the appropriate routine based on the address size specified in the
instruction structure (obtained by looking at the code segment default
address size and the address override prefix, if present).
While doing this, rename the original function insn_get_addr_ref() with
the more appropriate name get_addr_ref_64(), ensure it is only used
for 64-bit addresses.
Also, since 64-bit addresses are not possible in 32-bit builds, provide
a dummy function such case.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-4-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:47 +0000 (18:27 -0800)]
x86/insn-eval: Add support to resolve 32-bit address encodings
32-bit and 64-bit address encodings are identical. Thus, the same logic
could be used to resolve the effective address. However, there are two key
differences: address size and enforcement of segment limits.
If running a 32-bit process on a 64-bit kernel, it is best to perform
the address calculation using 32-bit data types. In this manner hardware
is used for the arithmetic, including handling of signs and overflows.
32-bit addresses are generally used in protected mode; segment limits are
enforced in this mode. This implementation obtains the limit of the
segment associated with the instruction operands and prefixes. If the
computed address is outside the segment limits, an error is returned. It
is also possible to use 32-bit address in long mode and virtual-8086 mode
by using an address override prefix. In such cases, segment limits are not
enforced.
Support to use 32-bit arithmetic is added to the utility functions that
compute effective addresses. However, the end result is stored in a
variable of type long (which has a width of 8 bytes in 64-bit builds).
Hence, once a 32-bit effective address is computed, the 4 most significant
bytes are masked out to avoid sign extension.
The newly added function get_addr_ref_32() is almost identical to the
existing function insn_get_addr_ref() (used for 64-bit addresses). The only
difference is that it verifies that the effective address is within the
limits of the segment.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-3-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ricardo Neri [Mon, 6 Nov 2017 02:27:46 +0000 (18:27 -0800)]
x86/insn-eval: Compute linear address in several utility functions
Computing a linear address involves several steps. The first step is to
compute the effective address. This requires determining the addressing
mode in use and perform arithmetic operations on the operands. Plus, each
addressing mode has special cases that must be handled.
Once the effective address is known, the base address of the applicable
segment is added to obtain the linear address.
Clearly, this is too much work for a single function. Instead, handle each
addressing mode in a separate utility function. This improves readability
and gives us the opportunity to handler errors better.
At the moment, arithmetic to compute the effective address uses 64-byte
variables. Thus, limit support to 64-bit addresses.
While reworking the function insn_get_addr_ref(), the variable addr_offset
is renamed as regoff to reflect its actual use (i.e., offset, from the
base of pt_regs, of the register used as operand).
Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-2-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kristian Evensen [Tue, 7 Nov 2017 12:47:56 +0000 (13:47 +0100)]
qmi_wwan: Add missing skb_reset_mac_header-call
When we receive a packet on a QMI device in raw IP mode, we should call
skb_reset_mac_header() to ensure that skb->mac_header contains a valid
offset in the packet. While it shouldn't really matter, the packets have
no MAC header and the interface is configured as-such, it seems certain
parts of the network stack expects a "good" value in skb->mac_header.
Without the skb_reset_mac_header() call added in this patch, for example
shaping traffic (using tc) triggers the following oops on the first
received packet:
While the oops is for a 4.9-kernel, I was able to trigger the same oops with
net-next as of yesterday.
Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode") Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Vosburgh [Tue, 7 Nov 2017 10:50:07 +0000 (19:50 +0900)]
bonding: fix slave stuck in BOND_LINK_FAIL state
The bonding miimon logic has a flaw, in that a failure of the
rtnl_trylock can cause a slave to become permanently stuck in
BOND_LINK_FAIL state.
The sequence of events to cause this is as follows:
1) bond_miimon_inspect finds that a slave's link is down, and so
calls bond_propose_link_state, setting slave->new_link_state to
BOND_LINK_FAIL, then sets slave->new_link to BOND_LINK_DOWN and returns
non-zero.
2) In bond_mii_monitor, the rtnl_trylock fails, and the timer is
rescheduled. No change is committed.
3) bond_miimon_inspect is called again, but this time the slave
from step 1 has recovered. slave->new_link is reset to NOCHANGE, and, as
slave->link was never changed, the switch enters the BOND_LINK_UP case,
and does nothing. The pending BOND_LINK_FAIL state from step 1 remains
pending, as new_link_state is not reset.
4) The state from step 3 persists until another slave changes link
state and causes bond_miimon_inspect to return non-zero. At this point,
the BOND_LINK_FAIL state change on the slave from steps 1-3 is committed,
and the slave will remain stuck in BOND_LINK_FAIL state even though it
is actually link up.
The remedy for this is to initialize new_link_state on each entry
to bond_miimon_inspect, as is already done with new_link.
Fixes: fb9eb899a6dc ("bonding: handle link transition from FAIL to UP correctly") Reported-by: Alex Sidorenko <alexandre.sidorenko@hpe.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjorn Andersson [Tue, 7 Nov 2017 04:50:35 +0000 (20:50 -0800)]
qrtr: Move to postcore_initcall
Registering qrtr with module_init makes the ability of typical platform
code to create AF_QIPCRTR socket during probe a matter of link order
luck. Moving qrtr to postcore_initcall() avoids this.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix by simply ignoring the bogus descriptor, as it is optional
for QMI devices anyway.
Fixes: 423ce8caab7e ("net: usb: qmi_wwan: New driver for Huawei QMI based WWAN devices") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Mon, 6 Nov 2017 14:37:22 +0000 (15:37 +0100)]
net: cdc_ether: fix divide by 0 on bad descriptors
Setting dev->hard_mtu to 0 will cause a divide error in
usbnet_probe. Protect against devices with bogus CDC Ethernet
functional descriptors by ignoring a zero wMaxSegmentSize.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Mon, 6 Nov 2017 01:01:57 +0000 (09:01 +0800)]
bonding: discard lowest hash bit for 802.3ad layer3+4
After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), we will try to use even ports for connect(). Then if an
application (seen clearly with iperf) opens multiple streams to the same
destination IP and port, each stream will be given an even source port.
So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
will always hash all these streams to the same interface. And the total
throughput will limited to a single slave.
Change the tcp code will impact the whole tcp behavior, only for bonding
usage. Paolo Abeni suggested fix this by changing the bonding code only,
which should be more reasonable, and less impact.
Fix this by discarding the lowest hash bit because it contains little entropy.
After the fix we can re-balance between slaves.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Sun, 5 Nov 2017 03:54:53 +0000 (22:54 -0500)]
net/mlx5e/core/en_fs: fix pointer dereference after free in mlx5e_execute_l2_action
hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced
by accessing hn->ai.addr
Fix this by copying the MAC address into a local variable for its safe use
in all possible execution paths within function mlx5e_execute_l2_action.
Addresses-Coverity-ID: 1417789 Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Zyngier [Sat, 4 Nov 2017 12:33:47 +0000 (12:33 +0000)]
net: mvpp2: Prevent userspace from changing TX affinities
The mvpp2 driver can't cope at all with the TX affinities being
changed from userspace, and spit an endless stream of
[ 91.779920] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing
[ 91.779930] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing
[ 91.780402] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing
[ 91.780406] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing
[ 91.780415] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing
[ 91.780418] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing
rendering the box completely useless (I've measured around 600k
interrupts/s on a 8040 box) once irqbalance kicks in and start
doing its job.
Obviously, the driver was never designed with this in mind. So let's
work around the problem by preventing userspace from interacting
with these interrupts altogether.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gabriele Paoloni [Wed, 8 Nov 2017 00:37:54 +0000 (18:37 -0600)]
MAINTAINERS: Remove Gabriele Paoloni as HiSilicon PCI maintainer
Gabriele is now moving to a different role, so remove him as HiSilicon PCI
maintainer.
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
[bhelgaas: Thanks for all your help, Gabriele, and best wishes!] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Zhou Wang <wangzhou1@hisilicon.com>
Borislav Petkov [Tue, 7 Nov 2017 16:37:24 +0000 (17:37 +0100)]
drivers/ide-cd: Handle missing driver data during status check gracefully
The 0day bot reports the below failure which happens occasionally, with
their randconfig testing (once every ~100 boots). The Code points at
the private pointer ->driver_data being NULL, which hints at a race of
sorts where the private driver_data descriptor has disappeared by the
time we get to run the workqueue.
So let's check that pointer before we continue with issuing the command
to the drive.
This fix is of the brown paper bag nature but considering that IDE is
long deprecated, let's do that so that random testing which happens to
enable CONFIG_IDE during randconfig builds, doesn't fail because of
this.
Besides, failing the TEST_UNIT_READY command because the drive private
data is gone is something which we could simply do anyway, to denote
that there was a problem communicating with the device.
This commit added a call to sysfs_notify() from within
scsi_device_set_state(), which in turn turns out to make libata very
unhappy, because ata_eh_detach_dev() does
Takashi Iwai [Tue, 7 Nov 2017 15:05:24 +0000 (16:05 +0100)]
ALSA: seq: Fix OSS sysex delivery in OSS emulation
The SYSEX event delivery in OSS sequencer emulation assumed that the
event is encoded in the variable-length data with the straight
buffering. This was the normal behavior in the past, but during the
development, the chained buffers were introduced for carrying more
data, while the OSS code was left intact. As a result, when a SYSEX
event with the chained buffer data is passed to OSS sequencer port,
it may end up with the wrong memory access, as if it were having a too
large buffer.
This patch addresses the bug, by applying the buffer data expansion by
the generic snd_seq_dump_var_event() helper function.
Brijesh Singh [Fri, 20 Oct 2017 14:30:59 +0000 (09:30 -0500)]
X86/KVM: Clear encryption attribute when SEV is active
The guest physical memory area holding the struct pvclock_wall_clock and
struct pvclock_vcpu_time_info are shared with the hypervisor. It
periodically updates the contents of the memory.
When SEV is active, the encryption attributes from the shared memory pages
must be cleared so that both hypervisor and guest can access the data.
Brijesh Singh [Fri, 20 Oct 2017 14:30:58 +0000 (09:30 -0500)]
X86/KVM: Decrypt shared per-cpu variables when SEV is active
When SEV is active, guest memory is encrypted with a guest-specific key, a
guest memory region shared with the hypervisor must be mapped as decrypted
before it can be shared.
Brijesh Singh [Fri, 20 Oct 2017 14:30:57 +0000 (09:30 -0500)]
percpu: Introduce DEFINE_PER_CPU_DECRYPTED
KVM guest defines three per-CPU variables (steal-time, apf_reason, and
kvm_pic_eoi) which are shared between a guest and a hypervisor.
When SEV is active, memory is encrypted with a guest-specific key, and if
the guest OS wants to share the memory region with the hypervisor then it
must clear the C-bit (i.e set decrypted) before sharing it.
DEFINE_PER_CPU_DECRYPTED can be used to define the per-CPU variables
which will be shared between a guest and a hypervisor.
Brijesh Singh [Fri, 20 Oct 2017 14:30:56 +0000 (09:30 -0500)]
x86: Add support for changing memory encryption attribute in early boot
Some KVM-specific custom MSRs share the guest physical address with the
hypervisor in early boot. When SEV is active, the shared physical address
must be mapped with memory encryption attribute cleared so that both
hypervisor and guest can access the data.
Add APIs to change the memory encryption attribute in early boot code.
Tom Lendacky [Fri, 20 Oct 2017 14:30:54 +0000 (09:30 -0500)]
x86/boot: Add early boot support when running with SEV active
Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.
Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
(CPUID 0x8000001f, bit 1) and then checking a non-interceptable SEV MSR
(0xc0010131, bit 0).
This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory, it can boot properly.
After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so. This allows
to distinguish between SME and SEV, each of which have unique differences
in how certain things are handled: e.g. DMA (always bounce buffered with
SEV) or EFI tables (always access decrypted with SME).
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: kvm@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171020143059.3291-13-brijesh.singh@amd.com
Tom Lendacky [Fri, 20 Oct 2017 14:30:53 +0000 (09:30 -0500)]
x86/mm: Add DMA support for SEV memory encryption
DMA access to encrypted memory cannot be performed when SEV is active.
In order for DMA to properly work when SEV is active, the SWIOTLB bounce
buffers must be used.
Tom Lendacky [Fri, 20 Oct 2017 14:30:52 +0000 (09:30 -0500)]
x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages
In order for memory pages to be properly mapped when SEV is active, it's
necessary to use the PAGE_KERNEL protection attribute as the base
protection. This ensures that memory mapping of, e.g. ACPI tables,
receives the proper mapping attributes.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: kvm@vger.kernel.org Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171020143059.3291-11-brijesh.singh@amd.com
Tom Lendacky [Fri, 20 Oct 2017 14:30:51 +0000 (09:30 -0500)]
resource: Provide resource struct in resource walk callback
In preperation for a new function that will need additional resource
information during the resource walk, update the resource walk callback to
pass the resource structure. Since the current callback start and end
arguments are pulled from the resource structure, the callback functions
can obtain them from the resource structure directly.
Tom Lendacky [Fri, 20 Oct 2017 14:30:48 +0000 (09:30 -0500)]
x86/mm: Include SEV for encryption memory attribute changes
The current code checks only for sme_active() when determining whether
to perform the encryption attribute change. Include sev_active() in this
check so that memory attribute changes can occur under SME and SEV.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Laura Abbott <labbott@redhat.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: kvm@vger.kernel.org Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171020143059.3291-7-brijesh.singh@amd.com
Tom Lendacky [Fri, 20 Oct 2017 14:30:47 +0000 (09:30 -0500)]
x86/mm: Use encrypted access of boot related data with SEV
When Secure Encrypted Virtualization (SEV) is active, boot data (such as
EFI related data, setup data) is encrypted and needs to be accessed as
such when mapped. Update the architecture override in early_memremap to
keep the encryption attribute when mapping this data.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Laura Abbott <labbott@redhat.com> Cc: kvm@vger.kernel.org Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171020143059.3291-6-brijesh.singh@amd.com
Tom Lendacky [Fri, 20 Oct 2017 14:30:44 +0000 (09:30 -0500)]
x86/mm: Add Secure Encrypted Virtualization (SEV) support
Provide support for Secure Encrypted Virtualization (SEV). This initial
support defines a flag that is used by the kernel to determine if it is
running with SEV active.
Zhou Chengming [Thu, 2 Nov 2017 01:18:21 +0000 (09:18 +0800)]
kprobes, x86/alternatives: Use text_mutex to protect smp_alt_modules
We use alternatives_text_reserved() to check if the address is in
the fixed pieces of alternative reserved, but the problem is that
we don't hold the smp_alt mutex when call this function. So the list
traversal may encounter a deleted list_head if another path is doing
alternatives_smp_module_del().
One solution is that we can hold smp_alt mutex before call this
function, but the difficult point is that the callers of this
functions, arch_prepare_kprobe() and arch_prepare_optimized_kprobe(),
are called inside the text_mutex. So we must hold smp_alt mutex
before we go into these arch dependent code. But we can't now,
the smp_alt mutex is the arch dependent part, only x86 has it.
Maybe we can export another arch dependent callback to solve this.
But there is a simpler way to handle this problem. We can reuse the
text_mutex to protect smp_alt_modules instead of using another mutex.
And all the arch dependent checks of kprobes are inside the text_mutex,
so it's safe now.
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Fixes: 2cfa197 "ftrace/alternatives: Introducing *_text_reserved functions" Link: http://lkml.kernel.org/r/1509585501-79466-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Lendacky [Wed, 1 Nov 2017 16:54:26 +0000 (11:54 -0500)]
x86/mm: Remove unnecessary TLB flush for SME in-place encryption
A TLB flush is not required when doing in-place encryption or decryption
since the area's pagetable attributes are not being altered. To avoid
confusion between what the routine is doing and what is documented in
the AMD APM, delete the local_flush_tlb() call.
Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171101165426.1388.24866.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Changbin Du [Mon, 6 Nov 2017 03:32:57 +0000 (11:32 +0800)]
x86/build: Add new paths for isolinux.bin and ldlinux.c32
Recently I failed to build isoimage target, because the path of isolinux.bin
changed to /usr/xxx/ISOLINUX/isolinux.bin, as well as ldlinux.c32 which
changed to /usr/xxx/syslinux/modules/bios/ldlinux.c32.
This patch improves the file search logic:
- Show a error message instead of silent fail.
- Add above new paths.
Signed-off-by: Changbin Du <changbin.du@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/1509939179-7556-3-git-send-email-changbin.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Changbin Du [Mon, 6 Nov 2017 03:32:56 +0000 (11:32 +0800)]
x86/build: Factor out fdimage/isoimage generation commands to standalone script
The build messages for fdimage/isoimage generation are pretty unstructured,
just the raw shell command blocks are printed.
Emit shortened messages similar to existing kbuild messages, and move
the Makefile commands into a separate shell script - which is much
easier to handle.
This patch factors out the commands used for fdimage/isoimage generation
from arch/x86/boot/Makefile to a new script arch/x86/boot/genimage.sh.
Then it adds the new kbuild command 'genimage' which invokes the new script.
All fdimages/isoimage files are now generated by a call to 'genimage' with
different parameters.
Now 'make isoimage' becomes:
...
Kernel: arch/x86/boot/bzImage is ready (#30)
GENIMAGE arch/x86/boot/image.iso
Size of boot image is 4 sectors -> No emulation
15.37% done, estimate finish Sun Nov 5 23:36:57 2017
30.68% done, estimate finish Sun Nov 5 23:36:57 2017
46.04% done, estimate finish Sun Nov 5 23:36:57 2017
61.35% done, estimate finish Sun Nov 5 23:36:57 2017
76.69% done, estimate finish Sun Nov 5 23:36:57 2017
92.00% done, estimate finish Sun Nov 5 23:36:57 2017
Total translation table size: 2048
Total rockridge attributes bytes: 659
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
32608 extents written (63 MB)
Kernel: arch/x86/boot/image.iso is ready
Before:
Kernel: arch/x86/boot/bzImage is ready (#63)
rm -rf arch/x86/boot/isoimage
mkdir arch/x86/boot/isoimage
for i in lib lib64 share end ; do \
if [ -f /usr/$i/syslinux/isolinux.bin ] ; then \
cp /usr/$i/syslinux/isolinux.bin arch/x86/boot/isoimage ; \
if [ -f /usr/$i/syslinux/ldlinux.c32 ]; then \
cp /usr/$i/syslinux/ldlinux.c32 arch/x86/boot/isoimage ; \
fi ; \
break ; \
fi ; \
if [ $i = end ] ; then exit 1 ; fi ; \
done
...
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1509939179-7556-2-git-send-email-changbin.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kirill A. Shutemov [Tue, 7 Nov 2017 08:33:37 +0000 (11:33 +0300)]
mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y
Since commit:
83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
we allocate the mem_section array dynamically in sparse_memory_present_with_active_regions(),
but some architectures, like arm64, don't call the routine to initialize sparsemem.
Let's move the initialization into memory_present() it should cover all
architectures.
Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Link: http://lkml.kernel.org/r/20171107083337.89952-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Sat, 4 Nov 2017 11:19:52 +0000 (04:19 -0700)]
selftests/x86/ldt_get: Add a few additional tests for limits
We weren't testing the .limit and .limit_in_pages fields very well.
Add more tests.
This addition seems to trigger the "bits 16:19 are undefined" issue
that was fixed in an earlier patch. I think that, at least on my
CPU, the high nibble of the limit ends in LAR bits 16:19.
Andy Lutomirski [Sat, 4 Nov 2017 11:19:49 +0000 (04:19 -0700)]
selftests/x86/ldt_gdt: Robustify against set_thread_area() and LAR oddities
Bits 19:16 of LAR's result are undefined, and some upcoming
improvements to the test case seem to trigger this. Mask off those
bits to avoid spurious failures.
commit 5b781c7e317f ("x86/tls: Forcibly set the accessed bit in TLS
segments") adds a valid case in which LAR's output doesn't quite
agree with set_thread_area()'s input. This isn't triggered in the
test as is, but it will be if we start calling set_thread_area()
with the accessed bit clear. Work around this discrepency.
I've added a Fixes tag so that -stable can pick this up if neccesary.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 5b781c7e317f ("x86/tls: Forcibly set the accessed bit in TLS segments") Link: http://lkml.kernel.org/r/b82f3f89c034b53580970ac865139fd8863f44e2.1509794321.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Tue, 31 Oct 2017 12:17:23 +0000 (13:17 +0100)]
x86/cpufeatures: Fix various details in the feature definitions
Kept this commit separate from the re-tabulation changes, to make
the changes easier to review:
- add better explanation for entries with no explanation
- fix/enhance the text of some of the entries
- fix the vertical alignment of some of the feature number definitions
- fix inconsistent capitalization
- ... and lots of other small details
i.e. make it all more of a coherent unit, instead of a patchwork of years of additions.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171031121723.28524-4-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Tue, 31 Oct 2017 12:17:22 +0000 (13:17 +0100)]
x86/cpufeatures: Re-tabulate the X86_FEATURE definitions
Over the years asm/cpufeatures.h has become somewhat of a mess: the original
tabulation style was too narrow, while x86 feature names also kept growing
in length, creating frequent field width overflows.
Re-tabulate it to make it wider and easier to read/modify. Also harmonize
the tabulation of the other defines in this file to match it.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171031121723.28524-3-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rafael J. Wysocki [Mon, 6 Nov 2017 22:56:57 +0000 (23:56 +0100)]
ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360
At least one Dell XPS13 9360 is reported to have serious issues with
the Low Power S0 Idle _DSM interface and since this machine model
generally can do ACPI S3 just fine, add a blacklist entry to disable
that interface for Dell XPS13 9360.
Fixes: 8110dd281e15 (ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems) Link: https://bugzilla.kernel.org/show_bug.cgi?id=196907 Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
Linus Torvalds [Mon, 6 Nov 2017 20:26:49 +0000 (12:26 -0800)]
Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Another fix for a really old bug.
It only affects drain_workqueue() which isn't used often and even then
triggers only during a pretty small race window, so it isn't too
surprising that it stayed hidden for so long.
The fix is straight-forward and low-risk. Kudos to Li Bin for
reporting and fixing the bug"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix NULL pointer dereference
Tobin C. Harding [Mon, 6 Nov 2017 05:19:27 +0000 (16:19 +1100)]
scripts: add leaking_addresses.pl
Currently we are leaking addresses from the kernel to user space. This
script is an attempt to find some of those leakages. Script parses
`dmesg` output and /proc and /sys files for hex strings that look like
kernel addresses.
Only works for 64 bit kernels, the reason being that kernel addresses on
64 bit kernels have 'ffff' as the leading bit pattern making greping
possible. On 32 kernels we don't have this luxury.
Scripts is _slightly_ smarter than a straight grep, we check for false
positives (all 0's or all 1's, and vsyscall start/finish addresses).
[ I think there is a lot of room for improvement here, but it's already
useful, so I'm merging it as-is. The whole "hash %p format" series is
expected to go into 4.15, but will not fix %x users, and will not
incentivize people to look at what they are leaking. - Linus ]
Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Iwai [Mon, 6 Nov 2017 19:16:50 +0000 (20:16 +0100)]
ALSA: seq: Avoid invalid lockdep class warning
The recent fix for adding rwsem nesting annotation was using the given
"hop" argument as the lock subclass key. Although the idea itself
works, it may trigger a kernel warning like:
BUG: looking up invalid subclass: 8
....
since the lockdep has a smaller number of subclasses (8) than we
currently allow for the hops there (10).
The current definition is merely a sanity check for avoiding the too
deep delivery paths, and the 8 hops are already enough. So, as a
quick fix, just follow the max hops as same as the max lockdep
subclasses.
Linus Torvalds [Mon, 6 Nov 2017 17:05:03 +0000 (09:05 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes an unaligned panic in x86/sha-mb and a bug in ccm that
triggers with certain underlying implementations"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccm - preserve the IV buffer
crypto: x86/sha1-mb - fix panic due to unaligned access
crypto: x86/sha256-mb - fix panic due to unaligned access