www.infradead.org Git - users/jedix/linux-maple.git/log

scsi: libfc: Do not login if the port is already started

When the port is already started we don't need to login; that
will only confuse the state machine.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: Do not drop down to FLOGI for fc_rport_login()

When fc_rport_login() is called while the rport is not
in RPORT_ST_INIT, RPORT_ST_READY, or RPORT_ST_DELETE
login is already in progress and there's no need to
drop down to FLOGI; doing so will only confuse the
other side.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: Do not take rdata->rp_mutex when processing a -FC_EX_CLOSED ELS response.

When an ELS response handler receives a -FC_EX_CLOSED, the rdata->rp_mutex is
already held which can lead to a deadlock condition like the following stack trace:

[<ffffffffa04d8f18>] fc_rport_plogi_resp+0x28/0x200 [libfc]
[<ffffffffa04cfa1a>] fc_invoke_resp+0x6a/0xe0 [libfc]
[<ffffffffa04d0c08>] fc_exch_mgr_reset+0x1b8/0x280 [libfc]
[<ffffffffa04d87b3>] fc_rport_logoff+0x43/0xd0 [libfc]
[<ffffffffa04ce73d>] fc_disc_stop+0x6d/0xf0 [libfc]
[<ffffffffa04ce7ce>] fc_disc_stop_final+0xe/0x20 [libfc]
[<ffffffffa04d55f7>] fc_fabric_logoff+0x17/0x70 [libfc]

The other ELS handlers need to follow the FLOGI response handler and simply do
a kref_put against the fc_rport_priv struct and exit when receving a
-FC_EX_CLOSED response.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: Fixup disc_mutex handling

The list of attached 'rdata' remote port structures is RCU
protected, so there is no need to take the 'disc_mutex' when
traversing it.
Rather we should be using rcu_read_lock() and kref_get_unless_zero()
to validate the entries.
We need, however, take the disc_mutex when deleting an entry;
otherwise we risk clashes with list_add.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xve: arm ud tx cq to generate completion interrupts

IPoIB polls for UD send cq for every 16th post_send() request to reduce
interrupt count; and it does not arm UD send cq (16 is controlled by
MAX_SEND_CQE variable)
XVE has followed IPoIB methodology in terms of handling UD send cq;
however, it missed to poll send cq after certain number of iterations.
This makes freeing of resources related to work request unreliable since
completion arrival is not controlled. This caused problem for live
migration; since initial UDP and ICMP skbs which are using UD work
requests are not getting freed. And, xenwatch process is getting stuck
on waiting for these skbs to be freed.

This patch does following:
- arm send cq at initialization. This will generate interrupt for
initial ud send requests.
- Once polling of send cq is completed, arm send cq again to generate
interrupt whenever next cqe arrives.

I'm going back to interrupt mechanism, since UD workload for xve is
extremely limited. And, I don't expect to generate interrupt flood here.
And, I don't want to miss out on freeing of skb (for example, if
scenario ends up as, only 10 post_send() are attempted for UD QP; and
after that, we try to live migrate that VM, we may miss completion if
our logic is, poll CQ at every 16th post_send() iteration)

Orabug: 28267050

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: sched: run ingress qdisc without locks

TC classifiers/actions were converted to RCU by John in the series:
http://thread.gmane.org/gmane.linux.network/329739/focus=329739
and many follow on patches.
This is the last patch from that series that finally drops
ingress spin_lock.

Single cpu ingress+u32 performance goes from 22.9 Mpps to 24.5 Mpps.

In two cpu case when both cores are receiving traffic on the same
device and go into the same ingress+u32 the performance jumps
from 4.5 + 4.5 Mpps to 23.5 + 23.5 Mpps

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 087c1a601ad7f851a2d31f5fa0e5e9dfc766df55)
Orabug: 29395374
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Laurence Rochfort <laurence.rochfort@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Fix typo in firmware message timeout logic.

Orabug: 29412112

The logic that polls for the firmware message response uses a shorter
sleep interval for the first few passes.  But there was a typo so it
was using the wrong counter (larger counter) for these short sleep
passes.  The result is a slightly shorter timeout period for these
firmware messages than intended.  Fix it by using the proper counter.

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>

bnxt_en: Wait longer for the firmware message response to complete.

Orabug: 29412112

The code waits up to 20 usec for the firmware response to complete
once we've seen the valid response header in the buffer. It turns
out that in some scenarios, this wait time is not long enough.
Extend it to 150 usec and use usleep_range() instead of udelay().

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>

mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.

commit bb422a738f6566f7439cd347d54e321e4fe92a9f upstream.

Syzbot caught an oops at unregister_shrinker() because combination of
commit 1d3d4437eae1bb29 ("vmscan: per-node deferred work") and fault
injection made register_shrinker() fail and the caller of
register_shrinker() did not check for failure.

----------
[  554.881422] FAULT_INJECTION: forcing a failure.
[  554.881422] name failslab, interval 1, probability 0, space 0, times 0
[  554.881438] CPU: 1 PID: 13231 Comm: syz-executor1 Not tainted 4.14.0-rc8+ #82
[  554.881443] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  554.881445] Call Trace:
[  554.881459]  dump_stack+0x194/0x257
[  554.881474]  ? arch_local_irq_restore+0x53/0x53
[  554.881486]  ? find_held_lock+0x35/0x1d0
[  554.881507]  should_fail+0x8c0/0xa40
[  554.881522]  ? fault_create_debugfs_attr+0x1f0/0x1f0
[  554.881537]  ? check_noncircular+0x20/0x20
[  554.881546]  ? find_next_zero_bit+0x2c/0x40
[  554.881560]  ? ida_get_new_above+0x421/0x9d0
[  554.881577]  ? find_held_lock+0x35/0x1d0
[  554.881594]  ? __lock_is_held+0xb6/0x140
[  554.881628]  ? check_same_owner+0x320/0x320
[  554.881634]  ? lock_downgrade+0x990/0x990
[  554.881649]  ? find_held_lock+0x35/0x1d0
[  554.881672]  should_failslab+0xec/0x120
[  554.881684]  __kmalloc+0x63/0x760
[  554.881692]  ? lock_downgrade+0x990/0x990
[  554.881712]  ? register_shrinker+0x10e/0x2d0
[  554.881721]  ? trace_event_raw_event_module_request+0x320/0x320
[  554.881737]  register_shrinker+0x10e/0x2d0
[  554.881747]  ? prepare_kswapd_sleep+0x1f0/0x1f0
[  554.881755]  ? _down_write_nest_lock+0x120/0x120
[  554.881765]  ? memcpy+0x45/0x50
[  554.881785]  sget_userns+0xbcd/0xe20
(...snipped...)
[  554.898693] kasan: CONFIG_KASAN_INLINE enabled
[  554.898724] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  554.898732] general protection fault: 0000 [#1] SMP KASAN
[  554.898737] Dumping ftrace buffer:
[  554.898741]    (ftrace buffer empty)
[  554.898743] Modules linked in:
[  554.898752] CPU: 1 PID: 13231 Comm: syz-executor1 Not tainted 4.14.0-rc8+ #82
[  554.898755] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  554.898760] task: ffff8801d1dbe5c0 task.stack: ffff8801c9e38000
[  554.898772] RIP: 0010:__list_del_entry_valid+0x7e/0x150
[  554.898775] RSP: 0018:ffff8801c9e3f108 EFLAGS: 00010246
[  554.898780] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  554.898784] RDX: 0000000000000000 RSI: ffff8801c53c6f98 RDI: ffff8801c53c6fa0
[  554.898788] RBP: ffff8801c9e3f120 R08: 1ffff100393c7d55 R09: 0000000000000004
[  554.898791] R10: ffff8801c9e3ef70 R11: 0000000000000000 R12: 0000000000000000
[  554.898795] R13: dffffc0000000000 R14: 1ffff100393c7e45 R15: ffff8801c53c6f98
[  554.898800] FS:  0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
[  554.898804] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  554.898807] CR2: 00000000dbc23000 CR3: 00000001c7269000 CR4: 00000000001406e0
[  554.898813] DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
[  554.898816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  554.898818] Call Trace:
[  554.898828]  unregister_shrinker+0x79/0x300
[  554.898837]  ? perf_trace_mm_vmscan_writepage+0x750/0x750
[  554.898844]  ? down_write+0x87/0x120
[  554.898851]  ? deactivate_super+0x139/0x1b0
[  554.898857]  ? down_read+0x150/0x150
[  554.898864]  ? check_same_owner+0x320/0x320
[  554.898875]  deactivate_locked_super+0x64/0xd0
[  554.898883]  deactivate_super+0x141/0x1b0
----------

Since allowing register_shrinker() callers to call unregister_shrinker()
when register_shrinker() failed can simplify error recovery path, this
patch makes unregister_shrinker() no-op when register_shrinker() failed.
Also, reset shrinker->nr_deferred in case unregister_shrinker() was
by error called twice.

Orabug: 29456281

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Glauber Costa <glauber@scylladb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7880fc541566166d140954825fc83c826534e622)

Signed-off-by: John Sobecki <john.sobecki@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

X.509: Handle midnight alternative notation in GeneralizedTime

The ASN.1 GeneralizedTime object carries an ISO 8601 format date and time.
The time is permitted to show midnight as 00:00 or 24:00 (the latter being
equivalent of 00:00 of the following day).

The permitted value is checked in x509_decode_time() but the actual
handling is left to mktime64().

Without this patch, certain X.509 certificates will be rejected and could
lead to an unbootable kernel.

Note that with this patch we also permit any 24:mm:ss time and extend this
to UTCTime, which whilst not strictly correct don't permit much leeway in
fiddling date strings.

Reported-by: Rudolf Polzer <rpolzer@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: David Woodhouse <David.Woodhouse@intel.com>
cc: John Stultz <john.stultz@linaro.org>

Orabug: 29460344
CVE: CVE-2015-5327
(cherry picked from commit 7650cb80e4e90b0fae7854b6008a46d24360515f)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

X.509: Support leap seconds

The format of ASN.1 GeneralizedTime seems to be specified by ISO 8601
[X.680 46.3] and this apparently supports leap seconds (ie. the seconds
field is 60). It's not entirely clear that ASN.1 expects it, but we can
relax the seconds check slightly for GeneralizedTime.

This results in us passing a time with sec as 60 to mktime64(), which
handles it as being a duplicate of the 0th second of the next minute.

We can't really do otherwise without giving the kernel much greater
knowledge of where all the leap seconds are. Unfortunately, this would
require change the mapping of the kernel's current-time-in-seconds.

UTCTime, however, only supports a seconds value in the range 00-59, but for
the sake of simplicity allow this with UTCTime also.

Without this patch, certain X.509 certificates will be rejected,
potentially making a kernel unbootable.

Reported-by: Rudolf Polzer <rpolzer@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: David Woodhouse <David.Woodhouse@intel.com>
cc: John Stultz <john.stultz@linaro.org>

Orabug: 29460344
CVE: CVE-2015-5327
(cherry picked from commit da02559c9f864c8d62f524c1e0b64173711a16ab)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

X.509: Fix the time validation [ver #2]

This fixes CVE-2015-5327. It affects kernels from 4.3-rc1 onwards.

Fix the X.509 time validation to use month number-1 when looking up the
number of days in that month. Also put the month number validation before
doing the lookup so as not to risk overrunning the array.

This can be tested by doing the following:

cat <<EOF | openssl x509 -outform DER | keyctl padd asymmetric "" @s
-----BEGIN CERTIFICATE-----
MIIDbjCCAlagAwIBAgIJAN/lUld+VR4hMA0GCSqGSIb3DQEBCwUAMCkxETAPBgNV
BAoMCGxvY2FsLWNhMRQwEgYDVQQDDAtzaWduaW5nIGtleTAeFw0xNTA5MDEyMTMw
MThaFw0xNjA4MzEyMTMwMThaMCkxETAPBgNVBAoMCGxvY2FsLWNhMRQwEgYDVQQD
DAtzaWduaW5nIGtleTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANrn
crcMfMeG67nagX4+m02Xk9rkmsMKI5XTUxbikROe7GSUVJ27sPVPZp4mgzoWlvhh
jfK8CC/qhEhwep8Pgg4EJZyWOjhZb7R97ckGvLIoUC6IO3FC2ZnR7WtmWDgo2Jcj
VlXwJdHhKU1VZwulh81O61N8IBKqz2r/kDhIWiicUCUkI/Do/RMRfKAoDBcSh86m
gOeIAGfq62vbiZhVsX5dOE8Oo2TK5weAvwUIOR7OuGBl5AqwFlPnXQolewiHzKry
THg9e44HfzG4Mi6wUvcJxVaQT1h5SrKD779Z5+8+wf1JLaooetcEUArvWyuxCU59
qxA4lsTjBwl4cmEki+cCAwEAAaOBmDCBlTAMBgNVHRMEBTADAQH/MAsGA1UdDwQE
AwIHgDAdBgNVHQ4EFgQUyND/eKUis7ep/hXMJ8iZMdUhI+IwWQYDVR0jBFIwUIAU
yND/eKUis7ep/hXMJ8iZMdUhI+KhLaQrMCkxETAPBgNVBAoMCGxvY2FsLWNhMRQw
EgYDVQQDDAtzaWduaW5nIGtleYIJAN/lUld+VR4hMA0GCSqGSIb3DQEBCwUAA4IB
AQAMqm1N1yD5pimUELLhT5eO2lRdGUfTozljRxc7e2QT3RLk2TtGhg65JFFN6eml
XS58AEPVcAsSLDlR6WpOpOLB2giM0+fV/eYFHHmh22yqTJl4YgkdUwyzPdCHNOZL
hmSKeY9xliHb6PNrNWWtZwhYYvRaO2DX4GXOMR0Oa2O4vaYu6/qGlZOZv3U6qZLY
wwHEJSrqeBDyMuwN+eANHpoSpiBzD77S4e+7hUDJnql4j6xzJ65+nWJ89fCrQypR
4sN5R3aGeIh3QAQUIKpHilwek0CtEaYERgc5m+jGyKSc1rezJW62hWRTaitOc+d5
G5hh+9YpnYcxQHEKnZ7rFNKJ
-----END CERTIFICATE-----
EOF

If it works, it emit a key ID; if it fails, it should give a bad message
error.

Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Orabug: 29460344
CVE: CVE-2015-5327
(cherry picked from commit cc25b994acfbc901429da682d0f73c190e960206)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Conflict:

crypto/asymmetric_keys/x509_cert_parser.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: enable new Kconfig items in kernel configs

Orabug: 29475071

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

benet: remove broken and unused macro

Orabug: 29475071

is_broadcast_packet() expands to compare_ether_addr() which doesn't
exist since commit 7367d0b573d1 ("drivers/net: Convert uses of
compare_ether_addr to ether_addr_equal"). It turns out it's actually not
used.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: don't flip hw_features when VXLANs are added/deleted

Orabug: 29475071

the be2net implementation of .ndo_tunnel_{add,del}() changes the value of
NETIF_F_GSO_UDP_TUNNEL bit in 'features' and 'hw_features', but it forgets
to call netdev_features_change(). Moreover, ethtool setting for that bit
can potentially be reverted after a tunnel is added or removed.

GSO already does software segmentation when 'hw_enc_features' is 0, even
if VXLAN offload is turned on. In addition, commit 096de2f83ebc ("benet:
stricter vxlan offloading check in be_features_check") avoids hardware
segmentation of non-VXLAN tunneled packets, or VXLAN packets having wrong
destination port. So, it's safe to avoid flipping the above feature on
addition/deletion of VXLAN tunnels.

Fixes: 630f4b70567f ("be2net: Export tunnel offloads only when a VxLAN tunnel is created")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Fix memory leak in be_cmd_get_profile_config()

Orabug: 29475071

DMA allocated memory is lost in be_cmd_get_profile_config() when we
call it with non-NULL port_res parameter.

Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Use Kconfig flag to support for enabling/disabling adapters

Orabug: 29475071

Add flags to enable/disable supported chips in be2net.

With disable support are removed coresponding PCI IDs and
also codepaths with [BE2|BE3|BEx|lancer|skyhawk]_chip checks.

Disable chip will reduce module size by:
BE2 ~2kb
BE3 ~3kb
Lancer ~10kb
Skyhawk ~9kb

When enable skyhawk only it will reduce module size by ~20kb

New help style in Kconfig

Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Mark expected switch fall-through

Orabug: 29475071

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114787 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: fix spelling mistake "seqence" -> "sequence"

Orabug: 29475071

Trivial fix to spelling mistake in dev_info message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Update the driver version to 12.0.0.0

Orabug: 29475071

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: gather debug info and reset adapter (only for Lancer) on a tx-timeout

Orabug: 29475071

This patch handles a TX-timeout as follows:

1) This patch gathers and prints the following info that can
   help in diagnosing the cause of a TX-timeout.
   a) TX queue and completion queue entries.
   b) SKB and TCP/UDP header details.

2) For Lancer NICs (TX-timeout recovery is not supported for
   BE3/Skyhawk-R NICs), it recovers from the TX timeout as follows:

   a) On a TX-timeout, driver sets the PHYSDEV_CONTROL_FW_RESET_MASK
      bit in the PHYSDEV_CONTROL register. Lancer firmware goes into
      an error state and indicates this back to the driver via a bit
      in a doorbell register.
   b) Driver detects this and calls be_err_recover(). DMA is disabled,
      all pending TX skbs are unmapped and freed (be_close()). All rings
      are destroyed (be_clear()).
   c) The driver waits for the FW to re-initialize and re-creates all
      rings along with other data structs (be_resume())

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: move rss_flags field in rss_info to ensure proper alignment

Orabug: 29475071

The current position of .rss_flags field in struct rss_info causes
that fields .rsstable and .rssqueue (both 128 bytes long) crosses
cache-line boundaries. Moving it at the end properly align all fields.

Before patch:
struct rss_info {
        u64                        rss_flags;            /*     0     8 */
        u8                         rsstable[128];        /*     8   128 */
        /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
        u8                         rss_queue[128];       /*   136   128 */
        /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
        u8                         rss_hkey[40];         /*   264    40 */
};

After patch:
struct rss_info {
        u8                         rsstable[128];        /*     0   128 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        u8                         rss_queue[128];       /*   128   128 */
        /* --- cacheline 4 boundary (256 bytes) --- */
        u8                         rss_hkey[40];         /*   256    40 */
        u64                        rss_flags;            /*   296     8 */
};

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: re-order fields in be_error_recovert to avoid hole

Orabug: 29475071

- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union

These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: remove unused tx_jiffies field from be_tx_stats

Orabug: 29475071

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: move txcp field in be_tx_obj to eliminate holes in the struct

Orabug: 29475071

Before patch:
struct be_tx_obj {
        u32                        db_offset;            /*     0     4 */

        /* XXX 4 bytes hole, try to pack */

        struct be_queue_info       q;                    /*     8    56 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct be_queue_info       cq;                   /*    64    56 */
        struct be_tx_compl_info    txcp;                 /*   120     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        struct sk_buff *           sent_skb_list[2048];  /*   128 16384 */
        ...
}:

After patch:
struct be_tx_obj {
        u32                        db_offset;            /*     0     4 */
        struct be_tx_compl_info    txcp;                 /*     4     4 */
        struct be_queue_info       q;                    /*     8    56 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct be_queue_info       cq;                   /*    64    56 */
        struct sk_buff *           sent_skb_list[2048];  /*   120 16384 */
        ...
};

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: reorder fields in be_eq_obj structure

Orabug: 29475071

Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: remove unused old custom busy-poll fields

Orabug: 29475071

The commit fb6113e688e0 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.

Fixes: fb6113e688e0 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: remove unused old AIC info

Orabug: 29475071

The commit 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.

Fixes: 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Fix error detection logic for BE3

Orabug: 29475071

Check for 0xE00 (RECOVERABLE_ERR) along with ARMFW UE (0x0)
in be_detect_error() to know whether the error is valid error or not

Fixes: 673c96e5a ("be2net: Fix UE detection logic for BE3")
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: sd: Do not override max_sectors_kb sysfs setting

A user may lower the max_sectors_kb setting in sysfs to accommodate
certain workloads. Previously we would always set the max I/O size to
either the block layer default or the optional preferred I/O size
reported by the device.

Keep the current heuristics for the initial setting of max_sectors_kb.
For subsequent invocations, only update the current queue limit if it
exceeds the capabilities of the hardware.

Orabug: 29596510

Cc: <stable@vger.kernel.org>
Reported-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 77082ca503bed061f7fbda7cfd7c93beda967a41)

Signed-off-by: John Sobecki <john.sobecki@oracle.com>
Tested-by: Dustin Samko <Dustin.Samko@gm.com>
Reviewed-by: Ritika Srivastava <ritika.srivastava@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

USB: serial: io_ti: fix div-by-zero in set_termios

[ Upstream commit 6aeb75e6adfaed16e58780309613a578fe1ee90b ]

Fix a division-by-zero in set_termios when debugging is enabled and a
high-enough speed has been requested so that the divisor value becomes
zero.

Instead of just fixing the offending debug statement, cap the baud rate
at the base as a zero divisor value also appears to crash the firmware.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org> # 2.6.12
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Orabug: 29487834
CVE: CVE-2017-18360
(cherry picked from commit 2cd394cd10465fc0878958ba99e6080ac8ead559)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Drop oversize TX packets to prevent errors.

Orabug: 29516462

There have been reports of oversize UDP packets being sent to the
driver to be transmitted, causing error conditions.  The issue is
likely caused by the dst of the SKB switching between 'lo' with
64K MTU and the hardware device with a smaller MTU.  Patches are
being proposed by Mahesh Bandewar <maheshb@google.com> to fix the
issue.

In the meantime, add a quick length check in the driver to prevent
the error.  The driver uses the TX packet size as index to look up an
array to setup the TX BD.  The array is large enough to support all MTU
sizes supported by the driver.  The oversize TX packet causes the
driver to index beyond the array and put garbage values into the
TX BD.  Add a simple check to prevent this.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2b3c6885386020b1b9d92d45e8349637e27d1f66)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Read per-cpu value of x86_spec_ctrl_priv in x86_virt_spec_ctrl()

In x86_virt_spec_ctrl(), when IBRS is in use on the host, the baseline to
restore the host SPEC_CTRL must be taken from the privileged value which
has the IBRS bit set. In addition, it must be read from the per cpu variable
(x86_spec_ctrl_priv_cpu) that holds the SPEC_CTRL MSR for the current cpu.

Currently, this line:

hostval = this_cpu_read(x86_spec_ctrl_priv);

incorrectly uses the global x86_spec_ctrl_priv instead of the correct
per-cpu variable x86_spec_ctrl_priv_cpu, which assigns spurious values
to hostval.
Fix this issue by reading the correct per-cpu value instead of the
global.

Orabug: 29526401

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Keep enhanced IBRS on when prctl is used for SSBD control

When using the prctl system call to enable/disable SSBD mitigation for a
specific thread, it is necessary to update the SPEC_CTRL MSR on the CPU
running it. The value used as the base for the msr that will be written
is x86_spec_ctrl_base, which does not have the IBRS bit set. The relevant
SSBD bits are OR'd to this value before it is written to the MSR, but
the IBRS bit will remain unset.
As a result, the thread that requested the SSBD protection will run without
IBRS enabled, and when it is context switched out, IBRS will not be turned
back on again. This is not a problem in processors that use basic IBRS since
the bit is constantly toggled on kernel entry, but with enhanced IBRS this
is not necessary and therefore the bit remains unset.

Fix it by adding a check to detect when enhanced IBRS is in use, and add
the bit to the msr value that will be used as the baseline.

Orabug: 29526401

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data

The function hso_probe reads if_num from the USB device (as an u8) and uses
it without a length check to index an array, resulting in an OOB memory read
in hso_probe or hso_get_config_data.

Add a length check for both locations and updated hso_probe to bail on
error.

This issue has been assigned CVE-2018-19985.

Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Signed-off-by: Mathias Payer <mathias.payer@nebelwelt.net>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5146f95df782b0ac61abde36567e718692725c89)

Orabug: 29605982
CVE: CVE-2018-19985

Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

swiotlb: save io_tlb_used to local variable before leaving critical section

When swiotlb is full, the kernel would print io_tlb_used. However, the
result might be inaccurate at that time because we have left the critical
section protected by spinlock.

Therefore, we backup the io_tlb_used into local variable before leaving
critical section.

Fixes: 83ca25948940 ("swiotlb: dump used and total slots when swiotlb buffer is full")
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 29637525

(cherry picked from commit 53b29c336830db48ad3dc737f88b8c065b1f0851)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
kernel/dma/swiotlb.c does not exist.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-By: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

swiotlb: dump used and total slots when swiotlb buffer is full

So far the kernel only prints the requested size if swiotlb buffer if full.
It is not possible to know whether it is simply an out of buffer, or it is
because swiotlb cannot allocate buffer with the requested size due to
fragmentation.

As 'io_tlb_used' is available since commit 71602fe6d4e9 ("swiotlb: add
debugfs to track swiotlb buffer usage"), both 'io_tlb_used' and
'io_tlb_nslabs' are printed when swiotlb buffer is full.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 29637525

(cherry picked from commit 83ca259489409a1fe8a83dad83a82f32174d4f31)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
kernel/dma/swiotlb.c does not exist.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-By: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs, kvm: don't miss SSBD when IBRS is in use.

When IBRS is in use, we unconditionnaly need to write to MSR_IA32_SPEC_CTRL
(it acts as a barrier) but we were failing to take into account the SSBD
state from the thread info flags, potentially disabling SSBD on the host on
tasks that needs it after a vmexit.

Orabug: 29642113

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

cifs: Fix use after free of a mid_q_entry

With protocol version 2.0 mounts we have seen crashes with corrupt mid
entries. Either the server->pending_mid_q list becomes corrupt with a
cyclic reference in one element or a mid object fetched by the
demultiplexer thread becomes overwritten during use.

Code review identified a race between the demultiplexer thread and the
request issuing thread. The demultiplexer thread seems to be written
with the assumption that it is the sole user of the mid object until
it calls the mid callback which either wakes the issuer task or
deletes the mid.

This assumption is not true because the issuer task can be woken up
earlier by a signal. If the demultiplexer thread has proceeded as far
as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
thread will happily end up calling cifs_delete_mid while the
demultiplexer thread still is using the mid object.

Inserting a delay in the cifs demultiplexer thread widens the race
window and makes reproduction of the race very easy:

if (server->large_buf)
buf = server->bigbuf;

+ usleep_range(500, 4000);

server->lstrp = jiffies;

To resolve this I think the proper solution involves putting a
reference count on the mid object. This patch makes sure that the
demultiplexer thread holds a reference until it has finished
processing the transaction.

Cc: stable@vger.kernel.org
Signed-off-by: Lars Persson <larper@axis.com>
Acked-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
(cherry picked from commit 696e420bb2a6624478105651d5368d45b502b324)

Orabug: 29654888

Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:

fs/cifs/connect.c
fs/cifs/smb2ops.c
fs/cifs/smb2transport.c
fs/cifs/transport.c
[
connect.c: contextual has changed.
smb2ops.c: hdr->Command changed to shdr->Command in the third line.
^
smb2transport.c: contextual has changed, the codes are the a statement
block of else, but this statement block has been moved outside.
transport.c: contextual has changed, the codes are the a statement
block of else, but this statement block has been moved outside.
]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

binfmt_elf: switch to new creds when switching to new mm

We used to delay switching to the new credentials until after we had
mapped the executable (and possible elf interpreter). That was kind of
odd to begin with, since the new executable will actually then _run_
with the new creds, but whatever.

The bigger problem was that we also want to make sure that we turn off
prof events and tracing before we start mapping the new executable
state. So while this is a cleanup, it's also a fix for a possible
information leak.

Reported-by: Robert Święcki <robert@swiecki.net>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 9f834ec18defc369d73ccf9e87a2790bfa05bf46)

Orabug: 29677233
CVE: CVE-2019-11190

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/microcode: Don't return error if microcode update is not needed

Commit 347b54683 ("x86/microcode: Synchronize late microcode loading")
incorrectly returns -EINVAL error on all request_microcode_fw() failures
in reload_store(). In fact, when update is not needed or if there is no
microcode to load we don't need to treat this as an error.

Orabug: 29759756

Fixes: 347b54683 ("x86/microcode: Synchronize late microcode loading")
Reported-by: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Jamie Iles <jamie.iles@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/mds: Add empty commit for CVE-2019-11091

The fixes for MDS also cover this CVE which states to be:
"Microarchitectural Data SamplingUncacheable Memory(MDSUM): Uncacheable
memory on some microprocessors utilizing speculative execution may allow
an authenticated user to potentially enable information disclosure
via a side channel with local access"

Orabug: 29721935
CVE: CVE-2019-11091
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>

x86/microcode: Add loader version file in debugfs

We want to be able to find out whether late microcode loader is using a
"safe" method where the system is in stop_machines() --- i.e. all cores
are pinned in kernel with interrupts disabled. This is especially
important for core siblings --- if one thread is loading microcode while
the other is executing instructions that are being patched then bad
things may happen, including MCEs.

Presense of this file indicates that we are all good. We will also
provide version value of "1".

[root@ca-virt1-0 ~]# cat /sys/kernel/debug/x86/microcode_loader_version
1
[root@ca-virt1-0 ~]#

Orabug: 29754165

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/microcode: Fix CPU synchronization routine

Emanuel reported an issue with a hang during microcode update because my
dumb idea to use one atomic synchronization variable for both rendezvous
- before and after update - was simply bollocks:

  microcode: microcode_reload_late: late_cpus: 4
  microcode: __reload_late: cpu 2 entered
  microcode: __reload_late: cpu 1 entered
  microcode: __reload_late: cpu 3 entered
  microcode: __reload_late: cpu 0 entered
  microcode: __reload_late: cpu 1 left
  microcode: Timeout while waiting for CPUs rendezvous, remaining: 1

CPU1 above would finish, leave and the others will still spin waiting for
it to join.

So do two synchronization atomics instead, which makes the code a lot more
straightforward.

Also, since the update is serialized and it also takes quite some time per
microcode engine, increase the exit timeout by the number of CPUs on the
system.

That's ok because the moment all CPUs are done, that timeout will be cut
short.

Furthermore, panic when some of the CPUs timeout when returning from a
microcode update: we can't allow a system with not all cores updated.

Also, as an optimization, do not do the exit sync if microcode wasn't
updated.

Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20180314183615.17629-2-bp@alien8.de
(cherry picked from commit bb8c13d61a629276a162c1d2b1a20a815cbcfbb7)

Orabug: 29754165

Conflicts:
Context

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/microcode: Synchronize late microcode loading

Original idea by Ashok, completely rewritten by Borislav.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

[ Borislav: Rewrite completely. ]

Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: https://lkml.kernel.org/r/20180228102846.13447-8-bp@alien8.de
(cherry picked from commit a5321aec6412b20b5ad15db2d6b916c05349dbff)

Orabug: 29754165

Conflicts --- quite a few. Notable ones:
* We don't have microcode cache and so call request_microcode_fw() for
  each CPU
* No need to get/put_online_cpus() --- they are part of stop_machine()
* No stop_machine_cpuslocked() in uek4 but uek4's version of
  stop_machine() prevents CPU hotplug.
* uek4's has fewer result codes for microcode operations and thus error
  handling is slightly different.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/speculation: Support 'mitigations=' cmdline option

commit d68be4c4d31295ff6ae34a8ddfaa4c1a8ff42812 upstream

Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com
(cherry picked from commit aaa95f2f1112dd4ec31ae13c4cf877dc7c7fcbc8)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
arch/x86/kernel/cpu/bugs.c
arch/x86/mm/pti.c
Documentation/admin-guide/kernel-parameters.txt: different location
arch/x86/kernel/cpu/bugs.c: different name (bugs_64.c). Also we have different logic in nospectre_v2.
arch/x86/mm/pti.c: different location for mitigation arch/x86/mm/kaiser.c.

cpu/speculation: Add 'mitigations=' cmdline option

commit 98af8452945c55652de68536afdde3b520fec429 upstream

Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
(cherry picked from commit 6cbbaa933b325234d6ffc93836d6b7c06dea7a56)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
Different location: Documentation/kernel-parameters.txt

x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off

commit e2c3c94788b08891dcf3dbe608f9880523ecd71b upstream

This code is only for CPUs which are affected by MSBDS, but are *not*
affected by the other two MDS issues.

For such CPUs, enabling the mds_idle_clear mitigation is enough to
mitigate SMT.

However if user boots with 'mds=off' and still has SMT enabled, we should
not report that SMT is mitigated:

$cat /sys//devices/system/cpu/vulnerabilities/mds
Vulnerable; SMT mitigated

But rather:
Vulnerable; SMT vulnerable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20190412215118.294906495@localhost.localdomain
(cherry picked from commit 31203de125c7e160bcb42927a5db0bf01de98f6a)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
bugs.c vs bug_64.c in UEK4

x86/speculation/mds: Fix comment

commit cae5ec342645746d617dd420d206e1588d47768a upstream

s/L1TF/MDS/

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
(cherry picked from commit 9a5f1b636984bcf0a8495208e0fc9f30e2ec8359)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
bugs.c vs bugs_64.c on UEK4

x86/speculation/mds: update mds_mitigation to reflect debugfs configuration

If we enable mds_user_clear, we set mds_mitigation to MDS_MITIGATION_FULL or
MDS_MITIGATION_VMWERV. When we disable mds_user_clear, we set mds_mitigation to
MDS_MITIGATION_IDLE if mds_idle_clear, otherwise MDS_MITIGATION_OFF. If we
enable mds_idle_clear, we set mds_mitigation to MDS_MITIGATION_IDLE only if
mds_user_clear is disabled. When we disable mds_idle_clear, we set
mds_mitigation to MDS_MITIGATION_OFF if mds_user_clear is disabled.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: fix microcode late loading

In the microcode late loading case we have to:
- clear the CPU bugs related to MDS to be re-evaluated
- add proper evaluation of the MDS state and enable mitigation if necessary.

If the user has enforced off or idle mitigation, we keep it. Also if the
microcode fixes the MDS bug, mitigation will be turned off.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: Add boot option to enable MDS protection only while in idle

Such protection may be useful when we only care about virtualization and
no core sharing between guests is allowed.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>

x86/speculation/mds: Improve coverage for MDS vulnerability

We seem to be missing a bunch of cases when we don't clear fill/store
buffers for MDS vulnerability during return to userspace.

Since we always call DISABLE_IBRS in those cases let's define a new
macro SPEC_RETURN_TO_USER than will both disable IBRS and flush the
buffers.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>

x86/speculation/mds: Add SMT warning message

commit 39226ef02bfb43248b7db12a4fdccb39d95318e3 upstream

MDS is vulnerable with SMT. Make that clear with a one-time printk
whenever SMT first gets enabled.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 2dad2dcf7f7eab0e6d10d560e22fddd0a08b37b3)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
bugs.c vs bugs_64.c: we do not have arch_smt_update. Squash the check in mds_select_mitigation.

x86/speculation/mds: Add mds=full,nosmt cmdline option

commit d71eb0ce109a124b0fa714832823b9452f2762cf upstream

Add the mds=full,nosmt cmdline option. This is like mds=full, but with
SMT disabled if the CPU is vulnerable.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 623b724d5e50c15d160799446956ba0d23d4f978)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
arch/x86/kernel/cpu/bugs.c
bugs.64 vs bugs_64.c: different boot command line parsing code
Documentation/admin-guide/kernel-parameters.txt vs Documentation/kernel-parameters.txt

Documentation: Add MDS vulnerability documentation

commit 5999bbe7a6ea3c62029532ec84dc06003a1fa258 upstream

Add the initial MDS vulnerability documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 3b5d5994ef174daf5f77ba3f101cd879cf0c5fd6)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Documentation: Move L1TF to separate directory

commit 65fd4cb65b2dad97feb8330b6690445910b56d6a upstream

Move L1TF to a separate directory so the MDS stuff can be added at the
side. Otherwise the all hardware vulnerabilites have their own top level
entry. Should have done that right away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 19049ee5b2543696dd3cb164a4c4e566984f9615)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: Add mitigation mode VMWERV

commit 22dd8365088b6403630b82423cf906491859b65e upstream

In virtualized environments it can happen that the host has the microcode
update which utilizes the VERW instruction to clear CPU buffers, but the
hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit
to guests.

Introduce an internal mitigation mode VWWERV which enables the invocation
of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the
system has no updated microcode this results in a pointless execution of
the VERW instruction wasting a few CPU cycles. If the microcode is updated,
but not exposed to a guest then the CPU buffers will be cleared.

That said: Virtual Machines Will Eventually Receive Vaccine

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit a2227b3f734fad5ace7c99103d6a7bc020c193fd)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
The changes to arch/x86/kernel/cpu/bugs.c instead need to be made to
arch/x86/kernel/cpu/bugs_64.c.

x86/speculation/mds: Add debugfs for controlling MDS

Add debugfs entries for controlling mds_user_clear and mds_idle_clear static
keys which would enable the mitigation in the respective paths.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: Add sysfs reporting for MDS

commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream

Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit db366061fff1f76407cb5d1b0975fcc381400cc3)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
The changes to arch/x86/kernel/cpu/bugs.c instead need to be made to
arch/x86/kernel/cpu/bugs_64.c.
X86_HYPER_NATIVE doesn't exist so just leave that change out.
sched_smt_active() does not exist, instead use cpu_smp_control.
hypervisor_is_type replaced with cpu_has_hypervisor

x86/speculation/mds: Add mitigation control for MDS

commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream

Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.

This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:

mds=[full|off]

This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.

The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation. The idle
mitigation is limited to CPUs which are only affected by MSBDS and not any
other variant, because the other variants cannot be mitigated on SMT
enabled systems.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 4cad86e4abd472f637038e0ad70a70d0d7333f83)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
The changes to arch/x86/kernel/cpu/bugs.c instead need to be made to
arch/x86/kernel/cpu/bugs_64.c.

x86/speculation/mds: Conditionally clear CPU buffers on idle entry

commit 07f07f55a29cb705e221eda7894dd67ab81ef343 upstream

Add a static key which controls the invocation of the CPU buffer clear
mechanism on idle entry. This is independent of other MDS mitigations
because the idle entry invocation to mitigate the potential leakage due to
store buffer repartitioning is only necessary on SMT systems.

Add the actual invocations to the different halt/mwait variants which
covers all usage sites. mwaitx is not patched as it's not available on
Intel CPUs.

The buffer clear is only invoked before entering the C-State to prevent
that stale data from the idling CPU is spilled to the Hyper-Thread sibling
after the Store buffer got repartitioned and all entries are available to
the non idle sibling.

When coming out of idle the store buffer is partitioned again so each
sibling has half of it available. Now CPU which returned from idle could be
speculatively exposed to contents of the sibling, but the buffers are
flushed either on exit to user space or on VMENTER.

When later on conditional buffer clearing is implemented on top of this,
then there is no action required either because before returning to user
space the context switch will set the condition flag which causes a flush
on the return to user path.

Note, that the buffer clearing on idle is only sensible on CPUs which are
solely affected by MSBDS and not any other variant of MDS because the other
MDS variants cannot be mitigated when SMT is enabled, so the buffer
clearing on idle would be a window dressing exercise.

This intentionally does not handle the case in the acpi/processor_idle
driver which uses the legacy IO port interface for C-State transitions for
two reasons:

- The acpi/processor_idle driver was replaced by the intel_idle driver
   almost a decade ago. Anything Nehalem upwards supports it and defaults
   to that new driver.

- The legacy IO port interface is likely to be used on older and therefore
   unaffected CPUs or on systems which do not receive microcode updates
   anymore, so there is no point in adding that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 7af0d7a8d40fa9dbf7700ed8e78f93c90b9a9e8b)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
Ignore added comments in the nonexistent function in mwait.h
Port the changes in bugs.c to bugs_64.c.
Move #include <cpufeature.h> in alternative.h to resolve implicit definition error.

x86/kvm/vmx: Add MDS protection when L1D Flush is not active

commit 650b68a0622f933444a6d66936abb3103029413b upstream

CPUs which are affected by L1TF and MDS mitigate MDS with the L1D Flush on
VMENTER when updated microcode is installed.

If a CPU is not affected by L1TF or if the L1D Flush is not in use, then
MDS mitigation needs to be invoked explicitly.

For these cases, follow the host mitigation state and invoke the MDS
mitigation before VMENTER.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit f0da457fb20b9f321adb8acce0c287eff6f850d1)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Changes from bugs.c imported to bugs_64.c

x86/speculation/mds: Clear CPU buffers on exit to user

commit 04dcbdb8057827b043b3c71aa397c4c63e67d086 upstream

Add a static key which controls the invocation of the CPU buffer clear
mechanism on exit to user space and add the call into
prepare_exit_to_usermode() and do_nmi() right before actually returning.

Add documentation which kernel to user space transition this covers and
explain why some corner cases are not mitigated.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 62ba379c5925f285e3ab9362761c18823e5a049e)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
UEK4 uses bugs_64.c instead of bugs.c arch/x86/entry/common.c doesn't exist.
Make the corresponding changes to arch/x86/kernel/entry_64.S.

x86/speculation/mds: Add mds_clear_cpu_buffers()

commit 6a9e529272517755904b7afa639f6db59ddb793e upstream

The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by
clearing the affected CPU buffers. The mechanism for clearing the buffers
uses the unused and obsolete VERW instruction in combination with a
microcode update which triggers a CPU buffer clear when VERW is executed.

Provide a inline function with the assembly magic. The argument of the VERW
instruction must be a memory operand as documented:

  "MD_CLEAR enumerates that the memory-operand variant of VERW (for
   example, VERW m16) has been extended to also overwrite buffers affected
   by MDS. This buffer overwriting functionality is not guaranteed for the
   register operand variant of VERW."

Documentation also recommends to use a writable data segment selector:

  "The buffer overwriting occurs regardless of the result of the VERW
   permission check, as well as when the selector is null or causes a
   descriptor load segment violation. However, for lowest latency we
   recommend using a selector that indicates a valid writable data
   segment."

Add x86 specific documentation about MDS and the internal workings of the
mitigation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 2f7a0abf72199b1c9cbaae99270343993b104921)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Added an assembly version of mds_clear_cpu_buffers

x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests

commit 6c4dbbd14730c43f4ed808a9c42ca41625925c22 upstream

X86_FEATURE_MD_CLEAR is a new CPUID bit which is set when microcode
provides the mechanism to invoke a flush of various exploitable CPU buffers
by invoking the VERW instruction.

Hand it through to guests so they can adjust their mitigations.

This also requires corresponding qemu changes, which are available
separately.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 0908473b20312b30f2600e4b16027d6c7facef4a)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/kvm/cpuid.c
Different initial content of cpuid bits.

x86/speculation/mds: Add BUG_MSBDS_ONLY

commit e261f209c3666e842fd645a1e31f001c3a26def9 upstream

This bug bit is set on CPUs which are only affected by Microarchitectural
Store Buffer Data Sampling (MSBDS) and not by any other MDS variant.

This is important because the Store Buffers are partitioned between
Hyper-Threads so cross thread forwarding is not possible. But if a thread
enters or exits a sleep state the store buffer is repartitioned which can
expose data from one thread to the other. This transition can be mitigated.

That means that for CPUs which are only affected by MSBDS SMT can be
enabled, if the CPU is not affected by other SMT sensitive vulnerabilities,
e.g. L1TF. The XEON PHI variants fall into that category. Also the
Silvermont/Airmont ATOMs, but for them it's not really relevant as they do
not support SMT, but mark them for completeness sake.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 24a8b2c167461f94542ecce1b0e48b30a9ef1a3b)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: Add basic bug infrastructure for MDS

commit ed5194c2732c8084af9fd159c146ea92bf137128 upstream

Microarchitectural Data Sampling (MDS), is a class of side channel attacks
on internal buffers in Intel CPUs. The variants are:

- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
- Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
- Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)

MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
dependent load (store-to-load forwarding) as an optimization. The forward
can also happen to a faulting or assisting load operation for a different
memory address, which can be exploited under certain conditions. Store
buffers are partitioned between Hyper-Threads so cross thread forwarding is
not possible. But if a thread enters or exits a sleep state the store
buffer is repartitioned which can expose data from one thread to the other.

MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
L1 miss situations and to hold data which is returned or sent in response
to a memory or I/O operation. Fill buffers can forward data to a load
operation and also write data to the cache. When the fill buffer is
deallocated it can retain the stale data of the preceding operations which
can then be forwarded to a faulting or assisting load operation, which can
be exploited under certain conditions. Fill buffers are shared between
Hyper-Threads so cross thread leakage is possible.

MLDPS leaks Load Port Data. Load ports are used to perform load operations
from memory or I/O. The received data is then forwarded to the register
file or a subsequent operation. In some implementations the Load Port can
contain stale data from a previous operation which can be forwarded to
faulting or assisting loads under certain conditions, which again can be
exploited eventually. Load ports are shared between Hyper-Threads so cross
thread leakage is possible.

All variants have the same mitigation for single CPU thread case (SMT off),
so the kernel can treat them as one MDS issue.

Add the basic infrastructure to detect if the current CPU is affected by
MDS.

[ tglx: Rewrote changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 928a58d368271748e2d8a4bdb7ba1412c5f1348f)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation: Consolidate CPU whitelists

commit 36ad35131adacc29b328b9c8b6277a8bf0d6fd5d upstream

The CPU vulnerability whitelists have some overlap and there are more
whitelists coming along.

Use the driver_data field in the x86_cpu_id struct to denote the
whitelisted vulnerabilities and combine all whitelists into one.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit dbcafb8714cdee996b72fc47e9df8eeca394ba70)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/msr-index: Cleanup bit defines

commit d8eabc37310a92df40d07c5a8afc53cebf996716 upstream

Greg pointed out that speculation related bit defines are using (1 << N)
format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
1UL at least.

Clean it up.

[ Josh Poimboeuf: Fix tools build ]

Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 4b2fc235844db92090c944b92174afb341395c97)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Documentation/l1tf: Fix small spelling typo

commit 60ca05c3b44566b70d64fbb8e87a6e0c67725468 upstream

Fix small typo (wiil -> will) in the "3.4. Nested virtual machines"
section.

Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Cc: linux-kernel@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: trivial@kernel.org
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 118c7a0fb6157b93bc6998b09f6bd15ccded5550)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>

x86/speculation: Simplify the CPU bug detection logic

commit 8ecc4979b1bd9c94168e6fc92960033b7a951336 upstream

Only CPUs which speculate can speculate. Therefore, it seems prudent
to test for cpu_no_speculation first and only then determine whether
a specific speculating CPU is susceptible to store bypass speculation.
This is underlined by all CPUs currently listed in cpu_no_speculation
were present in cpu_no_spec_store_bypass as well.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: konrad.wilk@oracle.com
Link: https://lkml.kernel.org/r/20180522090539.GA24668@light.dominikbrodowski.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit d24249216c7ddfc290843d9708e33adec921700b)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/apic: Make arch_setup_hwirq NUMA node aware

This is a second attempt to fix bug 29292411. The first attempt ran
afoul of special behavior in cluster_vector_allocation_domain() explained
below. This was discovered by QA immediately before QU7 was to be release,
so the original was reverted.

In a xen VM with vNUMA enabled, irq affinity for a device on node 1
may become stuck on CPU 0. /proc/irq/nnn/smp_affinity_list may show
affinity for all the CPUs on node 1, but this is wrong. All interrupts
are on the first CPU of node 0 which is usually CPU 0.

The problem is caused when __assign_irq_vector() is called by
arch_setup_hwirq() with a mask of all online CPUs, and then called later
with a mask including only the node 1 CPUs. The first call assigns affinity
to CPU 0, and the second tries to move affinity to the first online node 1
CPU. In the reported case this is always CPU 2. For some reason, the
CPU 0 affinity is never cleaned up, and all interrupts remain with CPU 0.
Since an incomplete move appears to be in progress, all attempts to
reassign affinity for the irq fail. Because of a quirk in how affinity is
displayed in /proc/irq/nnn/smp_affinity_list, changes may appear to work
temporarily.

It was not reproducible on baremetal on the machine I had available for
testing, but it is possible that it was observed on other machines. It
does not appear in UEK5. The APIC and IRQ code is very different in UEK5,
and the code changed here doesn't exist in UEK5. Upstream has completely
abandoned the UEK4 approach to IRQ management. It is unknown whether KVM
guests might see the same problem with UEK4.

Making arch_setup_hwirq() NUMA sensitive eliminates the problem by
using the correct cpumask for the node for the initial assignment. The
second assignment becomes a noop. After initialization is complete,
affinity can be moved to any CPU on any node and back without a problem.

However, cluster_vector_allocation_domain() contains a hack designed to
reduce vector pressure in cluster x2apic. Specifically, it relies on
the address of the cpumask passed to it to determine if this allocation
is for the default device bringup case or explicit migration. If the
address of the cpumask does not match what is returned by
apic->target_cpus(), it assumes it is the explicit migration case and goes
into cluster mode which uses up vectors on multiple CPUs. Since the
original patch modifies arch_setup_hwirq() to pass a cpumask with only
local CPUs in it, cluster_vector_allocation_domain() allocates for the
entire cluster rather than a single CPU. This can cause vector allocation
failures when there are a very large number of devices such as can be the
case when there are a large number of VFs (see bug 29534769).

Orabug: 29534769
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

KEYS: encrypted: fix buffer overread in valid_master_desc()

With the 'encrypted' key type it was possible for userspace to provide a
data blob ending with a master key description shorter than expected,
e.g. 'keyctl add encrypted desc "new x" @s'. When validating such a
master key description, validate_master_desc() could read beyond the end
of the buffer. Fix this by using strncmp() instead of memcmp(). [Also
clean up the code to deduplicate some logic.]

Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 794b4bc292f5d31739d89c0202c54e7dc9bc3add)

Orabug: 29591025
CVE: CVE-2017-13305

Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: target: remove hardcoded T10 Vendor ID in INQUIRY response

Use the value stored in t10_wwn.vendor, which defaults to "LIO-ORG",
but can be reconfigured via the vendor_id ConfigFS attribute.

This commit was backported by hand.

Orabug: 29344862

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: target: add device vendor id, product id and revision configfs attributes

The vendor_id, product_id and revision attributes will allow for the
modification of the T10 Model and Revision strings returned in
inquiry responses. Its value can be viewed and modified via the
ConfigFS path at:

target/core/$backstore/$name/wwn/vendor_id
target/core/$backstore/$name/wwn/product_id
target/core/$backstore/$name/wwn/revision

Orabug: 29344862

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: target: consistently null-terminate t10_wwn strings

In preparation for supporting user provided vendor strings, add an extra
byte to the vendor, model and revision arrays in struct t10_wwn. This
ensures that the full INQUIRY data can be carried in the arrays along with
a null-terminator.

Change a number of array readers and writers so that they account for
explicit null-termination:

- The pscsi_set_inquiry_info() and emulate_model_alias_store() codepaths
  don't currently explicitly null-terminate; fix this.

- Existing t10_wwn field dumps use for-loops which step over
  null-terminators for right-padding.
  + Use printf with width specifiers instead.

This commit was backported by hand.

Orabug: 29344862

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: target: use consistent left-aligned ASCII INQUIRY data

spc5r17.pdf specifies:

  4.3.1 ASCII data field requirements
  ASCII data fields shall contain only ASCII printable characters (i.e.,
  code values 20h to 7Eh) and may be terminated with one or more ASCII null
  (00h) characters.  ASCII data fields described as being left-aligned
  shall have any unused bytes at the end of the field (i.e., highest
  offset) and the unused bytes shall be filled with ASCII space characters
  (20h).

LIO currently space-pads the T10 VENDOR IDENTIFICATION and PRODUCT
IDENTIFICATION fields in the standard INQUIRY data. However, the PRODUCT
REVISION LEVEL field in the standard INQUIRY data as well as the T10 VENDOR
IDENTIFICATION field in the INQUIRY Device Identification VPD Page are
zero-terminated/zero-padded.

Fix this inconsistency by using space-padding for all of the above fields.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bryant G. Ly <bly@catalogicsoftware.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0de263577de5d5e052be5f4f93334e63cc8a7f0b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/target/target_core_spc.c

Orabug: 29344862

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: fix data corruption caused by unaligned direct AIO

commit 372a03e01853f860560eade508794dd274e9b390 upstream.

Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.

However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.

This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31

With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
*
0000b200

Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2ebfb9ae0047e900a932e6219a1a8ee830cde940)
Orabug: 29553371
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

swiotlb: checking whether swiotlb buffer is full with io_tlb_used

This patch uses io_tlb_used to help check whether swiotlb buffer is full.
io_tlb_used is no longer used for only debugfs. It is also used to help
optimize swiotlb_tbl_map_single().

Suggested-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 29582587
(cherry picked from commit 60513ed06a41049768a6875229b025b6e726e148)

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

swiotlb: add debugfs to track swiotlb buffer usage

The device driver will not be able to do dma operations once swiotlb buffer
is full, either because the driver is using so many IO TLB blocks inflight,
or because there is memory leak issue in device driver. To export the
swiotlb buffer usage via debugfs would help the user estimate the size of
swiotlb buffer to pre-allocate or analyze device driver memory leak issue.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 29582587
(cherry picked from commit 71602fe6d4e9291af105adfef8e893b57c735906)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
debugfs_create_ulong() is not available. debugfs_create_file() is used instead.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

swiotlb: fix comment on swiotlb_bounce()

Fix the comment as swiotlb_bounce() is used to copy from original dma
location to swiotlb buffer during swiotlb_tbl_map_single(), while to
copy from swiotlb buffer to original dma location during
swiotlb_tbl_unmap_single().

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 29582587
(cherry picked from commit 6442ca2abf882d9d838fb844d852ba6acd1db7f4)

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Olga Kornievskaia <aglo@umich.edu>
Acked-by: Adamson, Andy <William.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(cherry picked from commit cfd278c280f997cf2fe4662e0acab0fe465f637b)
Orabug: 29617508
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ib_core: initialize shpd field when allocating 'struct ib_pd'

The shared pd feature in Oracle Linux was added by
commit a1911c2c180d ("IB/Shared PD support from Oracle").
It adds a field named shpd to 'struct ib_pd' but fails to
initialize it in all the places it is allocated.

This results its uninitialized content being referenced
in mlx4_ib_dealloc_pd() and actions taken based on it
which eventually leads to resource leaks even when shared
pd feature is not being used.

This fix here initializes it to NULL in ib_alloc_pd() where
the ib_core module allocates the data structure.

Orabug: 29384815

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "x86/apic: Make arch_setup_hwirq NUMA node aware"

Orabug: 29542185

This reverts commit ff3efcf39b34994ab1ba9390bd3a79c1a985c895.

qlcnic: fix Tx descriptor corruption on 82xx devices

In regular NIC transmission flow, driver always configures MAC using
Tx queue zero descriptor as a part of MAC learning flow.
But with multi Tx queue supported NIC, regular transmission can occur on
any non-zero Tx queue and from that context it uses
Tx queue zero descriptor to configure MAC, at the same time TX queue
zero could be used by another CPU for regular transmission
which could lead to Tx queue zero descriptor corruption and cause FW
abort.

This patch fixes this in such a way that driver always configures
learned MAC address from the same Tx queue which is used for
regular transmission.

Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 27708787

(cherry picked from commit c333fa0c4f220f8f7ea5acd6b0ebf3bf13fd684d)
Signed-off-by: aru kolappan <aru.kolappan@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

block: Fix a race between blk_cleanup_queue() and timeout handling

Make sure that if the timeout timer fires after a queue has been
marked "dying" that the affected requests are finished.

Reported-by: chenxiang (M) <chenxiang66@hisilicon.com>
Fixes: commit 287922eb0b18 ("block: defer timeouts to a workqueue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: chenxiang (M) <chenxiang66@hisilicon.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(backport from upstream commit 4e9b6f20828ac880dbc1fa2fdbafae779473d1af)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
block/blk-core.c
block/blk-timeout.c

Orabug: 29158186

Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

can: gw: ensure DLC boundaries after CAN frame modification

Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN
frame modification rule that makes the data length code a higher value than
the available CAN frame data size. In combination with a configured checksum
calculation where the result is stored relatively to the end of the data
(e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in
skb_shared_info) can be rewritten which finally can cause a system crash.

Michael Kubecek suggested to drop frames that have a DLC exceeding the
available space after the modification process and provided a patch that can
handle CAN FD frames too. Within this patch we also limit the length for the
checksum calculations to the maximum of Classic CAN data length (8).

CAN frames that are dropped by these additional checks are counted with the
CGW_DELETED counter which indicates misconfigurations in can-gw rules.

This fixes CVE-2019-3701.

Reported-by: Muyu Yu <ieatmuttonchuan@gmail.com>
Reported-by: Marcus Meissner <meissner@suse.de>
Suggested-by: Michal Kubecek <mkubecek@suse.cz>
Tested-by: Muyu Yu <ieatmuttonchuan@gmail.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.2
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 29215299
CVE: CVE-2019-3701
(cherry picked from commit 0aaa81377c5a01f686bcdb8c7a6929a7bf330c68)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

CIFS: Enable encryption during session setup phase

In order to allow encryption on SMB connection we need to exchange
a session key and generate encryption and decryption keys.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit cabfb3680f78981d26c078a26e5c748531257ebb)

Orabug: 29338239
CVE: CVE-2018-1066

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/cifs/smb2pdu.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: clear i_data in ext4_inode_info when removing inline data

commit 6e8ab72a812396996035a37e5ca4b3b99b5d214b upstream.

When converting from an inode from storing the data in-line to a data
block, ext4_destroy_inline_data_nolock() was only clearing the on-disk
copy of the i_blocks[] array. It was not clearing copy of the
i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually
used by ext4_map_blocks().

This didn't matter much if we are using extents, since the extents
header would be invalid and thus the extents could would re-initialize
the extents tree. But if we are using indirect blocks, the previous
contents of the i_blocks array will be treated as block numbers, with
potentially catastrophic results to the file system integrity and/or
user data.

This gets worse if the file system is using a 1k block size and
s_first_data is zero, but even without this, the file system can get
quite badly corrupted.

This addresses CVE-2018-10881.

https://bugzilla.kernel.org/show_bug.cgi?id=200015

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit deb465ec750b80776cc4ac5b92b72c0a71fd4f0b)

Orabug: 29540709
CVE: CVE-2018-10881

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: add more inode number paranoia checks

commit c37e9e013469521d9adb932d17a1795c139b36db upstream.

If there is a directory entry pointing to a system inode (such as a
journal inode), complain and declare the file system to be corrupted.

Also, if the superblock's first inode number field is too small,
refuse to mount the file system.

This addresses CVE-2018-10882.

https://bugzilla.kernel.org/show_bug.cgi?id=200069

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ff6c96461be35381399466ad58f02b8d78ab480a)

Orabug: 29545566
CVE: CVE-2018-10882

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/inode.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: nVMX: Eliminate vmcs02 pool"

This reverts commit 22d8dc569898dec08b3c1fdc8a5b8b0e48ab8986.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: VMX: introduce alloc_loaded_vmcs"

This reverts commit 8dd66ca98dfc03f7921ce5bf5926ce2d95507d84.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: VMX: make MSR bitmaps per-VCPU"

This reverts commit b3bd557498bff191933d0481b609fdff0caa9ead.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: x86: pass host_initiated to functions that read MSRs"

This reverts commit fe09396e31ddc452c8ef2e0a5474c28d5a501e4e.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM/x86: Add IBPB support"

This reverts commit e5455ef7dbbab7ee5bc901ffdc7666e61fc41e11.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL - reloaded"

This reverts commit eaaa119c0f931596b7da2bb70e33d8cd1fcea576.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL"

This reverts commit 53da198cf7a918baa7ac267af1d89b71bc117ce9.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: SVM: Add MSR-based feature support for serializing LFENCE"

This reverts commit 351ff7a273f566e1efed1be14f01fc9510ad9002.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "x86/cpufeatures: rename X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD"

This reverts commit a1a02e26cd913a7cfb5086b3f3cb1561c04447b6.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>