]> www.infradead.org Git - users/dwmw2/linux.git/log
users/dwmw2/linux.git
7 years agofork: don't copy inconsistent signal handler state to child
Jann Horn [Wed, 22 Aug 2018 05:00:58 +0000 (22:00 -0700)]
fork: don't copy inconsistent signal handler state to child

[ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]

Before this change, if a multithreaded process forks while one of its
threads is changing a signal handler using sigaction(), the memcpy() in
copy_sighand() can race with the struct assignment in do_sigaction().  It
isn't clear whether this can cause corruption of the userspace signal
handler pointer, but it definitely can cause inconsistency between
different fields of struct sigaction.

Take the appropriate spinlock to avoid this.

I have tested that this patch prevents inconsistency between sa_sigaction
and sa_flags, which is possible before this patch.

Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosunrpc: Don't use stack buffer with scatterlist
Laura Abbott [Fri, 17 Aug 2018 21:43:54 +0000 (14:43 -0700)]
sunrpc: Don't use stack buffer with scatterlist

[ Upstream commit 44090cc876926277329e1608bafc01b9f6da627f ]

Fedora got a bug report from NFS:

kernel BUG at include/linux/scatterlist.h:143!
...
RIP: 0010:sg_init_one+0x7d/0x90
..
  make_checksum+0x4e7/0x760 [rpcsec_gss_krb5]
  gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5]
  gss_marshal+0x126/0x1a0 [auth_rpcgss]
  ? __local_bh_enable_ip+0x80/0xe0
  ? call_transmit_status+0x1d0/0x1d0 [sunrpc]
  call_transmit+0x137/0x230 [sunrpc]
  __rpc_execute+0x9b/0x490 [sunrpc]
  rpc_run_task+0x119/0x150 [sunrpc]
  nfs4_run_exchange_id+0x1bd/0x250 [nfsv4]
  _nfs4_proc_exchange_id+0x2d/0x490 [nfsv4]
  nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4]
  nfs4_discover_server_trunking+0x80/0x270 [nfsv4]
  nfs4_init_client+0x16e/0x240 [nfsv4]
  ? nfs_get_client+0x4c9/0x5d0 [nfs]
  ? _raw_spin_unlock+0x24/0x30
  ? nfs_get_client+0x4c9/0x5d0 [nfs]
  nfs4_set_client+0xb2/0x100 [nfsv4]
  nfs4_create_server+0xff/0x290 [nfsv4]
  nfs4_remote_mount+0x28/0x50 [nfsv4]
  mount_fs+0x3b/0x16a
  vfs_kern_mount.part.35+0x54/0x160
  nfs_do_root_mount+0x7f/0xc0 [nfsv4]
  nfs4_try_mount+0x43/0x70 [nfsv4]
  ? get_nfs_version+0x21/0x80 [nfs]
  nfs_fs_mount+0x789/0xbf0 [nfs]
  ? pcpu_alloc+0x6ca/0x7e0
  ? nfs_clone_super+0x70/0x70 [nfs]
  ? nfs_parse_mount_options+0xb40/0xb40 [nfs]
  mount_fs+0x3b/0x16a
  vfs_kern_mount.part.35+0x54/0x160
  do_mount+0x1fd/0xd50
  ksys_mount+0xba/0xd0
  __x64_sys_mount+0x21/0x30
  do_syscall_64+0x60/0x1f0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack
allocated buffer with a scatterlist. Convert the buffer for
rc4salt to be dynamically allocated instead.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agohfs: prevent crash on exit from failed search
Ernesto A. Fernández [Fri, 24 Aug 2018 00:00:31 +0000 (17:00 -0700)]
hfs: prevent crash on exit from failed search

[ Upstream commit dc2572791d3a41bab94400af2b6bca9d71ccd303 ]

hfs_find_exit() expects fd->bnode to be NULL after a search has failed.
hfs_brec_insert() may instead set it to an error-valued pointer.  Fix
this to prevent a crash.

Link: http://lkml.kernel.org/r/53d9749a029c41b4016c495fc5838c9dba3afc52.1530294815.git.ernesto.mnd.fernandez@gmail.com
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agohfsplus: don't return 0 when fill_super() failed
Tetsuo Handa [Wed, 22 Aug 2018 04:59:12 +0000 (21:59 -0700)]
hfsplus: don't return 0 when fill_super() failed

[ Upstream commit 7464726cb5998846306ed0a7d6714afb2e37b25d ]

syzbot is reporting NULL pointer dereference at mount_fs() [1].  This is
because hfsplus_fill_super() is by error returning 0 when
hfsplus_fill_super() detected invalid filesystem image, and mount_bdev()
is returning NULL because dget(s->s_root) == NULL if s->s_root == NULL,
and mount_fs() is accessing root->d_sb because IS_ERR(root) == false if
root == NULL.  Fix this by returning -EINVAL when hfsplus_fill_super()
detected invalid filesystem image.

[1] https://syzkaller.appspot.com/bug?id=21acb6850cecbc960c927229e597158cf35f33d0

Link: http://lkml.kernel.org/r/d83ce31a-874c-dd5b-f790-41405983a5be@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+01ffaf5d9568dd1609f7@syzkaller.appspotmail.com>
Reviewed-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocifs: check if SMB2 PDU size has been padded and suppress the warning
Ronnie Sahlberg [Wed, 22 Aug 2018 02:19:24 +0000 (12:19 +1000)]
cifs: check if SMB2 PDU size has been padded and suppress the warning

[ Upstream commit e6c47dd0da1e3a484e778046fc10da0b20606a86 ]

Some SMB2/3 servers, Win2016 but possibly others too, adds padding
not only between PDUs in a compound but also to the final PDU.
This padding extends the PDU to a multiple of 8 bytes.

Check if the unexpected length looks like this might be the case
and avoid triggering the log messages for :

  "SMB2 server sent bad RFC1001 len %d not %d\n"

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: sched: action_ife: take reference to meta module
Vlad Buslov [Mon, 3 Sep 2018 21:44:42 +0000 (00:44 +0300)]
net: sched: action_ife: take reference to meta module

[ Upstream commit 84cb8eb26cb9ce3c79928094962a475a9d850a53 ]

Recent refactoring of add_metainfo() caused use_all_metadata() to add
metainfo to ife action metalist without taking reference to module. This
causes warning in module_put called from ife action cleanup function.

Implement add_metainfo_and_get_ops() function that returns with reference
to module taken if metainfo was added successfully, and call it from
use_all_metadata(), instead of calling __add_metainfo() directly.

Example warning:

[  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
[  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
[  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
[  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  646.440595] RIP: 0010:module_put+0x1cb/0x230
[  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
[  646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286
[  646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db
[  646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518
[  646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3
[  646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d
[  646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100
[  646.508213] FS:  00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000
[  646.516961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0
[  646.530595] Call Trace:
[  646.533408]  ? find_symbol_in_section+0x260/0x260
[  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
[  646.543695]  tcf_action_cleanup+0x29/0xa0
[  646.548078]  __tcf_action_put+0x5a/0xb0
[  646.552289]  ? nla_put+0x65/0xe0
[  646.555889]  __tcf_idr_release+0x48/0x60
[  646.560187]  tcf_generic_walker+0x448/0x6b0
[  646.564764]  ? tcf_action_dump_1+0x450/0x450
[  646.569411]  ? __lock_is_held+0x84/0x110
[  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
[  646.578982]  tca_action_gd+0x972/0xc40
[  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
[  646.588471]  ? mark_lock+0xcf/0x980
[  646.592324]  ? check_chain_key+0x140/0x1f0
[  646.596832]  ? debug_show_all_locks+0x240/0x240
[  646.601839]  ? memset+0x1f/0x40
[  646.605350]  ? nla_parse+0xca/0x1a0
[  646.609217]  tc_ctl_action+0x215/0x230
[  646.613339]  ? tcf_action_add+0x220/0x220
[  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
[  646.622227]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.626466]  netlink_rcv_skb+0x18d/0x200
[  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.634959]  ? netlink_ack+0x500/0x500
[  646.639106]  netlink_unicast+0x2d0/0x370
[  646.643409]  ? netlink_attachskb+0x340/0x340
[  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
[  646.652870]  ? import_iovec+0x11e/0x1c0
[  646.657083]  netlink_sendmsg+0x3b9/0x6a0
[  646.661388]  ? netlink_unicast+0x370/0x370
[  646.665877]  ? netlink_unicast+0x370/0x370
[  646.670351]  sock_sendmsg+0x6b/0x80
[  646.674212]  ___sys_sendmsg+0x4a1/0x520
[  646.678443]  ? copy_msghdr_from_user+0x210/0x210
[  646.683463]  ? lock_downgrade+0x320/0x320
[  646.687849]  ? debug_show_all_locks+0x240/0x240
[  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
[  646.697418]  ? _raw_spin_unlock+0x24/0x30
[  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
[  646.706619]  ? __pmd_alloc+0x320/0x320
[  646.710738]  ? debug_show_all_locks+0x240/0x240
[  646.715649]  ? restore_nameidata+0x7b/0xa0
[  646.720117]  ? check_chain_key+0x140/0x1f0
[  646.724590]  ? check_chain_key+0x140/0x1f0
[  646.729070]  ? __fget_light+0xbc/0xd0
[  646.733121]  ? __sys_sendmsg+0xd7/0x150
[  646.737329]  __sys_sendmsg+0xd7/0x150
[  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
[  646.746003]  ? up_read+0x53/0x90
[  646.749601]  ? __do_page_fault+0x484/0x780
[  646.754105]  ? do_syscall_64+0x1e/0x2c0
[  646.758320]  do_syscall_64+0x72/0x2c0
[  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  646.767776] RIP: 0033:0x7f4163872150
[  646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
[  646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150
[  646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003
[  646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000
[  646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20
[  646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0
[  646.837360] irq event stamp: 6083
[  646.841043] hardirqs last  enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500
[  646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c
[  646.859775] softirqs last  enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee
[  646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife]
[  646.878845] ---[ end trace b1b8c12ffe51e657 ]---

Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoact_ife: fix a potential deadlock
Cong Wang [Sun, 19 Aug 2018 19:22:13 +0000 (12:22 -0700)]
act_ife: fix a potential deadlock

[ Upstream commit 5ffe57da29b3802baeddaa40909682bbb4cb4d48 ]

use_all_metadata() acquires read_lock(&ife_mod_lock), then calls
add_metainfo() which calls find_ife_oplist() which acquires the same
lock again. Deadlock!

Introduce __add_metainfo() which accepts struct tcf_meta_ops *ops
as an additional parameter and let its callers to decide how
to find it. For use_all_metadata(), it already has ops, no
need to find it again, just call __add_metainfo() directly.

And, as ife_mod_lock is only needed for find_ife_oplist(),
this means we can make non-atomic allocation for populate_metalist()
now.

Fixes: 817e9f2c5c26 ("act_ife: acquire ife_mod_lock before reading ifeoplist")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoact_ife: move tcfa_lock down to where necessary
Cong Wang [Sun, 19 Aug 2018 19:22:12 +0000 (12:22 -0700)]
act_ife: move tcfa_lock down to where necessary

[ Upstream commit 4e407ff5cd67ec76eeeea1deec227b7982dc7f66 ]

The only time we need to take tcfa_lock is when adding
a new metainfo to an existing ife->metalist. We don't need
to take tcfa_lock so early and so broadly in tcf_ife_init().

This means we can always take ife_mod_lock first, avoid the
reverse locking ordering warning as reported by Vlad.

Reported-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Vlad Buslov <vladbu@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agohv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()
Dexuan Cui [Thu, 30 Aug 2018 05:42:13 +0000 (05:42 +0000)]
hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()

[ Upstream commit e04e7a7bbd4bbabef4e1a58367e5fc9b2edc3b10 ]

This patch fixes the race between netvsc_probe() and
rndis_set_subchannel(), which can cause a deadlock.

These are the related 3 paths which show the deadlock:

path #1:
    Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
    Call Trace:
     schedule
     schedule_preempt_disabled
     __mutex_lock
     __device_attach
     bus_probe_device
     device_add
     vmbus_device_register
     vmbus_onoffer
     vmbus_onmessage_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #2:
    schedule
     schedule_preempt_disabled
     __mutex_lock
     netvsc_probe
     vmbus_probe
     really_probe
     __driver_attach
     bus_for_each_dev
     driver_attach_async
     async_run_entry_fn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #3:
    Workqueue: events netvsc_subchan_work [hv_netvsc]
    Call Trace:
     schedule
     rndis_set_subchannel
     netvsc_subchan_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

Before path #1 finishes, path #2 can start to run, because just before
the "bus_probe_device(dev);" in device_add() in path #1, there is a line
"object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can
immediately try to load hv_netvsc and hence path #2 can start to run.

Next, path #2 offloads the subchannal's initialization to a workqueue,
i.e. path #3, so we can end up in a deadlock situation like this:

Path #2 gets the device lock, and is trying to get the rtnl lock;
Path #3 gets the rtnl lock and is waiting for all the subchannel messages
to be processed;
Path #1 is trying to get the device lock, but since #2 is not releasing
the device lock, path #1 has to sleep; since the VMBus messages are
processed one by one, this means the sub-channel messages can't be
procedded, so #3 has to sleep with the rtnl lock held, and finally #2
has to sleep... Now all the 3 paths are sleeping and we hit the deadlock.

With the patch, we can make sure #2 gets both the device lock and the
rtnl lock together, gets its job done, and releases the locks, so #1
and #3 will not be blocked for ever.

Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agohv_netvsc: ignore devices that are not PCI
Stephen Hemminger [Tue, 21 Aug 2018 17:40:38 +0000 (10:40 -0700)]
hv_netvsc: ignore devices that are not PCI

[ Upstream commit b93c1b5ac8643cc08bb74fa8ae21d6c63dfcb23d ]

Registering another device with same MAC address (such as TAP, VPN or
DPDK KNI) will confuse the VF autobinding logic.  Restrict the search
to only run if the device is known to be a PCI attached VF.

Fixes: e8ff40d4bff1 ("hv_netvsc: improve VF device matching")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovhost: correctly check the iova range when waking virtqueue
Jason Wang [Fri, 24 Aug 2018 08:53:13 +0000 (16:53 +0800)]
vhost: correctly check the iova range when waking virtqueue

[ Upstream commit 2d66f997f0545c8f7fc5cf0b49af1decb35170e7 ]

We don't wakeup the virtqueue if the first byte of pending iova range
is the last byte of the range we just got updated. This will lead a
virtqueue to wait for IOTLB updating forever. Fixing by correct the
check and wake up the virtqueue in this case.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
Ido Schimmel [Fri, 24 Aug 2018 12:41:35 +0000 (15:41 +0300)]
mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge

[ Upstream commit 602b74eda81311dbdb5dbab08c30f789f648ebdc ]

When a bridge device is removed, the VLANs are flushed from each
configured port. This causes the ports to decrement the reference count
on the associated FIDs (filtering identifier). If the reference count of
a FID is 1 and it has a RIF (router interface), then this RIF is
destroyed.

However, if no port is member in the VLAN for which a RIF exists, then
the RIF will continue to exist after the removal of the bridge. To
reproduce:

# ip link add name br0 type bridge vlan_filtering 1
# ip link set dev swp1 master br0
# ip link add link br0 name br0.10 type vlan id 10
# ip address add 192.0.2.0/24 dev br0.10
# ip link del dev br0

The RIF associated with br0.10 continues to exist.

Fix this by iterating over all the bridge device uppers when it is
destroyed and take care of destroying their RIFs.

Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosctp: hold transport before accessing its asoc in sctp_transport_get_next
Xin Long [Mon, 27 Aug 2018 10:38:31 +0000 (18:38 +0800)]
sctp: hold transport before accessing its asoc in sctp_transport_get_next

[ Upstream commit bab1be79a5169ac748d8292b20c86d874022d7ba ]

As Marcelo noticed, in sctp_transport_get_next, it is iterating over
transports but then also accessing the association directly, without
checking any refcnts before that, which can cause an use-after-free
Read.

So fix it by holding transport before accessing the association. With
that, sctp_transport_hold calls can be removed in the later places.

Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonfp: wait for posted reconfigs when disabling the device
Jakub Kicinski [Wed, 29 Aug 2018 19:46:08 +0000 (12:46 -0700)]
nfp: wait for posted reconfigs when disabling the device

[ Upstream commit 9ad716b95fd6c6be46a4f2d5936e514b5bcd744d ]

To avoid leaking a running timer we need to wait for the
posted reconfigs after netdev is unregistered.  In common
case the process of deinitializing the device will perform
synchronous reconfigs which wait for posted requests, but
especially with VXLAN ports being actively added and removed
there can be a race condition leaving a timer running after
adapter structure is freed leading to a crash.

Add an explicit flush after deregistering and for a good
measure a warning to check if timer is running just before
structures are freed.

Fixes: 3d780b926a12 ("nfp: add async reconfiguration mechanism")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotipc: fix a missing rhashtable_walk_exit()
Cong Wang [Thu, 23 Aug 2018 23:19:44 +0000 (16:19 -0700)]
tipc: fix a missing rhashtable_walk_exit()

[ Upstream commit bd583fe30427500a2d0abe25724025b1cb5e2636 ]

rhashtable_walk_exit() must be paired with rhashtable_walk_enter().

Fixes: 40f9f4397060 ("tipc: Fix tipc_sk_reinit race conditions")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/sched: act_pedit: fix dump of extended layered op
Davide Caratti [Mon, 27 Aug 2018 20:56:22 +0000 (22:56 +0200)]
net/sched: act_pedit: fix dump of extended layered op

[ Upstream commit 85eb9af182243ce9a8b72410d5321c440ac5f8d7 ]

in the (rare) case of failure in nla_nest_start(), missing NULL checks in
tcf_pedit_key_ex_dump() can make the following command

 # tc action add action pedit ex munge ip ttl set 64

dereference a NULL pointer:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 PGD 800000007d1cd067 P4D 800000007d1cd067 PUD 7acd3067 PMD 0
 Oops: 0002 [#1] SMP PTI
 CPU: 0 PID: 3336 Comm: tc Tainted: G            E     4.18.0.pedit+ #425
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_pedit_dump+0x19d/0x358 [act_pedit]
 Code: be 02 00 00 00 48 89 df 66 89 44 24 20 e8 9b b1 fd e0 85 c0 75 46 8b 83 c8 00 00 00 49 83 c5 08 48 03 83 d0 00 00 00 4d 39 f5 <66> 89 04 25 00 00 00 00 0f 84 81 01 00 00 41 8b 45 00 48 8d 4c 24
 RSP: 0018:ffffb5d4004478a8 EFLAGS: 00010246
 RAX: ffff8880fcda2070 RBX: ffff8880fadd2900 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: ffffb5d4004478ca RDI: ffff8880fcda206e
 RBP: ffff8880fb9cb900 R08: 0000000000000008 R09: ffff8880fcda206e
 R10: ffff8880fadd2900 R11: 0000000000000000 R12: ffff8880fd26cf40
 R13: ffff8880fc957430 R14: ffff8880fc957430 R15: ffff8880fb9cb988
 FS:  00007f75a537a740(0000) GS:ffff8880fda00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007a2fa005 CR4: 00000000001606f0
 Call Trace:
  ? __nla_reserve+0x38/0x50
  tcf_action_dump_1+0xd2/0x130
  tcf_action_dump+0x6a/0xf0
  tca_get_fill.constprop.31+0xa3/0x120
  tcf_action_add+0xd1/0x170
  tc_ctl_action+0x137/0x150
  rtnetlink_rcv_msg+0x263/0x2d0
  ? _cond_resched+0x15/0x40
  ? rtnl_calcit.isra.30+0x110/0x110
  netlink_rcv_skb+0x4d/0x130
  netlink_unicast+0x1a3/0x250
  netlink_sendmsg+0x2ae/0x3a0
  sock_sendmsg+0x36/0x40
  ___sys_sendmsg+0x26f/0x2d0
  ? do_wp_page+0x8e/0x5f0
  ? handle_pte_fault+0x6c3/0xf50
  ? __handle_mm_fault+0x38e/0x520
  ? __sys_sendmsg+0x5e/0xa0
  __sys_sendmsg+0x5e/0xa0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f75a4583ba0
 Code: c3 48 8b 05 f2 62 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d fd c3 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
 RSP: 002b:00007fff60ee7418 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fff60ee7540 RCX: 00007f75a4583ba0
 RDX: 0000000000000000 RSI: 00007fff60ee7490 RDI: 0000000000000003
 RBP: 000000005b842d3e R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fff60ee6ea0 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fff60ee7554 R14: 0000000000000001 R15: 000000000066c100
 Modules linked in: act_pedit(E) ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul ext4 crc32_pclmul mbcache ghash_clmulni_intel jbd2 pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer cryptd glue_helper snd joydev pcspkr soundcore virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk virtio_console failover qxl crc32c_intel drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix virtio_pci libata virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_pedit]
 CR2: 0000000000000000

Like it's done for other TC actions, give up dumping pedit rules and return
an error if nla_nest_start() returns NULL.

Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovti6: remove !skb->ignore_df check from vti6_xmit()
Alexey Kodanev [Thu, 23 Aug 2018 16:49:54 +0000 (19:49 +0300)]
vti6: remove !skb->ignore_df check from vti6_xmit()

[ Upstream commit 9f2895461439fda2801a7906fb4c5fb3dbb37a0a ]

Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting
on xmit") '!skb->ignore_df' check was always true because the function
skb_scrub_packet() was called before it, resetting ignore_df to zero.

In the commit, skb_scrub_packet() was moved below, and now this check
can be false for the packet, e.g. when sending it in the two fragments,
this prevents successful PMTU updates in such case. The next attempts
to send the packet lead to the same tx error. Moreover, vti6 initial
MTU value relies on PMTU adjustments.

This issue can be reproduced with the following LTP test script:
    udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000

Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotcp: do not restart timewait timer on rst reception
Florian Westphal [Thu, 30 Aug 2018 12:24:29 +0000 (14:24 +0200)]
tcp: do not restart timewait timer on rst reception

[ Upstream commit 63cc357f7bba6729869565a12df08441a5995d9a ]

RFC 1337 says:
 ''Ignore RST segments in TIME-WAIT state.
   If the 2 minute MSL is enforced, this fix avoids all three hazards.''

So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
expire rather than removing it instantly when a reset is received.

However, Linux will also re-start the TIME-WAIT timer.

This causes connect to fail when tying to re-use ports or very long
delays (until syn retry interval exceeds MSL).

packetdrill test case:
// Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
`sysctl net.ipv4.tcp_rfc1337=1`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive first segment
0.310 < P. 1:1001(1000) ack 1 win 46

// Send one ACK
0.310 > . 1:1(0) ack 1001

// read 1000 byte
0.310 read(4, ..., 1000) = 1000

// Application writes 100 bytes
0.350 write(4, ..., 100) = 100
0.350 > P. 1:101(100) ack 1001

// ACK
0.500 < . 1001:1001(0) ack 101 win 257

// close the connection
0.600 close(4) = 0
0.600 > F. 101:101(0) ack 1001 win 244

// Our side is in FIN_WAIT_1 & waits for ack to fin
0.7 < . 1001:1001(0) ack 102 win 244

// Our side is in FIN_WAIT_2 with no outstanding data.
0.8 < F. 1001:1001(0) ack 102 win 244
0.8 > . 102:102(0) ack 1002 win 244

// Our side is now in TIME_WAIT state, send ack for fin.
0.9 < F. 1002:1002(0) ack 102 win 244
0.9 > . 102:102(0) ack 1002 win 244

// Peer reopens with in-window SYN:
1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>

// Therefore, reply with ACK.
1.000 > . 102:102(0) ack 1002 win 244

// Peer sends RST for this ACK.  Normally this RST results
// in tw socket removal, but rfc1337=1 setting prevents this.
1.100 < R 1002:1002(0) win 244

// second syn. Due to rfc1337=1 expect another pure ACK.
31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
31.0 > . 102:102(0) ack 1002 win 244

// .. and another RST from peer.
31.1 < R 1002:1002(0) win 244
31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`

// third syn after one minute.  Time-Wait socket should have expired by now.
63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>

// so we expect a syn-ack & 3whs to proceed from here on.
63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>

Without this patch, 'ss' shows restarts of tw timer and last packet is
thus just another pure ack, more than one minute later.

This restores the original code from commit 283fd6cf0be690a83
("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .

For some reason the else branch was removed/lost in 1f28b683339f7
("Merge in TCP/UDP optimizations and [..]") and timer restart became
unconditional.

Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agor8169: add support for NCube 8168 network card
Anthony Wong [Fri, 31 Aug 2018 12:06:42 +0000 (20:06 +0800)]
r8169: add support for NCube 8168 network card

[ Upstream commit 9fd0e09a4e86499639653243edfcb417a05c5c46 ]

This card identifies itself as:
  Ethernet controller [0200]: NCube Device [10ff:8168] (rev 06)
  Subsystem: TP-LINK Technologies Co., Ltd. Device [7470:3468]

Adding a new entry to rtl8169_pci_tbl makes the card work.

Link: http://launchpad.net/bugs/1788730
Signed-off-by: Anthony Wong <anthony.wong@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoqlge: Fix netdev features configuration.
Manish Chopra [Thu, 23 Aug 2018 20:20:52 +0000 (13:20 -0700)]
qlge: Fix netdev features configuration.

[ Upstream commit 6750c87074c5b534d82fdaabb1deb45b8f1f57de ]

qlge_fix_features() is not supposed to modify hardware or
driver state, rather it is supposed to only fix requested
fetures bits. Currently qlge_fix_features() also goes for
interface down and up unnecessarily if there is not even
any change in features set.

This patch changes/fixes following -

1) Move reload of interface or device re-config from
   qlge_fix_features() to qlge_set_features().
2) Reload of interface in qlge_set_features() only if
   relevant feature bit (NETIF_F_HW_VLAN_CTAG_RX) is changed.
3) Get rid of qlge_fix_features() since driver is not really
   required to fix any features bit.

Signed-off-by: Manish <manish.chopra@cavium.com>
Reviewed-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: sched: Fix memory exposure from short TCA_U32_SEL
Kees Cook [Sun, 26 Aug 2018 05:58:01 +0000 (22:58 -0700)]
net: sched: Fix memory exposure from short TCA_U32_SEL

[ Upstream commit 98c8f125fd8a6240ea343c1aa50a1be9047791b8 ]

Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
policy, so max length isn't enforced, only minimum. This means nkeys
(from userspace) was being trusted without checking the actual size of
nla_len(), which could lead to a memory over-read, and ultimately an
exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
a namespace.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: macb: do not disable MDIO bus at open/close time
Anssi Hannula [Thu, 23 Aug 2018 07:45:22 +0000 (10:45 +0300)]
net: macb: do not disable MDIO bus at open/close time

[ Upstream commit 0da70f808029476001109b6cb076737bc04cea2e ]

macb_reset_hw() is called from macb_close() and indirectly from
macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE
(Management Port Enable) bit.

This will prevent accessing any other PHYs for other Ethernet MACs on
the MDIO bus, which remains registered at macb_reset_hw() time, until
macb_init_hw() is called from macb_open() which sets the MPE bit again.

I.e. currently the MDIO bus has a short disruption at open time and is
disabled at close time until the interface is opened again.

Fix that by only touching the RE and TE bits when enabling and disabling
RX/TX.

v2: Make macb_init_hw() NCR write a single statement.

Fixes: 6c36a7074436 ("macb: Use generic PHY layer")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: bcmgenet: use MAC link status for fixed phy
Doug Berger [Tue, 28 Aug 2018 19:33:15 +0000 (12:33 -0700)]
net: bcmgenet: use MAC link status for fixed phy

[ Upstream commit c3c397c1f16c51601a3fac4fe0c63ad8aa85a904 ]

When using the fixed PHY with GENET (e.g. MOCA) the PHY link
status can be determined from the internal link status captured
by the MAC. This allows the PHY state machine to use the correct
link state with the fixed PHY even if MAC link event interrupts
are missed when the net device is opened.

Fixes: 8d88c6ebb34c ("net: bcmgenet: enable MoCA link state change detection")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state
Eric Dumazet [Wed, 22 Aug 2018 20:30:45 +0000 (13:30 -0700)]
ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state

[ Upstream commit 431280eebed9f5079553daf003011097763e71fd ]

tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
to send some control packets.

1) RST packets, through tcp_v4_send_reset()
2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()

These packets assert IP_DF, and also use the hashed IP ident generator
to provide an IPv4 ID number.

Geoff Alexander reported this could be used to build off-path attacks.

These packets should not be fragmented, since their size is smaller than
IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
regardless of inner IPID.

We really can use zero IPID, to address the flaw, and as a bonus,
avoid a couple of atomic operations in ip_idents_reserve()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Geoff Alexander <alexandg@cs.unm.edu>
Tested-by: Geoff Alexander <alexandg@cs.unm.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoact_ife: fix a potential use-after-free
Cong Wang [Mon, 3 Sep 2018 18:08:15 +0000 (11:08 -0700)]
act_ife: fix a potential use-after-free

[ Upstream commit 6d784f1625ea68783cc1fb17de8f6cd3e1660c3f ]

Immediately after module_put(), user could delete this
module, so e->ops could be already freed before we call
e->ops->release().

Fix this by moving module_put() after ops->release().

Fixes: ef6980b6becb ("introduce IFE action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.14.69 v4.14.69
Greg Kroah-Hartman [Sun, 9 Sep 2018 17:56:02 +0000 (19:56 +0200)]
Linux 4.14.69

7 years agoarm64: mm: always enable CONFIG_HOLES_IN_ZONE
James Morse [Thu, 30 Aug 2018 15:05:32 +0000 (16:05 +0100)]
arm64: mm: always enable CONFIG_HOLES_IN_ZONE

commit f52bb98f5aded4c43e52f5ce19fb83f7261e9e73 upstream.

Commit 6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA")
only enabled HOLES_IN_ZONE for NUMA systems because the NUMA code was
choking on the missing zone for nomap pages. This problem doesn't just
apply to NUMA systems.

If the architecture doesn't set HAVE_ARCH_PFN_VALID, pfn_valid() will
return true if the pfn is part of a valid sparsemem section.

When working with multiple pages, the mm code uses pfn_valid_within()
to test each page it uses within the sparsemem section is valid. On
most systems memory comes in MAX_ORDER_NR_PAGES chunks which all
have valid/initialised struct pages. In this case pfn_valid_within()
is optimised out.

Systems where this isn't true (e.g. due to nomap) should set
HOLES_IN_ZONE and provide HAVE_ARCH_PFN_VALID so that mm tests each
page as it works with it.

Currently non-NUMA arm64 systems can't enable HOLES_IN_ZONE, leading to
a VM_BUG_ON():

| page:fffffdff802e1780 is uninitialized and poisoned
| raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
| raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
| page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
| ------------[ cut here ]------------
| kernel BUG at include/linux/mm.h:978!
| Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[...]
| CPU: 1 PID: 25236 Comm: dd Not tainted 4.18.0 #7
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 40000085 (nZcv daIf -PAN -UAO)
| pc : move_freepages_block+0x144/0x248
| lr : move_freepages_block+0x144/0x248
| sp : fffffe0071177680
[...]
| Process dd (pid: 25236, stack limit = 0x0000000094cc07fb)
| Call trace:
|  move_freepages_block+0x144/0x248
|  steal_suitable_fallback+0x100/0x16c
|  get_page_from_freelist+0x440/0xb20
|  __alloc_pages_nodemask+0xe8/0x838
|  new_slab+0xd4/0x418
|  ___slab_alloc.constprop.27+0x380/0x4a8
|  __slab_alloc.isra.21.constprop.26+0x24/0x34
|  kmem_cache_alloc+0xa8/0x180
|  alloc_buffer_head+0x1c/0x90
|  alloc_page_buffers+0x68/0xb0
|  create_empty_buffers+0x20/0x1ec
|  create_page_buffers+0xb0/0xf0
|  __block_write_begin_int+0xc4/0x564
|  __block_write_begin+0x10/0x18
|  block_write_begin+0x48/0xd0
|  blkdev_write_begin+0x28/0x30
|  generic_perform_write+0x98/0x16c
|  __generic_file_write_iter+0x138/0x168
|  blkdev_write_iter+0x80/0xf0
|  __vfs_write+0xe4/0x10c
|  vfs_write+0xb4/0x168
|  ksys_write+0x44/0x88
|  sys_write+0xc/0x14
|  el0_svc_naked+0x30/0x34
| Code: aa1303e0 90001a01 91296421 94008902 (d4210000)
| ---[ end trace 1601ba47f6e883fe ]---

Remove the NUMA dependency.

Link: https://www.spinics.net/lists/arm-kernel/msg671851.html
Cc: <stable@vger.kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofs/quota: Fix spectre gadget in do_quotactl
Jeremy Cline [Tue, 31 Jul 2018 01:37:31 +0000 (01:37 +0000)]
fs/quota: Fix spectre gadget in do_quotactl

commit 7b6924d94a60c6b8c1279ca003e8744e6cd9e8b1 upstream.

'type' is user-controlled, so sanitize it after the bounds check to
avoid using it in speculative execution. This covers the following
potential gadgets detected with the help of smatch:

* fs/ext4/super.c:5741 ext4_quota_read() warn: potential spectre issue
  'sb_dqopt(sb)->files' [r]
* fs/ext4/super.c:5778 ext4_quota_write() warn: potential spectre issue
  'sb_dqopt(sb)->files' [r]
* fs/f2fs/super.c:1552 f2fs_quota_read() warn: potential spectre issue
  'sb_dqopt(sb)->files' [r]
* fs/f2fs/super.c:1608 f2fs_quota_write() warn: potential spectre issue
  'sb_dqopt(sb)->files' [r]
* fs/quota/dquot.c:412 mark_info_dirty() warn: potential spectre issue
  'sb_dqopt(sb)->info' [w]
* fs/quota/dquot.c:933 dqinit_needed() warn: potential spectre issue
  'dquots' [r]
* fs/quota/dquot.c:2112 dquot_commit_info() warn: potential spectre
  issue 'dqopt->ops' [r]
* fs/quota/dquot.c:2362 vfs_load_quota_inode() warn: potential spectre
  issue 'dqopt->files' [w] (local cap)
* fs/quota/dquot.c:2369 vfs_load_quota_inode() warn: potential spectre
  issue 'dqopt->ops' [w] (local cap)
* fs/quota/dquot.c:2370 vfs_load_quota_inode() warn: potential spectre
  issue 'dqopt->info' [w] (local cap)
* fs/quota/quota.c:110 quota_getfmt() warn: potential spectre issue
  'sb_dqopt(sb)->info' [r]
* fs/quota/quota_v2.c:84 v2_check_quota_file() warn: potential spectre
  issue 'quota_magics' [w]
* fs/quota/quota_v2.c:85 v2_check_quota_file() warn: potential spectre
  issue 'quota_versions' [w]
* fs/quota/quota_v2.c:96 v2_read_file_info() warn: potential spectre
  issue 'dqopt->info' [r]
* fs/quota/quota_v2.c:172 v2_write_file_info() warn: potential spectre
  issue 'dqopt->info' [r]

Additionally, a quick inspection indicates there are array accesses with
'type' in quota_on() and quota_off() functions which are also addressed
by this.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: caam/qi - fix error path in xts setkey
Horia Geantă [Mon, 6 Aug 2018 12:29:39 +0000 (15:29 +0300)]
crypto: caam/qi - fix error path in xts setkey

commit ad876a18048f43b1f66f5d474b7598538668c5de upstream.

xts setkey callback returns 0 on some error paths.
Fix this by returning -EINVAL.

Cc: <stable@vger.kernel.org> # 4.12+
Fixes: b189817cf789 ("crypto: caam/qi - add ablkcipher and authenc algorithms")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: caam/jr - fix descriptor DMA unmapping
Horia Geantă [Mon, 6 Aug 2018 12:29:09 +0000 (15:29 +0300)]
crypto: caam/jr - fix descriptor DMA unmapping

commit cc98963dbaaea93d17608641b8d6942a5327fc31 upstream.

Descriptor address needs to be swapped to CPU endianness before being
DMA unmapped.

Cc: <stable@vger.kernel.org> # 4.8+
Fixes: 261ea058f016 ("crypto: caam - handle core endianness != caam endianness")
Reported-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: caam - fix DMA mapping direction for RSA forms 2 & 3
Horia Geantă [Mon, 6 Aug 2018 12:29:55 +0000 (15:29 +0300)]
crypto: caam - fix DMA mapping direction for RSA forms 2 & 3

commit f1bf9e60a0779ec97de9ecdc353e1d01cdd73f43 upstream.

Crypto engine needs some temporary locations in external memory for
running RSA decrypt forms 2 and 3 (CRT).
These are named "tmp1" and "tmp2" in the PDB.

Update DMA mapping direction of tmp1 and tmp2 from TO_DEVICE to
BIDIRECTIONAL, since engine needs r/w access.

Cc: <stable@vger.kernel.org> # 4.13+
Fixes: 52e26d77b8b3 ("crypto: caam - add support for RSA key form 2")
Fixes: 4a651b122adb ("crypto: caam - add support for RSA key form 3")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: vmx - Fix sleep-in-atomic bugs
Ondrej Mosnacek [Wed, 22 Aug 2018 06:26:31 +0000 (08:26 +0200)]
crypto: vmx - Fix sleep-in-atomic bugs

commit 0522236d4f9c5ab2e79889cb020d1acbe5da416e upstream.

This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX
implementations. The problem is that the blkcipher_* functions should
not be called in atomic context.

The bugs can be reproduced via the AF_ALG interface by trying to
encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the
VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then
trigger BUG in crypto_yield():

[  891.863680] BUG: sleeping function called from invalid context at include/crypto/algapi.h:424
[  891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc
[  891.864739] 1 lock held by kcapi-enc/12347:
[  891.864811]  #0: 00000000f5d42c46 (sk_lock-AF_ALG){+.+.}, at: skcipher_recvmsg+0x50/0x530
[  891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted 4.19.0-0.rc0.git3.1.fc30.ppc64le #1
[  891.865251] Call Trace:
[  891.865340] [c0000003387578c0] [c000000000d67ea4] dump_stack+0xe8/0x164 (unreliable)
[  891.865511] [c000000338757910] [c000000000172a58] ___might_sleep+0x2f8/0x310
[  891.865679] [c000000338757990] [c0000000006bff74] blkcipher_walk_done+0x374/0x4a0
[  891.865825] [c0000003387579e0] [d000000007e73e70] p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
[  891.865993] [c000000338757ad0] [c0000000006c0ee0] skcipher_encrypt_blkcipher+0x60/0x80
[  891.866128] [c000000338757b10] [c0000000006ec504] skcipher_recvmsg+0x424/0x530
[  891.866283] [c000000338757bd0] [c000000000b00654] sock_recvmsg+0x74/0xa0
[  891.866403] [c000000338757c10] [c000000000b00f64] ___sys_recvmsg+0xf4/0x2f0
[  891.866515] [c000000338757d90] [c000000000b02bb8] __sys_recvmsg+0x68/0xe0
[  891.866631] [c000000338757e30] [c00000000000bbe4] system_call+0x5c/0x70

Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoperf auxtrace: Fix queue resize
Adrian Hunter [Tue, 14 Aug 2018 08:46:08 +0000 (11:46 +0300)]
perf auxtrace: Fix queue resize

commit 99cbbe56eb8bede625f410ab62ba34673ffa7d21 upstream.

When the number of queues grows beyond 32, the array of queues is
resized but not all members were being copied. Fix by also copying
'tid', 'cpu' and 'set'.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Fixes: e502789302a6e ("perf auxtrace: Add helpers for queuing AUX area tracing data")
Link: http://lkml.kernel.org/r/20180814084608.6563-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
Eddie.Horng [Fri, 20 Jul 2018 07:30:00 +0000 (15:30 +0800)]
cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()

commit 355139a8dba446cc11a424cddbf7afebc3041ba1 upstream.

The code in cap_inode_getsecurity(), introduced by commit 8db6c34f1dbc
("Introduce v3 namespaced file capabilities"), should use
d_find_any_alias() instead of d_find_alias() do handle unhashed dentry
correctly. This is needed, for example, if execveat() is called with an
open but unlinked overlayfs file, because overlayfs unhashes dentry on
unlink.
This is a regression of real life application, first reported at
https://www.spinics.net/lists/linux-unionfs/msg05363.html

Below reproducer and setup can reproduce the case.
  const char* exec="echo";
  const char *newargv[] = { "echo", "hello", NULL};
  const char *newenviron[] = { NULL };
  int fd, err;

  fd = open(exec, O_PATH);
  unlink(exec);
  err = syscall(322/*SYS_execveat*/, fd, "", newargv, newenviron,
AT_EMPTY_PATH);
  if(err<0)
    fprintf(stderr, "execveat: %s\n", strerror(errno));

gcc compile into ~/test/a.out
mount -t overlay -orw,lowerdir=/mnt/l,upperdir=/mnt/u,workdir=/mnt/w
none /mnt/m
cd /mnt/m
cp /bin/echo .
~/test/a.out

Expected result:
hello
Actually result:
execveat: Invalid argument
dmesg:
Invalid argument reading file caps for /dev/fd/3

The 2nd reproducer and setup emulates similar case but for
regular filesystem:
  const char* exec="echo";
  int fd, err;
  char buf[256];

  fd = open(exec, O_RDONLY);
  unlink(exec);
  err = fgetxattr(fd, "security.capability", buf, 256);
  if(err<0)
    fprintf(stderr, "fgetxattr: %s\n", strerror(errno));

gcc compile into ~/test_fgetxattr

cd /tmp
cp /bin/echo .
~/test_fgetxattr

Result:
fgetxattr: Invalid argument

On regular filesystem, for example, ext4 read xattr from
disk and return to execveat(), will not trigger this issue, however,
the overlay attr handler pass real dentry to vfs_getxattr() will.
This reproducer calls fgetxattr() with an unlinked fd, involkes
vfs_getxattr() then reproduced the case that d_find_alias() in
cap_inode_getsecurity() can't find the unlinked dentry.

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities")
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Eddie Horng <eddie.horng@mediatek.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobcache: release dc->writeback_lock properly in bch_writeback_thread()
Shan Hai [Wed, 22 Aug 2018 18:02:56 +0000 (02:02 +0800)]
bcache: release dc->writeback_lock properly in bch_writeback_thread()

commit 3943b040f11ed0cc6d4585fd286a623ca8634547 upstream.

The writeback thread would exit with a lock held when the cache device
is detached via sysfs interface, fix it by releasing the held lock
before exiting the while-loop.

Fixes: fadd94e05c02 (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Coly Li <colyli@suse.de>
Tested-by: Shenghui Wang <shhuiw@foxmail.com>
Cc: stable@vger.kernel.org #4.17+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agolibnvdimm: fix ars_status output length calculation
Vishal Verma [Fri, 10 Aug 2018 19:23:15 +0000 (13:23 -0600)]
libnvdimm: fix ars_status output length calculation

commit 286e87718103acdf85f4ed323a37e4839a8a7c05 upstream.

Commit efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Introduced additional hardening for ambiguity in the ACPI spec for
ars_status output sizing. However, it had a couple of cases mixed up.
Where it should have been checking for (and returning) "out_field[1] -
4" it was using "out_field[1] - 8" and vice versa.

This caused a four byte discrepancy in the buffer size passed on to
the command handler, and in some cases, this caused memory corruption
like:

  ./daxdev-errors.sh: line 76: 24104 Aborted   (core dumped) ./daxdev-errors $busdev $region
  malloc(): memory corruption
  Program received signal SIGABRT, Aborted.
  [...]
  #5  0x00007ffff7865a2e in calloc () from /lib64/libc.so.6
  #6  0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136
  #7  0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144
  #8  test_daxdev_clear_error (region_name=<optimized out>, bus_name=<optimized out>)
      at daxdev-errors.c:332

Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Lukasz Dorau <lukasz.dorau@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Fixes: efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-of-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogetxattr: use correct xattr length
Christian Brauner [Thu, 7 Jun 2018 11:43:48 +0000 (13:43 +0200)]
getxattr: use correct xattr length

commit 82c9a927bc5df6e06b72d206d24a9d10cced4eb5 upstream.

When running in a container with a user namespace, if you call getxattr
with name = "system.posix_acl_access" and size % 8 != 4, then getxattr
silently skips the user namespace fixup that it normally does resulting in
un-fixed-up data being returned.
This is caused by posix_acl_fix_xattr_to_user() being passed the total
buffer size and not the actual size of the xattr as returned by
vfs_getxattr().
This commit passes the actual length of the xattr as returned by
vfs_getxattr() down.

A reproducer for the issue is:

  touch acl_posix

  setfacl -m user:0:rwx acl_posix

and the compile:

  #define _GNU_SOURCE
  #include <errno.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <sys/types.h>
  #include <unistd.h>
  #include <attr/xattr.h>

  /* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */
  int main(int argc, void **argv)
  {
          ssize_t ret1, ret2;
          char buf1[128], buf2[132];
          int fret = EXIT_SUCCESS;
          char *file;

          if (argc < 2) {
                  fprintf(stderr,
                          "Please specify a file with "
                          "\"system.posix_acl_access\" permissions set\n");
                  _exit(EXIT_FAILURE);
          }
          file = argv[1];

          ret1 = getxattr(file, "system.posix_acl_access",
                          buf1, sizeof(buf1));
          if (ret1 < 0) {
                  fprintf(stderr, "%s - Failed to retrieve "
                                  "\"system.posix_acl_access\" "
                                  "from \"%s\"\n", strerror(errno), file);
                  _exit(EXIT_FAILURE);
          }

          ret2 = getxattr(file, "system.posix_acl_access",
                          buf2, sizeof(buf2));
          if (ret2 < 0) {
                  fprintf(stderr, "%s - Failed to retrieve "
                                  "\"system.posix_acl_access\" "
                                  "from \"%s\"\n", strerror(errno), file);
                  _exit(EXIT_FAILURE);
          }

          if (ret1 != ret2) {
                  fprintf(stderr, "The value of \"system.posix_acl_"
                                  "access\" for file \"%s\" changed "
                                  "between two successive calls\n", file);
                  _exit(EXIT_FAILURE);
          }

          for (ssize_t i = 0; i < ret2; i++) {
                  if (buf1[i] == buf2[i])
                          continue;

                  fprintf(stderr,
                          "Unexpected different in byte %zd: "
                          "%02x != %02x\n", i, buf1[i], buf2[i]);
                  fret = EXIT_FAILURE;
          }

          if (fret == EXIT_SUCCESS)
                  fprintf(stderr, "Test passed\n");
          else
                  fprintf(stderr, "Test failed\n");

          _exit(fret);
  }
and run:

  ./tester acl_posix

On a non-fixed up kernel this should return something like:

  root@c1:/# ./t
  Unexpected different in byte 16: ffffffa0 != 00
  Unexpected different in byte 17: ffffff86 != 00
  Unexpected different in byte 18: 01 != 00

and on a fixed kernel:

  root@c1:~# ./t
  Test passed

Cc: stable@vger.kernel.org
Fixes: 2f6f0654ab61 ("userns: Convert vfs posix_acl support to use kuids and kgids")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945
Reported-by: Colin Watson <cjwatson@ubuntu.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoudlfb: set optimal write delay
Mikulas Patocka [Wed, 25 Jul 2018 13:41:55 +0000 (15:41 +0200)]
udlfb: set optimal write delay

commit bb24153a3f13dd0dbc1f8055ad97fe346d598f66 upstream.

The default delay 5 jiffies is too much when the kernel is compiled with
HZ=100 - it results in jumpy cursor in Xwindow.

In order to find out the optimal delay, I benchmarked the driver on
1280x720x30fps video. I found out that with HZ=1000, 10ms is acceptable,
but with HZ=250 or HZ=300, we need 4ms, so that the video is played
without any frame skips.

This patch changes the delay to this value.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofb: fix lost console when the user unplugs a USB adapter
Mikulas Patocka [Wed, 25 Jul 2018 13:41:54 +0000 (15:41 +0200)]
fb: fix lost console when the user unplugs a USB adapter

commit 8c5b044299951acd91e830a688dd920477ea1eda upstream.

I have a USB display adapter using the udlfb driver and I use it on an ARM
board that doesn't have any graphics card. When I plug the adapter in, the
console is properly displayed, however when I unplug and re-plug the
adapter, the console is not displayed and I can't access it until I reboot
the board.

The reason is this:
When the adapter is unplugged, dlfb_usb_disconnect calls
unlink_framebuffer, then it waits until the reference count drops to zero
and then it deallocates the framebuffer. However, the console that is
attached to the framebuffer device keeps the reference count non-zero, so
the framebuffer device is never destroyed. When the USB adapter is plugged
again, it creates a new device /dev/fb1 and the console is not attached to
it.

This patch fixes the bug by unbinding the console from unlink_framebuffer.
The code to unbind the console is moved from do_unregister_framebuffer to
a function unbind_console. When the console is unbound, the reference
count drops to zero and the udlfb driver frees the framebuffer. When the
adapter is plugged back, a new framebuffer is created and the console is
attached to it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bernie Thompson <bernie@plugable.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: stable@vger.kernel.org
[b.zolnierkie: preserve old behavior for do_unregister_framebuffer()]
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopwm: tiehrpwm: Fix disabling of output of PWMs
Vignesh R [Mon, 11 Jun 2018 06:09:56 +0000 (11:39 +0530)]
pwm: tiehrpwm: Fix disabling of output of PWMs

commit 38dabd91ff0bde33352ca3cc65ef515599b77a05 upstream.

pwm-tiehrpwm driver disables PWM output by putting it in low output
state via active AQCSFRC register in ehrpwm_pwm_disable(). But, the
AQCSFRC shadow register is not updated. Therefore, when shadow AQCSFRC
register is re-enabled in ehrpwm_pwm_enable() (say to enable second PWM
output), previous settings are lost as shadow register value is loaded
into active register. This results in things like PWMA getting enabled
automatically, when PWMB is enabled and vice versa. Fix this by
updating AQCSFRC shadow register as well during ehrpwm_pwm_disable().

Fixes: 19891b20e7c2 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM")
Cc: stable@vger.kernel.org
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopwm: tiehrpwm: Don't use emulation mode bits to control PWM output
Vignesh R [Mon, 11 Jun 2018 06:09:55 +0000 (11:39 +0530)]
pwm: tiehrpwm: Don't use emulation mode bits to control PWM output

commit aa49d628f6e016bcec8c6f8e704b9b18ee697329 upstream.

As per AM335x TRM SPRUH73P "15.2.2.11 ePWM Behavior During Emulation",
TBCTL[15:14] only have effect during emulation suspend events (IOW,
to stop PWM when debugging using a debugger). These bits have no effect
on PWM output during normal running of system. Hence, remove code
accessing these bits as they have no role in enabling/disabling PWMs.

Fixes: 19891b20e7c2 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM")
Cc: stable@vger.kernel.org
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoubifs: Fix synced_i_size calculation for xattr inodes
Richard Weinberger [Mon, 11 Jun 2018 22:52:28 +0000 (00:52 +0200)]
ubifs: Fix synced_i_size calculation for xattr inodes

commit 59965593205fa4044850d35ee3557cf0b7edcd14 upstream.

In ubifs_jnl_update() we sync parent and child inodes to the flash,
in case of xattrs, the parent inode (AKA host inode) has a non-zero
data_len. Therefore we need to adjust synced_i_size too.

This issue was reported by ubifs self tests unter a xattr related work
load.
UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: ui_size is 4, synced_i_size is 0, but inode is clean
UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: i_ino 65, i_mode 0x81a4, i_size 4

Cc: <stable@vger.kernel.org>
Fixes: 1e51764a3c2a ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoubifs: xattr: Don't operate on deleted inodes
Richard Weinberger [Sun, 8 Jul 2018 21:33:25 +0000 (23:33 +0200)]
ubifs: xattr: Don't operate on deleted inodes

commit 11a6fc3dc743e22fb50f2196ec55bee5140d3c52 upstream.

xattr operations can race with unlink and the following assert triggers:
UBIFS assert failed in ubifs_jnl_change_xattr at 1606 (pid 6256)

Fix this by checking i_nlink before working on the host inode.

Cc: <stable@vger.kernel.org>
Fixes: 1e51764a3c2a ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoubifs: Check data node size before truncate
Richard Weinberger [Sun, 1 Jul 2018 21:20:51 +0000 (23:20 +0200)]
ubifs: Check data node size before truncate

commit 95a22d2084d72ea067d8323cc85677dba5d97cae upstream.

Check whether the size is within bounds before using it.
If the size is not correct, abort and dump the bad data node.

Cc: Kees Cook <keescook@chromium.org>
Cc: Silvio Cesare <silvio.cesare@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system")
Reported-by: Silvio Cesare <silvio.cesare@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "UBIFS: Fix potential integer overflow in allocation"
Richard Weinberger [Sun, 1 Jul 2018 21:20:50 +0000 (23:20 +0200)]
Revert "UBIFS: Fix potential integer overflow in allocation"

commit 08acbdd6fd736b90f8d725da5a0de4de2dd6de62 upstream.

This reverts commit 353748a359f1821ee934afc579cf04572406b420.
It bypassed the linux-mtd review process and fixes the issue not as it
should.

Cc: Kees Cook <keescook@chromium.org>
Cc: Silvio Cesare <silvio.cesare@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoubifs: Fix memory leak in lprobs self-check
Richard Weinberger [Tue, 12 Jun 2018 18:49:45 +0000 (20:49 +0200)]
ubifs: Fix memory leak in lprobs self-check

commit eef19816ada3abd56d9f20c88794cc2fea83ebb2 upstream.

Allocate the buffer after we return early.
Otherwise memory is being leaked.

Cc: <stable@vger.kernel.org>
Fixes: 1e51764a3c2a ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agouserns: move user access out of the mutex
Jann Horn [Mon, 25 Jun 2018 16:34:19 +0000 (18:34 +0200)]
userns: move user access out of the mutex

commit 5820f140edef111a9ea2ef414ab2428b8cb805b1 upstream.

The old code would hold the userns_state_mutex indefinitely if
memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
moving the memdup_user_nul in front of the mutex_lock().

Note: This changes the error precedence of invalid buf/count/*ppos vs
map already written / capabilities missing.

Fixes: 22d917d80e84 ("userns: Rework the user_namespace adding uid/gid...")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Christian Brauner <christian@brauner.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosys: don't hold uts_sem while accessing userspace memory
Jann Horn [Mon, 25 Jun 2018 16:34:10 +0000 (18:34 +0200)]
sys: don't hold uts_sem while accessing userspace memory

commit 42a0cc3478584d4d63f68f2f5af021ddbea771fa upstream.

Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.

Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoiommu/vt-d: Fix dev iotlb pfsid use
Jacob Pan [Thu, 7 Jun 2018 16:57:00 +0000 (09:57 -0700)]
iommu/vt-d: Fix dev iotlb pfsid use

commit 1c48db44924298ad0cb5a6386b88017539be8822 upstream.

PFSID should be used in the invalidation descriptor for flushing
device IOTLBs on SRIOV VFs.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: "Ashok Raj" <ashok.raj@intel.com>
Cc: "Lu Baolu" <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoiommu/vt-d: Add definitions for PFSID
Jacob Pan [Thu, 7 Jun 2018 16:56:59 +0000 (09:56 -0700)]
iommu/vt-d: Add definitions for PFSID

commit 0f725561e168485eff7277d683405c05b192f537 upstream.

When SRIOV VF device IOTLB is invalidated, we need to provide
the PF source ID such that IOMMU hardware can gauge the depth
of invalidation queue which is shared among VFs. This is needed
when device invalidation throttle (DIT) capability is supported.

This patch adds bit definitions for checking and tracking PFSID.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: "Ashok Raj" <ashok.raj@intel.com>
Cc: "Lu Baolu" <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/tlb: Remove tlb_remove_table() non-concurrent condition
Peter Zijlstra [Wed, 22 Aug 2018 15:30:14 +0000 (17:30 +0200)]
mm/tlb: Remove tlb_remove_table() non-concurrent condition

commit a6f572084fbee8b30f91465f4a085d7a90901c57 upstream.

Will noted that only checking mm_users is incorrect; we should also
check mm_count in order to cover CPUs that have a lazy reference to
this mm (and could do speculative TLB operations).

If removing this turns out to be a performance issue, we can
re-instate a more complete check, but in tlb_table_flush() eliding the
call_rcu_sched().

Fixes: 267239116987 ("mm, powerpc: move the RCU page-table freeing into generic code")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: tegra: Fix Tegra30 Cardhu PCA954x reset
Jon Hunter [Tue, 3 Jul 2018 08:59:47 +0000 (09:59 +0100)]
ARM: tegra: Fix Tegra30 Cardhu PCA954x reset

commit 6e1811900b6fe6f2b4665dba6bd6ed32c6b98575 upstream.

On all versions of Tegra30 Cardhu, the reset signal to the NXP PCA9546
I2C mux is connected to the Tegra GPIO BB0. Currently, this pin on the
Tegra is not configured as a GPIO but as a special-function IO (SFIO)
that is multiplexing the pin to an I2S controller. On exiting system
suspend, I2C commands sent to the PCA9546 are failing because there is
no ACK. Although it is not possible to see exactly what is happening
to the reset during suspend, by ensuring it is configured as a GPIO
and driven high, to de-assert the reset, the failures are no longer
seen.

Please note that this GPIO is also used to drive the reset signal
going to the camera connector on the board. However, given that there
is no camera support currently for Cardhu, this should not have any
impact.

Fixes: 40431d16ff11 ("ARM: tegra: enable PCA9546 on Cardhu")
Cc: stable@vger.kernel.org
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
Trond Myklebust [Tue, 14 Aug 2018 21:55:56 +0000 (17:55 -0400)]
NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()

commit 8618289c46556fd4dd259a1af02ccc448032f48d upstream.

We must drop the lock before we can sleep in referring_call_exists().

Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Fixes: 045d2a6d076a ("NFSv4.1: Delay callback processing...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFSv4: Fix locking in pnfs_generic_recover_commit_reqs
Trond Myklebust [Tue, 14 Aug 2018 21:25:37 +0000 (17:25 -0400)]
NFSv4: Fix locking in pnfs_generic_recover_commit_reqs

commit d0fbb1d8a194c0ec0180c1d073ad709e45503a43 upstream.

The use of the inode->i_lock was converted to a mutex, but we forgot
to remove the old inode unlock/lock() pair that allowed the layout
segment to be put inside the loop.

Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Fixes: e824f99adaaf1 ("NFSv4: Use a mutex to protect the per-inode commit...")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFSv4 client live hangs after live data migration recovery
Bill Baker [Tue, 19 Jun 2018 21:24:58 +0000 (16:24 -0500)]
NFSv4 client live hangs after live data migration recovery

commit 0f90be132cbf1537d87a6a8b9e80867adac892f6 upstream.

After a live data migration event at the NFS server, the client may send
I/O requests to the wrong server, causing a live hang due to repeated
recovery events.  On the wire, this will appear as an I/O request failing
with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
NFS4ERR_BADSSESSION is returned because the session ID being used was
issued by the other server and is not valid at the old server.

The failure is caused by async worker threads having cached the transport
(xprt) in the rpc_task structure.  After the migration recovery completes,
the task is redispatched and the task resends the request to the wrong
server based on the old value still present in tk_xprt.

The solution is to recompute the tk_xprt field of the rpc_task structure
so that the request goes to the correct server.

Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopnfs/blocklayout: off by one in bl_map_stripe()
Dan Carpenter [Wed, 4 Jul 2018 09:59:58 +0000 (12:59 +0300)]
pnfs/blocklayout: off by one in bl_map_stripe()

commit 0914bb965e38a055e9245637aed117efbe976e91 upstream.

"dev->nr_children" is the number of children which were parsed
successfully in bl_parse_stripe().  It could be all of them and then, in
that case, it is equal to v->stripe.volumes_count.  Either way, the >
should be >= so that we don't go beyond the end of what we're supposed
to.

Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoblock, bfq: return nbytes and not zero from struct cftype .write() method
Maciej S. Szmigiero [Wed, 15 Aug 2018 21:56:45 +0000 (23:56 +0200)]
block, bfq: return nbytes and not zero from struct cftype .write() method

commit fc8ebd01deeb12728c83381f6ec923e4a192ffd3 upstream.

The value that struct cftype .write() method returns is then directly
returned to userspace as the value returned by write() syscall, so it
should be the number of bytes actually written (or consumed) and not zero.

Returning zero from write() syscall makes programs like /bin/echo or bash
spin.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxtensa: increase ranges in ___invalidate_{i,d}cache_all
Max Filippov [Sat, 11 Aug 2018 05:21:22 +0000 (22:21 -0700)]
xtensa: increase ranges in ___invalidate_{i,d}cache_all

commit fec3259c9f747c039f90e99570540114c8d81a14 upstream.

Cache invalidation macros use cache line size to iterate over
invalidated cache lines, assuming that all cache ways are invalidated by
single instruction, but xtensa ISA recommends to not assume that for
future compatibility:
  In some implementations all ways at index Addry-1..z are invalidated
  regardless of the specified way, but for future compatibility this
  behavior should not be assumed.

Iterate over all cache ways in ___invalidate_icache_all and
___invalidate_dcache_all.

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxtensa: limit offsets in __loop_cache_{all,page}
Max Filippov [Sat, 11 Aug 2018 03:43:48 +0000 (20:43 -0700)]
xtensa: limit offsets in __loop_cache_{all,page}

commit be75de25251f7cf3e399ca1f584716a95510d24a upstream.

When building kernel for xtensa cores with big cache lines (e.g. 128
bytes or more) __loop_cache_all and __loop_cache_page may generate
assembly instructions with immediate fields that are too big. This
results in the following build errors:

  arch/xtensa/mm/misc.S: Assembler messages:
  arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '256'
  arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '384'
  arch/xtensa/kernel/head.S: Assembler messages:
  arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '256'
  arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '384'
  arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '256'
  arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '384'
  arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '256'
  arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '384'

Add parameter max_immed to these macros and use it to limit values of
immediate operands. Extract common code of these macros into the new
macro __loop_cache_unroll.

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages
Paul Mackerras [Thu, 23 Aug 2018 00:08:58 +0000 (10:08 +1000)]
KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages

commit 8cfbdbdc24815417a3ab35101ccf706b9a23ff17 upstream.

Commit 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in
the pinned physical page", 2018-07-17) added some checks to ensure
that guest DMA mappings don't attempt to map more than the guest is
entitled to access. However, errors in the logic mean that legitimate
guest requests to map pages for DMA are being denied in some
situations. Specifically, if the first page of the range passed to
mm_iommu_get() is mapped with a normal page, and subsequent pages are
mapped with transparent huge pages, we end up with mem->pageshift ==
0. That means that the page size checks in mm_iommu_ua_to_hpa() and
mm_iommu_up_to_hpa_rm() will always fail for every page in that
region, and thus the guest can never map any memory in that region for
DMA, typically leading to a flood of error messages like this:

  qemu-system-ppc64: VFIO_MAP_DMA: -22
  qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)

The logic errors in mm_iommu_get() are:

  (a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
      call (meaning that find_linux_pte() returns the pte for the
      first address in the range, not the address we are currently up
      to);
  (b) use of 'pageshift' as the variable to receive the hugepage shift
      returned by find_linux_pte() - for a normal page this gets set
      to 0, leading to us setting mem->pageshift to 0 when we conclude
      that the pte returned by find_linux_pte() didn't match the page
      we were looking at;
  (c) comparing 'compshift', which is a page order, i.e. log base 2 of
      the number of pages, with 'pageshift', which is a log base 2 of
      the number of bytes.

To fix these problems, this patch introduces 'cur_ua' to hold the
current user address and uses that in the find_linux_pte() call;
introduces 'pteshift' to hold the hugepage shift found by
find_linux_pte(); and compares 'pteshift' with 'compshift +
PAGE_SHIFT' rather than 'compshift'.

The patch also moves the local_irq_restore to the point after the PTE
pointer returned by find_linux_pte() has been dereferenced because
otherwise the PTE could change underneath us, and adds a check to
avoid doing the find_linux_pte() call once mem->pageshift has been
reduced to PAGE_SHIFT, as an optimization.

Fixes: 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: VMX: fixes for vmentry_l1d_flush module parameter
Paolo Bonzini [Wed, 22 Aug 2018 14:43:39 +0000 (16:43 +0200)]
KVM: VMX: fixes for vmentry_l1d_flush module parameter

commit 0027ff2a75f9dcf0537ac0a65c5840b0e21a4950 upstream.

Two bug fixes:

1) missing entries in the l1d_param array; this can cause a host crash
if an access attempts to reach the missing entry. Future-proof the get
function against any overflows as well.  However, the two entries
VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
not be accepted by the parse function, so disable them there.

2) invalid values must be rejected even if the CPU does not have the
bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)

... and a small refactoring, since the .cmd field is redundant with
the index in the array.

Reported-by: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Fixes: a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoPM / sleep: wakeup: Fix build error caused by missing SRCU support
zhangyi (F) [Tue, 14 Aug 2018 02:34:42 +0000 (10:34 +0800)]
PM / sleep: wakeup: Fix build error caused by missing SRCU support

commit 3df6f61fff49632492490fb6e42646b803a9958a upstream.

Commit ea0212f40c6 (power: auto select CONFIG_SRCU) made the code in
drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to
select CONFIG_SRCU in Kconfig, which leads to the following build
error if CONFIG_SRCU is not selected somewhere else:

drivers/built-in.o: In function `wakeup_source_remove':
(.text+0x3c6fc): undefined reference to `synchronize_srcu'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c7a8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c84c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d1d8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d228): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d24c): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d29c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu'

Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled.

Fixes: ea0212f40c6 (power: auto select CONFIG_SRCU)
Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
[ rjw: Minor subject/changelog fixups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocpufreq: governor: Avoid accessing invalid governor_data
Henry Willard [Wed, 15 Aug 2018 00:01:02 +0000 (17:01 -0700)]
cpufreq: governor: Avoid accessing invalid governor_data

commit 2a3eb51e30b9ac66fe1b75877627a7e4aaeca24a upstream.

If cppc_cpufreq.ko is deleted at the same time that tuned-adm is
changing profiles, there is a small chance that a race can occur
between cpufreq_dbs_governor_exit() and cpufreq_dbs_governor_limits()
resulting in a system failure when the latter tries to use
policy->governor_data that has been freed by the former.

This patch uses gov_dbs_data_mutex to synchronize access.

Fixes: e788892ba3cc (cpufreq: governor: Get rid of governor events)
Signed-off-by: Henry Willard <henry.willard@oracle.com>
[ rjw: Subject, minor white space adjustment ]
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrivers/block/zram/zram_drv.c: fix bug storing backing_dev
Peter Kalauskas [Wed, 22 Aug 2018 04:54:02 +0000 (21:54 -0700)]
drivers/block/zram/zram_drv.c: fix bug storing backing_dev

commit c8bd134a4bddafe5917d163eea73873932c15e83 upstream.

The call to strlcpy in backing_dev_store is incorrect. It should take
the size of the destination buffer instead of the size of the source
buffer.  Additionally, ignore the newline character (\n) when reading
the new file_name buffer. This makes it possible to set the backing_dev
as follows:

echo /dev/sdX > /sys/block/zram0/backing_dev

The reason it worked before was the fact that strlcpy() copies 'len - 1'
bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
copy the trailing new line symbol.  Which also means that "echo -n
/dev/sdX" most likely was broken.

Signed-off-by: Peter Kalauskas <peskal@google.com>
Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.google.com
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoovl: fix wrong use of impure dir cache in ovl_iterate()
Amir Goldstein [Tue, 17 Jul 2018 13:05:38 +0000 (16:05 +0300)]
ovl: fix wrong use of impure dir cache in ovl_iterate()

commit 67810693077afc1ebf9e1646af300436cb8103c2 upstream.

Only upper dir can be impure, but if we are in the middle of
iterating a lower real dir, dir could be copied up and marked
impure. We only want the impure cache if we started iterating
a real upper dir to begin with.

Aditya Kali reported that the following reproducer hits the
WARN_ON(!cache->refcount) in ovl_get_cache():

 docker run --rm drupal:8.5.4-fpm-alpine \
    sh -c 'cd /var/www/html/vendor/symfony && \
           chown -R www-data:www-data . && ls -l .'

Reported-by: Aditya Kali <adityakali@google.com>
Tested-by: Aditya Kali <adityakali@google.com>
Fixes: 4edb83bb1041 ('ovl: constant d_ino for non-merge dirs')
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomfd: hi655x: Fix regmap area declared size for hi655x
Rafael David Tinoco [Fri, 6 Jul 2018 17:28:33 +0000 (14:28 -0300)]
mfd: hi655x: Fix regmap area declared size for hi655x

commit 6afebb70ee7a4bde106dc1a875e7ac7997248f84 upstream.

Fixes https://bugs.linaro.org/show_bug.cgi?id=3903

LTP Functional tests have caused a bad paging request when triggering
the regmap_read_debugfs() logic of the device PMIC Hi6553 (reading
regmap/f8000000.pmic/registers file during read_all test):

Unable to handle kernel paging request at virtual address ffff0
[ffff00000984e000] pgd=0000000077ffe803, pud=0000000077ffd803,0
Internal error: Oops: 96000007 [#1] SMP
...
Hardware name: HiKey Development Board (DT)
...
Call trace:
 regmap_mmio_read8+0x24/0x40
 regmap_mmio_read+0x48/0x70
 _regmap_bus_reg_read+0x38/0x48
 _regmap_read+0x68/0x170
 regmap_read+0x50/0x78
 regmap_read_debugfs+0x1a0/0x308
 regmap_map_read_file+0x48/0x58
 full_proxy_read+0x68/0x98
 __vfs_read+0x48/0x80
 vfs_read+0x94/0x150
 SyS_read+0x6c/0xd8
 el0_svc_naked+0x30/0x34
Code: aa1e03e0 d503201f f9400280 8b334000 (39400000)

Investigations have showed that, when triggered by debugfs read()
handler, the mmio regmap logic was reading a bigger (16k) register area
than the one mapped by devm_ioremap_resource() during hi655x-pmic probe
time (4k).

This commit changes hi655x's max register, according to HW specs, to be
the same as the one declared in the pmic device in hi6220's dts, fixing
the issue.

Cc: <stable@vger.kernel.org> #v4.9 #v4.14 #v4.16 #v4.17
Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agouprobes: Use synchronize_rcu() not synchronize_sched()
Steven Rostedt (VMware) [Thu, 9 Aug 2018 19:37:59 +0000 (15:37 -0400)]
uprobes: Use synchronize_rcu() not synchronize_sched()

commit 016f8ffc48cb01d1e7701649c728c5d2e737d295 upstream.

While debugging another bug, I was looking at all the synchronize*()
functions being used in kernel/trace, and noticed that trace_uprobes was
using synchronize_sched(), with a comment to synchronize with
{u,ret}_probe_trace_func(). When looking at those functions, the data is
protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
is using the wrong synchronize_*() function.

Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agolivepatch: Validate module/old func name length
Kamalesh Babulal [Fri, 20 Jul 2018 09:46:42 +0000 (15:16 +0530)]
livepatch: Validate module/old func name length

commit 6e9df95b76cad18f7b217bdad7bb8a26d63b8c47 upstream.

livepatch module author can pass module name/old function name with more
than the defined character limit. With obj->name length greater than
MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on
the module specified by obj->name to be loaded. It also populates a /sys
directory with an untruncated object name.

In the case of funcs->old_name length greater then KSYM_NAME_LEN, it
would not match against any of the symbol table entries. Instead loop
through the symbol table comparing them against a nonexisting function,
which can be avoided.

The same issues apply, to misspelled/incorrect names. At least gatekeep
the modules with over the limit string length, by checking for their
length during livepatch module registration.

Cc: stable@vger.kernel.org
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoprintk/tracing: Do not trace printk_nmi_enter()
Steven Rostedt (VMware) [Wed, 5 Sep 2018 20:29:49 +0000 (16:29 -0400)]
printk/tracing: Do not trace printk_nmi_enter()

commit d1c392c9e2a301f38998a353f467f76414e38725 upstream.

I hit the following splat in my tests:

------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
 do_idle+0x33/0x202
 cpu_startup_entry+0x61/0x63
 start_secondary+0x18e/0x1ed
 startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---

After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

  do_idle {

    [interrupts enabled]

    <interrupt> [interrupts disabled]
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET
    test if pt_regs say return to interrupts enabled [yes]
    TRACE_IRQS_ON [lockdep says irqs are on]

    <nmi>
nmi_enter() {
    printk_nmi_enter() [traced by ftrace]
    [ hit ftrace breakpoint ]
    <breakpoint exception>
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET [return from breakpoint]
   test if pt_regs say interrupts enabled [no]
   [iret back to interrupt]
   [iret back to code]

    tick_nohz_idle_enter() {

lockdep_assert_irqs_enabled() [lockdep say no!]

Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.

Cc: stable@vger.kernel.org
Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI")
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotracing/blktrace: Fix to allow setting same value
Steven Rostedt (VMware) [Thu, 16 Aug 2018 20:08:37 +0000 (16:08 -0400)]
tracing/blktrace: Fix to allow setting same value

commit 757d9140072054528b13bbe291583d9823cde195 upstream.

Masami Hiramatsu reported:

  Current trace-enable attribute in sysfs returns an error
  if user writes the same setting value as current one,
  e.g.

    # cat /sys/block/sda/trace/enable
    0
    # echo 0 > /sys/block/sda/trace/enable
    bash: echo: write error: Invalid argument
    # echo 1 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable
    bash: echo: write error: Device or resource busy

  But this is not a preferred behavior, it should ignore
  if new setting is same as current one. This fixes the
  problem as below.

    # cat /sys/block/sda/trace/enable
    0
    # echo 0 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable

Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotracing: Do not call start/stop() functions when tracing_on does not change
Steven Rostedt (VMware) [Wed, 1 Aug 2018 19:40:57 +0000 (15:40 -0400)]
tracing: Do not call start/stop() functions when tracing_on does not change

commit f143641bfef9a4a60c57af30de26c63057e7e695 upstream.

Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.

Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de
Cc: stable@vger.kernel.org
Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file")
Reported-by: Erica Bugden <erica.bugden@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agortc: omap: fix potential crash on power off
Johan Hovold [Wed, 4 Jul 2018 09:05:55 +0000 (11:05 +0200)]
rtc: omap: fix potential crash on power off

commit 5c8b84f410b3819d14cb1ebf32e4b3714b5a6e0b upstream.

Do not set the system power-off callback and omap power-off rtc pointer
until we're done setting up our device to avoid leaving stale pointers
around after a late probe error.

Fixes: 97ea1906b3c2 ("rtc: omap: Support ext_wakeup configuration")
Cc: stable <stable@vger.kernel.org> # 4.9
Cc: Marcin Niestroj <m.niestroj@grinn-global.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovmw_balloon: fix VMCI use when balloon built into kernel
Nadav Amit [Tue, 19 Jun 2018 23:00:27 +0000 (16:00 -0700)]
vmw_balloon: fix VMCI use when balloon built into kernel

commit c3cc1b0fc27508da53fe955a3b23d03964410682 upstream.

Currently, when all modules, including VMCI and VMware balloon are built
into the kernel, the initialization of the balloon happens before the
VMCI is probed. As a result, the balloon fails to initialize the VMCI
doorbell, which it uses to get asynchronous requests for balloon size
changes.

The problem can be seen in the logs, in the form of the following
message:
"vmw_balloon: failed to initialize vmci doorbell"

The driver would work correctly but slightly less efficiently, probing
for requests periodically. This patch changes the balloon to be
initialized using late_initcall() instead of module_init() to address
this issue. It does not address a situation in which VMCI is built as a
module and the balloon is built into the kernel.

Fixes: 48e3d668b790 ("VMware balloon: Enable notification via VMCI")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovmw_balloon: VMCI_DOORBELL_SET does not check status
Nadav Amit [Tue, 19 Jun 2018 23:00:26 +0000 (16:00 -0700)]
vmw_balloon: VMCI_DOORBELL_SET does not check status

commit ce664331b2487a5d244a51cbdd8cb54f866fbe5d upstream.

When vmballoon_vmci_init() sets a doorbell using VMCI_DOORBELL_SET, for
some reason it does not consider the status and looks at the result.
However, the hypervisor does not update the result - it updates the
status. This might cause VMCI doorbell not to be enabled, resulting in
degraded performance.

Fixes: 48e3d668b790 ("VMware balloon: Enable notification via VMCI")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovmw_balloon: do not use 2MB without batching
Nadav Amit [Tue, 19 Jun 2018 23:00:25 +0000 (16:00 -0700)]
vmw_balloon: do not use 2MB without batching

commit 5081efd112560d3febb328e627176235b250d59d upstream.

If the hypervisor sets 2MB batching is on, while batching is cleared,
the balloon code breaks. In this case the legacy mechanism is used with
2MB page. The VM would report a 2MB page is ballooned, and the
hypervisor would only take the first 4KB.

While the hypervisor should not report such settings, make the code more
robust by not enabling 2MB support without batching.

Fixes: 365bd7ef7ec8e ("VMware balloon: Support 2m page ballooning.")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovmw_balloon: fix inflation of 64-bit GFNs
Nadav Amit [Tue, 19 Jun 2018 23:00:24 +0000 (16:00 -0700)]
vmw_balloon: fix inflation of 64-bit GFNs

commit 09755690c6b7c1eabdc4651eb3b276f8feb1e447 upstream.

When balloon batching is not supported by the hypervisor, the guest
frame number (GFN) must fit in 32-bit. However, due to a bug, this check
was mistakenly ignored. In practice, when total RAM is greater than
16TB, the balloon does not work currently, making this bug unlikely to
happen.

Fixes: ef0f8f112984 ("VMware balloon: partially inline vmballoon_reserve_page.")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoextcon: Release locking when sending the notification of connector state
Chanwoo Choi [Thu, 14 Jun 2018 02:16:29 +0000 (11:16 +0900)]
extcon: Release locking when sending the notification of connector state

commit 8a9dbb779fe882325b9a0238494a7afaff2eb444 upstream.

Previously, extcon used the spinlock before calling the notifier_call_chain
to prevent the scheduled out of task and to prevent the notification delay.
When spinlock is locked for sending the notification, deadlock issue
occured on the side of extcon consumer device. To fix this issue,
extcon consumer device should always use the work. it is always not
reasonable to use work.

To fix this issue on extcon consumer device, release locking when sending
the notification of connector state.

Fixes: ab11af049f88 ("extcon: Add the synchronization extcon APIs to support the notification")
Cc: stable@vger.kernel.org
Cc: Roger Quadros <rogerq@ti.com>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoiio: ad9523: Fix return value for ad952x_store()
Lars-Peter Clausen [Fri, 27 Jul 2018 06:42:45 +0000 (09:42 +0300)]
iio: ad9523: Fix return value for ad952x_store()

commit 9a5094ca29ea9b1da301b31fd377c0c0c4c23034 upstream.

A sysfs write callback function needs to either return the number of
consumed characters or an error.

The ad952x_store() function currently returns 0 if the input value was "0",
this will signal that no characters have been consumed and the function
will be called repeatedly in a loop indefinitely. Fix this by returning
number of supplied characters to indicate that the whole input string has
been consumed.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Fixes: cd1678f96329 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoiio: ad9523: Fix displayed phase
Lars-Peter Clausen [Mon, 25 Jun 2018 08:03:07 +0000 (11:03 +0300)]
iio: ad9523: Fix displayed phase

commit 5a4e33c1c53ae7d4425f7d94e60e4458a37b349e upstream.

Fix the displayed phase for the ad9523 driver. Currently the most
significant decimal place is dropped and all other digits are shifted one
to the left. This is due to a multiplication by 10, which is not necessary,
so remove it.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Fixes: cd1678f9632 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoiio: sca3000: Fix missing return in switch
Gustavo A. R. Silva [Sat, 7 Jul 2018 17:44:01 +0000 (12:44 -0500)]
iio: sca3000: Fix missing return in switch

commit c5b974bee9d2ceae4c441ae5a01e498c2674e100 upstream.

The IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY case is missing a
return and will fall through to the default case and errorenously
return -EINVAL.

Fix this by adding in missing *return ret*.

Fixes: 626f971b5b07 ("staging:iio:accel:sca3000 Add write support to the low pass filter control")
Reported-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoDrivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()
Dexuan Cui [Thu, 2 Aug 2018 03:08:23 +0000 (03:08 +0000)]
Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()

commit d3b26dd7cb0e3433bfd3c1d4dcf74c6039bb49fb upstream.

Before setting channel->rescind in vmbus_rescind_cleanup(), we should make
sure the channel callback won't run any more, otherwise a high-level
driver like pci_hyperv, which may be infinitely waiting for the host VSP's
response and notices the channel has been rescinded, can't safely give
up: e.g., in hv_pci_protocol_negotiation() -> wait_for_response(), it's
unsafe to exit from wait_for_response() and proceed with the on-stack
variable "comp_pkt" popped. The issue was originally spotted by
Michael Kelley <mikelley@microsoft.com>.

In vmbus_close_internal(), the patch also minimizes the range protected by
disabling/enabling channel->callback_event: we don't really need that for
the whole function.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agouart: fix race between uart_put_char() and uart_shutdown()
Tycho Andersen [Fri, 6 Jul 2018 16:24:57 +0000 (10:24 -0600)]
uart: fix race between uart_put_char() and uart_shutdown()

commit a5ba1d95e46ecaea638ddd7cd144107c783acb5d upstream.

We have reports of the following crash:

    PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
    #0 [ffff88085c6db710] machine_kexec at ffffffff81046239
    #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248
    #2 [ffff88085c6db830] oops_end at ffffffff81008ae7
    #3 [ffff88085c6db860] no_context at ffffffff81050b8f
    #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75
    #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83
    #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e
    #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c
    #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122
    [exception RIP: uart_put_char+149]
    RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006
    RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081
    RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120
    RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320
    R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000
    R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544
    #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c
    #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b
    #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2
    #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b
    #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a
    #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016
    #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194
    #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a
    #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2
    #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d
    #20 [ffff88085c6dbeb0] kthread at ffffffff81096384
    #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f

after slogging through some dissasembly:

ffffffff814b6720 <uart_put_char>:
ffffffff814b6720: 55                    push   %rbp
ffffffff814b6721: 48 89 e5              mov    %rsp,%rbp
ffffffff814b6724: 48 83 ec 20           sub    $0x20,%rsp
ffffffff814b6728: 48 89 1c 24           mov    %rbx,(%rsp)
ffffffff814b672c: 4c 89 64 24 08        mov    %r12,0x8(%rsp)
ffffffff814b6731: 4c 89 6c 24 10        mov    %r13,0x10(%rsp)
ffffffff814b6736: 4c 89 74 24 18        mov    %r14,0x18(%rsp)
ffffffff814b673b: e8 b0 8e 58 00        callq  ffffffff81a3f5f0 <mcount>
ffffffff814b6740: 4c 8b a7 88 02 00 00  mov    0x288(%rdi),%r12
ffffffff814b6747: 45 31 ed              xor    %r13d,%r13d
ffffffff814b674a: 41 89 f6              mov    %esi,%r14d
ffffffff814b674d: 49 83 bc 24 70 01 00  cmpq   $0x0,0x170(%r12)
ffffffff814b6754: 00 00
ffffffff814b6756: 49 8b 9c 24 80 01 00  mov    0x180(%r12),%rbx
ffffffff814b675d: 00
ffffffff814b675e: 74 2f                 je     ffffffff814b678f <uart_put_char+0x6f>
ffffffff814b6760: 48 89 df              mov    %rbx,%rdi
ffffffff814b6763: e8 a8 67 58 00        callq  ffffffff81a3cf10 <_raw_spin_lock_irqsave>
ffffffff814b6768: 41 8b 8c 24 78 01 00  mov    0x178(%r12),%ecx
ffffffff814b676f: 00
ffffffff814b6770: 89 ca                 mov    %ecx,%edx
ffffffff814b6772: f7 d2                 not    %edx
ffffffff814b6774: 41 03 94 24 7c 01 00  add    0x17c(%r12),%edx
ffffffff814b677b: 00
ffffffff814b677c: 81 e2 ff 0f 00 00     and    $0xfff,%edx
ffffffff814b6782: 75 23                 jne    ffffffff814b67a7 <uart_put_char+0x87>
ffffffff814b6784: 48 89 c6              mov    %rax,%rsi
ffffffff814b6787: 48 89 df              mov    %rbx,%rdi
ffffffff814b678a: e8 e1 64 58 00        callq  ffffffff81a3cc70 <_raw_spin_unlock_irqrestore>
ffffffff814b678f: 44 89 e8              mov    %r13d,%eax
ffffffff814b6792: 48 8b 1c 24           mov    (%rsp),%rbx
ffffffff814b6796: 4c 8b 64 24 08        mov    0x8(%rsp),%r12
ffffffff814b679b: 4c 8b 6c 24 10        mov    0x10(%rsp),%r13
ffffffff814b67a0: 4c 8b 74 24 18        mov    0x18(%rsp),%r14
ffffffff814b67a5: c9                    leaveq
ffffffff814b67a6: c3                    retq
ffffffff814b67a7: 49 8b 94 24 70 01 00  mov    0x170(%r12),%rdx
ffffffff814b67ae: 00
ffffffff814b67af: 48 63 c9              movslq %ecx,%rcx
ffffffff814b67b2: 41 b5 01              mov    $0x1,%r13b
ffffffff814b67b5: 44 88 34 0a           mov    %r14b,(%rdx,%rcx,1)
ffffffff814b67b9: 41 8b 94 24 78 01 00  mov    0x178(%r12),%edx
ffffffff814b67c0: 00
ffffffff814b67c1: 83 c2 01              add    $0x1,%edx
ffffffff814b67c4: 81 e2 ff 0f 00 00     and    $0xfff,%edx
ffffffff814b67ca: 41 89 94 24 78 01 00  mov    %edx,0x178(%r12)
ffffffff814b67d1: 00
ffffffff814b67d2: eb b0                 jmp    ffffffff814b6784 <uart_put_char+0x64>
ffffffff814b67d4: 66 66 66 2e 0f 1f 84  data32 data32 nopw %cs:0x0(%rax,%rax,1)
ffffffff814b67db: 00 00 00 00 00

for our build, this is crashing at:

    circ->buf[circ->head] = c;

Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
protected by the "per-port mutex", which based on uart_port_check() is
state->port.mutex. Indeed, the lock acquired in uart_put_char() is
uport->lock, i.e. not the same lock.

Anyway, since the lock is not acquired, if uart_shutdown() is called, the
last chunk of that function may release state->xmit.buf before its assigned
to null, and cause the race above.

To fix it, let's lock uport->lock when allocating/deallocating
state->xmit.buf in addition to the per-port mutex.

v2: switch to locking uport->lock on allocation/deallocation instead of
    locking the per-port mutex in uart_put_char. Note that since
    uport->lock is a spin lock, we have to switch the allocation to
    GFP_ATOMIC.
v3: move the allocation outside the lock, so we can switch back to
    GFP_KERNEL

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodm crypt: don't decrease device limits
Mikulas Patocka [Fri, 10 Aug 2018 15:23:56 +0000 (11:23 -0400)]
dm crypt: don't decrease device limits

commit bc9e9cf0401f18e33b78d4c8a518661b8346baf7 upstream.

dm-crypt should only increase device limits, it should not decrease them.

This fixes a bug where the user could creates a crypt device with 1024
sector size on the top of scsi device that had 4096 logical block size.
The limit 4096 would be lost and the user could incorrectly send
1024-I/Os to the crypt device.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodm cache metadata: set dirty on all cache blocks after a crash
Ilya Dryomov [Thu, 9 Aug 2018 10:38:28 +0000 (12:38 +0200)]
dm cache metadata: set dirty on all cache blocks after a crash

commit 5b1fe7bec8a8d0cc547a22e7ddc2bd59acd67de4 upstream.

Quoting Documentation/device-mapper/cache.txt:

  The 'dirty' state for a cache block changes far too frequently for us
  to keep updating it on the fly.  So we treat it as a hint.  In normal
  operation it will be written when the dm device is suspended.  If the
  system crashes all cache blocks will be assumed dirty when restarted.

This got broken in commit f177940a8091 ("dm cache metadata: switch to
using the new cursor api for loading metadata") in 4.9, which removed
the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
flag) when loading cache blocks.  This results in data corruption on an
unclean shutdown with dirty cache blocks on the fast device.  After the
crash those blocks are considered clean and may get evicted from the
cache at any time.  This can be demonstrated by doing a lot of reads
to trigger individual evictions, but uncache is more predictable:

  ### Disable auto-activation in lvm.conf to be able to do uncache in
  ### time (i.e. see uncache doing flushing) when the fix is applied.

  # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb
  # vgcreate vg_cache /dev/vdb /dev/vdc
  # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb
  # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc
  # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc
  # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev
  # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev
  # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  # dmsetup status vg_cache-lv_slowdev
  0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw -
                                                            ^^^^
                                7065 * 64k = 441M yet to be written to the slow device
  # echo b >/proc/sysrq-trigger

  # vgchange -ay vg_cache
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  # lvconvert --uncache vg_cache/lv_slowdev
  Flushing 0 blocks for cache vg_cache/lv_slowdev.
  Logical volume "lv_cachedev" successfully removed
  Logical volume vg_cache/lv_slowdev is not cached.
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
  0fe00010:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................

This is the case with both v1 and v2 cache pool metatata formats.

After applying this patch:

  # vgchange -ay vg_cache
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  # lvconvert --uncache vg_cache/lv_slowdev
  Flushing 3724 blocks for cache vg_cache/lv_slowdev.
  ...
  Flushing 71 blocks for cache vg_cache/lv_slowdev.
  Logical volume "lv_cachedev" successfully removed
  Logical volume vg_cache/lv_slowdev is not cached.
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................

Cc: stable@vger.kernel.org
Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodm cache metadata: save in-core policy_hint_size to on-disk superblock
Mike Snitzer [Thu, 2 Aug 2018 20:08:52 +0000 (16:08 -0400)]
dm cache metadata: save in-core policy_hint_size to on-disk superblock

commit fd2fa95416188a767a63979296fa3e169a9ef5ec upstream.

policy_hint_size starts as 0 during __write_initial_superblock().  It
isn't until the policy is loaded that policy_hint_size is set in-core
(cmd->policy_hint_size).  But it never got recorded in the on-disk
superblock because __commit_transaction() didn't deal with transfering
the in-core cmd->policy_hint_size to the on-disk superblock.

The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
__begin_transaction_flags() which re-reads all superblock fields.
Because the superblock's policy_hint_size was never properly stored, when
the cache was created, hints_array_available() would always return false
when re-activating a previously created cache.  This means
__load_mappings() always considered the hints invalid and never made use
of the hints (these hints served to optimize).

Another detremental side-effect of this oversight is the cache_check
utility would fail with: "invalid hint width: 0"

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodm thin: stop no_space_timeout worker when switching to write-mode
Hou Tao [Thu, 2 Aug 2018 08:18:24 +0000 (16:18 +0800)]
dm thin: stop no_space_timeout worker when switching to write-mode

commit 75294442d896f2767be34f75aca7cc2b0d01301f upstream.

Now both check_for_space() and do_no_space_timeout() will read & write
pool->pf.error_if_no_space.  If these functions run concurrently, as
shown in the following case, the default setting of "queue_if_no_space"
can get lost.

precondition:
    * error_if_no_space = false (aka "queue_if_no_space")
    * pool is in Out-of-Data-Space (OODS) mode
    * no_space_timeout worker has been queued

CPU 0:                          CPU 1:
// delete a thin device
process_delete_mesg()
// check_for_space() invoked by commit()
set_pool_mode(pool, PM_WRITE)
    pool->pf.error_if_no_space = \
     pt->requested_pf.error_if_no_space

// timeout, pool is still in OODS mode
do_no_space_timeout
    // "queue_if_no_space" config is lost
    pool->pf.error_if_no_space = true
    pool->pf.mode = new_mode

Fix it by stopping no_space_timeout worker when switching to write mode.

Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires")
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodm integrity: change 'suspending' variable from bool to int
Mikulas Patocka [Tue, 3 Jul 2018 18:13:25 +0000 (20:13 +0200)]
dm integrity: change 'suspending' variable from bool to int

commit c21b16392701543d61e366dca84e15fe7f0cf0cf upstream.

Early alpha processors can't write a byte or short atomically - they
read 8 bytes, modify the byte or two bytes in registers and write back
8 bytes.

The modification of the variable "suspending" may race with
modification of the variable "failed".  Fix this by changing
"suspending" to an int.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree()
Tomas Bortoli [Fri, 20 Jul 2018 09:27:30 +0000 (11:27 +0200)]
net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree()

commit 430ac66eb4c5b5c4eb846b78ebf65747510b30f1 upstream.

The patch adds the flush in p9_mux_poll_stop() as it the function used by
p9_conn_destroy(), in turn called by p9_fd_close() to stop the async
polling associated with the data regarding the connection.

Link: http://lkml.kernel.org/r/20180720092730.27104-1-tomasbortoli@gmail.com
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Reported-by: syzbot+39749ed7d9ef6dfb23f6@syzkaller.appspotmail.com
To: Eric Van Hensbergen <ericvh@gmail.com>
To: Ron Minnich <rminnich@sandia.gov>
To: Latchesar Ionkov <lucho@ionkov.net>
Cc: Yiwen Jiang <jiangyiwen@huwei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/9p/client.c: version pointer uninitialized
Tomas Bortoli [Mon, 9 Jul 2018 22:29:43 +0000 (00:29 +0200)]
net/9p/client.c: version pointer uninitialized

commit 7913690dcc5e18e235769fd87c34143072f5dbea upstream.

The p9_client_version() does not initialize the version pointer. If the
call to p9pdu_readf() returns an error and version has not been allocated
in p9pdu_readf(), then the program will jump to the "error" label and will
try to free the version pointer. If version is not initialized, free()
will be called with uninitialized, garbage data and will provoke a crash.

Link: http://lkml.kernel.org/r/20180709222943.19503-1-tomasbortoli@gmail.com
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Reported-by: syzbot+65c6b72f284a39d416b4@syzkaller.appspotmail.com
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years ago9p/virtio: fix off-by-one error in sg list bounds check
jiangyiwen [Fri, 3 Aug 2018 04:11:34 +0000 (12:11 +0800)]
9p/virtio: fix off-by-one error in sg list bounds check

commit 23cba9cbde0bba05d772b335fe5f66aa82b9ad19 upstream.

Because the value of limit is VIRTQUEUE_NUM, if index is equal to
limit, it will cause sg array out of bounds, so correct the judgement
of BUG_ON.

Link: http://lkml.kernel.org/r/5B63D5F6.6080109@huawei.com
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reported-By: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
piaojun [Wed, 25 Jul 2018 03:13:16 +0000 (11:13 +0800)]
fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed

commit 3111784bee81591ea2815011688d28b65df03627 upstream.

In my testing, v9fs_fid_xattr_set will return successfully even if the
backend ext4 filesystem has no space to store xattr key-value. That will
cause inconsistent behavior between front end and back end. The reason is
that lsetxattr will be triggered by p9_client_clunk, and unfortunately we
did not catch the error. This patch will catch the error to notify upper
caller.

p9_client_clunk (in 9p)
  p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid);
    v9fs_clunk (in qemu)
      put_fid
        free_fid
          v9fs_xattr_fid_clunk
            v9fs_co_lsetxattr
              s->ops->lsetxattr
                ext4_xattr_user_set (in host ext4 filesystem)

Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years ago9p: fix multiple NULL-pointer-dereferences
Tomas Bortoli [Fri, 27 Jul 2018 11:05:58 +0000 (13:05 +0200)]
9p: fix multiple NULL-pointer-dereferences

commit 10aa14527f458e9867cf3d2cc6b8cb0f6704448b upstream.

Added checks to prevent GPFs from raising.

Link: http://lkml.kernel.org/r/20180727110558.5479-1-tomasbortoli@gmail.com
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Reported-by: syzbot+1a262da37d3bead15c39@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRDMA/rxe: Set wqe->status correctly if an unexpected response is received
Bart Van Assche [Tue, 26 Jun 2018 15:39:36 +0000 (08:39 -0700)]
RDMA/rxe: Set wqe->status correctly if an unexpected response is received

commit 61b717d041b1976530f68f8b539b2e3a7dd8e39c upstream.

Every function that returns COMPST_ERROR must set wqe->status to another
value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code
path for which this is not yet the case.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoib_srpt: Fix a use-after-free in srpt_close_ch()
Bart Van Assche [Mon, 2 Jul 2018 21:08:18 +0000 (14:08 -0700)]
ib_srpt: Fix a use-after-free in srpt_close_ch()

commit 995250959d22fc341b5424e3343b0ce5df672461 upstream.

Avoid that KASAN reports the following:

BUG: KASAN: use-after-free in srpt_close_ch+0x4f/0x1b0 [ib_srpt]
Read of size 4 at addr ffff880151180cb8 by task check/4681

CPU: 15 PID: 4681 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xa4/0xf5
 print_address_description+0x6f/0x270
 kasan_report+0x241/0x360
 __asan_load4+0x78/0x80
 srpt_close_ch+0x4f/0x1b0 [ib_srpt]
 srpt_set_enabled+0xf7/0x1e0 [ib_srpt]
 srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
 configfs_write_file+0x14e/0x1d0 [configfs]
 __vfs_write+0xd2/0x3b0
 vfs_write+0x101/0x270
 ksys_write+0xab/0x120
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x77/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocxl: Fix wrong comparison in cxl_adapter_context_get()
Vaibhav Jain [Wed, 4 Jul 2018 15:28:33 +0000 (20:58 +0530)]
cxl: Fix wrong comparison in cxl_adapter_context_get()

commit ef6cb5f1a048fdf91ccee6d63d2bfa293338502d upstream.

Function atomic_inc_unless_negative() returns a bool to indicate
success/failure. However cxl_adapter_context_get() wrongly compares
the return value against '>=0' which will always be true. The patch
fixes this comparison to '==0' there by also fixing this compile time
warning:

drivers/misc/cxl/main.c:290 cxl_adapter_context_get()
warn: 'atomic_inc_unless_negative(&adapter->contexts_num)' is unsigned

Fixes: 70b565bbdb91 ("cxl: Prevent adapter reset if an active context exists")
Cc: stable@vger.kernel.org # v4.9+
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/powernv/pci: Work around races in PCI bridge enabling
Benjamin Herrenschmidt [Fri, 17 Aug 2018 07:30:39 +0000 (17:30 +1000)]
powerpc/powernv/pci: Work around races in PCI bridge enabling

commit db2173198b9513f7add8009f225afa1f1c79bcc6 upstream.

The generic code is racy when multiple children of a PCI bridge try to
enable it simultaneously.

This leads to drivers trying to access a device through a
not-yet-enabled bridge, and this EEH errors under various
circumstances when using parallel driver probing.

There is work going on to fix that properly in the PCI core but it
will take some time.

x86 gets away with it because (outside of hotplug), the BIOS enables
all the bridges at boot time.

This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's
suitable for backporting to stable and distros, while the proper PCI
fix will probably be significantly more invasive.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoPCI: Add wrappers for dev_printk()
Frederick Lawler [Thu, 18 Jan 2018 18:55:24 +0000 (12:55 -0600)]
PCI: Add wrappers for dev_printk()

commit 7506dc7989933235e6fc23f3d0516bdbf0f7d1a8 upstream.

Add PCI-specific dev_printk() wrappers and use them to simplify the code
slightly.  No functional change intended.

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: squash into one patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[only take the pci.h portion of this patch, to make backporting stuff
easier over time - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
Mahesh Salgaonkar [Tue, 7 Aug 2018 14:16:46 +0000 (19:46 +0530)]
powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.

commit cd813e1cd7122f2c261dce5b54d1e0c97f80e1a5 upstream.

During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.

Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.

  Severe Machine check interrupt [Recovered]
    NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: d00000000ca70000
  cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
      pc: c0000000009694c0: vsnprintf+0x80/0x480
      lr: c0000000009698e0: vscnprintf+0x20/0x60
      sp: c0000000fc777830
     msr: 8000000002009033
     dar: a803a30c000000d0
    current = 0xc00000000bc9ef00
    paca    = 0xc00000001eca5c00  softe: 3  irq_happened: 0x01
      pid   = 8860, comm = insmod
  vscnprintf+0x20/0x60
  vprintk_emit+0xb4/0x4b0
  vprintk_func+0x5c/0xd0
  printk+0x38/0x4c
  init_module+0x1c0/0x338 [bork_kernel]
  do_one_initcall+0x54/0x230
  do_init_module+0x8c/0x248
  load_module+0x12b8/0x15b0
  sys_finit_module+0xa8/0x110
  system_call+0x58/0x6c
  --- Exception: c00 (System Call) at 00007fff8bda0644
  SP (7fffdfbfe980) is in userspace

This patch fixes this issue.

Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org # v3.15+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/fadump: handle crash memory ranges array index overflow
Hari Bathini [Mon, 6 Aug 2018 20:42:45 +0000 (02:12 +0530)]
powerpc/fadump: handle crash memory ranges array index overflow

commit 1bd6a1c4b80a28d975287630644e6b47d0f977a5 upstream.

Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e22 ("memblock: Add array resizing support").

On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:

  task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000
  NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180
  REGS: c00000000b73b570 TRAP: 0300   Tainted: G          L   X  (4.4.140+)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22004484  XER: 20000000
  CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0
  ...
  NIP [c000000000047df4] smp_send_reschedule+0x24/0x80
  LR [c0000000000f9e58] resched_curr+0x138/0x160
  Call Trace:
    resched_curr+0x138/0x160 (unreliable)
    check_preempt_curr+0xc8/0xf0
    ttwu_do_wakeup+0x38/0x150
    try_to_wake_up+0x224/0x4d0
    __wake_up_common+0x94/0x100
    ep_poll_callback+0xac/0x1c0
    __wake_up_common+0x94/0x100
    __wake_up_sync_key+0x70/0xa0
    sock_def_readable+0x58/0xa0
    unix_stream_sendmsg+0x2dc/0x4c0
    sock_sendmsg+0x68/0xa0
    ___sys_sendmsg+0x2cc/0x2e0
    __sys_sendmsg+0x5c/0xc0
    SyS_socketcall+0x36c/0x3f0
    system_call+0x3c/0x100

as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.

Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program headers.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Just use PAGE_SIZE directly, fixup variable placement]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoFix kexec forbidding kernels signed with keys in the secondary keyring to boot
Yannik Sembritzki [Thu, 16 Aug 2018 13:05:23 +0000 (14:05 +0100)]
Fix kexec forbidding kernels signed with keys in the secondary keyring to boot

commit ea93102f32244e3f45c8b26260be77ed0cc1d16c upstream.

The split of .system_keyring into .builtin_trusted_keys and
.secondary_trusted_keys broke kexec, thereby preventing kernels signed by
keys which are now in the secondary keyring from being kexec'd.

Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
verify_pefile_signature().

Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically")
Signed-off-by: Yannik Sembritzki <yannik@sembritzki.me>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: kexec@lists.infradead.org
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>