Chuck Anderson [Mon, 23 Jan 2017 22:29:42 +0000 (14:29 -0800)]
Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/drivers: (66 commits)
ib/mlx4: add msi-x allocation kernel msg logging
NVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata
nvme: Limit command retries
nvme: avoid cqe corruption when update at the same time as read
NVMe: Don't unmap controller registers on reset
net: ena: change the return type of ena_set_push_mode() to be void.
net: ena: Fix error return code in ena_device_init()
net: ena: Remove unnecessary pci_set_drvdata()
net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
bnxt_en: Add interface to support RDMA driver.
bnxt_en: Refactor the driver registration function with firmware.
bnxt_en: Reserve RDMA resources by default.
bnxt_en: Improve completion ring allocation for VFs.
bnxt_en: Move function reset to bnxt_init_one().
bnxt_en: Enable MSIX early in bnxt_init_one().
bnxt_en: Add bnxt_set_max_func_irqs().
bnxt_en: Add PFC statistics.
bnxt_en: Implement DCBNL to support host-based DCBX.
bnxt_en: Update firmware header file to latest 1.6.0.
bnxt_en: Re-factor bnxt_setup_tc().
...
Chuck Anderson [Mon, 23 Jan 2017 22:28:05 +0000 (14:28 -0800)]
Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/upstream-cherry-picks:
Don't feed anything but regular iovec's to blk_rq_map_user_iov
crypto: algif_hash - Only export and import on sockets with data
Qing Huang [Fri, 16 Dec 2016 00:03:58 +0000 (16:03 -0800)]
ib/mlx4: add msi-x allocation kernel msg logging
Kernel msg prints are added in the mlx4 driver when enabling msi-x
vectors during device initialization. This would help us to debug
issues when we encounter errors in this area on both bare metal and
VM.
Linus Torvalds [Wed, 7 Dec 2016 00:18:14 +0000 (16:18 -0800)]
Don't feed anything but regular iovec's to blk_rq_map_user_iov
In theory we could map other things, but there's a reason that function
is called "user_iov". Using anything else (like splice can do) just
confuses it.
Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0)
Orabug: 25230657
CVE: CVE-2016-9576 Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Conflicts:
block/blk-map.c
Chuck Anderson [Mon, 23 Jan 2017 07:24:30 +0000 (23:24 -0800)]
Merge branch topic/uek-4.1/sparc of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/sparc:
Revert "sparc64: struct adi_caps should use __u64, not u64"
SPARC64: ds driver: Make memory allocations ATOMIC and enhance debugging
sparc64: Add symbolic access to M7 performance counters to perf
sonoma: perf: add support for sonoma (s7) into perf
sparc64:M8 cpu recognition typo fix
sparc64: Add M7 hardware cache events into perf
sparc64: Fix the watchdog corrupting performance counters
sparc64: Fix incorrect counting when using multiple perf counters
sparc64: Fix a race condition when stopping performance counters
sparc64: Stop performance counter before updating
sparc64: enable cpu hotplug feature for UEK4
sparc64: release thirds level cache reference for cpu hotplug feature
sparc64: fix compile warning section mismatch in find_node()
sparc64: fix sun4v_build_irq NULL pointer dereference
SPARC64: ldmvsw: tx queue stuck in stopped state after LDC reset
sparc: Implement watchdog_nmi_enable and watchdog_nmi_disable
sparc64: Setup a scheduling domain for highest level cache.
David Vrabel [Fri, 9 Dec 2016 14:41:13 +0000 (14:41 +0000)]
xenbus: fix deadlock on writes to /proc/xen/xenbus
/proc/xen/xenbus does not work correctly. A read blocked waiting for
a xenstore message holds the mutex needed for atomic file position
updates. This blocks any writes on the same file handle, which can
deadlock if the write is needed to unblock the read.
Clear FMODE_ATOMIC_POS when opening this device to always get
character device like sematics.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
Orabug: 25425387
(cherry picked from commit 581d21a2d02a798ee34e56dbfa13f891b3a90c30)
Jira: OCC-36718 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Chuck Anderson [Mon, 23 Jan 2017 06:47:45 +0000 (22:47 -0800)]
Merge branch topic/uek-4.1/stable-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/stable-cherry-picks: (187 commits)
ext4: verify extent header depth
nfsd: check permissions when setting ACLs
posix_acl: Add set_posix_acl
sysv, ipc: fix security-layer leaking
dm: set DMF_SUSPENDED* _before_ clearing DMF_NOFLUSH_SUSPENDING
dm rq: fix the starting and stopping of blk-mq queues
dm flakey: error READ bios during the down_interval
CIFS: Fix a possible invalid memory access in smb2_query_symlink()
fs/cifs: make share unaccessible at root level mountable
Input: i8042 - break load dependency between atkbd/psmouse and i8042
module: Invalidate signatures on force-loaded modules
Documentation/module-signing.txt: Note need for version info if reusing a key
net/irda: fix NULL pointer dereference on memory allocation failure
fs/dcache.c: avoid soft-lockup in dput()
iscsi-target: Fix panic when adding second TCP connection to iSCSI session
audit: fix a double fetch in audit_log_single_execve_arg()
Fix broken audit tests for exec arg len
audit: Fix check of return value of strnlen_user()
cifs: fix crash due to race in hmac(md5) handling
dm: fix second blk_delay_queue() parameter to be in msec units not jiffies
...
Aaron Young [Wed, 23 Nov 2016 16:02:02 +0000 (11:02 -0500)]
SPARC64: ds driver: Make memory allocations ATOMIC and enhance debugging
This patch fixes the following issues:
1. BUG 25107317 - Kernel Panic: Watchdog HARD LOCKUP out of ds_cap_fini()
2. BUG 24787856 - Forward port 19811909 - Unnecessary
warning - ldom_req_sp_token
BUG 25107317 appears to be caused by the ds driver allocating memory using
the GFP_KERNEL flag (which can result in sleeping) while holding a spinlock.
This is a violation of rules and resulted in the panic.
To fix BUG 24787856, the error message in question was changed to a
printk_once() which will result in the message only appearing once
in the console log instead of repeatedly.
The debugging facility in the driver was also enhanced by adding 3 separate
debug levels for the ds driver debug messages.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com> Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com> Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 25107317, 24787856
(cherry picked from commit f3bf272f0512120708a2966a7916b51c34efe56d) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Aldridge [Thu, 19 May 2016 10:54:58 +0000 (03:54 -0700)]
sparc64: Add symbolic access to M7 performance counters to perf
This commit provides symbolic access to every performance counter
provided in the M7. The 'perf list' command can be used to provide
a complete list of these new events, which will be reported as
shown below.
Br_mispred OR cpu/Br_mispred/ [Kernel PMU event]
Br_taken OR cpu/Br_taken/ [Kernel PMU event]
Br_tgt_mispred OR cpu/Br_tgt_mispred/ [Kernel PMU event]
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Acked-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 39f70b2fa98ea10931133ab983f521c70cb7429f)
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit f39f00c4536c8c6ca0585a200a56894c2c158743) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Aldridge [Fri, 24 Jun 2016 13:17:25 +0000 (06:17 -0700)]
sparc64: Fix the watchdog corrupting performance counters
There is a race condition in the perf_event_grab_pmc() which
means that we do not increment the active_events count correctly
when a new event is added. Ultimately, we end up with a negative
value for the active_event count. This means that the next time
we try and add a new event the watchdog will not be stopped
correctly and corruption of the performance count will
be observed.
Note: In sparc64 land the watchdog is implemented using one
of the performance counters.
This issue is fixed by moving the mutex lock to make
sure it encompasses the whole critical section in the
perf_event_grab_pmc().
Dave Aldridge [Tue, 29 Mar 2016 10:57:14 +0000 (03:57 -0700)]
sparc64: Fix incorrect counting when using multiple perf counters
Commit 165050c1 introduced a change to the way we deal with
performance counter overflow interrupts. This change had the
side effect that when a performance counter overflow was
detected it assumed all performance counters in use
had overflowed. Thus, when using multiple performance
counters the event counting was incorrect.
This commit fixes this incorrect counting behaviour.
Dave Aldridge [Fri, 4 Nov 2016 16:56:07 +0000 (09:56 -0700)]
sparc64: Fix a race condition when stopping performance counters
When stopping a performance counter that is close to overflowing,
there is a race condition that can occur between writing to the
PCRx register to stop the counter (and also clearing the PCRx.ov
bit at the same time) vs the performance counter overflowing and
setting the PCRx.ov bit in the PCRx register.
The result of this race condition is that we occassionally miss
a performance counter overflow interrupt, which in turn leads
to incorrect event counting.
This race condition has been observed when counting cpu cycles.
To fix this issue when stopping a performance counter,
we simply allow it to continue counting and overflow before
stopping it. This allows the performance counter overflow
interrupt to be generated and acted upon.
This fix is applied for M7, T5 and T4 devices.
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Signed-off-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
(cherry picked from commit e5b7619e1de2f3e0dd858f632bc08ce64c344245) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Aldridge [Fri, 4 Mar 2016 11:18:45 +0000 (03:18 -0800)]
sparc64: Stop performance counter before updating
In order to reliably clear the PCRx.ov bit when updating a
performance counter value, we need to stop it counting first.
If we do not do this, then we can miss performance counter
overflow events.
Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit a53c94ca8afc7a7603ff3c1154d81abb113a9e71)
Reviewed-by: Chris Hyser <chris.hyser@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit c33aebff52457ee7d0bacc922dc23b07cee4139a)
Thomas Tai [Fri, 11 Nov 2016 15:46:10 +0000 (10:46 -0500)]
sparc64: fix compile warning section mismatch in find_node()
A compile warning is introduced by a commit to fix the find_node().
This patch fix the compile warning by moving find_node() into __init
section. Because find_node() is only used by memblock_nid_range() which
is only used by a __init add_node_ranges(). find_node() and
memblock_nid_range() should also be inside __init section.
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
(cherry picked from commit e58d08f923190fc4dc2a1962710f84672c2bc9b2) Signed-off-by: Allen Pais <allen.pais@oracle.com>
sun4v_build_irq assume the given irq number is valid and use
it to get the handler pointer, the pointer is dereference
without being checked and cause kernel panic.
The cause of the invalid irq is that the tx/rx irq have never
been free during device removal. irq number end up exhausted during
continuous device add/removal test.
tx/rx irq is allocated during vio_device_probe() using irq_alloc()
and cookie_assign(). To free the tx/rx irq, cookie_unassign() and
irq_free() is called when the device is removed.
Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 80043637b8fb1eabc16ab5947019f4dcdbb8c79f) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Aaron Young [Wed, 2 Nov 2016 17:00:29 +0000 (13:00 -0400)]
SPARC64: ldmvsw: tx queue stuck in stopped state after LDC reset
The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.
The root cause was determined to be from the following series of
events:
1. Guest domain panics - resulting in the guest no longer processing
network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
control due to no more available tx drings and stops the tx queue
for the guest domain
3. The LDC of the network connection for the guest is reset when
the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
the normal method to re-enable the tx queue. But the ACK never comes
because the tx queue was cleared due to the LDC reset.
To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Orabug: 24714685
(cherry picked from commit d84ad41602ceb070c05d2633bc09d81f66796e15) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Babu Moger [Thu, 13 Oct 2016 17:36:48 +0000 (10:36 -0700)]
sparc: Implement watchdog_nmi_enable and watchdog_nmi_disable
Implement functions watchdog_nmi_enable and watchdog_nmi_disable
to enable/disable nmi watchdogs. Sparc uses arch specific nmi watchdog
handler. Currently, we do not have a way to enable/disable nmi watchdog
dynamically. With these patches we can enable or disable arch
specific nmi watchdogs using proc or sysctl interface.
Example commands.
To enable: echo 1 > /proc/sys/kernel/nmi_watchdog
To disable: echo 0 > /proc/sys/kernel/nmi_watchdog
It can also achieved using the sysctl parameter kernel.nmi_watchdog
Signed-off-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 43e96774e0a338e883e9ced9e717424df126b153) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Atish Patra [Thu, 20 Oct 2016 00:33:29 +0000 (18:33 -0600)]
sparc64: Setup a scheduling domain for highest level cache.
Individual scheduler domain should consist different hierarchy
consisting of cores sharing similar property. Currently, no
scheduler domain is defined separately for the cores that shares
the last level cache. As a result, the scheduler fails to take
advantage of cache locality while migrating tasks during load
balancing.
Here are the cpu masks currently present for sparc that are/can
be used in scheduler domain construction.
cpu_core_map : set based on the cores that shares l1 cache.
core_core_sib_map : is set based on the socket id or max cache id.
The prior SPARC notion of socket was defined as highest level of
shared cache. However, the MD record on T7 platforms now describes
the CPUs that share the physical socket and this is no longer tied
to shared cache.
That's why a separate cpu mask needs to be created that truly
represent highest level of shared cache for all platforms.
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Chris Hyser <chris.hyser@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1e655ca52bb2727471f20cf4d8f62b4b9f69e6fc) Signed-off-by: Allen Pais <allen.pais@oracle.com>
PCIe analyzer tracing by Oracle and Samsung revealed an errata in Samsung's
firmware for EPIC SSDs where the invalid completion entries in admin queue
and IO queue can occur when the queues straddle an 8MB DMA address boundary.
This patch limits admin queue depth to 64 for EPIC SSDs.
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Keith Busch [Wed, 14 Dec 2016 22:38:35 +0000 (14:38 -0800)]
nvme: Limit command retries
Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.
This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.
The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 25256529
Conflicts:
Patched the commits manually due to the lack of core.c file
drivers/nvme/host/pci.c
drivers/nvme/host/nvme.h
Signed-off-by: Ashok Vairavan <ashok.vairavan@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Marta Rybczynska [Sun, 18 Dec 2016 18:21:19 +0000 (10:21 -0800)]
nvme: avoid cqe corruption when update at the same time as read
Make sure the CQE phase (validity) is read before the rest of the
structure. The phase bit is the highest address and the CQE
read will happen on most platforms from lower to upper addresses
and will be done by multiple non-atomic loads. If the structure
is updated by PCI during the reads from the processor, the
processor may get a corrupted copy.
The addition of the new nvme_cqe_valid function that verifies
the validity bit also allows refactoring of the other CQE read
sequences.
Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d783e0bd02e700e7a893ef4fa71c69438ac1c276)
Orabug: 24960824
Conflicts:
nvme_poll() function is not available in UEK4QU2. Resolved
the conflicts around nvme poll function.
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
This patch changes the return type of ena_set_push_mode() to be void,
as it always returns 0.
Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 184b49c89f39f5c5ad262a6456248284e10984c6) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Fix to return a negative error code from the invalid dma width
error handling case instead of 0.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6e22066fd02b675260b980b3e42b7d616a9839c5) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 557bc7d44d52d52374bc72e9cc3b0beb41026886) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
This is a driver for the ENA family of networking devices.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1738cd3ed342294360d6a74d4e58800004bff854) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Although the extent tree depth of 5 should enough be for the worst
case of 2*32 extents of length 1, the extent tree code does not
currently to merge nodes which are less than half-full with a sibling
node, or to shrink the tree depth if possible. So it's possible, at
least in theory, for the tree depth to be greater than 5. However,
even in the worst case, a tree depth of 32 is highly unlikely, and if
the file system is maliciously corrupted, an insanely large eh_depth
can cause memory allocation failures that will trigger kernel warnings
(here, eh_depth = 65280):
Use set_posix_acl, which includes proper permission checks, instead of
calling ->set_acl directly. Without this anyone may be able to grant
themselves permissions to a file by setting the ACL.
Lock the inode to make the new checks atomic with respect to set_acl.
(Also, nfsd was the only caller of set_acl not locking the inode, so I
suspect this may fix other races.)
This also simplifies the code, and ensures our ACLs are checked by
posix_acl_valid.
The permission checks and the inode locking were lost with commit 4ac7249e, which changed nfsd to use the set_acl inode operation directly
instead of going through xattr handlers.
Reported-by: David Sinquin <david@sinquin.eu>
[agreunba@redhat.com: use set_posix_acl] Fixes: 4ac7249e Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 28a6d048eb841a6bca10558a6c9e30ec5ca2b1af) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.
Running LTP msgsnd06 with kmemleak gives the following:
Otherwise, there is potential for both DMF_SUSPENDED* and
DMF_NOFLUSH_SUSPENDING to not be set during dm_suspend() -- which is
definitely _not_ a valid state.
This fix, in conjuction with "dm rq: fix the starting and stopping of
blk-mq queues", addresses the potential for request-based DM multipath's
__multipath_map() to see !dm_noflush_suspending() during suspend.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 834ced1a13bdf510eae34d08bae1f1ae49b33141) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Improve dm_stop_queue() to cancel any requeue_work. Also, have
dm_start_queue() and dm_stop_queue() clear/set the QUEUE_FLAG_STOPPED
for the blk-mq request_queue.
On suspend dm_stop_queue() handles stopping the blk-mq request_queue
BUT: even though the hw_queues are marked BLK_MQ_S_STOPPED at that point
there is still a race that is allowing block/blk-mq.c to call ->queue_rq
against a hctx that it really shouldn't. Add a check to
dm_mq_queue_rq() that guards against this rarity (albeit _not_
race-free).
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # must patch dm.c on < 4.8 kernels Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 655fe78746d0b9141fe763535fc16d6652665c13) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When the corrupt_bio_byte feature was introduced it caused READ bios to
no longer be errored with -EIO during the down_interval. This had to do
with the complexity of needing to submit READs if the corrupt_bio_byte
feature was used.
Fix it so READ bios are properly errored with -EIO; doing so early in
flakey_map() as long as there isn't a match for the corrupt_bio_byte
feature.
During following a symbolic link we received err_buf from SMB2_open().
While the validity of SMB2 error response is checked previously
in smb2_check_message() a symbolic link payload is not checked at all.
Fix it by adding such checks.
Cc: Dan Carpenter <dan.carpenter@oracle.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit a3b180a9da61b9be52e1bcf8ff54b4cea3ce332c) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but
not any of the path components above:
- store the /sub/dir/foo prefix in the cifs super_block info
- in the superblock, set root dentry to the subpath dentry (instead of
the share root)
- set a flag in the superblock to remember it
- use prefixpath when building path from a dentry
fixes bso#8950
Signed-off-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit b7e61a108f9fccd8d1b90ffe62704b929ba841eb) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we
have a hard load dependency between i8042 and atkbd which prevents
keyboard from working on Gen2 Hyper-V VMs.
> hyperv_keyboard invokes serio_interrupt(), which needs a valid serio
> driver like atkbd.c. atkbd.c depends on libps2.c because it invokes
> ps2_command(). libps2.c depends on i8042.c because it invokes
> i8042_check_port_owner(). As a result, hyperv_keyboard actually
> depends on i8042.c.
>
> For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a
> Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m
> rather than =y, atkbd.ko can't load because i8042.ko can't load(due to
> no i8042 device emulated) and finally hyperv_keyboard can't work and
> the user can't input: https://bugs.archlinux.org/task/39820
> (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y)
To break the dependency we move away from using i8042_check_port_owner()
and instead allow serio port owner specify a mutex that clients should use
to serialize PS/2 command stream.
Reported-by: Mark Laws <mdl@60hz.org> Tested-by: Mark Laws <mdl@60hz.org> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit b5e8e7f655d3d01430c357bdebe72a6ddc19e9a7) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signing a module should only make it trusted by the specific kernel it
was built for, not anything else. Loading a signed module meant for a
kernel with a different ABI could have interesting effects.
Therefore, treat all signatures as invalid when a module is
force-loaded.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 6ac9857245bfb71d836f46db817b0c11e3e4bf69) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signing a module should only make it trusted by the specific kernel it
was built for, not anything else. If a module signing key is used for
multiple ABI-incompatible kernels, the modules need to include enough
version information to distinguish them.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit e9071d07878c865449c5afb062e6d305c62d1e85) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 634a3fc5f16470e9b78ccd7ce643305122d5ebb2) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The "fix" in commit 0b08c5e5944 ("audit: Fix check of return value of
strnlen_user()") didn't fix anything, it broke things. As reported by
Steven Rostedt:
"Yes, strnlen_user() returns 0 on fault, but if you look at what len is
set to, than you would notice that on fault len would be -1"
because we just subtracted one from the return value. So testing
against 0 doesn't test for a fault condition, it tests against a
perfectly valid empty string.
Also fix up the usual braindamage wrt using WARN_ON() inside a
conditional - make it part of the conditional and remove the explicit
unlikely() (which is already part of the WARN_ON*() logic, exactly so
that you don't have to write unreadable code.
Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Paul Moore <pmoore@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit a4664afa0dffd5340c61511d3da14e30bfd01517) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
strnlen_user() returns 0 when it hits fault, not -1. Fix the test in
audit_log_single_execve_arg(). Luckily this shouldn't ever happen unless
there's a kernel bug so it's mostly a cosmetic fix.
CC: Paul Moore <pmoore@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit a49b282f08d96cd73838e4e1a5ace747d432ba7d) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The secmech hmac(md5) structures are present in the TCP_Server_Info
struct and can be shared among multiple CIFS sessions. However, the
server mutex is not currently held when these structures are allocated
and used, which can lead to a kernel crashes, as in the scenario below:
Commit d548b34b062 ("dm: reduce the queue delay used in dm_request_fn
from 100ms to 10ms") always intended the value to be 10 msecs -- it
just expressed it in jiffies because earlier commit 7eaceaccab ("block:
remove per-queue plugging") did.
Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: d548b34b062 ("dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms") Cc: stable@vger.kernel.org # 4.1+ -- stable@ backports must be applied to drivers/md/dm.c Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit abf9569225763ab83a538530454f7d280fd08e4a) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.
EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
Aborting journal on device loop0-8.
EXT4-fs (loop0): Remounting filesystem read-only
EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
[...]
See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list") Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 881052c264a9a44481f1a29d4877478e54b4c690) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When opening a file with O_CREAT flag, check to see if the file opened
is an existing directory.
This prevents the directory from being opened which subsequently causes
a crash when the close function for directories cifs_closedir() is called
which frees up the file->private_data memory while the file is still
listed on the open file list for the tcon.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 9b01eafbc9514e71056e0a1a4714606385a431a4) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory. Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.
If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up. In that case, the iput() on the journal inode can
end up causing a BUG().
Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.
Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data") Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang") Cc: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit e19f0ec5aeb659cb7dd2bf6e4f1e842f5ad71fcf) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.
The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.
Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().
[ Changed to defer the call to ext4_journal_stop() only if the handle
is synchronous. --tytso ]
Reported-and-tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 906d6f4d9cdc8509c505f29f6146ec627fef2f06) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:
ext4_lblk_t last = lblock + len - 1;
if (len == 0 || lblock > last)
return 0;
since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).
Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()") Fixes: 2f974865f ("ext4: check for zero length extent explicitly") Cc: Eryu Guan <guaneryu@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit c580d82e1d5532b785e339450d43f82e5a8b4e79) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Backport of caaee6234d05a58c5b4d05e7bf766131b810a657 ("ptrace: use fsuid,
fsgid, effective creds for fs access checks") to v4.1 failed to update the
mode parameter in the mm_access() call in pagemap_read() to have one of the
new PTRACE_MODE_*CREDS flags.
Attempting to read any other process' pagemap results in a WARN()
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:
Currently, osd_weight and osd_state fields are updated in the encoding
order. This is wrong, because an incremental map may look like e.g.
new_up_client: { osd=6, addr=... } # set osd_state and addr
new_state: { osd=6, xorstate=EXISTS } # clear osd_state
Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down). After
applying new_up_client, osd_state is changed to EXISTS | UP. Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state. A non-existent OSD is considered down by the
mapping code
2087 for (i = 0; i < pg->pg_temp.len; i++) {
2088 if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089 if (ceph_can_shift_osds(pi))
2090 continue;
2091
2092 temp->osds[temp->size++] = CRUSH_ITEM_NONE;
and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:
[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
and hung rbds on the client:
[ 493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[ 493.566805] rbd: rbd0: result -6 xferred 400000
[ 493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed
Fixes: http://tracker.ceph.com/issues/14901 Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 6831c98ce0b8a3e88db64aa224372effd0dcc694) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS,
which is currently 256, whereas number of keys/buttons in input device (and
therefor in key_down) is much larger - KEY_CNT - 768, and that can cause
out-of-bound access when we do
sym = U(key_maps[0][k]);
with large 'k'.
To fix it we should not attempt iterating beyond smaller of NR_KEYS and
KEY_CNT.
Also while at it let's switch to for_each_set_bit() instead of open-coding
it.
Fix a memory leak on probe error of the airspy usb device driver.
The problem is triggered when more than 64 usb devices register with
v4l2 of type VFL_TYPE_SDR or VFL_TYPE_SUBDEV.
The memory leak is caused by the probe function of the airspy driver
mishandeling errors and not freeing the corresponding control structures
when an error occours registering the device to v4l2 core.
A badusb device can emulate 64 of these devices, and then through
continual emulated connect/disconnect of the 65th device, cause the
kernel to run out of RAM and crash the kernel, thus causing a local DOS
vulnerability.
It's possible to isolate some freepages in a pageblock and then fail
split_free_page() due to the low watermark check. In this case, we hit
VM_BUG_ON() because the freeing scanner terminated early without a
contended lock or enough freepages.
This should never have been a VM_BUG_ON() since it's not a fatal
condition. It should have been a VM_WARN_ON() at best, or even handled
gracefully.
Regardless, we need to terminate anytime the full pageblock scan was not
done. The logic belongs in isolate_freepages_block(), so handle its
state gracefully by terminating the pageblock loop and making a note to
restart at the same pageblock next time since it was not possible to
complete the scan this time.
[rientjes@google.com: don't rescan pages in a pageblock] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1607111244150.83138@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606291436300.145590@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Minchan Kim <minchan@kernel.org> Tested-by: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit fe071fb0d4e9fd40fe7c46c6a9f8f23d5f27e92f) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Handling the position where compaction free scanner should restart
(stored in cc->free_pfn) got more complex with commit e14c720efdd7 ("mm,
compaction: remember position within pageblock in free pages scanner").
Currently the position is updated in each loop iteration of
isolate_freepages(), although it should be enough to update it only when
breaking from the loop. There's also an extra check outside the loop
updates the position in case we have met the migration scanner.
This can be simplified if we move the test for having isolated enough
from the for-loop header next to the test for contention, and
determining the restart position only in these cases. We can reuse the
isolate_start_pfn variable for this instead of setting cc->free_pfn
directly. Outside the loop, we can simply set cc->free_pfn to current
value of isolate_start_pfn without any extra check.
Also add a VM_BUG_ON to catch possible mistake in the future, in case we
later add a new condition that terminates isolate_freepages_block()
prematurely without also considering the condition in
isolate_freepages().
Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit ca0d868322c49b0d6ee4dfaae94a28e12969552c) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The chmap ctls assigned to PCM streams are freed in the PCM disconnect
callback. However, since the disconnect callback isn't called when
the card gets freed before registering, the chmap ctls may still be
left assigned. They are eventually freed together with other ctls,
but it may cause an Oops at pcm_chmap_ctl_private_free(), as the
function refers to the assigned PCM stream, while the PCM objects have
been already freed beforehand.
The fix is to free the chmap ctls also at PCM free callback, not only
at PCM disconnect.
Right now when a new overlay inode is created, we initialize overlay
inode's ->i_mode from underlying inode ->i_mode but we retain only
file type bits (S_IFMT) and discard permission bits.
This patch changes it and retains permission bits too. This should allow
overlay to do permission checks on overlay inode itself in task context.
[SzM] It also fixes clearing suid/sgid bits on write.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 31534f8fead7e0dff7ba68bd5dfcf6a9dfe908bc) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Before 4bacc9c9234c ("overlayfs: Make f_path...") file->f_path pointed to
the underlying file, hence suid/sgid removal on write worked fine.
After that patch file->f_path pointed to the overlay file, and the file
mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid
removal simply stopped working.
The fix is to copy the mode bits, but then ovl_setattr() needs to clear
ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in
the next patch copy the mode.
Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit cb75f65fe798bcac694f6bde299c52d31bdc8e96) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When I pulled in 4.1.28 into my stable 4.1-rt tree and ran the tests,
it crashed with a severe OOM killing everything. I then tested 4.1.28
without -rt and it had the same issue. I did a bisect between 4.1.27
and 4.1.28 and found that the bug started at:
commit 8f182270dfec "mm/swap.c: flush lru pvecs on compound page
arrival"
Looking at that patch and what's in mainline, I see that there's a
mismatch in one of the hunks:
As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
both Intel and AMD systems. Doing any kind of hardware capability
checks in the driver as a prerequisite was wrong anyway: With the
hypervisor being in charge, all such checking should be done by it. If
ACPI data gets uploaded despite some missing capability, the hypervisor
is free to ignore part or all of that data.
Ditch the entire check_prereq() function, and do the only valid check
(xen_initial_domain()) in the caller in its place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit db86fac6fe0f05f02be9fbc5fcfa236f3209076c) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
It fixed a local root exploit but also introduced a dependency on
the lower file system implementing an mmap operation just to open a file,
which is a bit of a heavy hammer. The right fix is to have mmap depend
on the existence of the mmap handler instead.
A qeth_card contains a napi_struct linked to the net_device during
device probing. This struct must be deleted when removing the qeth
device, otherwise Panic on oops can occur when qeth devices are
repeatedly removed and added.
Fixes: a1c3ed4c9ca ("qeth: NAPI support for l2 and l3 discipline") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Tested-by: Alexander Klein <ALKL@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit df3f23d87fdbbb76f41949be380d1bdb094f033a) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
get_task_ioprio() accesses the task->io_context without holding the task
lock and thus can race with exit_io_context(), leading to a
use-after-free. The reproducer below hits this within a few seconds on
my 4-core QEMU VM:
int main(int argc, char **argv)
{
pid_t pid, child;
long nproc, i;
Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.
AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.
At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.
Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.
If we fall back to using LSI on the Croc or Crocodile chip we need to
clear the interrupt so we don't hang the system.
Cc: <stable@vger.kernel.org> Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 9e0303eb05f9c4f8920570a4b047e2ae3a7964eb) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
'commpage_bak' is allocated with 'sizeof(struct echoaudio)' bytes.
We then copy 'sizeof(struct comm_page)' bytes in it.
On my system, smatch complains because one is 2960 and the other is 3072.
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af01 ("tmpfs: don't undo fallocate past its last page") Cc: stable@vger.kernel.org Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 116c75f642ad4a6d267f399c9f1fe8b91fb822c5) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
A system call trace trigger on entry allows the tracing
process to inspect and potentially change the traced
process's registers.
Account for that by reloading the %g1 (syscall number)
and %i0-%i5 (syscall argument) values. We need to be
careful to revalidate the range of %g1, and reload the
system call table entry it corresponds to into %l7.
Reported-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d2e4e89ae871295c539334f50368bb48f74c3caf) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When we free cb->skb after a dump, we do it after releasing the
lock. This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.
This patch saves the skb and module before we unlock so we free
the right memory.
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit e39cd93be0009ae4548a737756a947d2030956ab) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
This adds a name to each buf_ops structure, so that if
a verifier fails we can print the type of verifier that
failed it. Should be a slight debugging aid, I hope.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 186e7c38727f7a9fecbf238bbff9675e83842a99) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
In the very unlikely case __tcp_retransmit_skb() can not use the cloning
done in tcp_transmit_skb(), we need to refresh skb_mstamp before doing
the copy and transmit, otherwise TCP TS val will be an exact copy of
original transmit.
Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 90eb6718b9db5c145f7c2d4a14df6a4b8d96e7b3) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Stack object "dte_facilities" is allocated in x25_rx_call_request(),
which is supposed to be initialized in x25_negotiate_facilities.
However, 5 fields (8 bytes in total) are not initialized. This
object is then copied to userland via copy_to_user, thus infoleak
occurs.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit b2b95b3fbd93c910210922809f6c4d24be172b1c) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
get_bridge_ifindices() is used from the old "deviceless" bridge ioctl
calls which aren't called with rtnl held. The comment above says that it is
called with rtnl but that is not really the case.
Here's a sample output from a test ASSERT_RTNL() which I put in
get_bridge_ifindices and executed "brctl show":
[ 957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30)
[ 957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G W O
4.6.0-rc4+ #157
[ 957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[ 957.423009] 0000000000000000ffff880058adfdf0ffffffff8138dec5 0000000000000400
[ 957.423009] ffffffff81ce8380ffff880058adfe58ffffffffa05ead32 0000000000000001
[ 957.423009] 00007ffec1a444b00000000000000400ffff880053c19130 0000000000008940
[ 957.423009] Call Trace:
[ 957.423009] [<ffffffff8138dec5>] dump_stack+0x85/0xc0
[ 957.423009] [<ffffffffa05ead32>]
br_ioctl_deviceless_stub+0x212/0x2e0 [bridge]
[ 957.423009] [<ffffffff81515beb>] sock_ioctl+0x22b/0x290
[ 957.423009] [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700
[ 957.423009] [<ffffffff8126c159>] SyS_ioctl+0x79/0x90
[ 957.423009] [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
Since it only reads bridge ifindices, we can use rcu to safely walk the net
device list. Also remove the wrong rtnl comment above.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 806d70c7da5bc0dc43e54fe0362fc0fcf573bfe2) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Use htons instead of unconditionally byte swapping nexthdr. On a little
endian systems shifting the byte is correct behavior, but it results in
incorrect csums on big endian architectures.
Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Carol Soto <clsoto@us.ibm.com> Tested-by: Carol Soto <clsoto@us.ibm.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 8bba1625512245771bdb2cb1502697228fe7e1b2) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 9a9390bcf56680c487a8e4c89c813a48bfedc4b6) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The stack object “info” has a total size of 12 bytes. Its last byte
is padding which is not initialized and leaked via “put_cmsg”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 5923f46563d1ce74c1f1178cba5a67735bb83e6d) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 236094acb4b5c224d68d3b279941d24d555078d4) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Remove nearly duplicated code and prepare for the following patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 11316d7eef2230000bb4fa3d4b9056690fda3ef2) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When multiple skb are TX-completed in a row, we might incorrectly keep
a timestamp of a prior skb and cause extra work.
Fixes: ec693d47010e8 ("net/mlx4_en: Add HW timestamping (TS) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit eeee948a652e9ee02643bf031cd613339ccc6864) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
during inetdev destroy.") when deleting an interface,
fib_del_ifaddr() can be executed without any primary address
present on the dead interface.
The above is safe, but triggers some "bug: prim == NULL" warnings.
This commit avoids warning if the in_dev is dead
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 2c5ac2bfe56da842b7c84e99e9811f118b6f7a17) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
A failure in validate_xmit_skb_list() triggered an unconditional call
to dev_requeue_skb with skb=NULL. This slowly grows the queue
discipline's qlen count until all traffic through the queue stops.
We take the optimistic approach and continue running the queue after a
failure since it is unknown if later packets also will fail in the
validate path.
Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock") Signed-off-by: Lars Persson <larper@axis.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 5730fd5d72071c9ae96929292351029ea56de7c0) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.
Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].
Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 4b5223be98e1972e177f319159a29eb3bab2720e) Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>