www.infradead.org Git - users/dwmw2/linux.git/log

s390/mm: fix race on mm->context.flush_mm

commit 60f07c8ec5fae06c23e9fd7bab67dabce92b3414 upstream.

The order in __tlb_flush_mm_lazy is to flush TLB first and then clear
the mm->context.flush_mm bit. This can lead to missed flushes as the
bit can be set anytime, the order needs to be the other way aronud.

But this leads to a different race, __tlb_flush_mm_lazy may be called
on two CPUs concurrently. If mm->context.flush_mm is cleared first then
another CPU can bypass __tlb_flush_mm_lazy although the first CPU has
not done the flush yet. In a virtualized environment the time until the
flush is finally completed can be arbitrarily long.

Add a spinlock to serialize __tlb_flush_mm_lazy and use the function
in finish_arch_post_lock_switch as well.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/mm: fix local TLB flushing vs. detach of an mm address space

commit b3e5dc45fd1ec2aa1de6b80008f9295eb17e0659 upstream.

The local TLB flushing code keeps an additional mask in the mm.context,
the cpu_attach_mask. At the time a global flush of an address space is
done the cpu_attach_mask is copied to the mm_cpumask in order to avoid
future global flushes in case the mm is used by a single CPU only after
the flush.

Trouble is that the reset of the mm_cpumask is racy against the detach
of an mm address space by switch_mm. The current order is first the
global TLB flush and then the copy of the cpu_attach_mask to the
mm_cpumask. The order needs to be the other way around.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()

commit 3ef0c7a730de0bae03d86c19570af764fa3c4445 upstream.

As we want to remove spin_unlock_wait() and replace it with explicit
spin_lock()/spin_unlock() calls, we can use this to simplify the
locking.

In addition:
- Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
- The new code avoids the backwards loop.

Only slightly tested, I did not manage to trigger calls to
nf_conntrack_all_lock().

V2: With improved comments, to clearly show how the barriers
pair.

Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI: pciehp: Report power fault only once until we clear it

commit 7612b3b28c0b900dcbcdf5e9b9747cc20a1e2455 upstream.

When a power fault occurs, the power controller sets Power Fault Detected
in the Slot Status register, and pciehp_isr() queues an INT_POWER_FAULT
event to handle it.

It also clears Power Fault Detected, but since nothing has yet changed to
correct the power fault, the power controller will likely set it again
immediately, which may cause an infinite loop when pcie_isr() rechecks
Slot Status.

Fix that by masking off Power Fault Detected from new events if the driver
hasn't seen the power fault clear from the previous handling attempt.

Fixes: fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones")
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog, pull test out and add comment]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI: shpchp: Enable bridge bus mastering if MSI is enabled

commit 48b79a14505349a29b3e20f03619ada9b33c4b17 upstream.

An SHPC may generate MSIs to notify software about slot or controller
events (SHPC spec r1.0, sec 4.7). A PCI device can only generate an MSI if
it has bus mastering enabled.

Enable bus mastering if the bridge contains an SHPC that uses MSI for event
notifications.

Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARC: Re-enable MMU upon Machine Check exception

commit 1ee55a8f7f6b7ca4c0c59e0b4b4e3584a085c2d3 upstream.

I recently came upon a scenario where I would get a double fault
machine check exception tiriggered by a kernel module.
However the ensuing crash stacktrace (ksym lookup) was not working
correctly.

Turns out that machine check auto-disables MMU while modules are allocated
in kernel vaddr spapce.

This patch re-enables the MMU before start printing the stacktrace
making stacktracing of modules work upon a fatal exception.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: moved code into low level handler to avoid in 2 places]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Apply trace_clock changes to instance max buffer

commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream.

Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.

Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com
Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers")
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Fix clear of RECORDED_TGID flag when disabling trace event

commit 7685ab6c58557c6234f3540260195ecbee7fc4b3 upstream.

When disabling one trace event, the RECORDED_TGID flag in the event
file is not correctly cleared. It's clearing RECORDED_CMD flag when
it should clear RECORDED_TGID flag.

Link: http://lkml.kernel.org/r/1504589806-8425-1-git-send-email-chuhu@redhat.com
Cc: Joel Fernandes <joelaf@google.com>
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Add barrier to trace_printk() buffer nesting modification

commit 3d9622c12c8873911f4cc0ccdabd0362c2fca06b upstream.

trace_printk() uses 4 buffers, one for each context (normal, softirq, irq
and NMI), such that it does not need to worry about one context preempting
the other. There's a nesting counter that gets incremented to figure out
which buffer to use. If the context gets preempted by another context which
calls trace_printk() it will increment the counter and use the next buffer,
and restore the counter when it is finished.

The problem is that gcc may optimize the modification of the buffer nesting
counter and it may not be incremented in memory before the buffer is used.
If this happens, and the context gets interrupted by another context, it
could pick the same buffer and corrupt the one that is being used.

Compiler barriers need to be added after the nesting variable is incremented
and before it is decremented to prevent usage of the context buffers by more
than one context at the same time.

Cc: Andy Lutomirski <luto@kernel.org>
Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count")
Hat-tip-to: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ftrace: Fix memleak when unregistering dynamic ops when tracing disabled

commit edb096e00724f02db5f6ec7900f3bbd465c6c76f upstream.

If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.

# cat kmemleak
unreferenced object 0xffffffffa0020000 (size 4096):
  comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s)
  hex dump (first 32 bytes):
    55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89  U.t$.UH...t$.UH.
    e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c  .H......H.D$PH.L
  backtrace:
    [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0
    [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0
    [<ffffffff8109697f>] module_alloc+0x4f/0x90
    [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420
    [<ffffffff81249947>] ftrace_startup+0xe7/0x300
    [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90
    [<ffffffff81263786>] trace_selftest_ops+0x204/0x397
    [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624
    [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7
    [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192
    [<ffffffff81002230>] do_one_initcall+0x90/0x1e2
    [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe
    [<ffffffff81d61ec3>] kernel_init+0x13/0x122
    [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff

Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ftrace: Fix selftest goto location on error

commit 46320a6acc4fb58f04bcf78c4c942cc43b20f986 upstream.

In the second iteration of trace_selftest_ops(), the error goto label is
wrong in the case where trace_selftest_test_global_cnt is off. In the
case of error, it leaks the dynamic ops that was allocated.

Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ftrace: Fix debug preempt config name in stack_tracer_{en,dis}able

commit 60361e12d01676e23a8de89a5ef4a349ae97f616 upstream.

stack_tracer_disable()/stack_tracer_enable() had been using the wrong
name for the config symbol to enable their preempt-debugging checks --
fix with a word swap.

Link: http://lkml.kernel.org/r/20170831154036.4xldyakmmhuts5x7@hatter.bewilderbeest.net
Fixes: 8aaf1ee70e ("tracing: Rename trace_active to disable_stack_tracer and inline its modification")
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mailbox: bcm-flexrm-mailbox: Fix mask used in CMPL_START_ADDR_VALUE()

commit 6d2061b981af165d3e45462e0804b5a1f2f4c7bc upstream.

The mask used in CMPL_START_ADDR_VALUE() should be 27bits instead of
26bits. This incorrect mask was causing completion writes to 40bits
physical address fail.

This patch fixes mask used in CMPL_START_ADDR_VALUE() macro.

Fixes: dbc049eee730 ("mailbox: Add driver for Broadcom FlexRM
ring manager")

Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qla2xxx: Fix an integer overflow in sysfs code

commit e6f77540c067b48dee10f1e33678415bfcc89017 upstream.

The value of "size" comes from the user.  When we add "start + size" it
could lead to an integer overflow bug.

It means we vmalloc() a lot more memory than we had intended.  I believe
that on 64 bit systems vmalloc() can succeed even if we ask it to
allocate huge 4GB buffers.  So we would get memory corruption and likely
a crash when we call ha->isp_ops->write_optrom() and ->read_optrom().

Only root can trigger this bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061
Fixes: b7cc176c9eb3 ("[SCSI] qla2xxx: Allow region-based flash-part accesses.")
Reported-by: shqking <shqking@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qla2xxx: Use fabric name for Get Port Speed command

commit b2e8ae3f0e342a3308b4573790bd42528e51885a upstream.

The Get Port Speed switch command needs the fabric port name of the
remote device. Current code uses the registered WWPN.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qla2xxx: Use BIT_6 to acquire FAWWPN from switch

commit fcc5b5cd726c0779cd689362aea82cc9d5a61346 upstream.

If FA-WWPN feature disabled on the switch side and enabled for the
adapter, then driver would update the port name with switch port name.

This patch fixes issue by checking correct BIT flag to validate.

Fixes: 41dc529a4602 ("qla2xxx: Improve RSCN handling in driver")
Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qla2xxx: Fix target multiqueue configuration

commit b7edfa235effb4b4a9816c2345620b11609c123e upstream.

Following error will be logged in to message file while trying to
configure target with multiqueue.

"Cmd 0x1f aborted with timeout since ISP Abort is pending"
"qla25xx_init_queues Rsp que: 1 init failed."

Fixes: 82de802ad46e ("scsi: qla2xxx: Preparation for Target MQ.")
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Michael Hernandez <michael.hernandez@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qla2xxx: Correction to vha->vref_count timeout

commit 6e98095f8fb6d98da34c4e6c34e69e7c638d79c0 upstream.

Fix incorrect second argument for wait_event_timeout()

Fixes: c4a9b538ab2a ("qla2xxx: Allow vref count to timeout on vport delete.")
Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qla2xxx: Update fw_started flags at qpair creation.

commit e6373f33a6bba0de9f543f4a7faeaaa536c62997 upstream.

Fixes: 4b60c82736d0 ("scsi: qla2xxx: Add fw_started flags to qpair")
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE

commit 3e0097499839e0fe3af380410eababe5a47c4cf9 upstream.

When calling SG_GET_REQUEST_TABLE ioctl only a half-filled table is
returned; the remaining part will then contain stale kernel memory
information. This patch zeroes out the entire table to avoid this
issue.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: sg: factor out sg_fill_request_table()

commit 4759df905a474d245752c9dc94288e779b8734dd upstream.

Factor out sg_fill_request_table() for better readability.

[mkp: typos, applied by hand]

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: storvsc: fix memory leak on ring buffer busy

commit 0208eeaa650c5c866a3242201678a19e6dc4a14e upstream.

When storvsc is sending I/O to Hyper-v, it may allocate a bigger buffer
descriptor for large data payload that can't fit into a pre-allocated
buffer descriptor. This bigger buffer is freed on return path.

If I/O request to Hyper-v fails due to ring buffer busy, the storvsc
allocated buffer descriptor should also be freed.

[mkp: applied by hand]

Fixes: be0cf6ca301c ("scsi: storvsc: Set the tablesize based on the information given by the host")
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead

commit eb3fe263a48b0d27b229c213929c4cb3b1b39a0f upstream.

After a kill adapter, since the cmd_status is not set, the IOCTLs will
be hung in driver resulting in application hang. Set cmd_status
MFI_STAT_WRONG_STATE when completing pended IOCTLs.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: megaraid_sas: Check valid aen class range to avoid kernel panic

commit 91b3d9f0069c8307d0b3a4c6843b65a439183318 upstream.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: megaraid_sas: set minimum value of resetwaittime to be 1 secs

commit e636a7a430f41efb0ff2727960ce61ef9f8f6769 upstream.

Setting resetwaittime to 0 during a FW fault will result in driver not
calling the OCR.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: megaraid_sas: mismatch of allocated MFI frame size and length exposed in MFI MPT pass through command

commit ed2983f458bed9dc827ec60c8486253b1669bb52 upstream.

Driver allocated 256 byte MFI frames bytes but while sending MFI frame
(embedded inside chain frame of MPT frame) to firmware, driver sets the
length as 4k. This results in DMA read error messages during boot.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: aacraid: Fix command send race condition

commit 1ae948fa4f00f3a2823e7cb19a3049ef27dd6947 upstream.

This fixes a potential race condition observed on Power systems.

Several places throughout the aacraid driver call aac_fib_send or
similar to send a command to the aacraid adapter, then check the return
code to determine if the command was actually sent to the adapter, then
update the phase field in the scsi command scratch pad area to track
that the firmware now owns this command.  However, there is nothing that
ensures that by the time the aac_fib_send function returns and we go to
write to the scsi command, that the command hasn't already completed and
the scsi command has been freed.  This was causing random crashes in the
TCP stack which was tracked down to be caused by memory that had been a
struct request + scsi_cmnd being now used for an skbuff. Memory
poisoning was enabled in the kernel to debug this which showed that the
last owner of the memory that had been freed was aacraid and that it was
a struct request.  The memory that was corrupted was the exact data
pattern of AAC_OWNER_FIRMWARE and it was at the same offset that aacraid
writes, which is scsicmd->SCp.phase. The patch below resolves this
issue.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qedi: off by one in qedi_get_cmd_from_tid()

commit fa2d9d6e894e096678a50ef0f65f7a8c3d8a40b8 upstream.

The > here should be >= or we end up reading one element beyond the end
of the qedi->itt_map[] array. The qedi->itt_map[] array is allocated in
qedi_alloc_itt().

Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: trace high part of "new" 64 bit SCSI LUN

commit 5d4a3d0a2ff23799b956e5962b886287614e7fad upstream.

Complements debugging aspects of the otherwise functionally complete
v3.17 commit 9cb78c16f5da ("scsi: use 64-bit LUNs").

While I don't have access to a target exporting 3 or 4 level LUNs,
I did test it by explicitly attaching a non-existent fake 4 level LUN
by means of zfcp sysfs attribute "unit_add".
In order to see corresponding trace records of otherwise successful
events, we had to increase the trace level of area SCSI and HBA to 6.

$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level
$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level

$ echo 0x4011402240334044 > \
  /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add

Example output formatted by an updated zfcpdbf from the s390-tools
package interspersed with kernel messages at scsi_logging_level=4605:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : scsla_1
LUN            : 0x4011402240334044
WWPN           : 0x50050763031bd327
D_ID           : 0x00......
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x41000000
Ready count    : 0x00000001
Running count  : 0x00000000
ERP want       : 0x01
ERP need       : 0x01

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : fs_norm
Request ID     : 0x<inquiry2-req-id>
Request status : 0x00000010
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_nor
Request ID     : 0x<inquiry2-req-id>
SCSI ID        : 0x00000000
SCSI LUN       : 0x40224011
SCSI LUN high  : 0x40444033 <=======================
SCSI result    : 0x00000000
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x<inquiry2-req-id>
SCSI opcode    : 12000000 a4000000 00000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                 00000000 00000000

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0
scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \
no device added

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 9cb78c16f5da ("scsi: use 64-bit LUNs")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response

commit fdb7cee3b9e3c561502e58137a837341f10cbf8b upstream.

At the default trace level, we only trace unsuccessful events including
FSF responses.

zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to
decide on an unsuccessful response. However, this is only one of multiple
possible sources determining a failed struct zfcp_fsf_req.

An FSF request can also "fail" if its response runs into an ERP timeout
or if it gets dismissed because a higher level recovery was triggered
[trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()].
FSF requests with ERP timeout are:
FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or
FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports,
FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN.
One example is slow queue processing which can cause follow-on errors,
e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out.
In order to see the root cause, we need to see late responses even if the
channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fcegpf1
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Ready count    : 0x00000001
Running count  : 0x...
ERP want       : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
ERP need       : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
|
Timestamp      : ... 30 seconds later
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 2
Tag            : erscf_2
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Request ID     : 0x<request_ID>
ERP status     : 0x10000000 ZFCP_STATUS_ERP_TIMEDOUT
ERP step       : 0x0800 ZFCP_ERP_STEP_PORT_OPENING
ERP action     : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
ERP count      : 0x00
|
Timestamp      : ... later than previous record
Area           : HBA
Subarea        : 00
Level          : 5 > default level => 3 <= default level
Exception      : -
CPU ID         : 00
Caller         : ...
Record ID      : 1
Tag            : fs_qtcb => fs_rerr
Request ID     : 0x<request_ID>
Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED
| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000005
FSF sequence no: 0x...
FSF issued     : ... > 30 seconds ago
FSF stat       : 0x00000000 FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001 FSF_PROT_GOOD
Prot stat qual : 00000000 00000000 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x00000000
QTCB log length: ...
QTCB log info  : ...

In case of problems detecting that new responses are waiting on the input
queue, we sooner or later trigger adapter recovery due to an FSF request
timeout (trace tag "fsrth_1").
FSF requests with FSF request timeout are:
typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also
FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports,
FSF_QTCB_FCP_CMND for task management function (LUN / target reset).
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.

In a theroretical case, inject code can create an erroneous FSF request
on purpose. If data router is enabled, it uses deferred error reporting.
A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and
SAM_STAT_GOOD. But on writing the read data to host memory via DMA,
it can still fail, e.g. if an intentionally wrong scatter list does not
provide enough space. Rather than getting an unsuccessful response,
we get a QDIO activate check which in turn triggers adapter recovery.
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6 > default level => 3 <= default level
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fs_norm => fs_rerr
Request ID     : 0x<request_ID2>
Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED
| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000 FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001 FSF_PROT_GOOD
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x<request_ID2>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x000e0000 DID_TRANSPORT_DISRUPTED
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_ID2>
SCSI opcode    : 28... Read(10)
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                                         ^^ SAM_STAT_GOOD
                 00000000 00000000

Only with luck in both above cases, we could see a follow-on trace record
of an unsuccesful event following a successful but late FSF response with
FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
[On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
However, the reason for this follow-on trace was invisible because the
corresponding HBA trace record was missing at the default trace level
(by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").

On adapter recovery, after we had shut down the QDIO queues, we perform
unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
for each pending FSF request in zfcp_fsf_req_dismiss_all().
In order to find the root cause, we need to see all pseudo responses even
if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.

Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".

It does not matter that there are numerous places which set
ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
early. These cases are based on protocol status != FSF_PROT_GOOD or
== FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
as trace tag "fs_perr" or "fs_ferr" respectively.

NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
all remains. zfcp_fsf_req_complete() handles this and returns early.
All other FSF request types are handled separately and as described above.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features")
Fixes: 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records

commit 12c3e5754c8022a4f2fd1e9f00d19e99ee0d3cc1 upstream.

If the FCP_RSP UI has optional parts (FCP_SNS_INFO or FCP_RSP_INFO) and
thus does not fit into the fsp_rsp field built into a SCSI trace record,
trace the full FCP_RSP UI with all optional parts as payload record
instead of just FCP_SNS_INFO as payload and
a 1 byte RSP_INFO_CODE part of FCP_RSP_INFO built into the SCSI record.

That way we would also get the full FCP_SNS_INFO in case a
target would ever send more than
min(SCSI_SENSE_BUFFERSIZE==96, ZFCP_DBF_PAY_MAX_REC==256)==96.

The mandatory part of FCP_RSP IU is only 24 bytes.
PAYload costs at least one full PAY record of 256 bytes anyway.
We cap to the hardware response size which is only FSF_FCP_RSP_SIZE==128.
So we can just put the whole FCP_RSP IU with any optional parts into
PAYload similarly as we do for SAN PAY since v4.9 commit aceeffbb59bb
("zfcp: trace full payload of all SAN records (req,resp,iels)").
This does not cause any additional trace records wasting memory.

Decoded trace records were confusing because they showed a hard-coded
sense data length of 96 even if the FCP_RSP_IU field FCP_SNS_LEN showed
actually less.

Since the same commit, we set pl_len for SAN traces to the full length of a
request/response even if we cap the corresponding trace.
In contrast, here for SCSI traces we set pl_len to the pre-computed
length of FCP_RSP IU considering SNS_LEN or RSP_LEN if valid.
Nonetheless we trace a hardcoded payload of length FSF_FCP_RSP_SIZE==128
if there were optional parts.
This makes it easier for the zfcpdbf tool to format only the relevant
part of the long FCP_RSP UI buffer. And any trailing information is still
available in the payload trace record just in case.

Rename the payload record tag from "fcp_sns" to "fcp_riu" to make the new
content explicit to zfcpdbf which can then pick a suitable field name such
as "FCP rsp IU all:" instead of "Sense info :"
Also, the same zfcpdbf can still be backwards compatible with "fcp_sns".

Old example trace record before this fix, formatted with the tool zfcpdbf
from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU id         : ..
Caller         : 0x...
Record id      : 1
Tag            : rsl_err
Request id     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000002
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_id>
SCSI opcode    : 00000000 00000000 00000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000202 00000000
                                       ^^==FCP_SNS_LEN_VALID
                 00000020 00000000
                 ^^^^^^^^==FCP_SNS_LEN==32
Sense len      : 96 <==min(SCSI_SENSE_BUFFERSIZE,ZFCP_DBF_PAY_MAX_REC)
Sense info     : 70000600 00000018 00000000 29000000
                 00000400 00000000 00000000 00000000
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous

New example trace records with this fix:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000002
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x<request_id>
SCSI opcode    : a30c0112 00000000 02000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000a02 00000200
                 00000020 00000000
FCP rsp IU len : 56
FCP rsp IU all : 00000000 00000000 00000a02 00000200
                                       ^^=FCP_RESID_UNDER|FCP_SNS_LEN_VALID
                 00000020 00000000 70000500 00000018
                 ^^^^^^^^==FCP_SNS_LEN
                                   ^^^^^^^^^^^^^^^^^
                 00000000 240000cb 00011100 00000000
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 00000000 00000000
                 ^^^^^^^^^^^^^^^^^==FCP_SNS_INFO

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_okay
Request ID     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000000
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_id>
SCSI opcode    : <CDB of unrelated SCSI command passed to eh handler>
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000100 00000000
                 00000000 00000008
FCP rsp IU len : 32
FCP rsp IU all : 00000000 00000000 00000100 00000000
                                       ^^==FCP_RSP_LEN_VALID
                 00000000 00000008 00000000 00000000
                          ^^^^^^^^==FCP_RSP_LEN
                                   ^^^^^^^^^^^^^^^^^==FCP_RSP_INFO

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: fix missing trace records for early returns in TMF eh handlers

commit 1a5d999ebfc7bfe28deb48931bb57faa8e4102b6 upstream.

For problem determination we need to see that we were in scsi_eh
as well as whether and why we were successful or not.

The following commits introduced new early returns without adding
a trace record:

v2.6.35 commit a1dbfddd02d2
("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL,

v2.6.30 commit 63caf367e1c9
("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
on not having gotten an FSF request after the maximum number of retry
attempts and thus could not issue a TMF and has to return FAILED.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA

commit 9fe5d2b2fd30aa8c7827ec62cbbe6d30df4fe3e3 upstream.

Without this fix we get SCSI trace records on task management functions
which cannot be correlated to HBA trace records because all fields
related to the FSF request are empty (zero).
Also, the FCP_RSP_IU is missing as well as any sense data if available.

This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement
of zfcp debug features") introducing trace records for TMFs but
hard coding NULL for a possibly existing TMF FSF request.
The scsi_cmnd scribble is also zero or unrelated for the TMF request
so it also could not lookup a suitable FSF request from there.

A broken example trace record formatted with zfcpdbf from the s390-tools
package:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_fail
Request ID     : 0x0000000000000000
                   ^^^^^^^^^^^^^^^^ no correlation to HBA record
SCSI ID        : 0x<scsitarget>
SCSI LUN       : 0x<scsilun>
SCSI result    : 0x000e0000
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x0000000000000000
SCSI opcode    : 2a000017 3bb80000 08000000 00000000
FCP rsp inf cod: 0x00
                   ^^ no TMF response
FCP rsp IU     : 00000000 00000000 00000000 00000000
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 00000000 00000000
                 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU
Sense len      : ...
^^^^^^^^^^^^^^^^^^^^ no sense data length
Sense info     : ...
^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present

There are some true cases where we really do not have an FSF request:
"rsl_fai" from zfcp_dbf_scsi_fail_send() called for early
returns / completions in zfcp_scsi_queuecommand(),
"abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from
zfcp_scsi_eh_abort_handler() where we did not get as far,
"lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're
successful and do not need to do anything because adapter stopped.
For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi().

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records

commit 975171b4461be296a35e83ebd748946b81cf0635 upstream.

v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records
(req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.

While it already saves space by not dumping any empty residual entries
of the large successful GPN_FT response (4 pages), there are seldom cases
where the GPN_FT response is unsuccessful and likely does not have
FC_NS_FID_LAST set in fp_flags so we did not cap the trace record.
We typically see such case for an initiator WWPN, which is not in any zone.

Cap unsuccessful responses to at least the actual basic CT_IU response
plus whatever fits the SAN trace record built-in "payload" buffer
just in case there's trailing information
of which we would at least see the existence and its beginning.

In order not to erroneously cap successful responses, we need to swap
calling the trace function and setting the CT / ELS status to success (0).

Example trace record pair formatted with zfcpdbf:

Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : fssct_1
Request ID     : 0x<request_id>
Destination ID : 0x00fffffc
SAN req short  : 01000000 fc020000 01720ffc 00000000
                 00000008
SAN req length : 20
|
Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 2
Tag            : fsscth2
Request ID     : 0x<request_id>
Destination ID : 0x00fffffc
SAN resp short : 01000000 fc020000 80010000 00090700
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
SAN resp length: 16384
San resp info  : 01000000 fc020000 80010000 00090700
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]

The fix saves all but one of the previously associated 64 PAYload trace
record chunks of size 256 bytes each.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)")
Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path

commit a099b7b1fc1f0418ab8d79ecf98153e1e134656e upstream.

Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP
response IU. When this flag is set, it is possible, in regards to the
FCP standard, that the storage-server processes the command normally, up
to the point where data is missing and simply ignores those.

In this case no CHECK CONDITION would be set, and because we ignored the
FCP_RESID_OVER flag we resulted in at least a data loss or even
-corruption as a follow-up error, depending on how the
applications/layers on top behave. To prevent this, we now set the
host-byte of the corresponding scsi_cmnd to DID_ERROR.

Other storage-behaviors, where the same condition results in a CHECK
CONDITION set in the answer, don't need to be changed as they are
handled in the mid-layer already.

Following is an example trace record decoded with zfcpdbf from the
s390-tools package. We forcefully injected a fc_dl which is one byte too
small:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x...
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00070000
                     ^^DID_ERROR
SCSI retries   : 0x..
SCSI allowed   : 0x..
SCSI scribble  : 0x...
SCSI opcode    : 2a000000 00000000 08000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000400 00000001
                                       ^^fr_flags==FCP_RESID_OVER
                                         ^^fr_status==SAM_STAT_GOOD
                                            ^^^^^^^^fr_resid
                 00000000 00000000

As of now, we don't actively handle to possibility that a response IU
has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once.

Reported-by: Luke M. Hopkins <lmhopkin@us.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 553448f6c483 ("[SCSI] zfcp: Message cleanup")
Fixes: ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled

commit 71b8e45da51a7b64a23378221c0a5868bd79da4f upstream.

Since commit db007fc5e20c ("[SCSI] Command protection operation"),
scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
SCSI_PROT_NORMAL.
Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
to only access any of scsi_prot_sg...() if
(scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).

Do the same thing for zfcp, which introduced DIX support with
commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for
DIF/DIX").

Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
because the regular SCSI command with DIX protection data, that scsi_eh
re-uses in scsi_send_eh_cmnd(), of course still has
(scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
FCP channel hardware.

This causes scsi_eh_test_devices() to have (finish_cmds == 0)
[not SCSI device is online or not scsi_eh_tur() failed]
so regular SCSI commands, that caused / were affected by scsi_eh,
are moved to work_q and scsi_eh_test_devices() itself returns false.
In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
beyond host reset to finally scsi_eh_offline_sdevs()
which sets affected SCSI devices offline with the following kernel message:

"kernel: sd H:0:T:L: Device offlined - not ready after error recovery"

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

skd: Submit requests to firmware before triggering the doorbell

commit 5fbd545cd3fd311ea1d6e8be4cedddd0ee5684c7 upstream.

Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:

(skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

skd: Avoid that module unloading triggers a use-after-free

commit 7277cc67b3916eed47558c64f9c9c0de00a35cda upstream.

Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:

WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
refcount_t: underflow; use-after-free.
CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
Workqueue: events work_for_cpu_fn
Call Trace:
dump_stack+0x63/0x84
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5a/0x80
refcount_sub_and_test+0x70/0x80
refcount_dec_and_test+0x11/0x20
kobject_put+0x1f/0x50
blk_put_queue+0x15/0x20
disk_release+0xae/0xf0
device_release+0x32/0x90
kobject_release+0x67/0x170
kobject_put+0x2b/0x50
put_disk+0x17/0x20
skd_destruct+0x5c/0x890 [skd]
skd_pci_probe+0x124d/0x13a0 [skd]
local_pci_probe+0x42/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x19e/0x470
worker_thread+0x1dc/0x4a0
kthread+0x125/0x140
ret_from_fork+0x25/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/bitmap: disable bitmap_resize for file-backed bitmaps.

commit e8a27f836f165c26f867ece7f31eb5c811692319 upstream.

bitmap_resize() does not work for file-backed bitmaps.
The buffer_heads are allocated and initialized when
the bitmap is read from the file, but resize doesn't
read from the file, it loads from the internal bitmap.
When it comes time to write the new bitmap, the bh is
non-existent and we crash.

The common case when growing an array involves making the array larger,
and that normally means making the bitmap larger. Doing
that inside the kernel is possible, but would need more code.
It is probably easier to require people who use file-backed
bitmaps to remove them and re-add after a reshape.

So this patch disables the resizing of arrays which have
file-backed bitmaps. This is better than crashing.

Reported-by: Zhilong Liu <zlliu@suse.com>
Fixes: d60b479d177a ("md/bitmap: add bitmap_resize function to allow bitmap resizing.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/bitmap: copy correct data for bitmap super

commit 8031c3ddc70ab93099e7d1814382dba39f57b43e upstream.

raid5 cache could write bitmap superblock before bitmap superblock is
initialized. The bitmap superblock is less than 512B. The current code will
only copy the superblock to a new page and write the whole 512B, which will
zero the the data after the superblock. Unfortunately the data could include
bitmap, which we should preserve. The patch will make superblock read do 4k
chunk and we always copy the 4k data to new page, so the superblock write will
old data to disk and we don't change the bitmap.

Reported-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: directly insert blk-mq request from blk_insert_cloned_request()

commit 157f377beb710e84bd8bc7a3c4475c0674ebebd7 upstream.

A NULL pointer crash was reported for the case of having the BFQ IO
scheduler attached to the underlying blk-mq paths of a DM multipath
device.  The crash occured in blk_mq_sched_insert_request()'s call to
e->type->ops.mq.insert_requests().

Paolo Valente correctly summarized why the crash occured with:
"the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
blk_rq_prep_clone) creates a cloned request without invoking
e->type->ops.mq.prepare_request for the target elevator e.  The cloned
request is therefore not initialized for the scheduler, but it is
however inserted into the scheduler by blk_mq_sched_insert_request."

All said, a request-based DM multipath device's IO scheduler should be
the only one used -- when the original requests are issued to the
underlying paths as cloned requests they are inserted directly in the
underlying dispatch queue(s) rather than through an additional elevator.

But commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO
schedulers") switched blk_insert_cloned_request() from using
blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
incorrectly added elevator machinery into a call chain that isn't
supposed to have any.

To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
that blk_insert_cloned_request() calls to insert the request without
involving any elevator that may be attached to the cloned request's
request_queue.

Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: Relax a check in blk_start_queue()

commit 4ddd56b003f251091a67c15ae3fe4a5c5c5e390a upstream.

Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:

WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
skd_unquiesce_dev+0x12a/0x1d0 [skd]
skd_complete_internal+0x1e7/0x5a0 [skd]
skd_complete_other+0xc2/0xd0 [skd]
skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
skd_isr+0x14f/0x180 [skd]
irq_forced_thread_fn+0x2a/0x70
irq_thread+0x144/0x1a0
kthread+0x125/0x140
ret_from_fork+0x2a/0x40

Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc: Fix DAR reporting when alignment handler faults

commit f9effe925039cf54489b5c04e0d40073bb3a123d upstream.

Anton noticed that if we fault part way through emulating an unaligned
instruction, we don't update the DAR to reflect that.

The DAR value is eventually reported back to userspace as the address
in the SEGV signal, and if userspace is using that value to demand
fault then it can be confused by us not setting the value correctly.

This patch is ugly as hell, but is intended to be the minimal fix and
back ports easily.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/pseries: Don't attempt to acquire drc during memory hot add for assigned lmbs

commit afb5519fdb346201728040cab4e08ce53e7ff4fd upstream.

Check if an LMB is assigned before attempting to call dlpar_acquire_drc
in order to avoid any unnecessary rtas calls. This substantially
reduces the running time of memory hot add on lpars with large amounts
of memory.

[mpe: We need to explicitly set rc to 0 in the success case, otherwise
the compiler might think we use rc without initialising it.]

Fixes: c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step")
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/powernv/npu: Move tlb flush before launching ATSD

commit bab9f954aaf352127725a9b7920226abdb65b604 upstream.

The nest MMU tlb flush needs to happen before the GPU translation
shootdown is launched to avoid the GPU refilling its tlb with stale
nmmu translations prior to the nmmu flush completing.

Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxl: Fix driver use count

commit 197267d0356004a31c4d6b6336598f5dff3301e1 upstream.

cxl keeps a driver use count, which is used with the hash memory model
on p8 to know when to upgrade local TLBIs to global and to trigger
callbacks to manage the MMU for PSL8.

If a process opens a context and closes without attaching or fails the
attachment, the driver use count is never decremented. As a
consequence, TLB invalidations remain global, even if there are no
active cxl contexts.

We should increment the driver use count when the process is attaching
to the cxl adapter, and not on open. It's not needed before the
adapter starts using the context and the use count is decremented on
the detach path, so it makes more sense.

It affects only the user api. The kernel api is already doing The
Right Thing.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Fixes: 7bb5d91a4dda ("cxl: Rework context lifetimes")
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix quota inconsistency during orphan cleanup for read-only mounts

commit 95f1fda47c9d8738f858c3861add7bf0a36a7c0b upstream.

Quota does not get enabled for read-only mounts if filesystem
has quota feature, so that quotas cannot updated during orphan
cleanup, which will lead to quota inconsistency.

This patch turn on quotas during orphan cleanup for this case,
make sure quotas can be updated correctly.

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix incorrect quotaoff if the quota feature is enabled

commit b0a5a9589decd07db755d6a8d9c0910d96ff7992 upstream.

Current ext4 quota should always "usage enabled" if the
quota feautre is enabled. But in ext4_orphan_cleanup(), it
turn quotas off directly (used for the older journaled
quota), so we cannot turn it on again via "quotaon" unless
umount and remount ext4.

Simple reproduce:

  mkfs.ext4 -O project,quota /dev/vdb1
  mount -o prjquota /dev/vdb1 /mnt
  chattr -p 123 /mnt
  chattr +P /mnt
  touch /mnt/aa /mnt/bb
  exec 100<>/mnt/aa
  rm -f /mnt/aa
  sync
  echo c > /proc/sysrq-trigger

  #reboot and mount
  mount -o prjquota /dev/vdb1 /mnt
  #query status
  quotaon -Ppv /dev/vdb1
  #output
  quotaon: Cannot find mountpoint for device /dev/vdb1
  quotaon: No correct mountpoint specified.

This patch add check for journaled quotas to avoid incorrect
quotaoff when ext4 has quota feautre.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets

commit 1bd8d6cd3e413d64e543ec3e69ff43e75a1cf1ea upstream.

In the ext4 implementations of SEEK_HOLE and SEEK_DATA, make sure we
return -ENXIO for negative offsets instead of banging around inside
the extent code and returning -EFSCORRUPTED.

Reported-by: Mateusz S <muttdini@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wcn36xx: Introduce mutual exclusion of fw configuration

commit 39efc7cc7ccf82d1cd946580cdb70760f347305a upstream.

As the association status changes the driver needs to configure the
hardware. This is done based on information in the "sta" acquired by
ieee80211_find_sta(), which requires the caller to ensure that the "sta"
is valid while its being used; generally by entering an rcu read
section.

But the operations acting on the "sta" has to communicate with the
firmware and may therefor sleep, resulting in the following report:

[   31.418190] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:238
[   31.425919] in_atomic(): 0, irqs_disabled(): 0, pid: 34, name:
kworker/u8:1
[   31.434609] CPU: 0 PID: 34 Comm: kworker/u8:1 Tainted: G        W
4.12.0-rc4-next-20170607+ #993
[   31.441002] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC
(DT)
[   31.450380] Workqueue: phy0 ieee80211_iface_work
[   31.457226] Call trace:
[   31.461830] [<ffffff8008088c58>] dump_backtrace+0x0/0x260
[   31.464004] [<ffffff8008088f7c>] show_stack+0x14/0x20
[   31.469557] [<ffffff8008392e70>] dump_stack+0x98/0xb8
[   31.474592] [<ffffff80080e4330>] ___might_sleep+0xf0/0x118
[   31.479626] [<ffffff80080e43a8>] __might_sleep+0x50/0x88
[   31.485010] [<ffffff80088ff9a4>] mutex_lock+0x24/0x60
[   31.490479] [<ffffff8008595c38>] wcn36xx_smd_set_link_st+0x30/0x130
[   31.495428] [<ffffff8008591ed8>] wcn36xx_bss_info_changed+0x148/0x448
[   31.501504] [<ffffff80088ab3c4>]
ieee80211_bss_info_change_notify+0xbc/0x118
[   31.508102] [<ffffff80088f841c>] ieee80211_assoc_success+0x664/0x7f8
[   31.515220] [<ffffff80088e13d4>]
ieee80211_rx_mgmt_assoc_resp+0x144/0x2d8
[   31.521555] [<ffffff80088e1e20>]
ieee80211_sta_rx_queued_mgmt+0x190/0x698
[   31.528239] [<ffffff80088bc44c>] ieee80211_iface_work+0x234/0x368
[   31.535011] [<ffffff80080d81ac>] process_one_work+0x1cc/0x340
[   31.541086] [<ffffff80080d8368>] worker_thread+0x48/0x430
[   31.546814] [<ffffff80080de448>] kthread+0x108/0x138
[   31.552195] [<ffffff8008082ec0>] ret_from_fork+0x10/0x50

In order to ensure that the "sta" remains alive (and consistent) for the
duration of bss_info_changed() mutual exclusion has to be ensured with
sta_remove().

This is done by introducing a mutex to cover firmware configuration
changes, which is made to also ensure mutual exclusion between other
operations changing the state or configuration of the firmware. With
this we can drop the rcu read lock.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

regulator: cpcap: Fix standby mode

commit 91a024e80336528d12b67b5a2e636b9e4467d3ec upstream.

The original patch from Tony uses standby mode bit inverted, which is
not correct. This fixes all instances in the driver code for get & set
mode. This did not yet make problems, since mode has not been changed
by any mainline driver so far.

Fixes: 0ad4c07edd41 ("regulator: cpcap: Add basic regulator support")
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: AF_ALG - remove SGL terminator indicator when chaining

Fixed differently upstream as commit 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")

The SGL is MAX_SGL_ENTS + 1 in size. The last SG entry is used for the
chaining and is properly updated with the sg_chain invocation. During
the filling-in of the initial SG entries, sg_mark_end is called for each
SG entry. This is appropriate as long as no additional SGL is chained
with the current SGL. However, when a new SGL is chained and the last
SG entry is updated with sg_chain, the last but one entry still contains
the end marker from the sg_mark_end. This end marker must be removed as
otherwise a walk of the chained SGLs will cause a NULL pointer
dereference at the last but one SG entry, because sg_next will return
NULL.

The patch only applies to all kernels up to and including 4.13. The
patch 2d97591ef43d0587be22ad1b0d758d6df4999a0b added to 4.14-rc1
introduced a complete new code base which addresses this bug in
a different way. Yet, that patch is too invasive for stable kernels
and was therefore not marked for stable.

Fixes: 8ff590903d5fc ("crypto: algif_skcipher - User-space interface for skcipher operations")
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: caam/qi - properly set IV after {en,de}crypt

commit a68a193805224d90bedd94e9e8ac287600f07b78 upstream.

caam/qi needs a fix similar to what was done for caam/jr in
commit "crypto: caam/qi - properly set IV after {en,de}crypt",
to allow for ablkcipher/skcipher chunking/streaming.

Fixes: b189817cf789 ("crypto: caam/qi - add ablkcipher and authenc algorithms")
Suggested-by: David Gstir <david@sigma-star.at>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: caam/qi - fix typo in authenc alg driver name

commit 84ea95436b83884fa55780618ffaf4bbe3312166 upstream.

s/desi/des for echainiv(authenc(hmac(sha256),cbc(des))) alg.

Fixes: b189817cf7894 ("crypto: caam/qi - add ablkcipher and authenc algorithms")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: scompress - don't sleep with preemption disabled

commit 3c08377262880afc1621ab9cb6dbe7df47a6033d upstream.

Due to the use of per-CPU buffers, scomp_acomp_comp_decomp() executes
with preemption disabled, and so whether the CRYPTO_TFM_REQ_MAY_SLEEP
flag is set is irrelevant, since we cannot sleep anyway. So disregard
the flag, and use GFP_ATOMIC unconditionally.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: ccp - Fix XTS-AES-128 support on v5 CCPs

commit e652399edba99a5497f0d80f240c9075d3b43493 upstream.

Version 5 CCPs have some new requirements for XTS-AES: the type field
must be specified, and the key requires 512 bits, with each part
occupying 256 bits and padded with zeroes.

Signed-off-by: Gary R Hook <ghook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

docs: disable KASLR when debugging kernel

commit e604f1cb85367d2e5fd4cf253296d190996da81a upstream.

commit 6807c84652b0 ("x86: Enable KASLR by default") enables KASLR
by default on x86. While KASLR will confuse gdb which resolve kernel
symbol address from symbol table of vmlinux. We should turn off KASLR for
kernel debugging.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Reviewed-by: Kieran Bingham <kbingham@kernel.org>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Natale Patriciello <natale.patriciello@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MADDF|MSUBF>.D: Fix accuracy (64-bit case)

commit 2cfa58259f4b65b33ebe8f167019a1f89c6c3289 upstream.

Implement fused multiply-add with correct accuracy.

Fused multiply-add operation has better accuracy than respective
sequential execution of multiply and add operations applied on the
same inputs. This is because accuracy errors accumulate in latter
case.

This patch implements fused multiply-add with the same accuracy
as it is implemented in hardware, using 128-bit intermediate
calculations.

One test case example (raw bits) that this patch fixes:

MADDF.D fd,fs,ft:
fd = 0x00000ca000000000
fs = ft = 0x3f40624dd2f1a9fc

Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction")
Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction")
Signed-off-by: Douglas Leung <douglas.leung@imgtec.com>
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16891/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MADDF|MSUBF>.S: Fix accuracy (32-bit case)

commit b3b8e1eb27c523e32b6a8aa7ec8ac4754456af57 upstream.

Implement fused multiply-add with correct accuracy.

Fused multiply-add operation has better accuracy than respective
sequential execution of multiply and add operations applied on the
same inputs. This is because accuracy errors accumulate in latter
case.

This patch implements fused multiply-add with the same accuracy
as it is implemented in hardware, using 64-bit intermediate
calculations.

One test case example (raw bits) that this patch fixes:

MADDF.S fd,fs,ft:
fd = 0x22575225
fs = ft = 0x3727c5ac

Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction")
Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction")
Signed-off-by: Douglas Leung <douglas.leung@imgtec.com>
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16890/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Clean up "maddf_flags" enumeration

commit ae11c0619973ffd73a496308d8a1cb5e1a353737 upstream.

Fix definition and usage of "maddf_flags" enumeration. Avoid duplicate
definition and apply more common capitalization.

This patch does not change any scenario. It just makes MADDF and
MSUBF emulation code more readable and easier to maintain, and
hopefully prevents future bugs as well.

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16889/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of zero inputs

commit 7cf64ce4d37f1b4f44365fcf77f565d523819dcd upstream.

Fix the cases of <MADDF|MSUBF>.<D|S> when any of two multiplicands is
+0 or -0, and the third input is also +0 or -0. Depending on the signs
of inputs, certain special cases must be handled.

A relevant example:

MADDF.S fd,fs,ft:
If fs contains +0.0, ft contains -0.0, and fd contains 0.0, fd is
going to contain +0.0 (without this patch, it used to contain -0.0).

Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction")
Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16888/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of infinite inputs

commit 0c64fe6348687f0e1cea9a608eae9d351124a73a upstream.

Fix the cases of <MADDF|MSUBF>.<D|S> when any of two multiplicands is
infinity. The correct behavior in such cases is affected by the nature
of third input. Cases of addition of infinities with opposite signs
and subtraction of infinities with same signs may arise and must be
handles separately. Also, the value od flags argument (that determines
whether the instruction is MADDF or MSUBF) affects the outcome.

Relevant examples:

MADDF.S fd,fs,ft:
  If fs contains +inf, ft contains +inf, and fd contains -inf, fd is
  going to contain indef (without this patch, it used to contain
  -inf).

MSUBF.S fd,fs,ft:
  If fs contains +inf, ft contains 1.0, and fd contains +0.0, fd is
  going to contain -inf (without this patch, it used to contain +inf).

Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction")
Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction")
Signed-off-by: Douglas Leung <douglas.leung@imgtec.com>
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16887/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix NaN propagation

commit e840be6e7057757befc3581e1699e30fe7f0dd51 upstream.

Fix the cases of <MADDF|MSUBF>.<D|S> when any of three inputs is any
NaN. Correct behavior of <MADDF|MSUBF>.<D|S> fd, fs, ft is following:

  - if any of inputs is sNaN, return a sNaN using following rules: if
    only one input is sNaN, return that one; if more than one input is
    sNaN, order of precedence for return value is fd, fs, ft
  - if no input is sNaN, but at least one of inputs is qNaN, return a
    qNaN using following rules: if only one input is qNaN, return that
    one; if more than one input is qNaN, order of precedence for
    return value is fd, fs, ft

The previous code contained correct handling of some above cases, but
not all. Also, such handling was scattered into various cases of
"switch (CLPAIR(xc, yc))" statement, and elsewhere. With this patch,
this logic is placed in one place, and "switch (CLPAIR(xc, yc))" is
significantly simplified.

A relevant example:

MADDF.S fd,fs,ft:
  If fs contains qNaN1, ft contains qNaN2, and fd contains qNaN3, fd
  is going to contain qNaN3 (without this patch, it used to contain
  qNaN1).

Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction")
Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16886/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs

commit 304bfe473e70523e591fb1c9223289d355e0bdcb upstream.

Fix following special cases for MINA>.<D|S>:

  - if one of the inputs is zero, and the other is subnormal, normal,
    or infinity, the  value of the former should be returned (that is,
    a zero).
  - if one of the inputs is infinity, and the other input is normal,
    or subnormal, the value of the latter should be returned.

The previous implementation's logic for such cases was incorrect - it
appears as if it implements MAXA, and not MINA instruction.

A relevant example:

MINA.S fd,fs,ft:
  If fs contains 100.0, and ft contains 0.0, fd is going to contain
  0.0 (without this patch, it used to contain 100.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16885/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs

commit 3444c4eb534c20e44f0d6670b34263efaf8b531f upstream.

Fix the value returned by <MAXA|MINA>.<D|S> fd,fs,ft, if both inputs
are infinite. The previous implementation returned always the value
contained in ft in such cases. The correct behavior is specified
in Mips instruction set manual and is as follows:

    fs    ft        MAXA     MINA
  ---------------------------------
    inf   inf        inf      inf
    inf  -inf        inf     -inf
   -inf   inf        inf     -inf
   -inf  -inf       -inf     -inf

A relevant example:

MAXA.S fd,fs,ft:
  If fs contains +inf, and ft contains -inf, fd is going to contain
  +inf (without this patch, it used to contain -inf).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16884/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs

commit 1a41b3b441508ae63b1a9ec699ec94065739eb60 upstream.

Fix the value returned by <MAXA|MINA>.<D|S>, if the inputs are normal
fp numbers of the same absolute value, but opposite signs.

A relevant example:

MAXA.S fd,fs,ft:
If fs contains -3.0, and ft contains +3.0, fd is going to contain
+3.0 (without this patch, it used to contain -3.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16883/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative

commit aabf5cf02e22ebc4e541adf835910f388b6c3e65 upstream.

Fix the value returned by <MAX|MIN>.<D|S>, if both inputs are negative
normal fp numbers. The previous logic did not take into account that
if both inputs have the same sign, there should be separate treatment
of the cases when both inputs are negative and when both inputs are
positive.

A relevant example:

MAX.S fd,fs,ft:
If fs contains -5.0, and ft contains -7.0, fd is going to contain
-5.0 (without this patch, it used to contain -7.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16882/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero

commit 15560a58bfd4ff82cdd16b2270d4ef9b06d2cc4d upstream.

Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S>, if both inputs
are zeros. The right behavior in such cases is stated in instruction
reference manual and is as follows:

   fs  ft       MAX     MIN       MAXA    MINA
  ---------------------------------------------
    0   0        0       0         0       0
    0  -0        0      -0         0      -0
   -0   0        0      -0         0      -0
   -0  -0       -0      -0        -0      -0

Prior to this patch, some of the above cases were yielding correct
results. However, for the sake of code consistency, all such cases
are rewritten in this patch.

A relevant example:

MAX.S fd,fs,ft:
  If fs contains +0.0, and ft contains -0.0, fd is going to contain
  +0.0 (without this patch, it used to contain -0.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16881/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation

commit e78bf0dc4789bdea1453595ae89e8db65918e22e upstream.

Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S> fd,fs,ft, if both
inputs are quiet NaNs. The <MAX|MAXA|MIN|MINA>.<D|S> specifications
state that the returned value in such cases should be the quiet NaN
contained in register fs.

A relevant example:

MAX.S fd,fs,ft:
If fs contains qNaN1, and ft contains qNaN2, fd is going to contain
qNaN1 (without this patch, it used to contain qNaN2).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16880/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: i8042 - add Gigabyte P57 to the keyboard reset table

commit 697c5d8a36768b36729533fb44622b35d56d6ad0 upstream.

Similar to other Gigabyte laptops, the touchpad on P57 requires a
keyboard reset to detect Elantech touchpad correctly.

BugLink: https://bugs.launchpad.net/bugs/1594214
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl/amd: save pin registers over suspend/resume

commit 79d2c8bede2c93f9432d7da0bc2f76a195c90fc0 upstream.

The touchpad in the Asus laptop models X505BA/BP and X542BA/BP is
unresponsive after suspend/resume. The following error appears during
resume:

i2c_hid i2c-ELAN1300:00: failed to reset device.

The problem here is that i2c_hid does not notice the interrupt being
generated at this point, because the GPIO is no longer configured
for interrupts.

Fix this by saving pinctrl-amd pin registers during suspend and
restoring them at resume time.

Based on code from pinctrl-intel.

Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl: samsung: Fix NULL pointer exception on external interrupts on S3C24xx

commit cee7413d84044a0c1919a7c70a2d761ae24390de upstream.

After commit 8b1bd11c1f8f ("pinctrl: samsung: Add the support the
multiple IORESOURCE_MEM for one pin-bank"), the S3C24xx (and probably
S3C64xx as well) fails:

Unable to handle kernel NULL pointer dereference at virtual address 000000a8
...
(s3c24xx_demux_eint4_7) from [<c004469c>] (__handle_domain_irq+0x6c/0xcc)
(__handle_domain_irq) from [<c0009444>] (s3c24xx_handle_irq+0x6c/0x12c)
(s3c24xx_handle_irq) from [<c000e5fc>] (__irq_svc+0x5c/0x78)

Mentioned commit moved the pointer to controller's base IO memory address
from each controller's driver data (samsung_pinctrl_drv_data) to per-bank
structure (samsung_pin_bank). The external interrupt demux
handlers (s3c24xx_demux_eint()) tried to get this base address from opaque
pointer stored under irq_chip data:

struct irq_data *irqd = irq_desc_get_irq_data(desc);
struct samsung_pin_bank *bank = irq_data_get_irq_chip_data(irqd);
...
pend = readl(bank->eint_base + EINTPEND_REG);

which is wrong because this is hardware irq and it bank was never set
for this irq_chip.

For S3C24xx and S3C64xx, this partially reverts mentioned commit by
bringing back the virt_base stored under each controller's driver data
(samsung_pinctrl_drv_data). This virt_base address will be now
duplicated:
- samsung_pinctrl_drv_data->virt_base: used on S3C24xx and S3C64xx,
- samsung_pin_bank->pctl_base: used on Exynos.

Fixes: 8b1bd11c1f8f ("pinctrl: samsung: Add the support the multiple IORESOURCE_MEM for one pin-bank")
Cc: Sergio Prado <sergio.prado@e-labworks.com>
Reported-by: Sergio Prado <sergio.prado@e-labworks.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Lihua Yao <ylhuajnu@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl: samsung: Fix invalid register offset used for Exynos5433 external interrupts

commit af0b0baa89953aed07034725023371b2fa50a1e6 upstream.

When setting the pin function for external interrupts, the driver used
wrong IO memory address base. The pin function register is always under
pctl_base, not the eint_base.

By updating wrong register, the external interrupts for chosen GPIO
would not work at all and some other GPIO might be configured to wrong
value. For example on Exynos5433-based boards, the external interrupts
for gpf{1-5}-X GPIOs should not work at all (driver toggled reserved
registers from ALIVE bank instead).

Platforms other than Exynos5433 should not be affected as eint_base
equals pctl_base in such case.

Fixes: 8b1bd11c1f8f ("pinctrl: samsung: Add the support the multiple IORESOURCE_MEM for one pin-bank")
Reported-by: Tomasz Figa <tomasz.figa@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Tested-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: fix __tty_insert_flip_char regression

commit 8a5a90a2a477b86a3dc2eaa5a706db9bfdd647ca upstream.

Sergey noticed a small but fatal mistake in __tty_insert_flip_char,
leading to an oops in an interrupt handler when using any serial
port.

The problem is that I accidentally took the tty_buffer pointer
before calling __tty_buffer_request_room(), which replaces the
buffer. This moves the pointer lookup to the right place after
allocating the new buffer space.

Fixes: 979990c62848 ("tty: improve tty_insert_flip_char() fast path")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: improve tty_insert_flip_char() slow path

commit 065ea0a7afd64d6cf3464bdd1d8cd227527e2045 upstream.

While working on improving the fast path of tty_insert_flip_char(),
I noticed that by calling tty_buffer_request_room(), we needlessly
move to the separate flag buffer mode for the tty, even when all
characters use TTY_NORMAL as the flag.

This changes the code to call __tty_buffer_request_room() with the
correct flag, which will then allocate a regular buffer when it rounds
out of space but no special flags have been used. I'm guessing that
this is the behavior that Peter Hurley intended when he introduced
the compacted flip buffers.

Fixes: acc0f67f307f ("tty: Halve flip buffer GFP_ATOMIC memory consumption")
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: improve tty_insert_flip_char() fast path

commit 979990c6284814617d8f2179d197f72ff62b5d85 upstream.

kernelci.org reports a crazy stack usage for the VT code when CONFIG_KASAN
is enabled:

drivers/tty/vt/keyboard.c: In function 'kbd_keycode':
drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

The problem is that tty_insert_flip_char() gets inlined many times into
kbd_keycode(), and also into other functions, and each copy requires 128
bytes for stack redzone to check for a possible out-of-bounds access on
the 'ch' and 'flags' arguments that are passed into
tty_insert_flip_string_flags as a variable-length string.

This introduces a new __tty_insert_flip_char() function for the slow
path, which receives the two arguments by value. This completely avoids
the problem and the stack usage goes back down to around 100 bytes.

Without KASAN, this is also slightly better, as we don't have to
spill the arguments to the stack but can simply pass 'ch' and 'flag'
in registers, saving a few bytes in .text for each call site.

This should be backported to linux-4.0 or later, which first introduced
the stack sanitizer in the kernel.

Fixes: c420f167db8c ("kasan: enable stack instrumentation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: read reg in each iterator of psp_wait_for loop

commit 2890decfd9969cac21067ca0c734fbccaf74d634 upstream.

v2: fix the SOS loading failure for PSP v3.1

Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Acked-by: Huang Rui <ray.huang@amd.com> (v1)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: xpad - validate USB endpoint type during probe

commit 122d6a347329818419b032c5a1776e6b3866d9b9 upstream.

We should only see devices with interrupt endpoints. Ignore any other
endpoints that we find, so we don't send try to send them interrupt URBs
and trigger a WARN down in the USB stack.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smp/hotplug: Handle removal correctly in cpuhp_store_callbacks()

commit 0c96b27305faf06c068b45e07d28336c80dac286 upstream.

If cpuhp_store_callbacks() is called for CPUHP_AP_ONLINE_DYN or
CPUHP_BP_PREPARE_DYN, which are the indicators for dynamically allocated
states, then cpuhp_store_callbacks() allocates a new dynamic state. The
first allocation in each range returns CPUHP_AP_ONLINE_DYN or
CPUHP_BP_PREPARE_DYN.

If cpuhp_remove_state() is invoked for one of these states, then there is
no protection against the allocation mechanism. So the removal, which
should clear the callbacks and the name, gets a new state assigned and
clears that one.

As a consequence the state which should be cleared stays initialized. A
consecutive CPU hotplug operation dereferences the state callbacks and
accesses either freed or reused memory, resulting in crashes.

Add a protection against this by checking the name argument for NULL. If
it's NULL it's a removal. If not, it's an allocation.

[ tglx: Added a comment and massaged changelog ]

Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface")
Signed-off-by: Ethan Barnes <ethan.barnes@sandisk.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.or>
Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
Cc: Sebastian Siewior <bigeasy@linutronix.d>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/DM2PR04MB398242FC7776D603D9F99C894A60@DM2PR04MB398.namprd04.prod.outlook.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

srcu: Provide ordering for CPU not involved in grace period

commit 35732cf9dd38b1efb0f2f22c91c61b51337d1ac3 upstream.

Tree RCU guarantees that every online CPU has a memory barrier between
any given grace period and any of that CPU's RCU read-side sections that
must be ordered against that grace period.  Since RCU doesn't always
know where read-side critical sections are, the actual implementation
guarantees order against prior and subsequent non-idle non-offline code,
whether in an RCU read-side critical section or not.  As a result, there
does not need to be a memory barrier at the end of synchronize_rcu()
and friends because the ordering internal to the grace period has
ordered every CPU's post-grace-period execution against each CPU's
pre-grace-period execution, again for all non-idle online CPUs.

In contrast, SRCU can have non-idle online CPUs that are completely
uninvolved in a given SRCU grace period, for example, a CPU that
never runs any SRCU read-side critical sections and took no part in
the grace-period processing.  It is in theory possible for a given
synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
uninvolved in the prior SRCU grace period, which could mean that the
code following that synchronize_srcu() would end up being unordered with
respect to both the grace period and any pre-existing SRCU read-side
critical sections.

This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
which prevents this scenario from occurring.

Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/mlx5: Fix cached MR allocation flow

commit 4c25b7a39005c9243a492b577c3e940eeac36a25 upstream.

When we have a miss in one order of the mkey cache, we try to get
an mkey from a higher order.

We still need to check that the higher order can be used with UMR
before using it. Otherwise, we will get an mkey with 0 entries and
the post send operation that is used to fill it will complete with
the following error:

mlx5_0:dump_cqe:275:(pid 0): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000025 49ce59d2

Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/{qib, hfi1}: Avoid flow control testing for RDMA write operation

commit 5b0ef650bd0f820e922fcc42f1985d4621ae19cf upstream.

Section 9.7.7.2.5 of the 1.3 IBTA spec clearly says that receive
credits should never apply to RDMA write.

qib and hfi1 were doing that. The following situation will result
in a QP hang:
- A prior SEND or RDMA_WRITE with immmediate consumed the last
credit for a QP using RC receive buffer credits
- The prior op is acked so there are no more acks
- The peer ULP fails to post receive for some reason
- An RDMA write sees that the credits are exhausted and waits
- The peer ULP posts receive buffers
- The ULP posts a send or RDMA write that will be hung

The fix is to avoid the credit test for the RDMA write operation.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/hfi1: Revert egress pkey check enforcement

commit ecdb19f4b513033e6f2c4326cd5b81e04393e5e1 upstream.

Current code has some serious flaws. Disarm the flag
pending an appropriate patch.

Fixes: 53526500f301 ("IB/hfi1: Permanently enable P_Key checking in HFI")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

<linux/uaccess.h>: Fix copy_in_user() declaration

commit f58e76c1c551c7577b25a6fe493d82f5214331b7 upstream.

copy_in_user() copies data from user-space address @from to user-
space address @to. Hence declare both @from and @to as user-space
pointers.

Fixes: commit d597580d3737 ("generic ...copy_..._user primitives")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

orangefs: Don't clear SGID when inheriting ACLs

commit b5accbb0dfae36d8d36cd882096943c98d5ede15 upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by creating __orangefs_set_acl() function that does not
call posix_acl_update_mode() and use it when inheriting ACLs. That
prevents SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: stable@vger.kernel.org
CC: Mike Marshall <hubcap@omnibond.com>
CC: pvfs2-developers@beowulf-underground.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 4.13.3

xfs: fix compiler warnings

commit 7bf7a193a90cadccaad21c5970435c665c40fe27 upstream.

Fix up all the compiler warnings that have crept in.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid5: release/flush io in raid5_do_work()

commit 9c72a18e46ebe0f09484cce8ebf847abdab58498 upstream.

In raid5, there are scenarios where some ios are deferred to a later
time, and some IO need a flush to complete. To make sure we make
progress with these IOs, we need to call the following functions:

    flush_deferred_bios(conf);
    r5l_flush_stripe_to_raid(conf->log);

Both of these functions are called in raid5d(), but missing in
raid5_do_work(). As a result, these functions are not called
when multi-threading (group_thread_cnt > 0) is enabled. This patch
adds calls to these function to raid5_do_work().

Note for stable branches:

  r5l_flush_stripe_to_raid(conf->log) is need for 4.4+
  flush_deferred_bios(conf) is only needed for 4.11+

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1/10: reset bio allocated from mempool

commit 208410b546207cfc4c832635fa46419cfa86b4cd upstream.

Data allocated from mempool doesn't always get initialized, this happens when
the data is reused instead of fresh allocation. In the raid1/10 case, we must
reinitialize the bios.

Reported-by: Jonathan G. Underwood <jonathan.underwood@gmail.com>
Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: use kmem_free to free return value of kmem_zalloc

commit 6c370590cfe0c36bcd62d548148aa65c984540b7 upstream.

In function xfs_test_remount_options(), kfree() is used to free memory
allocated by kmem_zalloc(). But it is better to use kmem_free().

Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: open code end_buffer_async_write in xfs_finish_page_writeback

commit 8353a814f2518dcfa79a5bb77afd0e7dfa391bb1 upstream.

Our loop in xfs_finish_page_writeback, which iterates over all buffer
heads in a page and then calls end_buffer_async_write, which also
iterates over all buffers in the page to check if any I/O is in flight
is not only inefficient, but also potentially dangerous as
end_buffer_async_write can cause the page and all buffers to be freed.

Replace it with a single loop that does the work of end_buffer_async_write
on a per-page basis.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: don't set v3 xflags for v2 inodes

commit dd60687ee541ca3f6df8758f38e6f22f57c42a37 upstream.

Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
if the inode isn't a v3 inode, because di_flags2 only exists on v3.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix incorrect log_flushed on fsync

commit 47c7d0b19502583120c3f396c7559e7a77288a68 upstream.

When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

Task A                    Task B
-------------------------------------------------------
xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: disable per-inode DAX flag

commit 742d84290739ae908f1b61b7d17ea382c8c0073a upstream.

Currently flag switching can be used to easily crash the kernel. Disable
the per-inode DAX flag until that is sorted out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: relog dirty buffers during swapext bmbt owner change

commit 2dd3d709fc4338681a3aa61658122fa8faa5a437 upstream.

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: disallow marking previously dirty buffers as ordered

commit a5814bceea48ee1c57c4db2bd54b0c0246daf54a upstream.

Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: move bmbt owner change to last step of extent swap

commit 6fb10d6d22094bc4062f92b9ccbcee2f54033d04 upstream.

The extent swap operation currently resets bmbt block owners before
the inode forks are swapped. The bmbt buffers are marked as ordered
so they do not have to be physically logged in the transaction.

This use of ordered buffers is not safe as bmbt buffers may have
been previously physically logged. The bmbt owner change algorithm
needs to be updated to physically log buffers that are already dirty
when/if they are encountered. This means that an extent swap will
eventually require multiple rolling transactions to handle large
btrees. In addition, all inode related changes must be logged before
the bmbt owner change scan begins and can roll the transaction for
the first time to preserve fs consistency via log recovery.

In preparation for such fixes to the bmbt owner change algorithm,
refactor the bmbt scan out of the extent fork swap code to the last
operation before the transaction is committed. Update
xfs_swap_extent_forks() to only set the inode log flags when an
owner change scan is necessary. Update xfs_swap_extents() to trigger
the owner change based on the inode log flags. Note that since the
owner change now occurs after the extent fork swap, the inode btrees
must be fixed up with the inode number of the current inode (similar
to log recovery).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: skip bmbt block ino validation during owner change

commit 99c794c639a65cc7b74f30a674048fd100fe9ac8 upstream.

Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: don't log dirty ranges for ordered buffers

commit 8dc518dfa7dbd079581269e51074b3c55a65a880 upstream.

Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: refactor buffer logging into buffer dirtying helper

commit 9684010d38eccda733b61106765e9357cf436f65 upstream.

xfs_trans_log_buf() is responsible for logging the dirty segments of
a buffer along with setting all of the necessary state on the
transaction, buffer, bli, etc., to ensure that the associated items
are marked as dirty and prepared for I/O. We have a couple use cases
that need to to dirty a buffer in a transaction without actually
logging dirty ranges of the buffer. One existing use case is
ordered buffers, which are currently logged with arbitrary ranges to
accomplish this even though the content of ordered buffers is never
written to the log. Another pending use case is to relog an already
dirty buffer across rolled transactions within the deferred
operations infrastructure. This is required to prevent a held
(XFS_BLI_HOLD) buffer from pinning the tail of the log.

Refactor xfs_trans_log_buf() into a new function that contains all
of the logic responsible to dirty the transaction, lidp, buffer and
bli. This new function can be used in the future for the use cases
outlined above. This patch does not introduce functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>