Chuck Anderson [Thu, 6 Apr 2017 07:17:44 +0000 (00:17 -0700)]
Merge branch topic/uek-4.1/sparc of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/sparc: (28 commits)
megaraid: Fix unaligned warning
sparc64: Restrict number of processes
SPARC64: vds_blk_rw() does not handle drives with q->limits.chunk_sectors > 0
sparc64: Improve boot time by per cpu map update
arch/sparc: memblock resizes are not handled properly
SPARC64: LDOM vnet "Got unexpected MCAST reply"
ldmvsw: disable tso and gso for bridge operations
ldmvsw: update and simplify version string
sunvnet: remove extra rcu_read_unlocks
sunvnet: straighten up message event handling logic
sunvnet: add memory barrier before check for tx enable
sunvnet: update version and version printing
sunvnet: remove unused variable in maybe_tx_wakeup
sunvnet: make sunvnet common code dynamically loadable
hwrng: n2 - update version info
hwrng: n2 - support new hardware register layout
hwrng: n2 - add device data descriptions
hwrng: n2 - limit error spewage when self-test fails
hwrng: n2 - Attach on T5/M5, T7/M7 SPARC CPUs
tcp: fix tcp_fastopen unaligned access complaints on sparc
...
Allen Pais [Wed, 29 Mar 2017 08:51:58 +0000 (14:21 +0530)]
megaraid: Fix unaligned warning
The MegaRAID userland descriptor structures do not properly align
pointers on their natural boundaries. This causes warnings to be issued
when storcli or the SNMP daemon are in use.
Quiesce the warning until the user-kernel interface has been fixed.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 069af368ac74dc0130f91836b9f85f7cd5b18749) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
If the number of processes exceeds the number of supported context IDs
then there are random segfaults seen in the user programs.
The data collected when debugging this bug showed that two processes
with the same context IDs were present in the TLB shared between them
thus resulting in incorrect translations. Since the bug occurs after the
kernel hits the max context ID supported by the processor, the context
wraparound code found in get_new_mmu_context(...) and
smp_new_mmu_context_version_client(...) are under suspicion.
The plan is that this will get fixed when we implement "context domain"
feature for sparc in the kernel. For now this patch temporarily
restricts the number of processes allowed by the kernel based on the
number of context IDs supported by the processor. This way we never reach
that condition of having incorrect translations.
For non root users fork will fail if the number of existing processes is
greater than (max_user_nctx - 100). Where max_user_nctx is the maximum
number of context IDs supported by the processor. For root user the fork
will fail if the the number of existing processes reaches max_user_nctx.
Extra context IDs are given to root to recover the system if users reach
their limit and cannot recover the system (i.e cannot even execute
'kill' to reduce the number of processes).
Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
George Kennedy [Wed, 15 Feb 2017 20:17:58 +0000 (15:17 -0500)]
SPARC64: vds_blk_rw() does not handle drives with q->limits.chunk_sectors > 0
Drives with q->limits.chunk_sectors > 0 are not properly handled by
vds_blk_rw(). Drives such as NVME set chunk_sectors to indicate a performance
boundary (see call to blk_queue_chunk_sectors() in nvme_alloc_ns()). Currently,
when vds_blk_rw() calls bio_add_page() and the chunk_sectors boundary would be
crossed, bio_add_page() returns zero and vds_blk_rw() fails with -EIO.
The proposed fix now adds an additional check to vds_blk_rw() when
bio_add_page() returns zero that checks for bio->bi_iter.bi_size != 0. If
bi_size != 0, it indicates that a page or pages have been successfully added by
bio_add_page(). When this added condition has been hit, exit the for loop in
vds_blk_rw() and submit the outstanding IOs and continue.
Signed-off-by: George Kennedy <george.kennedy@oracle.com> Reviewed-By: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 25373818 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Atish Patra [Mon, 13 Feb 2017 02:32:58 +0000 (19:32 -0700)]
sparc64: Improve boot time by per cpu map update
Currently, smp_fill_in_sib_core_maps is invoked during cpu_up to setup
all the core/sibling map correctly. This happens in the order of O(n^2)
as it iterates over all the online cpus twice when each cpu comes online.
This increases smp_init() execution time exponentially leading to a
higher boot time.
Optimize the code path by comparing only the current cpu with online
cpus and set the maps for both the cpus simultaneously. Take this
opportunity to merge all three for loops into one as well. Here is
the smp_init() time after and before the fix.
Number of cpus: before fix: after the fix:
512 2.30s .283s
1024 14.23s .493s
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Wed, 8 Feb 2017 21:02:39 +0000 (16:02 -0500)]
arch/sparc: memblock resizes are not handled properly
In add_node_ranges() when memblock resize happens, the iterator keeps using
the previous freed array. This bug cause hangs on machine where there are
over 128 memory blocks during boot. For example, on machines where memory
interleaving is small.
The problem is seen on T4-4 because it cant have 2T of memory, and memory
is interleaved at 8G. So we have 2T/8G = 256 regions to set node IDs. The
starting size of regions array is 128. Thus, we have to double at least one
time (actually we have to double twice because some memory is already
reserved and thus we need more than 256 regions). We start using an
incorrect pointer to the array after the first doubling.
George Kennedy [Wed, 8 Feb 2017 01:37:37 +0000 (20:37 -0500)]
SPARC64: LDOM vnet "Got unexpected MCAST reply"
Handle unexpected MCAST reply as a debug warning the same as is done in
Solaris 12. Please see bug 24954702 for details.
Signed-off-by: George Kennedy <george.kennedy@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Orabug: 24954702 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Mon, 13 Feb 2017 18:57:04 +0000 (10:57 -0800)]
ldmvsw: disable tso and gso for bridge operations
The ldmvsw driver is specifically for supporting the ldom virtual
networking by running in the primary ldom and using the LDC to connect
the remaining ldoms to the outside world via a bridge. With TSO and GSO
supported while connected the bridge, things tend to misbehave as seen
in our case by delayed packets, enough to begin triggering retransmits
and affecting overall throughput. By turning off advertised support for
TSO and GSO we restore stable traffic flow through the bridge.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc221a34ac473b444a7cfdd0c152b4c71f79326b) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7602011f59cc32ebc3a5f9058d6ba11b096c8c50) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Mon, 13 Feb 2017 18:57:02 +0000 (10:57 -0800)]
sunvnet: remove extra rcu_read_unlocks
The RCU read lock is grabbed first thing in sunvnet_start_xmit_common()
so it always needs to be released. This removes the conditional release
in the dropped packet error path and removes a couple of superfluous
calls in the middle of the code.
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit daa86e50f649fccadafc53994ddc4254d75a008b) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Mon, 13 Feb 2017 18:57:01 +0000 (10:57 -0800)]
sunvnet: straighten up message event handling logic
The use of gotos for handling the incoming events made this code
harder to read and support than it should be. This patch straightens
out and clears up the logic.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bf091f3f362b3c562a18bbf7a2d3e2f3a36eba1d) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Mon, 13 Feb 2017 18:57:00 +0000 (10:57 -0800)]
sunvnet: add memory barrier before check for tx enable
In order to allow the underlying LDC and outstanding memory operations
to potentially catch up with the driver's Tx requests, add a memory
barrier before checking again for available tx descriptors.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fd263fb6e718c5bdf35cbc1de4f781c71794d2a4) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Mon, 13 Feb 2017 18:56:59 +0000 (10:56 -0800)]
sunvnet: update version and version printing
There have been several changes since the first version of this code, so
we bump the version number. While we're at it, we can simplify the
version printing a bit and drop a couple lines of code.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f2f3e210bffe5c8f8b30d0b0c7b0f733ff5db334) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Sowmini Varadhan [Mon, 13 Feb 2017 18:56:58 +0000 (10:56 -0800)]
sunvnet: remove unused variable in maybe_tx_wakeup
The vio_dring_state *dr variable is unused in maybe_tx_wakeup().
As the comments indicate, we call maybe_tx_wakeup() whenever we
get a STOPPED LDC message on the port. If the queue is stopped,
we want to wake it up so that we will send another START message
at the next TX and trigger the consumer to drain the dring.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d4aa89cc2bbe021722c946eb11b21ebb0f13c825) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Mon, 13 Feb 2017 18:56:57 +0000 (10:56 -0800)]
sunvnet: make sunvnet common code dynamically loadable
When the sunvnet_common code was split out for use by both sunvnet
and the newer ldmvsw, it was made into a static kernel library, which
limits the usefulness of sunvnet and ldmvsw as loadables, since most
of the real work is being done in the shared code. Also, this is
simply dead code in kernels that aren't running the LDoms.
This patch makes the sunvnet_common into a dynamically loadable
module and makes sunvnet and ldmvsw dependent on sunvnet_common.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2493b842f258e14938f278e44ecc26970dfabbf0) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0ff1436fb2e3da085f7177d03ce4362c45b75d57) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 07e25d43be8502bd8ab6122c4f6449ebf30e98f7) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Thu, 12 Jan 2017 18:52:47 +0000 (10:52 -0800)]
hwrng: n2 - add device data descriptions
Since we're going to need to keep track of more than just one
attribute of the hardware, we'll change the use of the data field
from the match struct from a single flag to a struct pointer.
This patch adds the struct template and initial descriptions.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit becbc4940ad8e8ff560e1ceee33d9bb4fe4c9225) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Thu, 12 Jan 2017 18:52:46 +0000 (10:52 -0800)]
hwrng: n2 - limit error spewage when self-test fails
If the self-test fails, it probably won't actually suddenly
start working. Currently, this causes an endless spew of
error messages on the console and in the logs, so this patch
adds a limiter to the test.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit db602a7f940a71870c17e39bcbe4e4d7a4a8273e) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit c1e9b3b0eea12899b7749571af21cc60822cf2b6) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Thu, 12 Jan 2017 22:24:58 +0000 (14:24 -0800)]
tcp: fix tcp_fastopen unaligned access complaints on sparc
Fix up a data alignment issue on sparc by swapping the order
of the cookie byte array field with the length field in
struct tcp_fastopen_cookie, and making it a proper union
to clean up the typecasting.
This addresses log complaints like these:
log_unaligned: 113 callbacks suppressed
Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 003c941057eaa868ca6fedd29a274c863167230d) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Liam R. Howlett [Tue, 31 Jan 2017 20:34:08 +0000 (12:34 -0800)]
vds: Add physical block support
Version 1.2 of the virtual IO device protocol added physical block
support. Start sending the underlaying physical block device size.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Orabug: 19420123 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Aldridge [Fri, 17 Feb 2017 15:00:38 +0000 (07:00 -0800)]
sparc64: Add missing hardware capabilities for M7
Some M7 hardware capabilities were not being reported
correctly. This commit fixes the issue by adding definitions
for all the missing capabilities from both the Machine
Descriptor and the Compatibility Feature Register.
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Jag Raman [Tue, 7 Mar 2017 21:41:31 +0000 (16:41 -0500)]
sparc64: VDC threads in guest domain do not resume after primary domain reboot
Prevents VDC threads from hanging while waiting for primary
domain to come back up. Ensures that all waiting VDC threads
are woken up when primary domain comes back up.
Liam R. Howlett [Wed, 1 Feb 2017 19:09:26 +0000 (14:09 -0500)]
sunvdc: Add support for setting physical sector size
Physical sector size is supported in v1.2 of the vDisk protocol and
should be set if available. If protocol version 1.2 is used and the
physical disk size is unavailable, then the disk is considered busy.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(Cherry-pick of upstream f41e54616ca1a199f6c17228f26082ccdaaab3de)
Orabug: 19420123 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Atish Patra [Wed, 1 Mar 2017 02:18:26 +0000 (19:18 -0700)]
sparc64: create/destroy cpu sysfs dynamically
Currently, cpu/cpuX represents maximum number of possible
cpus in a domain. Those cpu sysfs directories also does
not change as we add/remove cpus via ldom manager.
Update sysfs so that it represents number of present cpus
in the domain. As a result, cpu sysfs is also updated
dynamically upon cpu add/removal.
Before the fix:
[root@ca-sparc76 ~]# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -n-cv- UART 32 32G 0.2% 0.2% 11m
[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
512
[root@ca-sparc76 ~]# ldm set-vcpu 64 primary
[root@ca-sparc76 ~]# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -n-cv- UART 64 32G 0.0% 0.0% 12m
[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
512
-------------------------------------------------------------------------
After the fix:
[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
32
[root@ca-sparc76 ~]# ldm set-vcpu 64 primary
[root@ca-sparc76 ~]# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -n-cv- UART 64 32G 0.0% 0.0% 12m
[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
64
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Khalid Aziz [Tue, 7 Mar 2017 18:26:10 +0000 (11:26 -0700)]
sparc64: Do not retain old VM_SPARC_ADI flag when protection changes on page
When protection on a memory page is changed with mprotect(), old
arch-specific VM flags on the page are retained. This patch clears
old VM_SPARC_ADI flag when protection is changed since mprotect() is
potentially being invoked to disable ADI on the page. This code will
add VM_SPARC_ADI flag back if the new protection includes it.
Aaron Young [Fri, 17 Feb 2017 23:18:53 +0000 (18:18 -0500)]
SPARC64: VIO: Support for virtual-device MD node probing
This update adds support to the mdesc/vio infrastructure to
probe for "virtual-device" nodes in the MD. The vio
module will create sysfs device files for these nodes which
can be accessed by user space code (such as udev). In addition,
VIO drivers can now probe for these MD nodes if the need arises.
This functionality will serve as part of the fix for
BUG 24841906.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com> Reviewed-By: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Orabug: 24841906
dtrace: fix handling of save_stack_trace sentinel (x86 only)
On x86 only, when save_stack_trace() writes less stack frames to the
buffer than there is space for, a ULONG_MAX is added as sentinel. The
DTrace code was mistakenly treating the buffer as always ending with a
ULONG_MAX.
Orabug: 25727046 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Tested-by: Pierre Orzechowski <pierre.e.orzechowski@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Jake Oshins [Fri, 3 Mar 2017 07:28:52 +0000 (23:28 -0800)]
PCI: hv: Microsoft changes in support of RHEL and UEK4
This patch layers changes made by Microsoft in a Github repo to support
RHEL kernel versions, it eliminates the IRQ Domain dependencies of the
initial commit into mainline. This has a few modifications for OL7/UEK4.
Orabug: 25507635 Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Chuck Anderson [Thu, 9 Mar 2017 09:52:49 +0000 (01:52 -0800)]
Merge branch topic/uek-4.1/dtrace of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/dtrace:
dtrace: get rid of dtrace_gethrtime
dtrace: drop spurious debugging left in by accident
dtrace: comtinuing the FBT implementation and fixes
dtrace: ensure DTrace can use get_user_pages safely
dtrace: enable paranoid mode and IST shift for xen_int3
dtrace: ensure we skip the entire SDT probe point
dtrace: add ip SDT provider
Chuck Anderson [Thu, 9 Mar 2017 09:50:02 +0000 (01:50 -0800)]
Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/drivers: (262 commits)
scsi: qla2xxx: Fix apparent cut-n-paste error.
scsi: qla2xxx: Fix Target mode handling with Multiqueue changes.
scsi: qla2xxx: Add Block Multi Queue functionality.
scsi: qla2xxx: Add multiple queue pair functionality.
qla2xxx: Add irq affinity notification
scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
be2net: get rid of custom busy poll code
be2net: fix initial MAC setting
be2net: fix MAC addr setting on privileged BE3 VFs
be2net: don't delete MAC on close on unprivileged BE3 VFs
be2net: fix status check in be_cmd_pmac_add()
be2net: Increase skb headroom size to 256 bytes
be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.
be2net: do not call napi_hash_del()
be2net: Enable VF link state setting for BE3
be2net: Fix TX stats for TSO packets
be2net: Update Copyright string in be_hw.h
be2net: NCSI FW section should be properly updated with ethtool for BE3
be2net: Provide an alternate way to read pf_num for BEx chips
be2net: mark symbols static where possible
...
Chuck Anderson [Thu, 9 Mar 2017 07:36:20 +0000 (23:36 -0800)]
Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/drivers: (289 commits)
Input: vmmouse - remove port reservation
Input: vmmouse - fix absolute device registration
bnxt_en: use eth_hw_addr_random()
bnxt_en: fix pci cleanup in bnxt_init_one() failure path
bnxt_en: Fix NULL pointer dereference in a failure path during open.
bnxt_en: Reject driver probe against all bridge devices
bnxt_en: Added PCI IDs for BCM57452 and BCM57454 ASICs
bnxt_en: Fix bnxt_setup_tc() error message.
bnxt_en: Print FEC settings as part of the linkup dmesg.
bnxt_en: Do not setup PHY unless driving a single PF.
bnxt_en: Add hardware NTUPLE filter for encapsulated packets.
bnxt_en: Allow NETIF_F_NTUPLE to be enabled on VFs.
bnxt_en: Fix ethtool -l pre-set max combined channel.
bnxt_en: Retry failed NVM_INSTALL_UPDATE with defragmentation flag.
bnxt_en: Update to firmware interface spec 1.7.0.
bnxt_en: Refactor tx completion path.
bnxt_en: Add a set of TX rings to support XDP.
bnxt_en: Add tx ring mapping logic.
bnxt_en: Centralize logic to reserve rings.
bnxt_en: Use event bit map in RX path.
...
Chuck Anderson [Thu, 9 Mar 2017 07:34:05 +0000 (23:34 -0800)]
Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/upstream-cherry-picks:
Btrfs: fix crash on fsync when using overlayfs v4
vfio/pci: Hide broken INTx support from user
crypto: cryptd - Assign statesize properly
crypto: ghash-clmulni - Fix load failure
USB: digi_acceleport: do sanity checking for the number of ports
ksplice: add sysctls for determining Ksplice features.
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
Chuck Anderson [Thu, 9 Mar 2017 04:24:57 +0000 (20:24 -0800)]
Merge branch topic/uek-4.1/sparc of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/sparc: (32 commits)
sparc: fix kernel panic caused by vio handshake
sparc64: Add sensible read values for /proc/<pid>/sparc_adi
sparc64: Add ability to set the mcde state for a process
sparc64: Add proc files specific to ADI
sparc64: add mcd_on_by_default
Revert "sparc: fix intermittent LDom hang waiting for vdc_port_up"
sparc64: Add support for ADI (Application Data Integrity)
sparc64: Add support for ADI register fields, ASIs and traps
mm: Add functions to support extra actions on swap in/out
signals, sparc: Add signal codes for ADI violations
sparc64: shut down to OBP correctly
sparc64: fix for user probes in high memory
sparc64: Use online cpus instead of present cpus during hotplug.
sparc64: Update cpumaps correctly during hotplug.
sparc: fix intermittent LDom hang waiting for vdc_port_up
arch/sparc: Add a dedicated clear_page and clear_user_page for M7
sparc64: perf: Enable dynamic tracepoints when using perf probe
SPARC64: UEK4 LDOMS DOMAIN SERVICES UPDATE 7
arch/sparc: Fix indexing msi_msiqid_table and msi_irq_table
arch/sparc: Clear msi_msiqid_table during teardown
...
Chuck Anderson [Thu, 9 Mar 2017 04:16:08 +0000 (20:16 -0800)]
Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/drivers: (200 commits)
scsi: megaraid-sas: request irqs later
scsi: megaraid_sas: add in missing white spaces in error messages text
scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
scsi: megaraid_sas: driver version upgrade
scsi: megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map
scsi: megaraid_sas: Send SYNCHRONIZE_CACHE for VD to firmware
scsi: megaraid_sas: Do not fire DCMDs during PCI shutdown/detach
scsi: megaraid_sas: Send correct PhysArm to FW for R1 VD downgrade
scsi: megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset
scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
scsi: megaraid_sas: clean function declarations in megaraid_sas_base.c up
scsi: megaraid_sas: add in missing white space in error message text
scsi: megaraid_sas: Fix the search of first memory bar
scsi: megaraid_sas: Use memdup_user() rather than duplicating its implementation
megaraid_sas: Fix probing cards without io port
megaraid_sas: Do not fire MR_DCMD_PD_LIST_QUERY to controllers which do not support it
megaraid_sas: Downgrade two success messages to info
megaraid_sas: driver version upgrade
megaraid_sas: task management code optimizations
megaraid_sas: call ISR function to clean up pending replies in OCR path
...
Chuck Anderson [Thu, 9 Mar 2017 04:00:03 +0000 (20:00 -0800)]
Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* topic/uek-4.1/upstream-cherry-picks: (280 commits)
dm btree: fix bufio buffer leaks in dm_btree_del() error path
ipv4: keep skb->dst around in presence of IP options
ip6_gre: fix ip6gre_err() invalid reads
watchdog: hpwdt: changed maintainer information
watchdog: hpwdt: add support for iLO5
watchdog: hpwdt: remove email address from doc
watchdog: hpwdt: Adjust documentation to match latest kernel module parameters.
hpwdt: use nmi_panic() when kernel panics in NMI handler
panic: change nmi_panic from macro to function
watchdog/hpwdt: Fix build on certain configs
watchdog/hpwdt: Create stack frame in asminline_call()
x86/asm: Add C versions of frame pointer macros
x86/asm: Clean up frame pointer macros
watchdog: hpwdt: HP rebranding
panic, x86: Allow CPUs to save registers even if looping in NMI context
watchdog: hpwdt: Add support for WDIOC_SETOPTIONS
kvm: fix page struct leak in handle_vmon
bnx2: use READ_ONCE() instead of barrier()
bnx2: Wait for in-flight DMA to complete at probe stage
bnx2: fix locking when netconsole is used
...
Thomas Tai [Mon, 13 Feb 2017 14:50:05 +0000 (06:50 -0800)]
sparc: fix kernel panic caused by vio handshake
During hours long reboot test, the primary prints out multiple TX trigger
errors followed by a VIO handshake panic. The TX trigger error happens
because the primary ldmvsw detects that the ldc channel is down. In this
situation, the ldc operation is aborted, the tx and rx queue are then
flushed. The problem is that the rx queue may contain a LDC_EVENT_RESET
sent by the guest. It causes the primary to think that the ldc channel
is not in reset state. When the guest comes up again, the handshake is
out of sequence and thus causes handshake panic.
The TX trigger error would not have happened if the LDC_EVENT_RESET was
received before the TX checked the ldc link state. This is the reason
why the panic happens intermittently.
This patch checks for the connection reset and changes the ldc state to
reset. The reset logic is taken from existing vnet_event_napi() ldc_ctrl:
code path.
Khalid Aziz [Fri, 2 Dec 2016 19:45:37 +0000 (12:45 -0700)]
sparc64: Add sensible read values for /proc/<pid>/sparc_adi
This patch makes value read from /proc/<pid>/sparc_adi consistent
across platforms that support ADi and ones that do not. When ADI is
not available for a process either due to process being an anonymous
process on an ADI-capable platform or the process is running on a
non-ADI platform, a read from /proc/<pid>/sparc_adi always reads a
value of -1. This patch updates the documentation file as well with
the values for sparc_adi proc file.
Eric Snowberg [Thu, 17 Nov 2016 21:27:36 +0000 (13:27 -0800)]
sparc64: Add ability to set the mcde state for a process
turn off version checking (PSTATE.mcde) to avoid tripping over ADI
versions in flux. This has been partially remedied by using non-faulting
loads.
However, there is still a need to turn off PSTATE.mcde in memory dump
functions. This is to determine if an address is readable. If the
address is unreadable, the dump shows the memory contents as "********"
instead of a 4-byte hex value.
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Chuck Anderson [Sat, 18 Feb 2017 06:15:35 +0000 (22:15 -0800)]
sparc64: add mcd_on_by_default
Add the global variable mcd_on_by_default and support for the kernel boot arg
"mcd_on_by_default" which causes mcd_on_by_default = 1 if the kernel is
adi_capable().
Based on the code in commit:
sparc64: Enable Application Data Integrity for m7 and newer processors
Required by commit:
sparc64: Add proc files specific to ADI
Orabug: 22713162 Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Khalid Aziz [Wed, 15 Feb 2017 19:57:59 +0000 (12:57 -0700)]
sparc64: Add support for ADI (Application Data Integrity)
ADI is a new feature supported on SPARC M7 and newer processors to allow
hardware to catch rogue accesses to memory. ADI is supported for data
fetches only and not instruction fetches. An app can enable ADI on its
data pages, set version tags on them and use versioned addresses to
access the data pages. Upper bits of the address contain the version
tag. On M7 processors, upper four bits (bits 63-60) contain the version
tag. If a rogue app attempts to access ADI enabled data pages, its
access is blocked and processor generates an exception. Please see
Documentation/sparc/adi.txt for further details.
This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
MCD (Memory Corruption Detection) on selected memory ranges, enable
TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
version tags on page swap out/in or migration. It also adds handlers for
traps related to MCD. ADI is not enabled by default for any task. A task
must explicitly enable ADI on a memory range and set version tag for ADI
to be effective for the task.
This initial implementation supports saving and restoring one tag per
page. A page must use same version tag across the entire page for the
tag to survive swap and migration. Swap swupport infrastructure in this
patch allows for this capability to be expanded to store/restore more
than one tag per page in future.
This is a backport of patch sent upstream and brings UEK code closer to
upstream patch v6.
Khalid Aziz [Wed, 18 Jan 2017 17:59:26 +0000 (10:59 -0700)]
sparc64: Add support for ADI register fields, ASIs and traps
SPARC M7 processor adds new control register fields, ASIs and a new
trap to support the ADI (Application Data Integrity) feature. This
patch adds definitions for these register fields, ASIs and a handler
for the new precise memory corruption detected trap.
This is a backport of patch sent upstream and brings UEK code in sync
with upstream patch v6.
Khalid Aziz [Wed, 18 Jan 2017 17:36:21 +0000 (10:36 -0700)]
mm: Add functions to support extra actions on swap in/out
If a processor supports special metadata for a page, for example ADI
version tags on SPARC M7, this metadata must be saved when the page is
swapped out. The same metadata must be restored when the page is swapped
back in. This patch adds two new architecture specific functions -
arch_do_swap_page() to be called when a page is swapped in,
arch_unmap_one() to be called when a page is being unmapped for swap
out.
This is a backport of patch sent upstream and brings UEK code in sync
with upstream patch v6.
Khalid Aziz [Thu, 5 Jan 2017 18:46:54 +0000 (11:46 -0700)]
signals, sparc: Add signal codes for ADI violations
SPARC M7 processor introduces a new feature - Application Data
Integrity (ADI). ADI allows MMU to catch rogue accesses to memory.
When a rogue access occurs, MMU blocks the access and raises an
exception. In response to the exception, kernel sends the offending
task a SIGSEGV with si_code that indicates the nature of exception.
This patch adds three new signal codes specific to ADI feature:
1. ADI is not enabled for the address and task attempted to access
memory using ADI
2. Task attempted to access memory using wrong ADI tag and caused
a deferred exception.
3. Task attempted to access memory using wrong ADI Ttag and caused
a precise exception.
This is a backport of patch sent upstream and brings UEK code closer to
upstream patch v6.
The command "shutdown -h -H now" should shut the system down to the
OBP, however the machine was being powered off in the LDOM case.
In the LDOM case, the "reboot-command" variable must be set to
the string "noop" and then ldom_reboot() must be called.
This will make the OBP ignore the setting of "auto-boot?" after it
completes the reset. This causes the system to stop at the ok prompt.
Signed-off-by: Larry Bassel <larry.bassel@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
When returning from the user probe code into userspace process, PC & NPC are
truncated to 32 bits.
As a result of shared libraries get loaded very high in the virtual address
space of the process, placing a user probe inside a shared library makes the
kernel return into the process at the wrong address, causing it to seg'fault
most of the time.
This patch prevents truncating PC and NPC.
Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com> Reviewed-by: David Aldridge <david.j.aldridge@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Atish Patra [Mon, 23 Jan 2017 21:40:35 +0000 (14:40 -0700)]
sparc64: Use online cpus instead of present cpus during hotplug.
As per the hotplug documentation, online cpu maps should be
updated if cpu hotplug happens via sysfs. Thus, all other
cpu maps should be updated basd on the online cpus instead
of present cpus. The following example illustrates the issue
if cpu maps are updated based on present cpus.
[root@ca-sparc64 hackbench]# echo 1 > /sys/devices/system/cpu/cpu2/online
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/core_siblings_list
0-255
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
0-7
This is wrong because cpu0 is still offline.
After the fix:
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/core_siblings_list
1-255
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
1-7
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Chris Hyser <chris.hyser@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Atish Patra [Mon, 23 Jan 2017 21:39:24 +0000 (14:39 -0700)]
sparc64: Update cpumaps correctly during hotplug.
Currently,numa_cpu_mask is not updated when cpus are
hotplugged resulting incorrect number of cpus reported
by lscpu/numactl. Moreover, cpu_core_sib_cache_map is
also not cleared when cpu goes offline.
Update both the masks correctly whenever cpu goes online/
offline.
Thomas Tai [Tue, 17 Jan 2017 19:43:32 +0000 (11:43 -0800)]
sparc: fix intermittent LDom hang waiting for vdc_port_up
When an LDom boots, sunvdc probes the disk using the LDC channel.
If the channel was previously configured, we need to wait for
the channel state to change from UP to RESETTING so that the
seqid is properly reset in the primary. Otherwise the primary
will expect that the ldc packet contains a seqid other than 0.
Also disable ldc hypervisor interrupt before calling vio_port_up,
because interrupts can happen once ldc_bind is called. disabling the
interrupt ensures everything is configured before getting an interrupt
request.
Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Babu Moger [Wed, 18 Jan 2017 01:21:44 +0000 (17:21 -0800)]
arch/sparc: Add a dedicated clear_page and clear_user_page for M7
Adding a dedicated clear_page and clear_user_page for M7.
Avoids multiple checks which are really not required.
This eliminates about 30 instructions for each call.
Seen about 3 to 4 percent latency reduction in some cases.
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Couple of indexing fixes.
1. Fix indexing pbm->msi_msiqid_table. It is initialized
based off of pbm->msi_first(not pbm->msiq_first as previously done).
Here is how it is initialized(Look at in sparc64_setup_msi_irq)
pbm->msi_msiqid_table[msi - pbm->msi_first] = msiqid;
2. In set_related_affinity, we dont need to subtract msi_first as
the loop is indexed from 0 to size of the table.
Saves time when smp_flush_tlb_page/smp_flush_tlb_pending
is called during do_exit(...). Without this patch, killing
processes had performance bottle neck in these functions
due to unnecessary xcalls made to flush TLBs.
Reviewed-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: Bob Picco <bob.picco@oracle.com Signed-off-by: Henry Willard <henry.willard@oracle.com> Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Liam R. Howlett [Thu, 5 Jan 2017 20:58:41 +0000 (15:58 -0500)]
sparc64: Zero pages on allocation for mondo and error queues.
Error queues use a non-zero first word to detect if the queues are full.
Using pages that have not been zeroed may result in false positive
overflow events. These queues are set up once during boot so zeroing
all mondo and error queue pages is safe.
Note that this does not always occur because the page allocation for
these queues is so early in the boot cycle that higher number CPUs get
fresh pages. It is only when traps are serviced with lower number CPUs
who were given already used pages that this issue is exposed.
Liam R. Howlett [Thu, 22 Dec 2016 02:57:42 +0000 (21:57 -0500)]
sparc64: Don't panic on user mode non-resumable errors
Send a SIGBUS to the offending process on all userspace non-resumable
traps. This prevents userspace applications from creating a kernel
panic. The siginfo will return the code BUS_ADRERR and a valid address
if possible.
David S. Miller [Thu, 27 Oct 2016 16:04:54 +0000 (09:04 -0700)]
sparc64: Handle extremely large kernel TLB range flushes more gracefully.
When the vmalloc area gets fragmented, and because the firmware
mapping area sits between where modules live and the vmalloc area, we
can sometimes receive requests for enormous kernel TLB range flushes.
When this happens the cpu just spins flushing billions of pages and
this triggers the NMI watchdog and other problems.
We took care of this on the TSB side by doing a linear scan of the
table once we pass a certain threshold.
Do something similar for the TLB flush, however we are limited by
the TLB flush facilities provided by the different chip variants.
First of all we use an (mostly arbitrary) cut-off of 256K which is
about 32 pages. This can be tuned in the future.
The huge range code path for each chip works as follows:
1) On spitfire we flush all non-locked TLB entries using diagnostic
acceses.
2) On cheetah we use the "flush all" TLB flush.
3) On sun4v/hypervisor we do a TLB context flush on context 0, which
unlike previous chips does not remove "permanent" or locked
entries.
We could probably do something better on spitfire, such as limiting
the flush to kernel TLB entries or even doing range comparisons.
However that probably isn't worth it since those chips are old and
the TLB only had 64 entries.
Reported-by: James Clarke <jrtc27@jrtc27.com> Tested-by: James Clarke <jrtc27@jrtc27.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a74ad5e660a9ee1d071665e7e8ad822784a2dc7f) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a236441bb69723032db94128761a469030c3fe6d) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 830cda3f9855ff092b0e9610346d110846fc497c) Signed-off-by: Allen Pais <allen.pais@oracle.com>
David S. Miller [Wed, 26 Oct 2016 02:43:17 +0000 (19:43 -0700)]
sparc64: Handle extremely large kernel TSB range flushes sanely.
If the number of pages we are flushing is more than twice the number
of entries in the TSB, just scan the TSB table for matches rather
than probing each and every page in the range.
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 849c498766060a16aad5b0e0d03206726e7d2fa4) Signed-off-by: Allen Pais <allen.pais@oracle.com>
David S. Miller [Tue, 25 Oct 2016 23:23:26 +0000 (16:23 -0700)]
sparc64: Fix illegal relative branches in hypervisor patched TLB code.
When we copy code over to patch another piece of code, we can only use
PC-relative branches that target code within that piece of code.
Such PC-relative branches cannot be made to external symbols because
the patch moves the location of the code and thus modifies the
relative address of external symbols.
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b429ae4d5b565a71dfffd759dfcd4f6c093ced94) Signed-off-by: Allen Pais <allen.pais@oracle.com>
2. CPU DR related problems including 'length too big' errors and hangs. With
these new fixes, >256 vcpus can be successfully added/removed from a guest
domain. As part of this fix, a new scheme for reusing event data memory
buffers was implemented.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com> Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 23171935, 24848179 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Commit 093df73771ba ("scsi: qla2xxx: Fix Target mode handling with
Multiqueue changes.") introduces two bodies of code that look similar
but with s/req/rsp/ in the second instance. But in one case, it looks
like this conversion was missed.
Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Reviewed-by: Laurence Oberman <loberman@redhat.com> Acked-by: Quinn Tran <Quinn.Tran@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
- Fix race condition between dpc_thread accessing Multiqueue resources
and qla2x00_remove_one thread trying to free resource.
- Fix out of order free for Multiqueue resources. Also, Multiqueue
interrupts needs a workqueue. Interrupt needed to stop before
the wq can be destroyed.
Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Tell the SCSI layer how many hardware queues we have based on the number
of max queue pairs created. The number of max queue pairs created will
depend on number of MSI-X vector count.
This feature can be turned on via CONFIG_SCSI_MQ_DEFAULT or passing
scsi_mod.use_blk_mq=Y as a parameter to the kernel
Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com> Signed-off-by: Michael Hernandez <michael.hernandez@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Replaced existing multiple queue functionality with framework
that allows for the creation of pairs of request and response queues,
either at start of day or dynamically.
Queue pair creation depend on module parameter "ql2xmqsupport",
which need to be enabled to create queue pair.
Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com> Signed-off-by: Michael Hernandez <michael.hernandez@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
A system can get hung task timeouts if a qlogic board fails during
initialization (if the board breaks again or fails the init). The hang
involves the scsi scan.
In a nutshell, since commit beb9e315e6e0 ("qla2xxx: Prevent removal and
board_disable race"):
...it is possible to have freed ha (base_vha->hw) early by a call to
qla2x00_remove_one when pdev->enable_cnt equals zero:
if (!atomic_read(&pdev->enable_cnt)) {
scsi_host_put(base_vha->host);
kfree(ha);
pci_set_drvdata(pdev, NULL);
return;
Almost always, the scsi_host_put above frees the vha structure
(attached to the end of the Scsi_Host we're putting) since it's the last
put, and life is good. However, if we are entering this routine because
the adapter has broken sometime during initialization AND a scsi scan is
already in progress (and has done its own scsi_host_get), vha will not
be freed. What's worse, the scsi scan will access the freed ha structure
through qla2xxx_scan_finished:
if (time > vha->hw->loop_reset_delay * HZ)
return 1;
The scsi scan keeps checking to see if a scan is complete by calling
qla2xxx_scan_finished. There is a timeout value that limits the length
of time a scan can take (hw->loop_reset_delay, usually set to 5
seconds), but this definition is in the data structure (hw) that can get
freed early.
This can yield unpredictable results, the worst of which is that the
scsi scan can hang indefinitely. This happens when the freed structure
gets reused and loop_reset_delay gets overwritten with garbage, which
the scan obliviously uses as its timeout value.
The fix for this is simple: at the top of qla2xxx_scan_finished, check
for the UNLOADING bit in the vha structure (_vha is not freed at this
point). If UNLOADING is set, we exit the scan for this adapter
immediately. After this last reference to the ha structure, we'll exit
the scan for this adapter, and continue on.
This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against a
RHEL7 driver variant, but the bug and patch are equally relevant to to
the upstream driver.
Fixes: beb9e315e6e0 ("qla2xxx: Prevent removal and board_disable race") Cc: <stable@vger.kernel.org> # v3.18+ Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com> Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Kris Van Hees [Wed, 1 Mar 2017 04:37:11 +0000 (23:37 -0500)]
dtrace: comtinuing the FBT implementation and fixes
This commit continues the implementation of Function Boundary Tracing
(FBT) and fixes various problems with the original implementation and
other things in DTrace that it caused to break. It is done as a single
commit due to the intertwined nature of the code it touches.
1. We were only handling unaligned memory access traps as part of the
NOFAULT access protection. This commit adds handling data and
instruction access trap handling.
2. When an OOPS takes place, we now add output about whether we are
in DTrace probe context and what the last probe was that was being
processed (if any). That last data item isn't guaranteed to always
have a valid value. But it is helpful.
3. New ustack stack walker implementation (moved from module to kernel
for consistency and because we need access to low level structures
like the page tables) for both x86 and sparc. The new code avoids
any locking or sleeping. The new user stack walker is accessed as
as sub-function of dtrace_stacktrace(), selected using the flags
field of stacktrace_state_t.
4. We added a new field to the dtrace_psinfo_t structure (ustack) to
hold the bottom address of the stack. This is needed in the stack
walker (specifically for x86) to know when we have reached the end
of the stack. It is initialized from copy_process (in DTrace
specific code) when stack_start is passed as parameter to clone.
It is also set from dtrace_psinfo_alloc() (which is generally called
from performing an exec), and there it gets its value from the
mm->start_stack value.
5. The FBT black lists have been updated with functions that may be
invoked during probe processing. In addition, for x86_64 we added
explicit filter out of functions that start with insn_* or inat_*
because they are used for instruction analysis during probe
processing.
6. On sparc64, per-cpu data gets access by means of a global register
that holds the base address for this memory area. Some assembler
code clobbers that register in some cases, so it is not safe to
depend on this in probe context. Instead, we explicitly access
the data based on the smp_processor_id().
7. We added a new CPU DTTrace flag (CPU_DTRACE_PROBE_CTX) to flag that
we are processing in DTrace probe context. It is primarily used
to detect attempts of re-entry into dtrace_probe().
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Orabug: 21220305
Orabug: 24829326
Kris Van Hees [Mon, 27 Feb 2017 15:39:07 +0000 (10:39 -0500)]
dtrace: ensure DTrace can use get_user_pages safely
The processing of the DTrace-specific FOLL_IMMED flag was not robust
enough. We could still get into a situation where cond_resched() was
called (which is bad) or where the VMA area would get extended (which
is also bad). The only code that passes this flag is DTrace support
code, and when the flag is not passed, the execution flow is not at all
affected.
Orabug: 25640153 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com> Reviewed-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Kris Van Hees [Fri, 24 Feb 2017 23:40:40 +0000 (18:40 -0500)]
dtrace: enable paranoid mode and IST shift for xen_int3
The Xen PVM path into an INT3 trap was not using paranoid=1 mode nor was
it using an IST shift as is done for HW INT3 traps. This interferes with
the instruction emulation code check based on the handler return value.
Orabug: 25580519 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Peter Zijlstra [Tue, 28 Feb 2017 17:18:01 +0000 (09:18 -0800)]
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.
... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.
That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.
So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).
Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.
Reported-by: Di Shen (Keen Lab) Tested-by: John Dias <joaodias@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Min Chong <mchong@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 321027c1fe77f892f4ea07846aeae08cefbbb290)
Duplicate perf events are handled by setting appropriate return value and redirecting
the flow to 'err_locked' goto label followed by 'err_context' label. In UEK4, 'err_locked'
goto label is not available. Hence, the operations under this label are performed before
redirecting the flow to 'err_context' label.
The conversion is generally straightforward. We convert filesystem from
a global cache to per-fs one. Similarly to ext4 the tricky part is that
xattr block corresponding to found mbcache entry can get freed before we
get buffer lock for that block. So we have to check whether the entry is
still valid after getting the buffer lock.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit be0726d33cb8f411945884664924bed3cb8c70ee) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The conversion is generally straightforward. The only tricky part is
that xattr block corresponding to found mbcache entry can get freed
before we get buffer lock for that block. So we have to check whether
the entry is still valid after getting buffer lock.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 82939d7999dfc1f1998c4b1c12e2f19edbdff272) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>