www.infradead.org Git - users/jedix/linux-maple.git/log

SPARC64: LDOM vnet "Got unexpected MCAST reply"

Handle unexpected MCAST reply as a debug warning the same as is done in
Solaris 12. Please see bug 24954702 for details.

Signed-off-by: George Kennedy <george.kennedy@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Orabug: 24954702
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ldmvsw: disable tso and gso for bridge operations

The ldmvsw driver is specifically for supporting the ldom virtual
networking by running in the primary ldom and using the LDC to connect
the remaining ldoms to the outside world via a bridge. With TSO and GSO
supported while connected the bridge, things tend to misbehave as seen
in our case by delayed packets, enough to begin triggering retransmits
and affecting overall throughput. By turning off advertised support for
TSO and GSO we restore stable traffic flow through the bridge.

Orabug: 23293104

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc221a34ac473b444a7cfdd0c152b4c71f79326b)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ldmvsw: update and simplify version string

New version and simplify the print code.

Orabug: 23293104

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7602011f59cc32ebc3a5f9058d6ba11b096c8c50)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: remove extra rcu_read_unlocks

The RCU read lock is grabbed first thing in sunvnet_start_xmit_common()
so it always needs to be released. This removes the conditional release
in the dropped packet error path and removes a couple of superfluous
calls in the middle of the code.

Orabug: 23293104

Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit daa86e50f649fccadafc53994ddc4254d75a008b)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: straighten up message event handling logic

The use of gotos for handling the incoming events made this code
harder to read and support than it should be. This patch straightens
out and clears up the logic.

Orabug: 23293104

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bf091f3f362b3c562a18bbf7a2d3e2f3a36eba1d)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: add memory barrier before check for tx enable

In order to allow the underlying LDC and outstanding memory operations
to potentially catch up with the driver's Tx requests, add a memory
barrier before checking again for available tx descriptors.

Orabug: 23293104

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fd263fb6e718c5bdf35cbc1de4f781c71794d2a4)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: update version and version printing

There have been several changes since the first version of this code, so
we bump the version number. While we're at it, we can simplify the
version printing a bit and drop a couple lines of code.

Orabug: 23293104

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f2f3e210bffe5c8f8b30d0b0c7b0f733ff5db334)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: remove unused variable in maybe_tx_wakeup

The vio_dring_state *dr variable is unused in maybe_tx_wakeup().
As the comments indicate, we call maybe_tx_wakeup() whenever we
get a STOPPED LDC message on the port. If the queue is stopped,
we want to wake it up so that we will send another START message
at the next TX and trigger the consumer to drain the dring.

Orabug: 23293104

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d4aa89cc2bbe021722c946eb11b21ebb0f13c825)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: make sunvnet common code dynamically loadable

When the sunvnet_common code was split out for use by both sunvnet
and the newer ldmvsw, it was made into a static kernel library, which
limits the usefulness of sunvnet and ldmvsw as loadables, since most
of the real work is being done in the shared code. Also, this is
simply dead code in kernels that aren't running the LDoms.

This patch makes the sunvnet_common into a dynamically loadable
module and makes sunvnet and ldmvsw dependent on sunvnet_common.

Orabug: 23293104

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2493b842f258e14938f278e44ecc26970dfabbf0)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

hwrng: n2 - update version info

Orabug: 25127795

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0ff1436fb2e3da085f7177d03ce4362c45b75d57)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

hwrng: n2 - support new hardware register layout

Add the new register layout constants and the requisite logic
for using them.

Orabug: 25127795

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 07e25d43be8502bd8ab6122c4f6449ebf30e98f7)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

hwrng: n2 - add device data descriptions

Since we're going to need to keep track of more than just one
attribute of the hardware, we'll change the use of the data field
from the match struct from a single flag to a struct pointer.
This patch adds the struct template and initial descriptions.

Orabug: 25127795

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit becbc4940ad8e8ff560e1ceee33d9bb4fe4c9225)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

hwrng: n2 - limit error spewage when self-test fails

If the self-test fails, it probably won't actually suddenly
start working. Currently, this causes an endless spew of
error messages on the console and in the logs, so this patch
adds a limiter to the test.

Orabug: 25127795

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit db602a7f940a71870c17e39bcbe4e4d7a4a8273e)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

hwrng: n2 - Attach on T5/M5, T7/M7 SPARC CPUs

n2rng: Attach on T5/M5, T7/M7 SPARC CPUs

(space to tab fixes after variable names)

Orabug: 25127795

Signed-off-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit c1e9b3b0eea12899b7749571af21cc60822cf2b6)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

tcp: fix tcp_fastopen unaligned access complaints on sparc

Fix up a data alignment issue on sparc by swapping the order
of the cookie byte array field with the length field in
struct tcp_fastopen_cookie, and making it a proper union
to clean up the typecasting.

This addresses log complaints like these:
    log_unaligned: 113 callbacks suppressed
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
    Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
    Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
    Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360

Orabug: 25163405

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 003c941057eaa868ca6fedd29a274c863167230d)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

vds: Add physical block support

Version 1.2 of the virtual IO device protocol added physical block
support. Start sending the underlaying physical block device size.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Orabug: 19420123
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Add missing hardware capabilities for M7

Some M7 hardware capabilities were not being reported
correctly. This commit fixes the issue by adding definitions
for all the missing capabilities from both the Machine
Descriptor and the Compatibility Feature Register.

Orabug: 25555746

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: Fix vds_vtoc_set_default debug with large disks

Fix vds_vtoc_set_default debug, which breaks with large capacity drives (i.e. 1.6TB).

Signed-off-by: George Kennedy <george.kennedy@oracle.com>
Reviewed-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Orabug: 25423802
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: VDC threads in guest domain do not resume after primary domain reboot

Prevents VDC threads from hanging while waiting for primary
domain to come back up. Ensures that all waiting VDC threads
are woken up when primary domain comes back up.

Orabug: 25519961

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvdc: Add support for setting physical sector size

Physical sector size is supported in v1.2 of the vDisk protocol and
should be set if available. If protocol version 1.2 is used and the
physical disk size is unavailable, then the disk is considered busy.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(Cherry-pick of upstream f41e54616ca1a199f6c17228f26082ccdaaab3de)

Orabug: 19420123
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: create/destroy cpu sysfs dynamically

Currently, cpu/cpuX represents maximum number of possible
cpus in a domain. Those cpu sysfs directories also does
not change as we add/remove cpus via ldom manager.

Update sysfs so that it represents number of present cpus
in the domain. As a result, cpu sysfs is also updated
dynamically upon cpu add/removal.

Orabug: 21775890
Orabug: 25216469

Before the fix:
[root@ca-sparc76 ~]# ldm list
NAME             STATE      FLAGS   CONS VCPU  MEMORY UTIL  NORM  UPTIME
primary          active     -n-cv-  UART 32    32G    0.2%  0.2%  11m

[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
512
[root@ca-sparc76 ~]# ldm set-vcpu 64 primary
[root@ca-sparc76 ~]# ldm list
NAME             STATE      FLAGS   CONS VCPU  MEMORY UTIL  NORM  UPTIME
primary          active     -n-cv-  UART 64    32G    0.0%  0.0%  12m
[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
512
-------------------------------------------------------------------------
After the fix:
[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
32
[root@ca-sparc76 ~]# ldm set-vcpu 64 primary
[root@ca-sparc76 ~]# ldm list
NAME             STATE      FLAGS   CONS  VCPU  MEMORY UTIL  NORM  UPTIME
primary          active     -n-cv-  UART  64    32G    0.0%  0.0%  12m
[root@ca-sparc76 ~]# getconf _NPROCESSORS_CONF
64

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Do not retain old VM_SPARC_ADI flag when protection changes on page

When protection on a memory page is changed with mprotect(), old
arch-specific VM flags on the page are retained. This patch clears
old VM_SPARC_ADI flag when protection is changed since mprotect() is
potentially being invoked to disable ADI on the page. This code will
add VM_SPARC_ADI flag back if the new protection includes it.

Orabug: 25641371

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: VIO: Support for virtual-device MD node probing

This update adds support to the mdesc/vio infrastructure to
probe for "virtual-device" nodes in the MD. The vio
module will create sysfs device files for these nodes which
can be accessed by user space code (such as udev). In addition,
VIO drivers can now probe for these MD nodes if the need arises.

This functionality will serve as part of the fix for
BUG 24841906.

Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Reviewed-By: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Orabug: 24841906

sparc: fix kernel panic caused by vio handshake

During hours long reboot test, the primary prints out multiple TX trigger
errors followed by a VIO handshake panic. The TX trigger error happens
because the primary ldmvsw detects that the ldc channel is down. In this
situation, the ldc operation is aborted, the tx and rx queue are then
flushed. The problem is that the rx queue may contain a LDC_EVENT_RESET
sent by the guest. It causes the primary to think that the ldc channel
is not in reset state. When the guest comes up again, the handshake is
out of sequence and thus causes handshake panic.

The TX trigger error would not have happened if the LDC_EVENT_RESET was
received before the TX checked the ldc link state. This is the reason
why the panic happens intermittently.

This patch checks for the connection reset and changes the ldc state to
reset. The reset logic is taken from existing vnet_event_napi() ldc_ctrl:
code path.

Orabug: 23476613
Orabug: 25064864

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

sparc64: Add sensible read values for /proc/<pid>/sparc_adi

This patch makes value read from /proc/<pid>/sparc_adi consistent
across platforms that support ADi and ones that do not. When ADI is
not available for a process either due to process being an anonymous
process on an ADI-capable platform or the process is running on a
non-ADI platform, a read from /proc/<pid>/sparc_adi always reads a
value of -1. This patch updates the documentation file as well with
the values for sparc_adi proc file.

Orabug: 25173120

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Add ability to set the mcde state for a process

turn off version checking (PSTATE.mcde) to avoid tripping over ADI
versions in flux. This has been partially remedied by using non-faulting
loads.

However, there is still a need to turn off PSTATE.mcde in memory dump
functions. This is to determine if an address is readable. If the
address is unreadable, the dump shows the memory contents as "********"
instead of a 4-byte hex value.

Orabug: 25130002

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Add proc files specific to ADI

This patch adds /proc/sys/kernel/mcd_on_by_default and
/proc/<pid>/sparc_adi files. These files allow userspace access to
change ADI parameters.

Orabug: 22713162

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: add mcd_on_by_default

Add the global variable mcd_on_by_default and support for the kernel boot arg
"mcd_on_by_default" which causes mcd_on_by_default = 1 if the kernel is
adi_capable().

Based on the code in commit:
sparc64: Enable Application Data Integrity for m7 and newer processors
Required by commit:
sparc64: Add proc files specific to ADI

Orabug: 22713162
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Revert "sparc: fix intermittent LDom hang waiting for vdc_port_up"

This reverts commit 94ac2958dd26064af74f49a966e3b7e3bd4dccfe.

Orabug: 25409637

sparc64: Add support for ADI (Application Data Integrity)

ADI is a new feature supported on SPARC M7 and newer processors to allow
hardware to catch rogue accesses to memory. ADI is supported for data
fetches only and not instruction fetches. An app can enable ADI on its
data pages, set version tags on them and use versioned addresses to
access the data pages. Upper bits of the address contain the version
tag. On M7 processors, upper four bits (bits 63-60) contain the version
tag. If a rogue app attempts to access ADI enabled data pages, its
access is blocked and processor generates an exception. Please see
Documentation/sparc/adi.txt for further details.

This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
MCD (Memory Corruption Detection) on selected memory ranges, enable
TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
version tags on page swap out/in or migration. It also adds handlers for
traps related to MCD. ADI is not enabled by default for any task. A task
must explicitly enable ADI on a memory range and set version tag for ADI
to be effective for the task.

This initial implementation supports saving and restoring one tag per
page. A page must use same version tag across the entire page for the
tag to survive swap and migration. Swap swupport infrastructure in this
patch allows for this capability to be expanded to store/restore more
than one tag per page in future.

This is a backport of patch sent upstream and brings UEK code closer to
upstream patch v6.

Orabug: 22713162

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>

sparc64: Add support for ADI register fields, ASIs and traps

SPARC M7 processor adds new control register fields, ASIs and a new
trap to support the ADI (Application Data Integrity) feature. This
patch adds definitions for these register fields, ASIs and a handler
for the new precise memory corruption detected trap.

This is a backport of patch sent upstream and brings UEK code in sync
with upstream patch v6.

Orabug: 22713162

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>

mm: Add functions to support extra actions on swap in/out

If a processor supports special metadata for a page, for example ADI
version tags on SPARC M7, this metadata must be saved when the page is
swapped out. The same metadata must be restored when the page is swapped
back in. This patch adds two new architecture specific functions -
arch_do_swap_page() to be called when a page is swapped in,
arch_unmap_one() to be called when a page is being unmapped for swap
out.

This is a backport of patch sent upstream and brings UEK code in sync
with upstream patch v6.

Orabug: 22713162

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>

signals, sparc: Add signal codes for ADI violations

SPARC M7 processor introduces a new feature - Application Data
Integrity (ADI). ADI allows MMU to  catch rogue accesses to memory.
When a rogue access occurs, MMU blocks the access and raises an
exception. In response to the exception, kernel sends the offending
task a SIGSEGV with si_code that indicates the nature of exception.
This patch adds three new signal codes specific to ADI feature:

1. ADI is not enabled for the address and task attempted to access
   memory using ADI
2. Task attempted to access memory using wrong ADI tag and caused
   a deferred exception.
3. Task attempted to access memory using wrong ADI Ttag and caused
   a precise exception.

This is a backport of patch sent upstream and brings UEK code closer to
upstream patch v6.

Orabug: 22713162

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>

sparc64: shut down to OBP correctly

Orabug: 23467092

The command "shutdown -h -H now" should shut the system down to the
OBP, however the machine was being powered off in the LDOM case.

In the LDOM case, the "reboot-command" variable must be set to
the string "noop" and then ldom_reboot() must be called.
This will make the OBP ignore the setting of "auto-boot?" after it
completes the reset. This causes the system to stop at the ok prompt.

Signed-off-by: Larry Bassel <larry.bassel@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: fix for user probes in high memory

Orabug 25428066

When returning from the user probe code into userspace process, PC & NPC are
truncated to 32 bits.

As a result of shared libraries get loaded very high in the virtual address
space of the process, placing a user probe inside a shared library makes the
kernel return into the process at the wrong address, causing it to seg'fault
most of the time.

This patch prevents truncating PC and NPC.

Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Reviewed-by: David Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Use online cpus instead of present cpus during hotplug.

As per the hotplug documentation, online cpu maps should be
updated if cpu hotplug happens via sysfs. Thus, all other
cpu maps should be updated basd on the online cpus instead
of present cpus. The following example illustrates the issue
if cpu maps are updated based on present cpus.

Before the fix on a T7-2:

[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0-7
[root@ca-sparc64 hackbench]# echo 0 > /sys/devices/system/cpu/cpu0/online
[root@ca-sparc64 hackbench]# echo 0 > /sys/devices/system/cpu/cpu2/online
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
1,3-7
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/core_siblings_list
1,3-255

[root@ca-sparc64 hackbench]# echo 1 > /sys/devices/system/cpu/cpu2/online
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/core_siblings_list
0-255
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
0-7
This is wrong because cpu0 is still offline.

After the fix:
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/core_siblings_list
1-255
[root@ca-sparc64 hackbench]#
cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
1-7

Orabug: 25472256

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Update cpumaps correctly during hotplug.

Currently,numa_cpu_mask is not updated when cpus are
hotplugged resulting incorrect number of cpus reported
by lscpu/numactl. Moreover, cpu_core_sib_cache_map is
also not cleared when cpu goes offline.

Update both the masks correctly whenever cpu goes online/
offline.

Orabug: 25144324

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: fix intermittent LDom hang waiting for vdc_port_up

When an LDom boots, sunvdc probes the disk using the LDC channel.
If the channel was previously configured, we need to wait for
the channel state to change from UP to RESETTING so that the
seqid is properly reset in the primary. Otherwise the primary
will expect that the ldc packet contains a seqid other than 0.

Also disable ldc hypervisor interrupt before calling vio_port_up,
because interrupts can happen once ldc_bind is called. disabling the
interrupt ensures everything is configured before getting an interrupt
request.

orabug: 25409637

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Add a dedicated clear_page and clear_user_page for M7

Adding a dedicated clear_page and clear_user_page for M7.
Avoids multiple checks which are really not required.
This eliminates about 30 instructions for each call.
Seen about 3 to 4 percent latency reduction in some cases.

Orabug: 25456049

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: perf: Enable dynamic tracepoints when using perf probe

This commit enables the use of dynamic tracepoints (kprobes) when
using the perf probe command.

Orabug: 24925615

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: UEK4 LDOMS DOMAIN SERVICES UPDATE 7

This update fixes the following issues for LDom domain services on UEK4:

1. Kernel watchdog panic when unbinding guest domains. This panic was
due to the ds driver accessing a freed data structure out of ds_remove().

2. "no service registered for UNREG_REQ handle" error messages on the console
when ldmd is restarted.

Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Reviewed-By: Bijan Mottahedeh <Bijan.Mottahedeh@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 25408406, 25366664
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Fix indexing msi_msiqid_table and msi_irq_table

Orabug: 25391918

Couple of indexing fixes.
1. Fix indexing pbm->msi_msiqid_table. It is initialized
   based off of pbm->msi_first(not pbm->msiq_first as previously done).
   Here is how it is initialized(Look at in sparc64_setup_msi_irq)
   pbm->msi_msiqid_table[msi - pbm->msi_first] = msiqid;

2. In set_related_affinity, we dont need to subtract msi_first as
   the loop is indexed from 0 to size of the table.

(cherry picked from uek2 commit 57d31847c9f2011314de8ea98c06616f91c5dbb8)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Tested-by: Dmitry Klochkov <dmitry.klochkov@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Clear msi_msiqid_table during teardown

Orabug: 25391918

teardown_msi_irq needs to clear msi_msiqid_table in PBM.

(cherry picked from uek2 commit 77264d74588ae4c59682c561707471a4accfed2a)

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Tested-by: Dmitry Klochkov <dmitry.klochkov@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Skip flushing TLBs if there are no mm_users

Orabug: 25379970

Saves time when smp_flush_tlb_page/smp_flush_tlb_pending
is called during do_exit(...). Without this patch, killing
processes had performance bottle neck in these functions
due to unnecessary xcalls made to flush TLBs.

Reviewed-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Bob Picco <bob.picco@oracle.com
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64:This fixes the numa_node attributes displayed in sysfs.

Orabug: 22748961

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Zero pages on allocation for mondo and error queues.

Error queues use a non-zero first word to detect if the queues are full.
Using pages that have not been zeroed may result in false positive
overflow events. These queues are set up once during boot so zeroing
all mondo and error queue pages is safe.

Note that this does not always occur because the page allocation for
these queues is so early in the boot cycle that higher number CPUs get
fresh pages. It is only when traps are serviced with lower number CPUs
who were given already used pages that this issue is exposed.

orabug: 23054018

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Don't panic on user mode non-resumable errors

Send a SIGBUS to the offending process on all userspace non-resumable
traps. This prevents userspace applications from creating a kernel
panic. The siginfo will return the code BUS_ADRERR and a valid address
if possible.

orabug: 23054018

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: affine strand irq stacks

    Like the subject says let us NUMA affine the per strand softirq and
    hardirq stacks.

    This has been boot tested on T7-4 and T4-1.

    Ported to UEK4

Orabug: 23050718

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Handle extremely large kernel TLB range flushes more gracefully.

When the vmalloc area gets fragmented, and because the firmware
mapping area sits between where modules live and the vmalloc area, we
can sometimes receive requests for enormous kernel TLB range flushes.

When this happens the cpu just spins flushing billions of pages and
this triggers the NMI watchdog and other problems.

We took care of this on the TSB side by doing a linear scan of the
table once we pass a certain threshold.

Do something similar for the TLB flush, however we are limited by
the TLB flush facilities provided by the different chip variants.

First of all we use an (mostly arbitrary) cut-off of 256K which is
about 32 pages.  This can be tuned in the future.

The huge range code path for each chip works as follows:

1) On spitfire we flush all non-locked TLB entries using diagnostic
   acceses.

2) On cheetah we use the "flush all" TLB flush.

3) On sun4v/hypervisor we do a TLB context flush on context 0, which
   unlike previous chips does not remove "permanent" or locked
   entries.

We could probably do something better on spitfire, such as limiting
the flush to kernel TLB entries or even doing range comparisons.
However that probably isn't worth it since those chips are old and
the TLB only had 64 entries.

Orabug: 25499527

Reported-by: James Clarke <jrtc27@jrtc27.com>
Tested-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a74ad5e660a9ee1d071665e7e8ad822784a2dc7f)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code.

Just like the non-cross-call TLB flush handlers, the cross-call ones need
to avoid doing PC-relative branches outside of their code blocks.

Orabug: 25499527

Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a236441bb69723032db94128761a469030c3fe6d)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending.

Noticed by James Clarke.

Orabug: 25499527

Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 830cda3f9855ff092b0e9610346d110846fc497c)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Handle extremely large kernel TSB range flushes sanely.

If the number of pages we are flushing is more than twice the number
of entries in the TSB, just scan the TSB table for matches rather
than probing each and every page in the range.

Based upon a patch and report by James Clarke.

Orabug: 25499527

Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 849c498766060a16aad5b0e0d03206726e7d2fa4)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix illegal relative branches in hypervisor patched TLB code.

When we copy code over to patch another piece of code, we can only use
PC-relative branches that target code within that piece of code.

Such PC-relative branches cannot be made to external symbols because
the patch moves the location of the code and thus modifies the
relative address of external symbols.

Use an absolute jmpl to fix this problem.

Orabug: 25499527

Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b429ae4d5b565a71dfffd759dfcd4f6c093ced94)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: UEK4 LDOMS DOMAIN SERVICES UPDATE 6

This update fixes the following issues for LDom domain services on UEK4:

1. Error messages displayed on the console when guest domains are stopped
   such as:

ldc_print: id=0x11 flags=0x7 state=CONNECTED cstate=0x0 hsstate=0x10
        rx_h=0x2b40 rx_t=0x2b40 rx_n=512
        tx_h=0x4440 tx_t=0x4440 tx_n=512
        rcv_nxt=635 snd_nxt=723
ds-3: ds_disconnect_service_client: failed to send UNREG_REQ for handle
700000001 (1)

2. CPU DR related problems including 'length too big' errors and hangs. With
   these new fixes, >256 vcpus can be successfully added/removed from a guest
   domain. As part of this fix, a new scheme for reusing event data memory
   buffers was implemented.

Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 23171935, 24848179
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: Optimized memset, memcpy, copy_to_user, copy_from_user for M7

New algorithm that takes advantage of the M7 block init store
ASI, ie, overlapping pipelines and miss buffer filling.
Full details in code comments.

Ported from following UEK2 commits.
http://ca-git.us.oracle.com/?p=linux-uek-2.6.39-sparc.git;a=commit;h=c58ef937e442830c362d1ab20a35a1c61b409827
http://ca-git.us.oracle.com/?p=linux-uek-2.6.39-sparc.git;a=commit;h=322d6f95ade517f4e180545f23fa731b2d748b33
http://ca-git.us.oracle.com/?p=linux-uek-2.6.39-sparc.git;a=commit;h=bc0b4ae6b87fbb28bd816320d22ae6c6a2393865

Orabug: 25120741

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

Revert "sparc64: struct adi_caps should use __u64, not u64"

This reverts commit 04b6750492f8551a82a0336803922f736917639a.

Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: ds driver: Make memory allocations ATOMIC and enhance debugging

This patch fixes the following issues:

1. BUG 25107317 - Kernel Panic: Watchdog HARD LOCKUP out of ds_cap_fini()
2. BUG 24787856 - Forward port 19811909 - Unnecessary
warning - ldom_req_sp_token

BUG 25107317 appears to be caused by the ds driver allocating memory using
the GFP_KERNEL flag (which can result in sleeping) while holding a spinlock.
This is a violation of rules and resulted in the panic.

To fix BUG 24787856, the error message in question was changed to a
printk_once() which will result in the message only appearing once
in the console log instead of repeatedly.

The debugging facility in the driver was also enhanced by adding 3 separate
debug levels for the ds driver debug messages.

Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 25107317, 24787856
(cherry picked from commit f3bf272f0512120708a2966a7916b51c34efe56d)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Add symbolic access to M7 performance counters to perf

This commit provides symbolic access to every performance counter
provided in the M7. The 'perf list' command can be used to provide
a complete list of these new events, which will be reported as
shown below.

Br_mispred OR cpu/Br_mispred/                      [Kernel PMU event]
Br_taken OR cpu/Br_taken/                          [Kernel PMU event]
Br_tgt_mispred OR cpu/Br_tgt_mispred/              [Kernel PMU event]

Orabug: 23313970

Note: This commit is based on a cherry-pick of the following:
3bc29d39f2cb5ba72d945d79f82dd0c98dc55643
bd91767dfdbee52537ec3f1454c8c2cf0cf77a84

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Acked-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 39f70b2fa98ea10931133ab983f521c70cb7429f)

sonoma: perf: add support for sonoma (s7) into perf

This commit ensures that perf will now recognise that
it is running on a sonoma device and will initialise
correctly.

Orabug: 24931042

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit f39f00c4536c8c6ca0585a200a56894c2c158743)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64:M8 cpu recognition typo fix

(cherry picked from commit 764d030ec66da2e0be166af0fac0f36f1f4aacae)
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 6deca46c941b66734021c2feff6eb9a1eef8d173)

sparc64: Add M7 hardware cache events into perf

Use the enhanced performance instrumentation provided
in the M7 to enable the following hardware cache
events in perf.

L1-dcache-load-misses
L1-dcache-loads
L1-dcache-prefetches
L1-dcache-store-misses
L1-dcache-stores
L1-icache-load-misses
L1-icache-loads
L1-icache-prefetches
LLC-load-misses
LLC-loads
LLC-prefetches
LLC-store-misses
LLC-stores
branch-load-misses
dTLB-load-misses
dTLB-store-misses
iTLB-load-misses

Orabug: 24621144

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit b1d3b6ce6d4a3e5cf88a16c1a99bf37e0b805131)
(cherry picked from commit 16f97e434978b46f8b92d911b907478a4fb3d00a)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix the watchdog corrupting performance counters

There is a race condition in the perf_event_grab_pmc() which
means that we do not increment the active_events count correctly
when a new event is added. Ultimately, we end up with a negative
value for the active_event count. This means that the next time
we try and add a new event the watchdog will not be stopped
correctly and corruption of the performance count will
be observed.

Note: In sparc64 land the watchdog is implemented using one
of the performance counters.

This issue is fixed by moving the mutex lock to make
sure it encompasses the whole critical section in the
perf_event_grab_pmc().

Orabug: 23106709

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit 54ed00318fec5db3fab1b035ade5d95926d84799)
(cherry picked from commit d9ad125578c9f2fa015beb9dc10bd3d1eb9004ec)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix incorrect counting when using multiple perf counters

Commit 165050c1 introduced a change to the way we deal with
performance counter overflow interrupts. This change had the
side effect that when a performance counter overflow was
detected it assumed all performance counters in use
had overflowed. Thus, when using multiple performance
counters the event counting was incorrect.

This commit fixes this incorrect counting behaviour.

Orabug: 23106709

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit ef4dab8459ac6dd32538dc9448caf55ab68c2231)
(cherry picked from commit 741d96c0e37d7a73e17433355bb5bf513f2053af)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix a race condition when stopping performance counters

When stopping a performance counter that is close to overflowing,
there is a race condition that can occur between writing to the
PCRx register to stop the counter (and also clearing the PCRx.ov
bit at the same time) vs the performance counter overflowing and
setting the PCRx.ov bit in the PCRx register.
The result of this race condition is that we occassionally miss
a performance counter overflow interrupt, which in turn leads
to incorrect event counting.
This race condition has been observed when counting cpu cycles.
To fix this issue when stopping a performance counter,
we simply allow it to continue counting and overflow before
stopping it. This allows the performance counter overflow
interrupt to be generated and acted upon.
This fix is applied for M7, T5 and T4 devices.

Note: This commit is based on the following commits:
8b9b5b404e754e5c271341f5d7ea4797374c9844
a2d17bc33bdcc1cefd84bca44f2fd27075b16058
960f1607bec735e8da7dbd5df818da0a2e2b0305

Orabug: 22876587

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
(cherry picked from commit e5b7619e1de2f3e0dd858f632bc08ce64c344245)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Stop performance counter before updating

In order to reliably clear the PCRx.ov bit when updating a
performance counter value, we need to stop it counting first.
If we do not do this, then we can miss performance counter
overflow events.

Orabug: 22876587

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit 6de93dc001ed2f440ed3881722934fbda2de0d4f)
(cherry picked from commit b36dd4d8040cd53f7e8de5a1d145be483d185105)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: enable cpu hotplug feature for UEK4

This patch provides users with an option to
disable/enable cpu at runtime by writing to
/sys/devices/system/cpu/cpuX/online field.

Eg:
$ echo [0/1] > /sys/devices/system/cpu/cpu2/online

Orabug: 24946811
Orabug: 22546196

Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit a53c94ca8afc7a7603ff3c1154d81abb113a9e71)

sparc64: release thirds level cache reference for cpu hotplug feature

This crash as see on T7-4 which was related to the introduction of
3rd level caching patch.

issue: sysfs: cannot create duplicate filename '/devices/system/cpu/cpu0/cache'

Orabug: 24841354

Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit c33aebff52457ee7d0bacc922dc23b07cee4139a)

sparc64: fix compile warning section mismatch in find_node()

A compile warning is introduced by a commit to fix the find_node().
This patch fix the compile warning by moving find_node() into __init
section. Because find_node() is only used by memblock_nid_range() which
is only used by a __init add_node_ranges(). find_node() and
memblock_nid_range() should also be inside __init section.

Orabug: 24674753

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
(cherry picked from commit e58d08f923190fc4dc2a1962710f84672c2bc9b2)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: fix sun4v_build_irq NULL pointer dereference

sun4v_build_irq assume the given irq number is valid and use
it to get the handler pointer, the pointer is dereference
without being checked and cause kernel panic.

The cause of the invalid irq is that the tx/rx irq have never
been free during device removal. irq number end up exhausted during
continuous device add/removal test.

tx/rx irq is allocated during vio_device_probe() using irq_alloc()
and cookie_assign(). To free the tx/rx irq, cookie_unassign() and
irq_free() is called when the device is removed.

Orabug: 23082240

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 80043637b8fb1eabc16ab5947019f4dcdbb8c79f)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: ldmvsw: tx queue stuck in stopped state after LDC reset

The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.

The root cause was determined to be from the following series of
events:

1. Guest domain panics - resulting in the guest no longer processing
   network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
   control due to no more available tx drings and stops the tx queue
   for the guest domain
3. The LDC of the network connection for the guest is reset when
   the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
   responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
   the normal method to re-enable the tx queue. But the ACK never comes
   because the tx queue was cleared due to the LDC reset.

To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.

Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Orabug: 24714685
(cherry picked from commit d84ad41602ceb070c05d2633bc09d81f66796e15)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: Implement watchdog_nmi_enable and watchdog_nmi_disable

Implement functions watchdog_nmi_enable and watchdog_nmi_disable
to enable/disable nmi watchdogs. Sparc uses arch specific nmi watchdog
handler. Currently, we do not have a way to enable/disable nmi watchdog
dynamically. With these patches we can enable or disable arch
specific nmi watchdogs using proc or sysctl interface.

Example commands.
To enable: echo 1 > /proc/sys/kernel/nmi_watchdog
To disable: echo 0 > /proc/sys/kernel/nmi_watchdog

It can also achieved using the sysctl parameter kernel.nmi_watchdog

Orabug: 24796651

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 43e96774e0a338e883e9ced9e717424df126b153)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Setup a scheduling domain for highest level cache.

Individual scheduler domain should consist different hierarchy
consisting of cores sharing similar property. Currently, no
scheduler domain is defined separately for the cores that shares
the last level cache. As a result, the scheduler fails to take
advantage of cache locality while migrating tasks during load
balancing.

Here are the cpu masks currently present for sparc that are/can
be used in scheduler domain construction.
cpu_core_map : set based on the cores that shares l1 cache.
core_core_sib_map : is set based on the socket id or max cache id.
The prior SPARC notion of socket was defined as highest level of
shared cache. However, the MD record on T7 platforms now describes
the CPUs that share the physical socket and this is no longer tied
to shared cache.

That's why a separate cpu mask needs to be created that truly
represent highest level of shared cache for all platforms.

Modified after cherry picked from upstream commit.
d624716b6c67e60681180786564b92ddb521148a
The implementation is largely based on Chris's patches.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1e655ca52bb2727471f20cf4d8f62b4b9f69e6fc)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: PORT LDMVSW DRIVER TO UEK4

      Port of the new ldmvsw (Ldoms Virtual Switch) driver to UEK4.
      This code has already been submitted and accepted
      into the mainline Linux kernel.

      The ldmvsw is very similar in function to the existing sunvnet driver. The
      sunvnet driver is therefore split to put the code common to both drivers
      into the kernel for use by both drivers when loaded (see sunvnet_common.c/h).

Orabug: 23215917

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 361afffe35368dc23d2c9df6d7797ccf9af8fe57)

SPARC64: Fix bad FP register calculation

An additional problem was found in handle_ldf_stq
after adding the fix for the SIGFPE on no-fault
load.  The calculation for freg is incorrect when
a single precision load is being handled. This
causes %f1 to be seen as %f32 etc, and the incorrect
register ends up being overwritten.  This code
sequence demonstrates the problem:
ldd [%g1], %f32         ! g1 = valid address
lda [%i3] ASI_PNF, %f1  ! i3 = invalid address
std %f32, [%g1] ! %f32 is mangled
This is corrected by basing the freg calculation on
the load size.

Orabug: 24942761

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

SPARC64: Respect no-fault ASI for floating exceptions

Floating point load instructions using ASI_PNF or other
no-fault ASIs should never cause a SIGFPE. A store-quad
instruction should naturally fault if a non-quad register
is given, but this constraint should not apply to loads,
which may be single precision, double, or quad, and the
only constraint should be that the target register type
be appropriate for the precision of the load. A bug in
handle_ldf_stq() unnecessarily restricts no-fault loads
to quad registers, and causes a floating point exception
if one is not given. This restriction is removed.

Orabug: 24942761

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc64: Fixes NUMA node cpulist sysfs file in single NUMA node case.

Forward port 23175351 to UEK4

The sysfs file /sys/devices/system/node/node0/cpulist is incorrect in the
single node case on sun4v machines as the machine description record in this
case does not contain any NUMA information. A default list from 0 to NR_CPUS
was used prior. This file is read by utilities such as 'numactl --hardware'
and lscpu to show CPU-to-node assignment.

In order to fix this issue, the numa_cpumask_lookup_table is cleared at
bootup. Whenever an extra cpu is bringup via __cpu_up, the corresponding
cpu mask is set in the numa_cpumask_lookup_table.

Orabug: 24500614
Orabug: 22546851

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>

sparc64: Cleans up PRIQ error and debugging messages.

Given that the lowest level arch dependent interrupt routines cannot actually
propagate any error back to the calling driver in the case of irq
request/enable/disable and setting affinity, PRIQ error messages need to
communicate failures in a more traceable way. The original error messages which
were more for internal debugging than regular usage have also been improved as
well as made controllable via a command line parameter priq=dbg.

Orabug: 24010412

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 89c31d4dd664cd2edc1f6d14aa62c75acfb0d172)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: Remove console spam during kdump

Before executing the crash kernel, the panicking kernel cleans up the
irq state of the machine. This code contains a warning when cleaning up
unbound MSIs. Repeating this warning for each one floods the console and
can cause a waiting thread to time out before the other cpus have
completed.

This patch removes the warning and increases the time allowed for all
the cpus to complete the machine_capture_other_strands() function.

orabug: 23585248

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 30e77b09b134b3ec049b01cfd8754c774da493b9)
(cherry picked from commit 67657418d64c3cee2945370614c38d4516fb0ea1)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: kdump: set crashing_cpu for panic

crashing_cpu was only being set in die_if_kernel() but not when a
crash dump is initiated from panic(). Move the initialization to
machine_crash_shutdown().

Also call bust_spinlocks() from die_if_kernel() to get rid of a warning
in smp_call_function_many(). It's already called in the panic path.

Orabug: 23585248

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 1b588be700fac73edd07c015ff53aecba5d92bec)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: kexec: Don't mess with the tl register

I meddled with things I didn't fully understand while implementing commit
b43bc8f0 - "sparc64: add missing code for crash_setup_regs()"

I had changed the tl register in order to read tstate, tpc, etc. without
really knowing what I was doing. This can be a disaster if the crashing
thread takes another interrupt. Currently, the crash utitility doesn't
even use those values. They are found on the stack instead.

Orabug: 23585248

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 80eb7e28d3c719bbe3af56de5a5a8c68b764dbb9)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: VDS should try indefinitely to allocate IO pages

Orabug: 24924152

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewd-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Use block layer BIO-based interface for VDC IO requests

Orabug: 24823012

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Enable virtual disk protocol out of order execution

Orabug: 24815498

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Chris Hyser <Chris.Hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ipmi: Fix NULL pointer access and double free panic.

Orabug: 24697944

In case of a ldom/hardware not supporting ldc, ipmi_si module
will set the smi interface pointer to NULL after ldc channel
detection failure. However, ipmi_si module will crash during
unload due to absence of NULL check.

Add the smi interface null check and assign the workqueue to
NULL during cleanup to avoid double free panic.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: David Aldridge <david.j.aldridge@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit f2546771efb0c6402a5ea65dac9c5dbce18150e6)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ipmi: Update ipmi driver as per new vldc interface

Orabug: 23748821

Currently, ipmi driver fakes it self as a userland process
to access ipmi vldc channel.

This patch uses new cleaner vldc kernel interface that is added
for ipmi driver.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 7a0d1deac3289130680a5ab1626c609b76c9f053)

ipmi: Fix ipmi driver for ilom reset scenario

Orabug: 24407542

IPMI driver will have a stale vldc file pointer if ILOM resets.
Thus, IPMI drivers failed to work after the reset is complete.
IPMI driver need to close that file pointer and open another after
ilom reset is complete.

This is achieved by trying to open vldc file in every 15 seconds
in a process context. As vldc or ldc can not detect a ILOM reset,
this is the best possible approach for the problem.

This is based on Rob's patch for mc reset fix.
Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit cf5139791a8241fcab1f59c1da0a9058def661f2)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: vcc fixes

Orabug: 24653154

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Reviewed-By: Bijan Mottahedeh <Bijan.Mottahedeh@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit ef9086623414761b55a23d18dfc8565e7d1d7659)

sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()

Synonym created.
Grant succeeded.
kernel BUG at include/asm-generic/pgtable.h:576!
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
oracle_8114_cdb(8114): Kernel bad sw trap 5 [#1]
CPU: 120 PID: 8114 Comm: oracle_8114_cdb Not tainted
4.1.12-61.7.1.el6uek.rc1.sparc64 #1
task: fff8400700a24d60 ti: fff8400700bc4000 task.ti: fff8400700bc4000
TSTATE: 0000004411e01607 TPC: 00000000004609f8 TNPC: 00000000004609fc Y:
00000005    Not tainted
TPC: <gup_huge_pmd+0x198/0x1e0>
g0: 000000000001c000 g1: 0000000000ef3954 g2: 0000000000000000 g3:
0000000000000001
g4: fff8400700a24d60 g5: fff8001fa5c10000 g6: fff8400700bc4000 g7:
0000000000000720
o0: 0000000000bc5058 o1: 0000000000000240 o2: 0000000000006000 o3:
0000000000001c00
o4: 0000000000000000 o5: 0000048000080000 sp: fff8400700bc6ab1 ret_pc:
00000000004609f0
RPC: <gup_huge_pmd+0x190/0x1e0>
l0: fff8400700bc74fc l1: 0000000000020000 l2: 0000000000002000 l3:
0000000000000000
l4: fff8001f93250950 l5: 000000000113f800 l6: 0000000000000004 l7:
0000000000000000
i0: fff8400700ca46a0 i1: bd0000085e800453 i2: 000000026a0c4000 i3:
000000026a0c6000
i4: 0000000000000001 i5: fff800070c958de8 i6: fff8400700bc6b61 i7:
0000000000460dd0
I7: <gup_pud_range+0x170/0x1a0>
Call Trace:
[0000000000460dd0] gup_pud_range+0x170/0x1a0
[0000000000460e84] get_user_pages_fast+0x84/0x120
[00000000006f5a18] iov_iter_get_pages+0x98/0x240
[00000000005fa744] do_direct_IO+0xf64/0x1e00
[00000000005fbbc0] __blockdev_direct_IO+0x360/0x15a0
[00000000101f74fc] ext4_ind_direct_IO+0xdc/0x400 [ext4]
[00000000101af690] ext4_ext_direct_IO+0x1d0/0x2c0 [ext4]
[00000000101af86c] ext4_direct_IO+0xec/0x220 [ext4]
[0000000000553bd4] generic_file_read_iter+0x114/0x140
[00000000005bdc2c] __vfs_read+0xac/0x100
[00000000005bf254] vfs_read+0x54/0x100
[00000000005bf368] SyS_pread64+0x68/0x80

Orabug: 24665642

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Initialize xl_hugepage_shift to 0

Currently, this global is incorrectly initialized
to the default hugepage size (HPAGE_SHIFT) which
causes non-8M hugepages fail to initialize.

Orabug: 24439278

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64:mm/hugetlb: Set correct huge_pte_count index for 8M hugepages

Both set_huge_pte_at(...) and huge_ptep_get_and_clear(...)
call real_hugepage_size_to_pte_count_idx(hugepage_size) when adjusting
huge_pte_count. For 8MB/4MB the huge_pte_count index computed is 1(one).
This is incorrect because this index is for xl_hugepages. So the tsb
grow code in the mm fault path does not grow the tsb for 8MB/4MB
hugepages.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Orabug: 24490586
(cherry picked from commit c928d6fccaa59bd4b6cffc904144fa67a4726ff6)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix accounting issues used to size TSBs

Orabug: 24478985

As pages are allocated by a task, counters in the mm and mm_context
structures are used to track these allocations.  These counters are
then used to size the task's TSBs.  This patch addresses issues where
counts are not maintained properly, and TSBs of the incorrect size
are created for the task.

- hugetlb pages are not included in a task's RSS calculations.  However,
  the routine do_sparc64_fault() calculates the size of base TSB block
  by subtracting total size of hugetlb pages from RSS.  Since hugetlb
  size is likely larger than RSS, a negative value is passed as an
  unsigned value to the routine which allocates the TSB block.  The
  'negative unsigned' value appears as a really big value and results in
  a maximum sized base TSB being allocated.  This is the case for almost
  all tasks using hugetlb pages.

  THP pages are also counted in huge_pte_count[MM_PTES_HUGE].  And
  unlike hugetlb pages, THP pages are included in a task's RSS.
  Therefore, both hugetlb and THP can not be counted for in
  huge_pte_count[MM_PTES_HUGE].

  Add a new counter thp_pte_count for THP pages, and use this value for
  adjusting RSS to size the base TSB.

- In order to save memory, THP makes use of a huge zero page.  This huge
  zero page does not count against a task's RSS, but it does consume TSB
  entries.  Therefore, count huge zero page entries in
  huge_pte_count[MM_PTES_HUGE].

- Accounting of THP pages is done in the routine set_pmd_at().
  Unfortunately, this does not catch the case where a THP page is split.
  To handle this case, decrement the count in pmdp_invalidate().
  pmdp_invalidate is only called when splitting a THP.  However, 'sanity
  checks' are added in case it is ever called for other purposes.

- huge_pte_count[MM_PTES_HUGE] tracks the number of HPAGE_SIZE (8M) pages
  used by the task.  This value is used to size the TSB for HPAGE_SIZE
  pages.  However, for each HPAGE_SIZE (8M) there are two REAL_HPAGE_SIZE
  (4M) pages.  The TSB contains an entry for each REAL_HPAGE_SIZE page.
  Therefore, the number of REAL_HPAGE_SIZE pages used by the task should
  be used to size the MM_PTES_HUGE TSB.  A new compile time constant
  REAL_HPAGE_PER_HPAGE is used to multiply huge_pte_count[MM_PTES_HUGE]
  before sizing the TSB.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Tested-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
(cherry picked from commit 417fc85e759b6d4c4602fbdbdd5375ec5ddf2cb0)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix irq stack bootmem allocation.

Currently, irq stack bootmem is allocated for all possible cpus
before nr_cpus value changes the list of possible cpus. As a result,
there is unnecessary wastage of bootmemory.

Move the irq stack bootmem allocation so that it happens after
possible cpu list is modified based on nr_cpus value.

Orabug: 23050718

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
(cherry picked from commit d192d0743aa6dbca0991900c490c06111b4bd86c)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix cpu_possible_mask if nr_cpus is set

Orabug: 23297558

If kernel boot parameter nr_cpus is set, it should define the number
of CPUs that can ever be available in the system i.e.
cpu_possible_mask. setup_nr_cpu_ids() overrides the nr_cpu_ids based
on the cpu_possible_mask during kernel initialization. If
cpu_possible_mask is not set based on the nr_cpus value, earlier part
of the kernel would be initialized using nr_cpus value leading to a
kernel crash.

Set cpu_possible_mask based on nr_cpus value. Thus setup_nr_cpu_ids()
becomes redundant and does not corrupt nr_cpu_ids value.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
(cherry picked from commit f539e5b332d8d969301bc43f076d905569c2b12c)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix PMD check during page table walk

Currently check for PMD_HUGE during page table
walk uses incorrect instruction sequence:

be,pt %xcc, 700f;
andcc REG1, REG2, %g0;

This sequence is incorrect since branch decision is
made *before* 'andcc' in the delay slot is executed.

Orabug: 24353511

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

vldc driver: provide kernel driver interfaces1

Orabug: 24601126

Forward port 22804422 to UEK4-QU3 - VLDC driver should expose
services...

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix sentinel page table entry for 16G

Currently no page table trimming is done for 16G pages
so _PAGE_PMD_HUGE must not be set for 16G. Also, for
this size, trimming would be done at PUD level, so
this flag should not be set anyways.

Orabug: 24353511

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Trim page tables for 2G pages

Currently mapping a 2G page requires 256*1024 PTE entries.
This results in large amounts of RAM to be used just for
storing page tables. We now use 256 PMD entries to map a
2G page which is much more space efficient.

Orabug: 23109070

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
(cherry picked from commit d3c88b8f27645c14cbb220570e5945abb0989d19)
(cherry picked from commit 768096d7916fefc497f397b0675455a754ee8a5b)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Trim page tables at PMD for hugepages

For PMD aligned (8M) hugepages, we currently allocate
all four page table levels which is wasteful. We now
allocate till PMD level only which saves memory usage
from page tables.

Orabug: 22630259

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
(cherry picked from commit 5d2c7930a4d3bf3ca560048052d638d7efa67e36)
(cherry picked from commit abefebd73e204979661a818ac31cf455d110a672)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

vcc driver fixes

Orabug:24319080 - hang on a mutex out of vcc_open()
Orabug:24326005 - UEK4 kernel panic tty_ldisc_flush vcc_close

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Reviewed-By: Bijan Mottahedeh <Bijan.Mottahedeh@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

LDOMS DOMAIN SERVICES UPDATE 5

Orabug: 24601099

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>