Babu Moger [Thu, 13 Oct 2016 17:36:48 +0000 (10:36 -0700)]
sparc: Implement watchdog_nmi_enable and watchdog_nmi_disable
Implement functions watchdog_nmi_enable and watchdog_nmi_disable
to enable/disable nmi watchdogs. Sparc uses arch specific nmi watchdog
handler. Currently, we do not have a way to enable/disable nmi watchdog
dynamically. With these patches we can enable or disable arch
specific nmi watchdogs using proc or sysctl interface.
Example commands.
To enable: echo 1 > /proc/sys/kernel/nmi_watchdog
To disable: echo 0 > /proc/sys/kernel/nmi_watchdog
It can also achieved using the sysctl parameter kernel.nmi_watchdog
Signed-off-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 43e96774e0a338e883e9ced9e717424df126b153) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Atish Patra [Thu, 20 Oct 2016 00:33:29 +0000 (18:33 -0600)]
sparc64: Setup a scheduling domain for highest level cache.
Individual scheduler domain should consist different hierarchy
consisting of cores sharing similar property. Currently, no
scheduler domain is defined separately for the cores that shares
the last level cache. As a result, the scheduler fails to take
advantage of cache locality while migrating tasks during load
balancing.
Here are the cpu masks currently present for sparc that are/can
be used in scheduler domain construction.
cpu_core_map : set based on the cores that shares l1 cache.
core_core_sib_map : is set based on the socket id or max cache id.
The prior SPARC notion of socket was defined as highest level of
shared cache. However, the MD record on T7 platforms now describes
the CPUs that share the physical socket and this is no longer tied
to shared cache.
That's why a separate cpu mask needs to be created that truly
represent highest level of shared cache for all platforms.
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Chris Hyser <chris.hyser@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1e655ca52bb2727471f20cf4d8f62b4b9f69e6fc) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Allen Pais [Mon, 30 May 2016 07:42:15 +0000 (13:12 +0530)]
SPARC64: PORT LDMVSW DRIVER TO UEK4
Port of the new ldmvsw (Ldoms Virtual Switch) driver to UEK4.
This code has already been submitted and accepted
into the mainline Linux kernel.
The ldmvsw is very similar in function to the existing sunvnet driver. The
sunvnet driver is therefore split to put the code common to both drivers
into the kernel for use by both drivers when loaded (see sunvnet_common.c/h).
Signed-off-by: Aaron Young <aaron.young@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 361afffe35368dc23d2c9df6d7797ccf9af8fe57)
Rob Gardner [Sun, 1 Nov 2015 23:51:34 +0000 (16:51 -0700)]
SPARC64: Fix bad FP register calculation
An additional problem was found in handle_ldf_stq
after adding the fix for the SIGFPE on no-fault
load. The calculation for freg is incorrect when
a single precision load is being handled. This
causes %f1 to be seen as %f32 etc, and the incorrect
register ends up being overwritten. This code
sequence demonstrates the problem:
ldd [%g1], %f32 ! g1 = valid address
lda [%i3] ASI_PNF, %f1 ! i3 = invalid address
std %f32, [%g1] ! %f32 is mangled
This is corrected by basing the freg calculation on
the load size.
Rob Gardner [Fri, 30 Oct 2015 19:18:00 +0000 (13:18 -0600)]
SPARC64: Respect no-fault ASI for floating exceptions
Floating point load instructions using ASI_PNF or other
no-fault ASIs should never cause a SIGFPE. A store-quad
instruction should naturally fault if a non-quad register
is given, but this constraint should not apply to loads,
which may be single precision, double, or quad, and the
only constraint should be that the target register type
be appropriate for the precision of the load. A bug in
handle_ldf_stq() unnecessarily restricts no-fault loads
to quad registers, and causes a floating point exception
if one is not given. This restriction is removed.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
The sysfs file /sys/devices/system/node/node0/cpulist is incorrect in the
single node case on sun4v machines as the machine description record in this
case does not contain any NUMA information. A default list from 0 to NR_CPUS
was used prior. This file is read by utilities such as 'numactl --hardware'
and lscpu to show CPU-to-node assignment.
In order to fix this issue, the numa_cpumask_lookup_table is cleared at
bootup. Whenever an extra cpu is bringup via __cpu_up, the corresponding
cpu mask is set in the numa_cpumask_lookup_table.
chris hyser [Fri, 23 Sep 2016 16:27:07 +0000 (09:27 -0700)]
sparc64: Cleans up PRIQ error and debugging messages.
Given that the lowest level arch dependent interrupt routines cannot actually
propagate any error back to the calling driver in the case of irq
request/enable/disable and setting affinity, PRIQ error messages need to
communicate failures in a more traceable way. The original error messages which
were more for internal debugging than regular usage have also been improved as
well as made controllable via a command line parameter priq=dbg.
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 89c31d4dd664cd2edc1f6d14aa62c75acfb0d172) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Kleikamp [Fri, 17 Jun 2016 14:51:04 +0000 (09:51 -0500)]
sparc: Remove console spam during kdump
Before executing the crash kernel, the panicking kernel cleans up the
irq state of the machine. This code contains a warning when cleaning up
unbound MSIs. Repeating this warning for each one floods the console and
can cause a waiting thread to time out before the other cpus have
completed.
This patch removes the warning and increases the time allowed for all
the cpus to complete the machine_capture_other_strands() function.
Dave Kleikamp [Mon, 27 Jun 2016 16:30:17 +0000 (11:30 -0500)]
sparc64: kdump: set crashing_cpu for panic
crashing_cpu was only being set in die_if_kernel() but not when a
crash dump is initiated from panic(). Move the initialization to
machine_crash_shutdown().
Also call bust_spinlocks() from die_if_kernel() to get rid of a warning
in smp_call_function_many(). It's already called in the panic path.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 1b588be700fac73edd07c015ff53aecba5d92bec) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Kleikamp [Thu, 23 Jun 2016 19:04:30 +0000 (14:04 -0500)]
sparc: kexec: Don't mess with the tl register
I meddled with things I didn't fully understand while implementing commit b43bc8f0 - "sparc64: add missing code for crash_setup_regs()"
I had changed the tl register in order to read tstate, tpc, etc. without
really knowing what I was doing. This can be a disaster if the crashing
thread takes another interrupt. Currently, the crash utitility doesn't
even use those values. They are found on the stack instead.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 80eb7e28d3c719bbe3af56de5a5a8c68b764dbb9) Signed-off-by: Allen Pais <allen.pais@oracle.com>
In case of a ldom/hardware not supporting ldc, ipmi_si module
will set the smi interface pointer to NULL after ldc channel
detection failure. However, ipmi_si module will crash during
unload due to absence of NULL check.
Add the smi interface null check and assign the workqueue to
NULL during cleanup to avoid double free panic.
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: David Aldridge <david.j.aldridge@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit f2546771efb0c6402a5ea65dac9c5dbce18150e6) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Currently, ipmi driver fakes it self as a userland process
to access ipmi vldc channel.
This patch uses new cleaner vldc kernel interface that is added
for ipmi driver.
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 7a0d1deac3289130680a5ab1626c609b76c9f053)
IPMI driver will have a stale vldc file pointer if ILOM resets.
Thus, IPMI drivers failed to work after the reset is complete.
IPMI driver need to close that file pointer and open another after
ilom reset is complete.
This is achieved by trying to open vldc file in every 15 seconds
in a process context. As vldc or ldc can not detect a ILOM reset,
this is the best possible approach for the problem.
This is based on Rob's patch for mc reset fix. Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit cf5139791a8241fcab1f59c1da0a9058def661f2) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Vijay Kumar [Sun, 2 Oct 2016 21:40:18 +0000 (15:40 -0600)]
sparc64:mm/hugetlb: Set correct huge_pte_count index for 8M hugepages
Both set_huge_pte_at(...) and huge_ptep_get_and_clear(...)
call real_hugepage_size_to_pte_count_idx(hugepage_size) when adjusting
huge_pte_count. For 8MB/4MB the huge_pte_count index computed is 1(one).
This is incorrect because this index is for xl_hugepages. So the tsb
grow code in the mm fault path does not grow the tsb for 8MB/4MB
hugepages.
Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Orabug: 24490586
(cherry picked from commit c928d6fccaa59bd4b6cffc904144fa67a4726ff6) Signed-off-by: Allen Pais <allen.pais@oracle.com>
As pages are allocated by a task, counters in the mm and mm_context
structures are used to track these allocations. These counters are
then used to size the task's TSBs. This patch addresses issues where
counts are not maintained properly, and TSBs of the incorrect size
are created for the task.
- hugetlb pages are not included in a task's RSS calculations. However,
the routine do_sparc64_fault() calculates the size of base TSB block
by subtracting total size of hugetlb pages from RSS. Since hugetlb
size is likely larger than RSS, a negative value is passed as an
unsigned value to the routine which allocates the TSB block. The
'negative unsigned' value appears as a really big value and results in
a maximum sized base TSB being allocated. This is the case for almost
all tasks using hugetlb pages.
THP pages are also counted in huge_pte_count[MM_PTES_HUGE]. And
unlike hugetlb pages, THP pages are included in a task's RSS.
Therefore, both hugetlb and THP can not be counted for in
huge_pte_count[MM_PTES_HUGE].
Add a new counter thp_pte_count for THP pages, and use this value for
adjusting RSS to size the base TSB.
- In order to save memory, THP makes use of a huge zero page. This huge
zero page does not count against a task's RSS, but it does consume TSB
entries. Therefore, count huge zero page entries in
huge_pte_count[MM_PTES_HUGE].
- Accounting of THP pages is done in the routine set_pmd_at().
Unfortunately, this does not catch the case where a THP page is split.
To handle this case, decrement the count in pmdp_invalidate().
pmdp_invalidate is only called when splitting a THP. However, 'sanity
checks' are added in case it is ever called for other purposes.
- huge_pte_count[MM_PTES_HUGE] tracks the number of HPAGE_SIZE (8M) pages
used by the task. This value is used to size the TSB for HPAGE_SIZE
pages. However, for each HPAGE_SIZE (8M) there are two REAL_HPAGE_SIZE
(4M) pages. The TSB contains an entry for each REAL_HPAGE_SIZE page.
Therefore, the number of REAL_HPAGE_SIZE pages used by the task should
be used to size the MM_PTES_HUGE TSB. A new compile time constant
REAL_HPAGE_PER_HPAGE is used to multiply huge_pte_count[MM_PTES_HUGE]
before sizing the TSB.
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Tested-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
(cherry picked from commit 417fc85e759b6d4c4602fbdbdd5375ec5ddf2cb0) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Currently, irq stack bootmem is allocated for all possible cpus
before nr_cpus value changes the list of possible cpus. As a result,
there is unnecessary wastage of bootmemory.
Move the irq stack bootmem allocation so that it happens after
possible cpu list is modified based on nr_cpus value.
If kernel boot parameter nr_cpus is set, it should define the number
of CPUs that can ever be available in the system i.e.
cpu_possible_mask. setup_nr_cpu_ids() overrides the nr_cpu_ids based
on the cpu_possible_mask during kernel initialization. If
cpu_possible_mask is not set based on the nr_cpus value, earlier part
of the kernel would be initialized using nr_cpus value leading to a
kernel crash.
Set cpu_possible_mask based on nr_cpus value. Thus setup_nr_cpu_ids()
becomes redundant and does not corrupt nr_cpu_ids value.
Signed-off-by: Atish Patra <atish.patra@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
(cherry picked from commit f539e5b332d8d969301bc43f076d905569c2b12c) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Nitin Gupta [Thu, 25 Aug 2016 18:33:27 +0000 (11:33 -0700)]
sparc64: Fix sentinel page table entry for 16G
Currently no page table trimming is done for 16G pages
so _PAGE_PMD_HUGE must not be set for 16G. Also, for
this size, trimming would be done at PUD level, so
this flag should not be set anyways.
Nitin Gupta [Thu, 2 Jun 2016 22:14:42 +0000 (15:14 -0700)]
sparc64: Trim page tables for 2G pages
Currently mapping a 2G page requires 256*1024 PTE entries.
This results in large amounts of RAM to be used just for
storing page tables. We now use 256 PMD entries to map a
2G page which is much more space efficient.
Nitin Gupta [Fri, 27 May 2016 21:58:13 +0000 (14:58 -0700)]
sparc64: Trim page tables at PMD for hugepages
For PMD aligned (8M) hugepages, we currently allocate
all four page table levels which is wasteful. We now
allocate till PMD level only which saves memory usage
from page tables.
Signed-off-by: Larry Bassel <larry.bassel@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
All signal frames must be at least 16-byte aligned, because that is
the alignment we explicitly create when we build signal return stack
frames.
All stack pointers must be at least 8-byte aligned.
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
arch/sparc/kernel/signal32.c - modified patch context so that it would apply
Signed-off-by: Larry Bassel <larry.bassel@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Number of context IDs supported by the hardware
is reported via machine descriptor for sun4v
systems. For systems > T3, 16 bits are used
to represent context ID in the HW. For these
systems the context ID wrap around happens if
there are more that 65536 processes running
simultaneously. For systems older than that
13 bits are used and the context ID wraps around
if there are 8192 processes running simultaneously.
Reviewed-by: Babu Moger <babu.moger@oracle.com> Acked-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
David S. Miller [Sun, 29 May 2016 03:41:12 +0000 (20:41 -0700)]
sparc64: Fix return from trap window fill crashes.
We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.
Otherwise we can get an OOPS that looks like this:
The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code. First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.
And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took. In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for. It just always branches to the last instruction in
the parent trap's handler.
All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.
On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons. The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).
This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary. Now if you look at them, we'll see at the end:
And oops, all three cases are handled like a fault.
This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.
So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.
So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.
David S. Miller [Wed, 25 May 2016 19:51:20 +0000 (12:51 -0700)]
sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
On cheetahplus chips we take the ctx_alloc_lock in order to
modify the TLB lookup parameters for the indexed TLBs, which
are stored in the context register.
This is called with interrupts disabled, however ctx_alloc_lock
is an IRQ safe lock, therefore we must take acquire/release it
properly with spin_{lock,unlock}_irq().
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reported-by: Ilya Malakhov <ilmalakhovthefirst@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Allen Pais <allen.pais@oracle.com>
max_active determines the maximum number of execution contexts per
CPU which can be assigned to the work items of a wq. For example,
with @max_active of 16, at most 16 work items of the wq can be
executing at the same time per CPU.
Currently, for a bound wq, the maximum limit for @max_active is 512
and the default value used when 0 is specified is 256. For an unbound
wq, the limit is higher of 512 and 4 * num_possible_cpus(). These
values are chosen sufficiently high such that they are not the
limiting factor while providing protection in runaway cases.
The number of active work items of a wq is usually regulated by the
users of the wq, more specifically, by how many work items the users
may queue at the same time. Unless there is a specific need for
throttling the number of active work items, specifying '0' is
recommended.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Reviewed-by: Liam Merwick <Liam.Merwick@oracle.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
(cherry picked from commit b584786e611e8e8a28830386e8b3db8874d794c5)
(cherry picked from commit f2559a96b70562267f01d5bb62ef44aa9f0c0cd8) Signed-off-by: Allen Pais <allen.pais@oracle.com>
Nitin Gupta [Thu, 26 May 2016 21:56:19 +0000 (14:56 -0700)]
sparc64: Reduce TLB flushes during hugepte changes
During hugepage map/unmap, TSB and TLB flushes are currently
issued at every PAGE_SIZE'd boundary which is unnecessary.
We now issue the flush at REAL_HPAGE_SIZE boundaries only.
Without this patch workloads which unmap a large hugepage
backed VMA region get CPU lockups due to excessive TLB
flush calls.
Dwight Engen [Fri, 16 Jan 2015 22:19:39 +0000 (17:19 -0500)]
sunvdc: don't dereference port->disk before disk probe finishes
If the backing file for a vdisk is not present in the service domain an
ldc reset can occur during the initial port/disk probing. The ldc reset
logic was dereferencing port->disk, which may not have been setup yet.
Guard against this case.
chris hyser [Tue, 17 May 2016 20:05:35 +0000 (13:05 -0700)]
sparc64: This patch adds PRIQ support.
This patch supports INT_A through INT_D interrupts as described
by the Open Firmware device tree as well as MSI vectors registered
by PCIe drivers. pci=nomsi may not work though frankly that makes no
sense on a SPARC machine.
The command line parameter priq=off reverts to prior MSIEQ interrupt
mechanism.
chris hyser [Thu, 19 May 2016 20:05:47 +0000 (13:05 -0700)]
sparc64: Enable aggressive setting of PCIe MPS settings
This patch connects SPARC PCIe into the generic PCIe framework enabling
MPS and MRRS to be set aggressively subject to the standard command line
flags. To enable put "pci=pcie_bus_perf" on command line.
chris hyser [Tue, 17 May 2016 16:39:25 +0000 (09:39 -0700)]
sparc64: Allow redirection of MSI/MSI-X IRQs
Allows redirection of MSI/MSI-X IRQs by finding appropriate MSIEQ and
re-routing its IRQ. Also handles driver IRQs sharing the same MSIEQ.
Affinity masks for all such shared interrupts as well as MSIQ IRQ
are modified. Note, based on the HW sharing this patch can change
related driver IRQs in an invisible manner. While confusing and not
desirable, this is an artifact of the HW design.
Rob Gardner [Tue, 9 Feb 2016 22:38:05 +0000 (15:38 -0700)]
IPMI: Driver for Sparc T4/T5/T7 Platforms
Functional IPMI interface driver for Sparc T4/T5/T7. This will
probably also work for other platforms that use an iLOM channel
for IPMI services, including older and future ones, though these
have not been tested.
This driver provides the transport between the IPMI message layer
and the Sparc platform IPMI endpoint in iLOM. The Virtual Logical
Domain Channel (VLDC) driver claims the host endpoint, and we call
it to move data to/from iLOM. So there is an unusual dependency
on another loadable module which requires several compromises
until we work out a plan to restructure the VLDC driver to provide
a cleaner interface:
* An artificial symbolic dependency on vldc is created so that
"modprobe ipmi_si" will ensure that vldc is loaded also.
* ipmi_vldc uses filp_open/kernel_read/kernel_write on device
files provided by vldc, ie, /sys/class/vldc/ipmi/mode and
/dev/vldc/ipmi.
Bug 22804422 has been created to deal with these issues.
Sending this driver upstream is on hold until we work out these
issues. Also, the vldc driver itself has not yet been sent upstream
and that is obviously a prerequisite.
Commit: 5075a47f3765e778b45367ba4873c1bd08b21d0e
fix-up code base for v4.1.12-46 merge
should not have removed "#include <linux/hugetlb.h>"
Add it back in after applying adfc71b605:
fix-up - add back include of linux/dtrace_os.h
so that it will merge with master.
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Commit: bd52d0fd57c96146f8d1838588753ab9dabcd2fe:
sparc64: Log warning for invalid hugepages boot param
removes "#include <linux/dtrace_os.h>" from arch/sparc/mm/fault_64.c.
That header file is needed by dtrace. Add it back in.
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Aaron Young [Tue, 1 Mar 2016 15:47:02 +0000 (07:47 -0800)]
SPARC64: UEK4 LDOMS DOMAIN SERVICES UPDATE 3
This update provides the following fix for LDom domain services on UEK4.
1. Add an event to the vlds driver which is used to signal
process(es) using libds that the vlds /dev devices have been updated.
When it receives this event, libds will refresh/update it's internal
list of vlds devices allowing the list to stay immediately up-to-date
when vlds devices have changed. This event fixes some DR related libds
problems found during regression testing due to libds internal vlds
device list becoming stale.
Signed-off-by: Aaron Young <aaron.young@oracle.com> Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Orabug: 22853109
Alexandre Chartre [Thu, 17 Mar 2016 10:31:24 +0000 (03:31 -0700)]
Interface to mark SR-IOV device ready for use by LDoms guest
Add a iov_ready file to all PCI devices (/sys/bus/pci/devices/*/iov_ready).
The iov_ready file is write only, and mapped to the pci_iov_dev_ready
hypervisor call, which is used to indicate that a PCI device is ready
or no longer ready to be shared with other domains
Write "1" to the file to indicate that the PCI device is ready.
For example:
Vijay Kumar [Wed, 9 Mar 2016 19:48:38 +0000 (11:48 -0800)]
sparc64: Log warning for invalid hugepages boot param
When an invalid hugepage param is mentioned in kernel boot param,
appropriate warning should be logged to indicate if it's not
a) software supported
b) MMU support for xl_hugepagesz
c) xl_hugepagesz not in use
Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Acked-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Orabug: 22729791 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Note: Resending this patch. There is no change in this patch since v1.
Jalap?no was verified repaired.
Now to find performance issues.
One performance issue is subordinate page table state (SPTS). The SPTS will
be tricky because of protection changes for COW and other. For example,
a 2Gb hugepage will have 1UL << (31-23) PMD entries. Do we want 256 IPI-s
for a hugepage TTE(pte) change?
chris hyser [Fri, 1 Apr 2016 19:44:22 +0000 (12:44 -0700)]
sparc64: Fix I/O NUMA parsing and sysfs display code.
I/O NUMA node parsing has been broke since T5 and did not work on
T7. The code also did not correctly handle PCIe root complexes
crossbar connected to multiple memory/cpu NUMA nodes. Additionally,
the numa_node attributes displayed in sysfs were incorrect.
Example: T7-4 showing round-robin spread of multiply connected root
complexes.
[ 3723.288247] /pci@305: On NUMA node 0
[ 3723.363398] /pci@304: On NUMA node 2
[ 3723.437486] /pci@307: On NUMA node 0
[ 3723.510510] /pci@306: On NUMA node 2
[ 3723.582582] /pci@313: On NUMA node 0
[ 3723.655276] /pci@308: On NUMA node 2
[ 3723.728077] /pci@302: On NUMA node 0
[ 3723.800774] /pci@30a: On NUMA node 2
[ 3723.874895] /pci@309: On NUMA node 0
[ 3723.947089] /pci@301: On NUMA node 2
[ 3724.020218] /pci@30b: On NUMA node 1
[ 3724.092902] /pci@300: On NUMA node 3
[ 3724.167630] /pci@303: On NUMA node 1
[ 3724.240287] /pci@30c: On NUMA node 3
[ 3724.312245] /pci@312: On NUMA node 1
[ 3724.384857] /pci@30e: On NUMA node 3
[ 3724.457482] /pci@30d: On NUMA node 1
[ 3724.531679] /pci@310: On NUMA node 3
[ 3724.603621] /pci@30f: On NUMA node 1
[ 3724.675695] /pci@311: On NUMA node 3
chris hyser [Thu, 7 Apr 2016 20:55:56 +0000 (13:55 -0700)]
sparc64: Set up core sibling list correctly for T7.
The important definition of core sibling is that some level of cache is shared.
The prior SPARC notion of socket was defined as highest level of shared cache.
On T7 platforms, the MD record now describes the CPUs that share the physical
socket and this is no longer tied to shared cache. This patch correctly
separates these two concepts.
chris hyser [Thu, 7 Apr 2016 19:12:05 +0000 (12:12 -0700)]
sparc64: Fix CPU package information in /sys
CPU package information in
/sys/bus/cpu/devices/cpu*/topology/physical_package_id
is inconisistent with the use by tools such as irqbalance. This patch
uses the socket ID to be consistent and useful.
chris hyser [Thu, 7 Apr 2016 19:32:48 +0000 (12:32 -0700)]
sparc64: Add 3rd level cache info to /sys
This patch pulls line size and cache size info from the machine description and
adds l3 caches files to /sys/bus/cpu/devices/cpu* directories. It also
structures the information in the same directory hierachy as x86 so that user
programs like irqbalance can find the needed information to work correctly.
Rob Gardner [Sun, 27 Mar 2016 22:39:13 +0000 (16:39 -0600)]
sparc64: Add lightweight syscall mechanism for lwp_info
This patch introduces a new "light weight" system call
mechanism which has the ability to retrieve small bits
of information and/or perform minor computations without
the need for a full blown save/switch/restore context.
Solaris provides _lwp_info(), which returns basically the
same information as getrusage(RUSAGE_THREAD) but much faster.
This is used extensively by the database code, and returns
the utime and stime for the calling thread.
(This patch also provides a fast getcpu function just as
a demonstration of how additional calls might be added.
Unlike x86, there is no unprivileged instruction to do this,
and so it is a fairly expensive system call.)
Allen Pais [Tue, 29 Mar 2016 08:50:33 +0000 (14:20 +0530)]
sparc64:piggback program generates a.out header with incorrect section sizes
piggyback in uek for SPARC generates an a.out that has section sizes that are
too large. This causes problems when booting with OpenBoot because OpenBoot
uses those sizes to map and copy the image to its specified VA and runs into
unmapped memory during the copies.
Signed-off-by: Jose Marchesi <jose.marchesi@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit bd99ee7ceffb1a472ccd8841dd7011d15e7fa258)
wim.coekaerts@oracle.com [Fri, 29 Jan 2016 17:39:38 +0000 (09:39 -0800)]
Add sun4v_wdt watchdog driver
This driver adds sparc hypervisor watchdog support. The default
timeout is 60 seconds and the range is between 1 and 31536000 seconds. Both watchdog-resolution and
watchdog-max-timeout MD properties settings are supported.
Signed-off-by: Wim Coekaerts <wim.coekaerts@oracle.com> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit eccc96426978c0fa963f8712077ecb6247f0e57e)
SR-IOV code looks for arch specific data while enabling
VFs. When VF device is added, driver probe function makes set
of calls to initialize the pci device. Because the VF device is
added different way than the normal PF device(which happens via
of_create_pci_dev for sparc), some of the arch specific initialization
does not happen for VF device. That causes panic when archdata is
accessed.
To fix this, I have used already defined weak function
pcibios_setup_device to copy archdata from PF to VF.
Also verified the fix.
Signed-off-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit be81c7e3cc48d3ff8b26021be3fd49e997743cbc)
chris hyser [Thu, 4 Feb 2016 21:14:43 +0000 (13:14 -0800)]
sparc64: enable "relaxed ordering" in IOMMU mappings
Enable relaxed ordering for memory writes in IOMMU TSB entry from
dma_4v_map_page() and dma_4v_map_sg() when dma_attrs
DMA_ATTR_WEAK_ORDERING is set. This requires vPCI version 2.0 API.
chris hyser [Thu, 4 Feb 2016 20:07:03 +0000 (12:07 -0800)]
sparc64: Enable PCI IOMMU version 2 API
Enable Version 2 of the PCI IOMMU API needed for advanced features
such as PCI Relaxed Ordering and greater than 2 GB DMA address
space per root complex.
Sowmini Varadhan [Tue, 2 Feb 2016 18:41:56 +0000 (10:41 -0800)]
sunvnet: perf tracepoint invocations to trace LDC state machine
Use sunvnet perf trace macros to monitor LDC message exchange state.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5fa4282fdb6d30937abcf1b1a9d367aaf472178a)
Sowmini Varadhan [Tue, 2 Feb 2016 18:41:55 +0000 (10:41 -0800)]
sunvnet: Add support for perf LDC event tracing
Add perf event macros for support of tracing and instrumentation
of LDC state machine
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 61cf74d322a9d8ef172251e32c3008cf60964b70)
- Disable cpu timer only for hot-remove and not for hot-add
- Update interrupt affinities before interrupt redistribution
- Default to simple round-robin interrupt redistribution for ldoms
Tushar Dave [Thu, 7 Jan 2016 23:24:26 +0000 (15:24 -0800)]
sparc64: bypass iommu to use 64bit address space
This patch is internal only not for UPSTREAM. This is a temporary
workaround based on UEK2 commit c1a12ed1d125
("sparc64: enable iommu bypass workaround for IB. This is temporary.")
Current design of sparc iommu is based on iommu V1 APIs which at max
can have 2G/8K DMA addresses. Due to this, kernel entity (e.g. i40e,
PSIF) requesting more than 2G/8K DMA addresses does not work at all.
This patch adds temporary workaround to remedy this issue by bypassing
iommu.
When 64bit iommu implementation is complete, this workaround will be
reverted.
Dave Kleikamp [Thu, 4 Feb 2016 16:43:48 +0000 (10:43 -0600)]
sparc64: call crash_kexec() directly from die_if_kernel()
A direct call to crash_kexec() here allows the crashing register state
to be saved to the PT_NOTE. When called from panic(), a new register
state is created which is less useful.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Occasionally, the crash kernel will fail to configure a virtual disk
because the hypervisor leaves an old request in the rx queue even after
it is reconfigured in ldc_bind(). Fix this with a call to ldc_rx_reset().
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Dave Kleikamp [Fri, 20 Mar 2015 21:38:42 +0000 (16:38 -0500)]
sparc64: restore prom_cif_stack
Commit ef3e035c stopped using the firmware stack and thus stopped saving
it's location in p1275buf. However, kexec wants to be using the firmware
stack when launching the new kernel.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>