www.infradead.org Git - users/jedix/linux-maple.git/log

sparc64: Fix cpu_possible_mask if nr_cpus is set

Orabug: 23297558

If kernel boot parameter nr_cpus is set, it should define the number
of CPUs that can ever be available in the system i.e.
cpu_possible_mask. setup_nr_cpu_ids() overrides the nr_cpu_ids based
on the cpu_possible_mask during kernel initialization. If
cpu_possible_mask is not set based on the nr_cpus value, earlier part
of the kernel would be initialized using nr_cpus value leading to a
kernel crash.

Set cpu_possible_mask based on nr_cpus value. Thus setup_nr_cpu_ids()
becomes redundant and does not corrupt nr_cpu_ids value.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
(cherry picked from commit f539e5b332d8d969301bc43f076d905569c2b12c)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix PMD check during page table walk

Currently check for PMD_HUGE during page table
walk uses incorrect instruction sequence:

be,pt %xcc, 700f;
andcc REG1, REG2, %g0;

This sequence is incorrect since branch decision is
made *before* 'andcc' in the delay slot is executed.

Orabug: 24353511

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

vldc driver: provide kernel driver interfaces1

Orabug: 24601126

Forward port 22804422 to UEK4-QU3 - VLDC driver should expose
services...

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix sentinel page table entry for 16G

Currently no page table trimming is done for 16G pages
so _PAGE_PMD_HUGE must not be set for 16G. Also, for
this size, trimming would be done at PUD level, so
this flag should not be set anyways.

Orabug: 24353511

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Trim page tables for 2G pages

Currently mapping a 2G page requires 256*1024 PTE entries.
This results in large amounts of RAM to be used just for
storing page tables. We now use 256 PMD entries to map a
2G page which is much more space efficient.

Orabug: 23109070

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
(cherry picked from commit d3c88b8f27645c14cbb220570e5945abb0989d19)
(cherry picked from commit 768096d7916fefc497f397b0675455a754ee8a5b)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Trim page tables at PMD for hugepages

For PMD aligned (8M) hugepages, we currently allocate
all four page table levels which is wasteful. We now
allocate till PMD level only which saves memory usage
from page tables.

Orabug: 22630259

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
(cherry picked from commit 5d2c7930a4d3bf3ca560048052d638d7efa67e36)
(cherry picked from commit abefebd73e204979661a818ac31cf455d110a672)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

vcc driver fixes

Orabug:24319080 - hang on a mutex out of vcc_open()
Orabug:24326005 - UEK4 kernel panic tty_ldisc_flush vcc_close

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Reviewed-By: Bijan Mottahedeh <Bijan.Mottahedeh@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

LDOMS DOMAIN SERVICES UPDATE 5

Orabug: 24601099

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Support reserving memory with memmap=xxx$yyy

The kernel commandline parameter memmap= was supported
on several other architectures but not on SPARC (it was
being ignored on SPARC).

Add support for the memmap=xxx$yyy commandline parameter
(sparc64/UEK4 only). The patch is based on the existing
code for the "tile" architecture.

There are other types of memmap= commandlines which
are only supported on x86 that are e820-specific.
These were not implemented.

Orabug: 22662762

Signed-off-by: Larry Bassel <larry.bassel@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: Harden signal return frame checks.

    Orabug: 23303740

    [ Upstream commit d11c2a0de2824395656cf8ed15811580c9dd38aa ]

    All signal frames must be at least 16-byte aligned, because that is
    the alignment we explicitly create when we build signal return stack
    frames.

    All stack pointers must be at least 8-byte aligned.

Signed-off-by: David S. Miller <davem@davemloft.net>
    Conflicts:

    arch/sparc/kernel/signal32.c - modified patch context so that it would apply

Signed-off-by: Larry Bassel <larry.bassel@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64:Support User Probes for Sparc

Orabug: 23523685 Support User Probes in OLS / uek4.1

Signed-off-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Use HW supported number of context ID bits

Orabug: 24449941

Number of context IDs supported by the hardware
is reported via machine descriptor for sun4v
systems. For systems > T3, 16 bits are used
to represent context ID in the HW. For these
systems the context ID wrap around happens if
there are more that 65536 processes running
simultaneously. For systems older than that
13 bits are used and the context ID wraps around
if there are 8192 processes running simultaneously.

Reviewed-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix return from trap window fill crashes.

We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.

Otherwise we can get an OOPS that looks like this:

ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
[0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c

The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code.  First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.

The userland register window fill handler is:

add %sp, STACK_BIAS + 0x00, %g1; \
ldxa [%g1 + %g0] ASI, %l0; \
mov 0x08, %g2; \
mov 0x10, %g3; \
ldxa [%g1 + %g2] ASI, %l1; \
mov 0x18, %g5; \
ldxa [%g1 + %g3] ASI, %l2; \
ldxa [%g1 + %g5] ASI, %l3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %l4; \
ldxa [%g1 + %g2] ASI, %l5; \
ldxa [%g1 + %g3] ASI, %l6; \
ldxa [%g1 + %g5] ASI, %l7; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i0; \
ldxa [%g1 + %g2] ASI, %i1; \
ldxa [%g1 + %g3] ASI, %i2; \
ldxa [%g1 + %g5] ASI, %i3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i4; \
ldxa [%g1 + %g2] ASI, %i5; \
ldxa [%g1 + %g3] ASI, %i6; \
ldxa [%g1 + %g5] ASI, %i7; \
restored; \
retry; nop; nop; nop; nop; \
b,a,pt %xcc, fill_fixup_dax; \
b,a,pt %xcc, fill_fixup_mna; \
b,a,pt %xcc, fill_fixup;

And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took.  In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for.  It just always branches to the last instruction in
the parent trap's handler.

For example, for a regular fault, the code goes:

winfix_trampoline:
rdpr %tpc, %g3
or %g3, 0x7c, %g3
wrpr %g3, %tnpc
done

All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.

On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons.  The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).

This is executed inline via the FILL_*_RTRAP handlers.  rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary.  Now if you look at them, we'll see at the end:

    ba,a,pt    %xcc, user_rtt_fill_fixup;
    ba,a,pt    %xcc, user_rtt_fill_fixup;
    ba,a,pt    %xcc, user_rtt_fill_fixup;

And oops, all three cases are handled like a fault.

This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.

So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.

So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.

Orabug: 24671126

Reported-by: Nick Alcock <nix@esperi.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Take ctx_alloc_lock properly in hugetlb_setup().

On cheetahplus chips we take the ctx_alloc_lock in order to
modify the TLB lookup parameters for the indexed TLBs, which
are stored in the context register.

This is called with interrupts disabled, however ctx_alloc_lock
is an IRQ safe lock, therefore we must take acquire/release it
properly with spin_{lock,unlock}_irq().

Orabug: 24671126

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix sparc64_set_context stack handling.

Like a signal return, we should use synchronize_user_stack() rather
than flush_user_windows().

Orabug: 24671126

Reported-by: Ilya Malakhov <ilmalakhovthefirst@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: vds kernel BUG at fs/buffer.c:1269!

Orabug: 24376791

Interrupts must be enabled before the fini call.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Virtual disk IO should handle VDS module removal and reinsertion

Orabug: 24319792

Virtual disk IO should handle mdodule removal and reinsertion while IO
is active between clients and the server.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: support for identifying Sonoma 2 systems

Needed for Sonoma 2 software support

Orabug: 22960812
Signed-off-by: Joe Moriarty <joe.moriarty@oracle.com>
Acked-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 8a9b7d9b25a3ad54bc41294f93ce814038f01c70)
(cherry picked from commit a8ce3853635573a42e49df9c8b7e87bf35656561)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sonoma:correctly recognize sonoma cpu type

Orabug: 23041920

Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Joe Moriarty <joe.moriarty@oracle.com>
(cherry picked from commit 72eaed0f66615fe000a63feb7350ba51bf040e06)
(cherry picked from commit 70a67f6bfc281c92e9422b1672d0cae30da178df)

sparc64: Set VDS workqueue max_active argument to 0

Orabug: 23565322

Based on

https://www.kernel.org/doc/Documentation/workqueue.txt

The recommended value for max_active is 0:

max_active:

max_active determines the maximum number of execution contexts per
CPU which can be assigned to the work items of a wq.  For example,
with @max_active of 16, at most 16 work items of the wq can be
executing at the same time per CPU.

Currently, for a bound wq, the maximum limit for @max_active is 512
and the default value used when 0 is specified is 256.  For an unbound
wq, the limit is higher of 512 and 4 * num_possible_cpus().  These
values are chosen sufficiently high such that they are not the
limiting factor while providing protection in runaway cases.

The number of active work items of a wq is usually regulated by the
users of the wq, more specifically, by how many work items the users
may queue at the same time.  Unless there is a specific need for
throttling the number of active work items, specifying '0' is
recommended.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <Liam.Merwick@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
(cherry picked from commit b584786e611e8e8a28830386e8b3db8874d794c5)
(cherry picked from commit f2559a96b70562267f01d5bb62ef44aa9f0c0cd8)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Reduce TLB flushes during hugepte changes

During hugepage map/unmap, TSB and TLB flushes are currently
issued at every PAGE_SIZE'd boundary which is unnecessary.
We now issue the flush at REAL_HPAGE_SIZE boundaries only.

Without this patch workloads which unmap a large hugepage
backed VMA region get CPU lockups due to excessive TLB
flush calls.

Orabug: 23071722

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
(cherry picked from commit b42a694198cca38e8cdb3f601266bf591ba3291d)
(cherry picked from commit fdc7f39ae632a9ec0114c59090131d2db7dd7682)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: kernel panic -- vds_bh_reset

Orabug: 23199936

The panic is an assertion failure in fs/buffer.c:1269

static inline void check_irqs_on(void)
{
         *** BUG_ON(irqs_disabled()); ***
}

The vds reset path calls the backend fini routine which eventually calls
the file close interface:

         vds_vio_lock(vio, flags);
         vds_be_fini(port);

vds_vio_lock() grabs a spin lock and disables local irqs and thus
the eventual assertion failure.

The fix is to add a new r/w mutex to protect backend state and move the
vds_vio_lock() call after vds_be_fini().

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <Liam.Merwick@oracle.com>
(cherry picked from commit 6e33112afcdd654ada7c9414a1c4d83278533911)
(cherry picked from commit e62908110662f009f2449df5faae496ac43a1d65)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

vds_blk_rw() should check bio_alloc() NULL return value

Orabug: 22934031

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
(cherry picked from commit fec4ca1085a268c38a4a12c6119322aaf2f87698)
(cherry picked from commit d590a3711158228194ea31da0f1fea612bd13c05)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvdc: don't dereference port->disk before disk probe finishes

If the backing file for a vdisk is not present in the service domain an
ldc reset can occur during the initial port/disk probing. The ldc reset
logic was dereferencing port->disk, which may not have been setup yet.
Guard against this case.

Orabug: 20362258

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
(cherry picked from commit cd6d3705da958b5db625272eb8733ab79a045f87)
(cherry picked from commit bee156ac9cad00f6a39417217c454085645c3d62)
(cherry picked from commit 476306db27c9a6bcd2e8012047ba06a0af16b734)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: This patch adds PRIQ support.

This patch supports INT_A through INT_D interrupts as described
by the Open Firmware device tree as well as MSI vectors registered
by PCIe drivers. pci=nomsi may not work though frankly that makes no
sense on a SPARC machine.

The command line parameter priq=off reverts to prior MSIEQ interrupt
mechanism.

OraBug: 22748924

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit d4c668861f91dfe6f5fa1a809218a8c46dc76c9b)
(cherry picked from commit c47c2d2a53856b25843e07c78f42f45a17661d2c)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Enable aggressive setting of PCIe MPS settings

This patch connects SPARC PCIe into the generic PCIe framework enabling
MPS and MRRS to be set aggressively subject to the standard command line
flags. To enable put "pci=pcie_bus_perf" on command line.

Orabug: 21149334

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 5e5b08ede2c5b6cbf39e20f91097ca2435ea286e)
(cherry picked from commit 8b9a1855f68978d437605b0267ba448399303511)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Allow redirection of MSI/MSI-X IRQs

Allows redirection of MSI/MSI-X IRQs by finding appropriate MSIEQ and
re-routing its IRQ. Also handles driver IRQs sharing the same MSIEQ.
Affinity masks for all such shared interrupts as well as MSIQ IRQ
are modified. Note, based on the HW sharing this patch can change
related driver IRQs in an invisible manner. While confusing and not
desirable, this is an artifact of the HW design.

Orabug: 22749960

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 914901044c1a028185326eb1a3c8821cab8845be)
(cherry picked from commit 98069630a58903f6ec29aaf784f24b44c27a0db0)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: use COMMAND_LINE_SIZE for boot string

Orabug: 19722011

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 1a9bf6b57dbfcc4ec8e8d98bd20b7975f4b4934f)
(cherry picked from commit 017214c5742ee92e3270024c4cce1cecb793ac1c)

sparc64: crypto camellia opcode error fix

Orabug: 23128525

Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit fc1b755de4250245961b226da41d10f066467926)
(cherry picked from commit c6cb169240529eb974a5846e55e622001973b79f)
(cherry picked from commit 8d140eb4166d127dc0f64595ec17e441beb4b47c)

sparc64: node_random needs attention

Seriously node_random will have to be hooked into sparc.

Orabug: 23128525

Signed-off-by: Bob Picco <bob.picco@oracle.com>
(cherry picked from commit 9c8ab6e8096ddf1814df8503cdd10ee83b4ddf9a)
(cherry picked from commit 640de4e021c24037420ddb4c52cc91b002d72ad7)

Conflicts:
kernel/fork.c
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 07bb1228d250a4e3003ccf317da22c06310b8607)
(cherry picked from commit cdeddef5edf0de95cff6ca8c8b7efe94322492d7)

sparc64: nr_cpus and nodes_shift

This is being done for M5 and the like.

To go beyond the NR_CPUS limit of 4064, the issue in cpu mondo -
init_cpu_send_mondo_info - needs to be addressed and appears possible.

Orabug: 23128525

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit d8ce9cc00181bfa65865c6d1624f0dcb3d048a7b)
(cherry picked from commit ce753fe5d10682939912242c9880935472f1e195)

Conflicts:
arch/sparc/Kconfig

Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 48df0cc664ddfb2f03dc1afbb3ff01a3192c9723)
(cherry picked from commit fdb632ef6a250d24887003cc35b72744183c8642)

sparc64: struct adi_caps should use __u64, not u64

struct adi_caps uses u64 as the type for its field which is not
defined for include/uapi. Change it to __u64.

Orabug: 22713162

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
(cherry picked from commit 4b47a697322066fcd6cf0f4637dece26da3525fc)

Conflicts:
include/uapi/linux/prctl.h

Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 858864aea91eb7a1337cedd70a01ffb3fb5d898a)
(cherry picked from commit acf5580a66da8aae303a42af49de27d7500651cc)

IPMI: Driver for Sparc T4/T5/T7 Platforms

Functional IPMI interface driver for Sparc T4/T5/T7. This will
probably also work for other platforms that use an iLOM channel
for IPMI services, including older and future ones, though these
have not been tested.

This driver provides the transport between the IPMI message layer
and the Sparc platform IPMI endpoint in iLOM. The Virtual Logical
Domain Channel (VLDC) driver claims the host endpoint, and we call
it to move data to/from iLOM. So there is an unusual dependency
on another loadable module which requires several compromises
until we work out a plan to restructure the VLDC driver to provide
a cleaner interface:

* An artificial symbolic dependency on vldc is created so that
   "modprobe ipmi_si" will ensure that vldc is loaded also.

* ipmi_vldc uses filp_open/kernel_read/kernel_write on device
   files provided by vldc, ie, /sys/class/vldc/ipmi/mode and
   /dev/vldc/ipmi.

Bug 22804422 has been created to deal with these issues.

Sending this driver upstream is on hold until we work out these
issues. Also, the vldc driver itself has not yet been sent upstream
and that is obviously a prerequisite.

Orabug: 22658348

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit 6083e586b068ae159c8335adc2d210e7b7f66d27)
(cherry picked from commit 9944e6442b962c2945f2a59ef7c6ff81d0e95172)
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit e76d315b514e424a8623c51eaa526a6d2ac52a89)
(cherry picked from commit dfcab0a3eef7ebd4cc2fda9865f42ff114b46459)

SPARC64: UEK4 LDOMS DOMAIN SERVICES UPDATE 4

This update provides the following fixes for LDom domain services on UEK4.

1. DS service registration with major ver 0.
2. Kernel NULL pointer dereference in vlds driver at shutdown.

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Reviewed-by: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 23292083,23504208

fix-up - add back include of linux/hugetlb.h

Orabug: 22729791

Commit:
5075a47f3765e778b45367ba4873c1bd08b21d0e
fix-up code base for v4.1.12-46 merge
should not have removed "#include <linux/hugetlb.h>"
Add it back in after applying adfc71b605:
fix-up - add back include of linux/dtrace_os.h
so that it will merge with master.

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

fix-up - add back include of linux/dtrace_os.h

Orabug: 22729791

Commit:
bd52d0fd57c96146f8d1838588753ab9dabcd2fe:
sparc64: Log warning for invalid hugepages boot param
removes "#include <linux/dtrace_os.h>" from arch/sparc/mm/fault_64.c.
That header file is needed by dtrace. Add it back in.

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

fix-up code base for v4.1.12-46 merge

Orabug: 22729791

Commit:
bd52d0fd57c96146f8d1838588753ab9dabcd2fe:
sparc64: Log warning for invalid hugepages boot param
Removes "#include <linux/dtrace_os.h>" from arch/sparc/mm/fault_64.c.

The topic/uek-4.1/sparc code base has:
    #include <linux/context_tracking.h>
    #include <linux/hugetlb.h>
    #include <linux/dtrace_os.h>

v4.1.12-46 has:
    #include <linux/context_tracking.h>
    #include <linux/dtrace_os.h>

Remove "#include <linux/hugetlb.h>" so that bd52d0fd57 will merge with
v4.1.12-46.

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

SPARC64: UEK4 LDOMS DOMAIN SERVICES UPDATE 3

  This update provides the following fix for LDom domain services on UEK4.

  1. Add an event to the vlds driver which is used to signal
  process(es) using libds that the vlds /dev devices have been updated.
  When it receives this event, libds will refresh/update it's internal
  list of vlds devices allowing the list to stay immediately up-to-date
  when vlds devices have changed. This event fixes some DR related libds
  problems found during regression testing due to libds internal vlds
  device list becoming stale.

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
  Orabug: 22853109

(cherry picked from commit 1722beeac0278656731b8f0da394fb2f6a0b17b3)
(cherry picked from commit 80d160e808f283c1438e8681eedfee287f53a955)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

Interface to mark SR-IOV device ready for use by LDoms guest

Add a iov_ready file to all PCI devices (/sys/bus/pci/devices/*/iov_ready).
The iov_ready file is write only, and mapped to the pci_iov_dev_ready
hypervisor call, which is used to indicate that a PCI device is ready
or no longer ready to be shared with other domains

Write "1" to the file to indicate that the PCI device is ready.
For example:

# echo 1 > /sys/bus/pci/devices/0001:03:00.0/iov_ready

Write "0" to the file to indicate that the PCI device is no longer ready.
For example:

# echo 0 > /sys/bus/pci/devices/0001:03:00.0/iov_ready

Orabug: 22909608

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 14f6e82264924f1db2cad628edae7964caa9c03e)

Merge remote-tracking branch 'shaggy/kexec-uek4-sparc' into sparc

Orabug: 21864391

Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Log warning for invalid hugepages boot param

When an invalid hugepage param is mentioned in kernel boot param,
appropriate warning should be logged to indicate if it's not
a) software supported
b) MMU support for xl_hugepagesz
c) xl_hugepagesz not in use

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Acked-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Orabug: 22729791
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: xl-hugepages

Note: Resending this patch. There is no change in this patch since v1.

Jalap?no was verified repaired.

Now to find performance issues.

One performance issue is subordinate page table state (SPTS). The SPTS will
be tricky because of protection changes for COW and other. For example,
a 2Gb hugepage will have 1UL << (31-23) PMD entries. Do we want 256 IPI-s
for a hugepage TTE(pte) change?

Signed-off-by: Bob Picco <bpicco@meloft.net>
(cherry picked from commit ece059b2e2581a2dcda3fb1ca35cd31258f6ed03)

Conflicts:
arch/sparc/include/asm/mmu_64.h
arch/sparc/mm/fault_64.c

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Acked-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Orabug: 22729791
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix I/O NUMA parsing and sysfs display code.

I/O NUMA node parsing has been broke since T5 and did not work on
T7. The code also did not correctly handle PCIe root complexes
crossbar connected to multiple memory/cpu NUMA nodes. Additionally,
the numa_node attributes displayed in sysfs were incorrect.

Example: T7-4 showing round-robin spread of multiply connected root
complexes.

[ 3723.288247] /pci@305: On NUMA node 0
[ 3723.363398] /pci@304: On NUMA node 2
[ 3723.437486] /pci@307: On NUMA node 0
[ 3723.510510] /pci@306: On NUMA node 2
[ 3723.582582] /pci@313: On NUMA node 0
[ 3723.655276] /pci@308: On NUMA node 2
[ 3723.728077] /pci@302: On NUMA node 0
[ 3723.800774] /pci@30a: On NUMA node 2
[ 3723.874895] /pci@309: On NUMA node 0
[ 3723.947089] /pci@301: On NUMA node 2
[ 3724.020218] /pci@30b: On NUMA node 1
[ 3724.092902] /pci@300: On NUMA node 3
[ 3724.167630] /pci@303: On NUMA node 1
[ 3724.240287] /pci@30c: On NUMA node 3
[ 3724.312245] /pci@312: On NUMA node 1
[ 3724.384857] /pci@30e: On NUMA node 3
[ 3724.457482] /pci@30d: On NUMA node 1
[ 3724.531679] /pci@310: On NUMA node 3
[ 3724.603621] /pci@30f: On NUMA node 1
[ 3724.675695] /pci@311: On NUMA node 3

Orabug: 22748961

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>

sparc64: Set up core sibling list correctly for T7.

The important definition of core sibling is that some level of cache is shared.
The prior SPARC notion of socket was defined as highest level of shared cache.
On T7 platforms, the MD record now describes the CPUs that share the physical
socket and this is no longer tied to shared cache. This patch correctly
separates these two concepts.

Before:
[root@ca-sparc30 topology]# cat core_siblings_list
32-63,128-223

After:
[root@ca-sparc30 topology]# cat core_siblings_list
32-63

OraBug 22748950

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>

sparc64: Fix CPU package information in /sys

CPU package information in
/sys/bus/cpu/devices/cpu*/topology/physical_package_id
is inconisistent with the use by tools such as irqbalance. This patch
uses the socket ID to be consistent and useful.

Orabug: 22748950

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>

sparc64: Add 3rd level cache info to /sys

This patch pulls line size and cache size info from the machine description and
adds l3 caches files to /sys/bus/cpu/devices/cpu* directories. It also
structures the information in the same directory hierachy as x86 so that user
programs like irqbalance can find the needed information to work correctly.

> ls /sys/bus/cpu/devices/cpu*
clock_tick           l1_dcache_size       l2_cache_line_size  l3_cache_size
crash_notes          l1_icache_line_size  l2_cache_size       node0
l1_dcache_line_size  l1_icache_size       l3_cache_line_size  topology

Sample results on a T7-4:

> cat /sys/bus/cpu/devices/cpu*/l3*
64
8388608

/sys/bus/cpu/devices/cpu*/cache/index0/:
coherency_line_size  level  shared_cpu_list  shared_cpu_map  size  type

/sys/bus/cpu/devices/cpu*/cache/index1/:
coherency_line_size  level  shared_cpu_list  shared_cpu_map  size  type

/sys/bus/cpu/devices/cpu*/cache/index2/:
coherency_line_size  level  shared_cpu_list  shared_cpu_map  size  type

/sys/bus/cpu/devices/cpu*/cache/index3/:
coherency_line_size  level  shared_cpu_list  shared_cpu_map  size  type

cat /sys/bus/cpu/devices/cpu32/cache/index3/*
64
3
32-63,128-223
0,ffffffff,ffffffff,ffffffff,00000000,00000000,ffffffff,00000000
8388608
Unified

Orabug: 22748950

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>

sparc64: Add lightweight syscall mechanism for lwp_info

This patch introduces a new "light weight" system call
mechanism which has the ability to retrieve small bits
of information and/or perform minor computations without
the need for a full blown save/switch/restore context.

Solaris provides _lwp_info(), which returns basically the
same information as getrusage(RUSAGE_THREAD) but much faster.
This is used extensively by the database code, and returns
the utime and stime for the calling thread.

(This patch also provides a fast getcpu function just as
a demonstration of how additional calls might be added.
Unlike x86, there is no unprivileged instruction to do this,
and so it is a fairly expensive system call.)

Orabug: 22952506

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>

sparc64: correctly recognize sparc M8 cpu

This patch detects Sparc M8 cpu type.

Orabug: 23130139

Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit d3ae0cafd1576f4660c9b44fa08b4cecee04f8a8)

sparc64: correctly recognize Sonoma chips

The following patch adds support for correctly
recognizing Sonoma chips.

cpu : Unknown SUN4V CPU
fpu : Unknown SUN4V FPU
pmu : Unknown SUN4V PMU

Orabug: 22088766

Signed-off-by: Allen Pais <allen.pais@oracle.com>

arch/sparc: Sonoma epsc group patch

Needed for Sonoma IB software support.

Orabug: 23055865
Signed-off-by: Joe Moriarty <joe.moriarty@oracle.com>
Acked-by: Babu Moger <babu.moger@oracle.com>
(cherry picked from commit bf199cb83bd83643b223a0504d2f41ed793812ef)

arch/sparc: Sonoma piggyback patch

Needed for Sonoma IB software support.

Orabug: 23055807
Signed-off-by: Joe Moriarty <joe.moriarty@oracle.com>
Acked-by: Karl Volz <karl.volz@oracle.com>
(cherry picked from commit a63fc712b3ceb2007ad5da3db6a5cae27d906208)

sparc64:piggback program generates a.out header with incorrect section sizes

piggyback in uek for SPARC generates an a.out that has section sizes that are
too large. This causes problems when booting with OpenBoot because OpenBoot
uses those sizes to map and copy the image to its specified VA and runs into
unmapped memory during the copies.

This is a minimal fix.

Orabug:21793535

Signed-off-by: Jose Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit bd99ee7ceffb1a472ccd8841dd7011d15e7fa258)

Add sun4v_wdt watchdog driver

This driver adds sparc hypervisor watchdog support. The default
timeout is 60 seconds and the range is between 1 and
31536000 seconds. Both watchdog-resolution and
watchdog-max-timeout MD properties settings are supported.

Signed-off-by: Wim Coekaerts <wim.coekaerts@oracle.com>
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit eccc96426978c0fa963f8712077ecb6247f0e57e)

Revert "Add sun4v_wdt watchdog driver"

This reverts commit d153ba1897b562594f98b91e12d62e69018ad990.

sparc/PCI: Fix for panic while enabling SR-IOV

Orabug: 22659268

We noticed this panic while enabling SR-IOV in sparc.

ixgbe 0002:03:00.0 eth4: SR-IOV enabled with 2 VFs
ixgbe 0002:03:00.0: Multiqueue Enabled: Rx Queue count = 4,
Tx Queue count = 4
ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function
Network Driver - version 2.12.1-k
ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 000000000000160c
tsk->{mm,active_mm}->pgd = fff8000408238000
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
modprobe(3335): Oops [#1]
CPU: 2 PID: 3335 Comm: modprobe Tainted: G           OE
4.1.12-32.el6uek.sparc64 #1
task: fff8000406f35ca0 ti: fff8000404630000 task.ti: fff8000404630000
TSTATE: 0000008411001606 TPC: 0000000000438ee0 TNPC: 0000000000438f0c
Y: 00000000 Tainted: G           OE
TPC: <dma_supported+0x20/0x80>
g0: 0000000080000001 g1: 0000000000000000 g2: 00000000ffffffff
g3: 0000000000000003
g4: fff8000406f35ca0 g5: fff800041896a000 g6: fff8000404630000
g7: 0000000000000000
o0: 0000000000000000 o1: 0000000000000300 o2: 000000000000000
1o3: 0000000000000004
o4: fff80004122160d8 o5: 0000000000e37ae5 sp: fff8000404632951
ret_pc: 00000000007215d0
RPC: <pci_enable_device+0x10/0x40>
l0: fff8000404410090 l1: fff800040441013e l2: 00000000007a768c
l3: 0000000000000001
l4: fff80004046331d8 l5: fff80004046331f0 l6: e000000000000000
l7: 0040000000000000
i0: fff8000404410090 i1: ffffffffffffffff i2: fff8000404410180
i3: 00000000004ace40
i4: 0000000000000000 i5: 2000000000000000 i6: fff8000404632a0
1i7: 0000000010550ea0
I7: <ixgbevf_probe+0x80/0x4c0 [ixgbevf]>
Call Trace:
[0000000010550ea0] ixgbevf_probe+0x80/0x4c0 [ixgbevf]
[00000000007229d4] local_pci_probe+0x34/0xa0
[0000000000722ae8] pci_call_probe+0xa8/0xe0
[0000000000722dd0] pci_device_probe+0x50/0x80
[000000000079c1c0] really_probe+0x140/0x420
[000000000079c4e4] driver_probe_device+0x44/0xa0
[000000000079c5c8] __driver_attach+0x88/0xa0
[000000000079a3cc] bus_for_each_dev+0x6c/0xa0
[000000000079bd5c] driver_attach+0x1c/0x40
[000000000079ae1c] bus_add_driver+0x17c/0x220
[000000000079cd94] driver_register+0x74/0x120
[0000000000722ebc] __pci_register_driver+0x3c/0x60
[0000000010558048] ixgbevf_init_module+0x48/0x5c [ixgbevf]
[0000000000426bb8] do_one_initcall+0xb8/0x200
[00000000004e5f8c] do_init_module+0x4c/0x1c0
[00000000004e6f48] load_module+0x5e8/0x780
Disabling lock debugging due to kernel taint
Caller[0000000010550ea0]: ixgbevf_probe+0x80/0x4c0 [ixgbevf]
Caller[00000000007229d4]: local_pci_probe+0x34/0xa0
Caller[0000000000722ae8]: pci_call_probe+0xa8/0xe0
Caller[0000000000722dd0]: pci_device_probe+0x50/0x80
Caller[000000000079c1c0]: really_probe+0x140/0x420
Caller[000000000079c4e4]: driver_probe_device+0x44/0xa0
Caller[000000000079c5c8]: __driver_attach+0x88/0xa0
Caller[000000000079a3cc]: bus_for_each_dev+0x6c/0xa0
Caller[000000000079bd5c]: driver_attach+0x1c/0x40
Caller[000000000079ae1c]: bus_add_driver+0x17c/0x220
Caller[000000000079cd94]: driver_register+0x74/0x120
Caller[0000000000722ebc]: __pci_register_driver+0x3c/0x60
Caller[0000000010558048]: ixgbevf_init_module+0x48/0x5c [ixgbevf]
Caller[0000000000426bb8]: do_one_initcall+0xb8/0x200
Caller[00000000004e5f8c]: do_init_module+0x4c/0x1c0
Caller[00000000004e6f48]: load_module+0x5e8/0x780
Caller[00000000004e7184]: SyS_init_module+0xa4/0xe0
Caller[0000000000406254]: linux_sparc_syscall+0x34/0x44
Caller[0000000000103490]: 0x103490
Instruction DUMP: 8530b020  80a64002  1860000c <c200625c>
840e4001  80a08001  02400009  90102001  c45e2088
Kernel panic - not syncing: Fatal exception
Press Stop-A (L1-A) to return to the boot prom
---[ end Kernel panic - not syncing: Fatal exception

Details:
Here is the call sequence
virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported

The panic happened at line 760(file arch/sparc/kernel/iommu.c)

758 int dma_supported(struct device *dev, u64 device_mask)
759 {
760         struct iommu *iommu = dev->archdata.iommu;
761         u64 dma_addr_mask = iommu->dma_addr_mask;
762
763         if (device_mask >= (1UL << 32UL))
764                 return 0;
765
766         if ((device_mask & dma_addr_mask) == dma_addr_mask)
767                 return 1;
768
769 #ifdef CONFIG_PCI
770         if (dev_is_pci(dev))
771 return pci64_dma_supported(to_pci_dev(dev), device_mask);
772 #endif
773
774         return 0;
775 }
776 EXPORT_SYMBOL(dma_supported);

Same panic happened with Intel ixgbe driver also.

SR-IOV code looks for arch specific data while enabling
VFs. When VF device is added, driver probe function makes set
of calls to initialize the pci device. Because the VF device is
added different way than the normal PF device(which happens via
of_create_pci_dev for sparc), some of the arch specific initialization
does not happen for VF device.  That causes panic when archdata is
accessed.

To fix this, I have used already defined weak function
pcibios_setup_device to copy archdata from PF to VF.
Also verified the fix.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit be81c7e3cc48d3ff8b26021be3fd49e997743cbc)

sparc64: enable "relaxed ordering" in IOMMU mappings

Enable relaxed ordering for memory writes in IOMMU TSB entry from
dma_4v_map_page() and dma_4v_map_sg() when dma_attrs
DMA_ATTR_WEAK_ORDERING is set. This requires vPCI version 2.0 API.

Orabug: 19245907

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit d61b9f04493d2a0508c58b6f663c86d6441e1c42)

sparc64: Enable PCI IOMMU version 2 API

Enable Version 2 of the PCI IOMMU API needed for advanced features
such as PCI Relaxed Ordering and greater than 2 GB DMA address
space per root complex.

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit f79d44785c80c5e626ee026e4e001f0c30958a82)

sunvnet: perf tracepoint invocations to trace LDC state machine

Use sunvnet perf trace macros to monitor LDC message exchange state.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5fa4282fdb6d30937abcf1b1a9d367aaf472178a)

sunvnet: Add support for perf LDC event tracing

Add perf event macros for support of tracing and instrumentation
of LDC state machine

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 61cf74d322a9d8ef172251e32c3008cf60964b70)

LDoms CPU Hotplug - fix interrupt redistribution.

Orabug: 22623753

- Disable cpu timer only for hot-remove and not for hot-add
- Update interrupt affinities before interrupt redistribution
- Default to simple round-robin interrupt redistribution for ldoms

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
(cherry picked from commit 40110bd3bf1d2188719cca6f7a32df7d722f42be)
(cherry picked from commit 69910784aff4cad929ed7b15b744249da57ffc01)

LDoms CPU Hotplug - dynamic mondo queue allocation.

Orabug: 22620474

- Allocate mondo queues for present cpus only at boot time
- Allocate mondo queues dynamically and with proper alignment at hot-add

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
(cherry picked from commit 41f763e66dcbdb72632cd0675e2990085e47a527)
(cherry picked from commit fb59288f6c4e17eca98a6a38512c0c05d55ac8e9)

sparc64: bypass iommu to use 64bit address space

This patch is internal only not for UPSTREAM. This is a temporary
workaround based on UEK2 commit c1a12ed1d125
("sparc64: enable iommu bypass workaround for IB. This is temporary.")

Current design of sparc iommu is based on iommu V1 APIs which at max
can have 2G/8K DMA addresses. Due to this, kernel entity (e.g. i40e,
PSIF) requesting more than 2G/8K DMA addresses does not work at all.
This patch adds temporary workaround to remedy this issue by bypassing
iommu.

When 64bit iommu implementation is complete, this workaround will be
reverted.

Orabug: 21149316
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
(cherry picked from commit d751c5e1e6575b1dc119383045ba488e0d30de4d)
(cherry picked from commit 2ecc8426003036609fc447c3cf2dcf54139770cf)

vmcore: quiet zero PT_NOTE warning

When creating a crashdump, this warning shows up on the console
for every offline cpu:

Warning: Zero PT_NOTE entries found

For an ldom, every possible inactive cpu shows up as offline, so let's
just print the warning once.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc64: call crash_kexec() directly from die_if_kernel()

A direct call to crash_kexec() here allows the crashing register state
to be saved to the PT_NOTE. When called from panic(), a new register
state is created which is less useful.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc: After kexec, ldc_bind needs to reset rx_head

Orabug: 21627005

Occasionally, the crash kernel will fail to configure a virtual disk
because the hypervisor leaves an old request in the rx queue even after
it is reconfigured in ldc_bind(). Fix this with a call to ldc_rx_reset().

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc: add FORCE_MAX_ZONEORDER

Allow for a memory allocation large enough to load a kernel for
kexec in contiguous memory.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

reserve memory for elfcorehdr

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc64: crash kernel may only use reserved memory

In order to preserve the memory state of the original kernel, the
crash kernel may only use the memory reserved for it.

kexec-tools passes the reserved memory through the HdrS structure.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc: add crash dump support

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc64: handle booting kernel from shim

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: capture obp information during boot

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: restore prom_cif_stack

Commit ef3e035c stopped using the firmware stack and thus stopped saving
it's location in p1275buf. However, kexec wants to be using the firmware
stack when launching the new kernel.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc64: define CONFIG_KEXEC

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: new files: kexec_shim.S and machine_kexec.c

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

kexec: Make kimage_alloc_pages() available to arch code

The sparc64 code wants to call this directly.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc64: add and call reserve_crashkernel

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

kexec: Add kimage_arch_load_normal_segment to generic code

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: kexec support for head_64.S

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

Add kexec parameters to HdrS

kexec needs to pass some values to the to-be-executed kernel. Like
silo, it finds the HdrS structure and modifies it. Increment the
version to 0x0302.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

kexec: Define KEXEC_ARCH_SPARC64

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: add arch/sparc/include/asm/kexec.h

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: define KEXEC_BASE

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: add sparc64_elf_core_copy_regs

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc: add sun4v_mmu_unmap_perm_addr

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc: add pci_sun4v_msiq_tear_down

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: chip handler IRQ cookie checking

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: check for stopped cpu in smp_boot_one_cpu

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: call set_irq_reqs around generic_smp_call_function_interrupt call

original patch by Bob Picco

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>

sparc64: Fix perf performance counter overflow calculation

If sparc_perf_event_update() is called between performcnce counter
overflow interrupts then everything is fine and the total event
count calculation is correct. If however, the
sparc_perf_event_update() is only called when the performance counter
overflows, we do not take the counter wrap into consideration.
This leaves us with an incorrect value for the total event count.

This patch fixes this issue by taking the counter overflow situation
into consideration.

Orabug: 22607658

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit 6c89361408f964ad2c2c29200987aece3a7c222d)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix for perf event counts sometimes reported as negative numbers

Use an unsigned number to prevent sign extension in the calculation
to work out the difference between the previous and the current
count obtained from the perfomance instrumentation counters.

Orabug: 22607658

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit b0fb8b78a2cc452512296ce5bec1fa927ebf867e)
(cherry picked from commit da8cc212a978a1f54cadadeaabed46d9d5f839b3)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix numa node distance initialization

Orabug: 22495713

Currently, NUMA node distance matrix is initialized only
when a machine descriptor (MD) exists. However, sun4u
machines (e.g. Sun Blade 2500) do not have an MD and thus
distance values were left uninitialized. The initialization
is now moved such that it happens on both sun4u and sun4v.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Tested-by: Mikael Pettersson <mikpelinux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 36beca6571c941b28b0798667608239731f9bc3a)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: fix incorrect sign extension in sys_sparc64_personality

The value returned by sys_personality has type "long int".
It is saved to a variable of type "int", which is not a problem
yet because the type of task_struct->pesonality is "unsigned int".
The problem is the sign extension from "int" to "long int"
that happens on return from sys_sparc64_personality.

For example, a userspace call personality((unsigned) -EINVAL) will
result to any subsequent personality call, including absolutely
harmless read-only personality(0xffffffff) call, failing with
errno set to EINVAL.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 525fd5a94e1be0776fa652df5c687697db508c91)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Add ADI capability to cpu capabilities

Add ADI (Application Data Integrity) capability to cpu capabilities list.
ADI capability allows virtual addresses to be encoded with a tag in
bits 63-60. This tag serves as an access control key for the regions
of virtual address with ADI enabled and a key set on them. Hypervisor
encodes this capability as "adp" in "hwcap-list" property in machine
description.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 82924e542f20e645bc7de86e2889fe3fb0858566)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sunvnet: Initialize network_header and transport_header in vnet_rx_one()

vnet_fullcsum() accesses ip_hdr() and transport header to compute
the checksum for IPv4 packets, so these need to be initialized in
skb created in vnet_rx_one().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Add support for polling to the sunhv serial driver.

Oragbug: 21793591

Signed-off-by: Greg Onufer <greg.onufer@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

Add sun4v_wdt watchdog driver

This driver adds sparc hypervisor watchdog support. Timeout is set in
milliseconds since that is the granularity supported and it honors
the settings of both the watchdog-resolution and watchdog-max-timeout
MD properties.

Note that most watchdog drivers use timeout in seconds. This driver
requires timeout_ms as a module parameter and time in ms.

In this driver, the default is 60000ms or 60 seconds.

This driver also modifies hvcalls.S and changes sun4v_mach0_set_watchdog
such that it allows for NULL to be passed as 2nd parameter. This removes
the need to pass &time_remaining which is not useful.

Signed-off-by: Wim Coekaerts <wim.coekaerts@oracle.com>

sparc64: Make memory allocations ATOMIC to fix lockdep warnings

Orabug: 22392548

Memory allocations are done holding spin_unlock_irqrestore with
non atomic flags. This was caught by lockdep as  below.

Make these allocations ATOMIC as these functions are always
called while holding the lock.

------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2649 lockdep_trace_alloc+0xc0/0xe8()
Modules linked in:
Call Trace:
[000000000047a304] warn_slowpath_common+0x4c/0x6c
[000000000047a340] warn_slowpath_null+0x1c/0x2c
[00000000004b0c70] lockdep_trace_alloc+0xc0/0xe8
[0000000000554b64] kmem_cache_alloc_trace+0x18/0x1a8
[00000000004540c8] ds_add_service_provider+0x58/0x120
[0000000000454204] ds_add_builtin_services+0x74/0xac
[000000000045450c] ds_probe+0x2d0/0x448
[0000000000450d54] vio_device_probe+0xb0/0xd4
[0000000000736e08] driver_probe_device+0x13c/0x234
[0000000000736f60] __driver_attach+0x60/0x8c
[0000000000736440] bus_for_each_dev+0x4c/0x9c
[0000000000736b18] driver_attach+0x1c/0x30
[0000000000735c58] bus_add_driver+0xcc/0x260
[00000000007373d4] driver_register+0xc0/0x170
[0000000000451414] vio_register_driver+0x18/0x40
[0000000000cd2db4] ds_init+0x140/0x170
---[ end trace 139ce121c98e96c9 ]---

Also fixed the deadlock scenario showed by lockdep.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.39-uek2-bm #35
-------------------------------------------------------
ldoms-ds/526 is trying to acquire lock:
(&(&ds->ds_lock)->rlock){-.-...}, at:
[<0000000000455000>] ds_callout_thread+0x148/0x594

but task is already holding lock:
(ds_data_lock){......}, at:
[<0000000000454fe4>] ds_callout_thread+0x12c/0x594

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (ds_data_lock){......}:
       [<00000000008fc1f0>] _raw_spin_lock_irqsave+0x38/0x78
       [<0000000000454538>] ds_probe+0x2fc/0x448
       [<0000000000450d54>] vio_device_probe+0xb0/0xd4
       [<0000000000736e08>] driver_probe_device+0x13c/0x234
       [<0000000000736f60>] __driver_attach+0x60/0x8c
       [<0000000000736440>] bus_for_each_dev+0x4c/0x9c
       [<0000000000736b18>] driver_attach+0x1c/0x30
       [<0000000000735c58>] bus_add_driver+0xcc/0x260
       [<00000000007373d4>] driver_register+0xc0/0x170
       [<0000000000451414>] vio_register_driver+0x18/0x40
       [<0000000000cd2db4>] ds_init+0x140/0x170
       [<0000000000426e30>] do_one_initcall+0x70/0x150
       [<0000000000cca218>] kernel_init+0x100/0x190
       [<000000000042b82c>] kernel_thread+0x38/0x50
       [<00000000008e8474>] rest_init+0x20/0xc8

-> #0 (&(&ds->ds_lock)->rlock){-.-...}:
       [<00000000004b4ac8>] lock_acquire+0xa4/0xbc
       [<00000000008fc1f0>] _raw_spin_lock_irqsave+0x38/0x78
       [<0000000000455000>] ds_callout_thread+0x148/0x594
       [<000000000049ce5c>] kthread+0x64/0x78
       [<000000000042b82c>] kernel_thread+0x38/0x50
       [<000000000049cf84>] kthreadd+0x114/0x168

other info that might help us debug this:

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(ds_data_lock);
                               lock(&(&ds->ds_lock)->rlock);
                               lock(ds_data_lock);
  lock(&(&ds->ds_lock)->rlock);

*** DEADLOCK ***

1 lock held by ldoms-ds/526:
#0:  (ds_data_lock){......}, at:
[<0000000000454fe4>] ds_callout_thread+0x12c/0x594

stack backtrace:
Call Trace:
[00000000004b1f54] print_circular_bug+0x2b4/0x2c4
[00000000004b3d4c] __lock_acquire+0x1428/0x1c08
[00000000004b4ac8] lock_acquire+0xa4/0xbc
[00000000008fc1f0] _raw_spin_lock_irqsave+0x38/0x78
[0000000000455000] ds_callout_thread+0x148/0x594
[000000000049ce5c] kthread+0x64/0x78
[000000000042b82c] kernel_thread+0x38/0x50
[000000000049cf84] kthreadd+0x114/0x168

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Aaron Young <aaron.young@oracle.com>
(cherry picked from commit a065c92b422de2d2c21f81bc6c189bac07f57e8b)

sunvnet: hack to work around Solaris VIO bug

Orabug 21895216

port workaround used for Bug 20455702:

This patch works around a problem where Solaris drops packets bound for
physical NICs (i.e., off host) that are using LSO and do not have the VIO v7
descriptor flags VNET_PKT_HASH, VNET_PKT_HCK_IPV4_HDRCKSUM,
VNET_PKT_HCK_FULLCKSUM set along with VNET_PKT_IPV4_LSO.

This patch can't go upstream because it doesn't actually support output
hashing (all packets will hash to '0'). The full and IPv4 header checksum
computations caused by the flags are unnecessary for Linux, but only affect
destinations through the vswitch.
(cherry picked from commit 3839694e54df457997025775894f954ea3185aff)

sparc64: fix FP corruption in user copy functions

Short story: Exception handlers used by some copy_to_user() and
copy_from_user() functions do not diligently clean up floating point
register usage, and this can result in a user process seeing invalid
values in floating point registers. This sometimes makes the process
fail.

Long story: Several cpu-specific (NG4, NG2, U1, U3) memcpy functions
use floating point registers and VIS alignaddr/faligndata to
accelerate data copying when source and dest addresses don't align
well. Linux uses a lazy scheme for saving floating point registers; It
is not done upon entering the kernel since it's a very expensive
operation. Rather, it is done only when needed. If the kernel ends up
not using FP regs during the course of some trap or system call, then
it can return to user space without saving or restoring them.

The various memcpy functions begin their FP code with VISEntry (or a
variation thereof), which saves the FP regs. They conclude their FP
code with VISExit (or a variation) which essentially marks the FP regs
"clean", ie, they contain no unsaved values. fprs.FPRS_FEF is turned
off so that a lazy restore will be triggered when/if the user process
accesses floating point regs again.

The bug is that the user copy variants of memcpy, copy_from_user() and
copy_to_user(), employ an exception handling mechanism to detect faults
when accessing user space addresses, and when this handler is invoked,
an immediate return from the function is forced, and VISExit is not
executed, thus leaving the fprs register in an indeterminate state,
but often with fprs.FPRS_FEF set and one or more dirty bits. This
results in a return to user space with invalid values in the FP regs,
and since fprs.FPRS_FEF is on, no lazy restore occurs.

This bug affects copy_to_user() and copy_from_user() for NG4, NG2,
U3, and U1. All are fixed by using a new exception handler for those
loads and stores that are done during the time between VISEnter and
VISExit.

n.b. In NG4memcpy, the problematic code can be triggered by a copy
size greater than 128 bytes and an unaligned source address. This bug
is known to be the cause of random user process memory corruptions
while perf is running with the callgraph option (ie, perf record -g).
This occurs because perf uses copy_from_user() to read user stacks,
and may fault when it follows a stack frame pointer off to an
invalid page. Validation checks on the stack address just obscure
the underlying problem.

Orabug: 22506897

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a7c5724b5c17775ca8ea2fd9906d8a7e37337cce)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Don't set %pil in rtrap_nmi too early

Commit 28a1f53 delays setting %pil to avoid potential
hardirq stack overflow in the common rtrap_irq path.
Setting %pil also needs to be delayed in the rtrap_nmi
path for the same reason.

Orabug: 22322473

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ca04a4ce0d5131471c5a1fac76899dc2d9d3f36)