www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
  perf: build TUI by default by pulling in slang and linking it statically
  configs: ol6: set with_headers, with_dtrace defaults to 0
  smartpqi: enable driver in uek config files
  Add the CONFIG_DEBUG_SET_MODULE_RONX option to OL6

Merge branch topic/uek-4.1/pmem of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/pmem: (222 commits)
  libnvdimm, dax: record the specified alignment of a dax-device instance
  libnvdimm, dax: reserve space to store labels for device-dax
  libnvdimm, dax: introduce device-dax infrastructure
  libnvdimm: cleanup nvdimm_namespace_common_probe(), kill 'host'
  mm, dax: fix livelock, allow dax pmd mappings to become writeable
  dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()
  tools/testing/libnvdimm: cleanup mock resource lookup
  block: protect rw_page against device teardown
  fix kABI breakage caused by "block: generic request_queue reference counting"
  block: generic request_queue reference counting
  fix kABI breakage caused by "block: use an atomic_t for mq_freeze_depth"
  block: use an atomic_t for mq_freeze_depth
  dax: guarantee page aligned results from bdev_direct_access()
  dax: increase granularity of dax_clear_blocks() operations
  pmem, dax: clean up clear_pmem()
  xfs: fix recursive splice read locking with DAX
  xfs: per-filesystem stats counter implementation
  xfs: per-filesystem stats in sysfs
  xfs: pass xfsstats structures to handlers and macros
  xfs: consolidate sysfs ops
  ...

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/ofed:
  RDS: don't commit to queue till transport connection is up
  RDS: restrict socket connection reset to CAP_NET_ADMIN
  xsigo: Fix crash in accessing xve proc l2 entries
  xsigo: Fix race in freeing aged Forwarding table entry
  xsigo: Schedule while uninterruptible

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/drivers:
  NVMe: reverse IO direction for VUC command code F7
  Call i40e_client_get_params only after the instance is checked
  scsi: smartpqi: raid bypass lba calculation fix
  scsi: smartpqi: bump driver version
  scsi: smartpqi: add smartpqi.txt
  scsi: smartpqi: update Kconfig
  scsi: smartpqi: remove timeout for cache flush operations
  scsi: smartpqi: scsi queuecommand cleanup
  scsi: smartpqi: minor tweaks to update time support
  scsi: smartpqi: minor function reformating
  scsi: smartpqi: correct event acknowledgment timeout issue
  scsi: smartpqi: correct controller offline issue
  scsi: smartpqi: add kdump support
  scsi: smartpqi: enhance reset logic
  scsi: smartpqi: enhance drive offline informational message
  scsi: smartpqi: simplify spanning
  scsi: smartpqi: change tmf macro names
  scsi: smartpqi: change aio sg processing
  aacraid: remove wildcard for series 9 controllers
  smartpqi: initial commit of Microsemi smartpqi driver

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
  xfs: validate metadata LSNs against log on v5 superblocks
  IB/ipoib: move back IB LL address into the hard header
  net: preserve IP control block during GSO segmentation
  xfs: fix broken multi-fsb buffer logging
  xfs: Split default quota limits by quota type

Merge branch topic/uek-4.1/stable-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/stable-cherry-picks:
  crypto: skcipher - Fix blkcipher walk OOM crash
  crypto: cryptd - initialize child shash_desc on import
  crypto: scatterwalk - Fix test in scatterwalk_done
  crypto: gcm - Filter out async ghash if necessary
  PKCS#7: pkcs7_validate_trust(): initialize the _trusted output argument
  crypto: public_key: select CRYPTO_AKCIPHER
  crypto: hash - Fix page length clamping in hash walk

perf: build TUI by default by pulling in slang and linking it statically

Orabug: 25161079

xfs: validate metadata LSNs against log on v5 superblocks

From a45086e27dfa21a4b39134f7505c8f60a3ecdec4 Mon Sep 17 00:00:00 2001

Since the onset of v5 superblocks, the LSN of the last modification has
been included in a variety of on-disk data structures. This LSN is used
to provide log recovery ordering guarantees (e.g., to ensure an older
log recovery item is not replayed over a newer target data structure).

While this works correctly from the point a filesystem is formatted and
mounted, userspace tools have some problematic behaviors that defeat
this mechanism. For example, xfs_repair historically zeroes out the log
unconditionally (regardless of whether corruption is detected). If this
occurs, the LSN of the filesystem is reset and the log is now in a
problematic state with respect to on-disk metadata structures that might
have a larger LSN. Until either the log catches up to the highest
previously used metadata LSN or each affected data structure is modified
and written out without incident (which resets the metadata LSN), log
recovery is susceptible to filesystem corruption.

This problem is ultimately addressed and repaired in the associated
userspace tools. The kernel is still responsible to detect the problem
and notify the user that something is wrong. Check the superblock LSN at
mount time and fail the mount if it is invalid. From that point on,
trigger verifier failure on any metadata I/O where an invalid LSN is
detected. This results in a filesystem shutdown and guarantees that we
do not log metadata changes with invalid LSNs on disk. Since this is a
known issue with a known recovery path, present a warning to instruct
the user how to recover.

Orabug: 25062171

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

IB/ipoib: move back IB LL address into the hard header

Orabug: 24469379

[ Backport of upstream commit fc791b633515 ]

After the commit 9207f9d45b0a ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel&m=146787279825501&w=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3d736 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.

After this commit, IPoIB performances are back to pre-regression
value.

v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size

Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

net: preserve IP control block during GSO segmentation

Orabug: 24469379

[ Upstream commit 9207f9d45b0ad071baa128e846d7e7ed85016df3 ]

Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.

This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit abefd1b4087b9b5e83e7b4e7689f8b8e3cb2899c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

RDS: don't commit to queue till transport connection is up

A rouge application can flood the send queue by targeting a dead
or non-existing node. Don't commit any messages to the queue
till the legitimate connection to the peer is established.
Let application retry so that only legit connections can
get on to the send queue.

Orabug: 25393611
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: restrict socket connection reset to CAP_NET_ADMIN

Normal users not suppose to need/have access to the transport
connection reset.

Orabug:25393611
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

configs: ol6: set with_headers, with_dtrace defaults to 0

This makes OL6 config match OL7 config for these default %defines. Later
directives conditionally set with_headers if sparc64, and with_dtrace if
sparc64|x86_64, overriding the defaults.

Orabug: 25257401
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>

NVMe: reverse IO direction for VUC command code F7

Orabug: 25258071

Samsung uses D2H command with Vendor Uniq Command (VUC) code F7
(the 0th bit of which is 1) for retrieving memory dump. In UEK4,
Bit 0 of the D2H command code has to be 0. Because of this voilation,
the nvmecli is unable to do crash and memory dumps in UEK4.

As the Samsung firmware can only understand VUC command code F7,
the IO direction is reversed for this vendor command code to
retrieve memory and crash dump.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

xfs: fix broken multi-fsb buffer logging

Orabug: 24400444
Upstream-commit: a3916e528b917851a4d2379e2fd2579ad5f2b5a7

Multi-block buffers are logged based on buffer offset in
xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
the buffer and marks the associated range to be logged in the
xfs_buf_log_format bitmap for that mapping. This code is broken,
however, in that it marks the actual buffer offsets of the associated
range in each bitmap rather than shifting to the byte range for that
particular mapping.

For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
of the second mapping in the buffer. This means byte 0 of the second log
format bitmap should be tagged as dirty. Instead, the current code marks
byte offset 4096 of the second log format bitmap, which is invalid and
potentially out of range of the mapping.

As a result of this, the log item format code invoked at transaction
commit time is not be able to correctly identify what parts of the
buffer to copy into log vectors. This can lead to NULL log vector
pointer dereferences in CIL push context if the item format code was not
able to locate any dirty ranges at all. This crash has been reproduced
on a 4k FSB filesystem using 16k directory blocks where an unlink
operation happened not to log anything in the first block of the
mapping. The logged offsets were all over 4k, marked as such in the
subsequent log format mappings, and thus left the transaction with an
xfs_log_item that is marked DIRTY but without any logged regions.

Further, even when the logged regions are marked correctly in the buffer
log format bitmaps, the format code doesn't copy the correct ranges of
the buffer into the log. This means that any logged region beyond the
first block of a multi-block buffer is subject to corruption after a
crash and log recovery sequence. This is due to a failure to convert the
mapping bm_len field from basic blocks to bytes in the buffer offset
tracking code in xfs_buf_item_format().

Update xfs_buf_item_log() to convert buffer offsets to segment relative
offsets when logging multi-block buffers. This ensures that the modified
regions of a buffer are logged correctly and avoids the aforementioned
crash. Also update xfs_buf_item_format() to correctly track the source
offset into the buffer for the log vector formatting code. This ensures
that the correct data is copied into the log.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: Split default quota limits by quota type

Orabug: 24399524
Upstream-commit: be6079461abf796e29d02b450a16908f4bf58f6c

Default quotas are globally set due historical reasons. IRIX only
supported user and project quotas, and default quota was only
applied to user quotas.

In Linux, when a default quota is set, all different quota types
inherits the same default value.

An user with a quota limit larger than the default quota value, will
still be limited to the default value because the group quotas also
inherits the default quotas. Unless the group which the user belongs
to have a custom quota limit set.

This patch aims to split the default quota value by quota type.
Allowing each quota type having different default values.

Default time limits are still set globally. XFS does not set a
per-user/group timer, but a single global timer. For changing this
behavior, some changes should be made in user-space tools another
bugs being fixed.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xsigo: Fix crash in accessing xve proc l2 entries

Orabug: 25165085

When accessing l2tables using /proc/driver/xve/devices/
system panics if path is created and there is no associated
Transmit QP. Added a check for validating tx structure before
using it.

Reported-by: Jie zhu <jie.x.zhu@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

Call i40e_client_get_params only after the instance is checked

We can avoid the minor bit of work by calling check params after we
check for the client instance, since we're about to return early in
cases where we do not have a client. But, more importantly, this
change prevents a panic from a NULL pointer when attempting to
get the client params.

Orabug: 25159384

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

smartpqi: enable driver in uek config files

Orabug: 25144431

Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: raid bypass lba calculation fix

Orabug: 25144431

In the ioaccel path, the calculation of the starting LBA for
READ(6)/WRITE(6) SCSI commands does not take into account the most
significant 5 bits of the LBA: it only uses the least significant 16
bits of the starting LBA.

Reported-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e018ef572ba4ff17caa9e82d5e1b5cea0d76f903)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: bump driver version

Orabug: 25144431

[mkp: fixed typo]

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 699bed758b1313f97a5ac78848090e8357d12ab1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add smartpqi.txt

Orabug: 25144431

added Documentation/scsi/smartpqi.txt

[mkp: applied by hand]

Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 425b490b2aa745740ea3618e1cdcc2bc37c0d996)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update Kconfig

Orabug: 25144431

The aacraid driver will not managage Microsemi smartpqi controllers, but
will still manage older aacraid devices.

Updated help section.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e8a31ebae1669f05254430d2fced99d77c63fc10)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: remove timeout for cache flush operations

Orabug: 25144431

Some cache flush operations can take longer than the timeout value. Best
to not impose a time limit to handle all cases.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d48f8fad1e435eff26c29e8e109c1a50c441e533)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: scsi queuecommand cleanup

Orabug: 25144431

minor cleanup of scsi queue command function

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7d81d2b8714ec72462a99875acbf2f976402f3f1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: minor tweaks to update time support

Orabug: 25144431

minor tweaks to update time support

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4fbebf1a779d9f6890ddc1df90c497b161dfb34c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: minor function reformating

Orabug: 25144431

reformatted pqi_num_elements_free() to match the rest of the driver

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit df7a1fcfc4761e658b60739e2ff4cd148afcae89)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct event acknowledgment timeout issue

Orabug: 25144431

the driver no longer waits for the firmware to consume
the event ack IU.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5e6429df9c8b3ab9e0a8d18af5248692ebe41871)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct controller offline issue

Orabug: 25144431

Fixes: 6c223761e 'smartpqi: initial commit of Microsemi smartpqi driver'
Fixed a bug where the driver would not free all of the
controller resources if the controller ever went offline.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e57a1f9b2fa4326ec289f1d03c658184ed6addb8)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add kdump support

Orabug: 25144431

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ff6abb7383d2eec6c8c988ff661352e66f245686)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: enhance reset logic

Orabug: 25144431

Eliminated timeout from LUN reset logic.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 14bb215d09de98a8e95fa2bb1b8f35b79672c5df)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: enhance drive offline informational message

Orabug: 25144431

Made a couple of error messages more verbose.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e58081a714275d1490e470bdaf1b5dc23043cf2a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: simplify spanning

Orabug: 25144431

Removed the workaround for the transition to spanning.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 77668f412dbcd6a9dd04c92f0b170c5b5182a5fb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: change tmf macro names

Orabug: 25144431

small change to make code look cleaner

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b17f048658c9b1bc8ac1d9a54b223f740c70f8fd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: change aio sg processing

Orabug: 25144431

Take advantage of controller improvements.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a60eec0251fe1bfc0cd549c073591e6657761158)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

aacraid: remove wildcard for series 9 controllers

Orabug: 25144431

Depends on smartpqi driver adoption

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>

smartpqi: initial commit of Microsemi smartpqi driver

Orabug: 25144431

This initial commit contains Microsemi's smartpqi module.

[mkp: Minor tweaks to apply to 4.9/scsi-queue]

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6c223761eb5482dca2bd981d0a800c4aba3c9009)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xsigo: Fix race in freeing aged Forwarding table entry

Orabug: 25129729

Fixed a race in case of freeing an Aged Forwarding entry
When there is no path associated with a Forwarding entry and
if it ages out uVNIC driver starts aging it ,if a new path with
different Forwarding entry (different mac but same remote guid
and qpn) creates a path present code ends up deleting
that newly created path.

Added a variable del_in_progress to detect and avoid race if
forwarding table is accessed during deletion.

Added protection in one case, while accessing forwarding table entry.
Cleaned up unwanted messages

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

xsigo: Schedule while uninterruptible

Orabug: 25097469

Fix the case where uVNIC driver calls msleep while
holding spinlock in case path creation failure.

Reported-by: Haakon bugge <haakon.bugge@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

Add the CONFIG_DEBUG_SET_MODULE_RONX option to OL6

Set the CONFIG_DEBUG_SET_MODULE_RONX option in the OL6 configuration,
this changes the page permissions of a loadable module, code and RO
data are set to RX, leaving off the write permission for the page,
this causes modify access to be trapped and provides enhanced
security.

Orabug: 24910950
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
Enable config options for IEEE 802.1AE driver

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks: (113 commits)
  packet: fix race condition in packet_set_ring
  net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
  ALSA: pcm : Call kill_fasync() in stream lock
  netlink: Fix dump skb leak/double free
  rcu: Fix soft lockup for rcu_nocb_kthread
  mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
  sctp: validate chunk len before actually using it
  kvm: raise KVM_SOFT_MAX_VCPUS to support more vcpus
  netfilter: nfnetlink: fix splat due to incorrect socket memory accounting in skbuff clones
  netfilter: nfnetlink: avoid recurrent netns lookups in call_batch
  netfilter: nf_tables: fix wrong destroy anonymous sets if binding fails
  netfilter: nf_tables: use reverse traversal commit_list in nf_tables_abort
  ixgbevf: Handle previously-freed msix_entries
  PCI: pciehp: Prioritize data-link event over presence detect
  PCI: pciehp: Leave power indicator on when enabling already-enabled slot
  net: Fix use after free in the recvmmsg exit path
  signals: avoid unnecessary taking of sighand->siglock
  audit: fix a double fetch in audit_log_single_execve_arg()
  KEYS: Fix short sprintf buffer in /proc/keys show function
  tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT
  ...

packet: fix race condition in packet_set_ring

When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.

This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.

The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 84ac7260236a49c79eede91617700174c2c19b0c)

Orabug: 25209594
CVE: CVE-2016-8655
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: avoid signed overflows for SO_{SND|RCV}BUFFORCE

CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...

Note that before commit 82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.

This needs to be backported to all known linux kernels.

Again, many thanks to syzkaller team for discovering this gem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b98b0bc8c431e3ceb4b26b0dfc8db509518fb290)

Orabug: 25203090
CVE: CVE-2016-9793
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ALSA: pcm : Call kill_fasync() in stream lock

Currently kill_fasync() is called outside the stream lock in
snd_pcm_period_elapsed().  This is potentially racy, since the stream
may get released even during the irq handler is running.  Although
snd_pcm_release_substream() calls snd_pcm_drop(), this doesn't
guarantee that the irq handler finishes, thus the kill_fasync() call
outside the stream spin lock may be invoked after the substream is
detached, as recently reported by KASAN.

As a quick workaround, move kill_fasync() call inside the stream
lock.  The fasync is rarely used interface, so this shouldn't have a
big impact from the performance POV.

Ideally, we should implement some sync mechanism for the proper finish
of stream and irq handler.  But this oneliner should suffice for most
cases, so far.

Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 3aa02cb664c5fb1042958c8d1aa8c35055a2ebc4)

Conflicts:

sound/core/pcm_lib.c

Orabug: 25203165
CVE: CVE-2016-9794
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

netlink: Fix dump skb leak/double free

When we free cb->skb after a dump, we do it after releasing the
lock. This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.

This patch saves the skb and module before we unlock so we free
the right memory.

Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 92964c79b357efd980812c4de5c1fd2ec8bb5520)

Orabug: 25203221
CVE: CVE-2016-9806
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

rcu: Fix soft lockup for rcu_nocb_kthread

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
(cherry picked from commit bedc1969150d480c462cdac320fa944b694a7162)

Orabug: 25165170
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]

This fixes CVE-2016-8650.

If mpi_powm() is given a zero exponent, it wants to immediately return
either 1 or 0, depending on the modulus.  However, if the result was
initalised with zero limb space, no limbs space is allocated and a
NULL-pointer exception ensues.

Fix this by allocating a minimal amount of limb space for the result when
the 0-exponent case when the result is 1 and not touching the limb space
when the result is 0.

This affects the use of RSA keys and X.509 certificates that carry them.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
PGD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff8804011944c0 task.stack: ffff880401294000
RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
Stack:
ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
Call Trace:
[<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
[<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
[<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
[<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
[<ffffffff8132a95c>] rsa_verify+0x9d/0xee
[<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
[<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
[<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
[<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
[<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
[<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
[<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
[<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
[<ffffffff812fe227>] SyS_add_key+0x154/0x19e
[<ffffffff81001c2b>] do_syscall_64+0x80/0x191
[<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
RSP <ffff880401297ad8>
CR2: 0000000000000000
---[ end trace d82015255d4a5d8d ]---

Basically, this is a backport of a libgcrypt patch:

http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526

Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
cc: linux-ima-devel@lists.sourceforge.net
cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit f5527fffff3f002b0a6b376163613b82f69de073)

Orabug: 25151496
CVE: CVE-2016-8650
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

sctp: validate chunk len before actually using it

Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.

The fix is to just move the check upwards so it's also validated for the
1st chunk.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bf911e985d6bbaa328c20c3e05f4eb03de11fdd6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/sctp/sm_statefuns.c

Orabug: 25142846
CVE: CVE-2016-9555
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

kvm: raise KVM_SOFT_MAX_VCPUS to support more vcpus

Orabug: 24820591

Despite the name, the kernel's manifest constant KVM_SOFT_MAX_VCPUS
defines the hard maximum number of vcpus that libvirt will support.
(When "--vcpus <n>" is specified on the virt-install command line,
the utility queries the kernel via ioctl(..., KVM_CAP_NR_VCPUS)
and uses the returned value as its limit. Right now, the ioctl
implementation simply returns KVM_SOFT_MAX_VCPUS.)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
(cherry picked from commit dc6eb6241ff41cd2e5e73bc6ebd71a754fb5333c)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
arch/x86/include/asm/kvm_host.h

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

netfilter: nfnetlink: fix splat due to incorrect socket memory accounting in skbuff clones

If we attach the sk to the skb from nfnetlink_rcv_batch(), then
netlink_skb_destructor() will underflow the socket receive memory
counter and we get warning splat when releasing the socket.

$ cat /proc/net/netlink
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff8800ca903000 12  0      00000000 -54144   0        0 2        0        17942
                                     ^^^^^^

Rmem above shows an underflow.

And here below the warning splat:

[ 1363.815976] WARNING: CPU: 2 PID: 1356 at net/netlink/af_netlink.c:958 netlink_sock_destruct+0x80/0xb9()
[...]
[ 1363.816152] CPU: 2 PID: 1356 Comm: kworker/u16:1 Tainted: G        W       4.4.0-rc1+ #153
[ 1363.816155] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
[ 1363.816160] Workqueue: netns cleanup_net
[ 1363.816163]  0000000000000000 ffff880119203dd0 ffffffff81240204 0000000000000000
[ 1363.816169]  ffff880119203e08 ffffffff8104db4b ffffffff813d49a1 ffff8800ca771000
[ 1363.816174]  ffffffff81a42b00 0000000000000000 ffff8800c0afe1e0 ffff880119203e18
[ 1363.816179] Call Trace:
[ 1363.816181]  <IRQ>  [<ffffffff81240204>] dump_stack+0x4e/0x79
[ 1363.816193]  [<ffffffff8104db4b>] warn_slowpath_common+0x9a/0xb3
[ 1363.816197]  [<ffffffff813d49a1>] ? netlink_sock_destruct+0x80/0xb9

skb->sk was only needed to lookup for the netns, however we don't need
this anymore since 633c9a840d0b ("netfilter: nfnetlink: avoid recurrent
netns lookups in call_batch") so this patch removes this manual socket
assignment to resolve this problem.

Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
(cherry picked from commit bd678e09dc1797bc0e2e536b6b268e7cf46e0184)

Orabug: 24749203
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

netfilter: nfnetlink: avoid recurrent netns lookups in call_batch

Pass the net pointer to the call_batch callback functions so we can skip
recurrent lookups.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
(cherry picked from commit 633c9a840d0bf1cce690f3165bdacd8ab412949e)

Orabug: 24749203
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

netfilter: nf_tables: fix wrong destroy anonymous sets if binding fails

When we add a nft rule like follows:
# nft add rule filter test tcp dport vmap {1: jump test}
-ELOOP error will be returned, and the anonymous set will be
destroyed.

But after that, nf_tables_abort will also try to remove the
element and destroy the set, which was already destroyed and
freed.

If we add a nft wrong rule, nft_tables_abort will do the cleanup
work rightly, so nf_tables_set_destroy call here is redundant and
wrong, remove it.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit a02f424863610a0a7abd80c468839e59cfa4d0d8)

Oragbug: 24749203
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

netfilter: nf_tables: use reverse traversal commit_list in nf_tables_abort

When we use 'nft -f' to submit rules, it will build multiple rules into
one netlink skb to send to kernel, kernel will process them one by one.
meanwhile, it add the trans into commit_list to record every commit.
if one of them's return value is -EAGAIN, status |= NFNL_BATCH_REPLAY
will be marked. after all the process is done. it will roll back all the
commits.

now kernel use list_add_tail to add trans to commit, and use
list_for_each_entry_safe to roll back. which means the order of adding
and rollback is the same. that will cause some cases cannot work well,
even trigger call trace, like:

1. add a set into table foo  [return -EAGAIN]:
   commit_list = 'add set trans'
2. del foo:
   commit_list = 'add set trans' -> 'del set trans' -> 'del tab trans'
then nf_tables_abort will be called to roll back:
firstly process 'add set trans':
                   case NFT_MSG_NEWSET:
                        trans->ctx.table->use--;
                        list_del_rcu(&nft_trans_set(trans)->list);

  it will del the set from the table foo, but it has removed when del
  table foo [step 2], then the kernel will panic.

the right order of rollback should be:
  'del tab trans' -> 'del set trans' -> 'add set trans'.
which is opposite with commit_list order.

so fix it by rolling back commits with reverse order in nf_tables_abort.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit a907e36d54e0ff836e55e04531be201bf6b4d8c8)

Orabug: 24749203
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ixgbevf: Handle previously-freed msix_entries

Orabug: 24568240

The msix_entries memory can be freed by a previous suspend or
remove, so don't crash on close when it isn't there. Also only
clear the interrupts when the interface is up, because there
aren't any when it is not up.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit eeffceee421b17a1d484679d738f278fbaa01384)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

PCI: pciehp: Prioritize data-link event over presence detect

If Slot Status indicates changes in both Data Link Layer Status and
Presence Detect, prioritize the Link status change.

When both events are observed, pciehp currently relies on the Slot Status
Presence Detect State (PDS) to agree with the Link Status Data Link Layer
Active status. The Presence Detect State, however, may be set to 1 through
out-of-band presence detect even if the link is down, which creates
conflicting events.

Since the Link Status accurately reflects the reachability of the
downstream bus, the Link Status event should take precedence over a
Presence Detect event. Skip checking the PDC status if we handled a link
event in the same handler.

Orabug: 25312751

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Altered slightly for UEK4.1
(cherry picked from commit 385895fef6b5f4723e33d0e58251c45bc708132d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

PCI: pciehp: Leave power indicator on when enabling already-enabled slot

If an error occurs when enabling a slot, pciehp_power_thread() turns off
the power indicator. But if the only error is that the slot was already
enabled, we should leave the power indicator on.

Return success if called to enable an already-enabled slot.
This is in the same spirit of the special handling for EEXISTS when
pciehp_configure_device() determines the slot devices already exist.

Orabug: 25312751

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
(cherry picked from commit c4ae2adedb38240be5a1a16588406980b948a3e7)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: Fix use after free in the recvmmsg exit path

Orabug: 25298601
CVE: CVE-2016-7117

The syzkaller fuzzer hit the following use-after-free:

  Call Trace:
   [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
   [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
   [<     inline     >] SYSC_recvmmsg net/socket.c:2281
   [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
   [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
  arch/x86/entry/entry_64.S:185

And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.

Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 34b88a68f26a75e4fded796f1a49c40f82234b7d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

signals: avoid unnecessary taking of sighand->siglock

When running certain database workload on a high-end system with many
CPUs, it was found that spinlock contention in the sigprocmask syscalls
became a significant portion of the overall CPU cycles as shown below.

  9.30%  9.30%  905387  dataserver  /proc/kcore 0x7fff8163f4d2
  [k] _raw_spin_lock_irq
            |
            ---_raw_spin_lock_irq
               |
               |--99.34%-- __set_current_blocked
               |          sigprocmask
               |          sys_rt_sigprocmask
               |          system_call_fastpath
               |          |
               |          |--50.63%-- __swapcontext
               |          |          |
               |          |          |--99.91%-- upsleepgeneric
               |          |
               |          |--49.36%-- __setcontext
               |          |          ktskRun

Looking further into the swapcontext function in glibc, it was found that
the function always call sigprocmask() without checking if there are
changes in the signal mask.

A check was added to the __set_current_blocked() function to avoid taking
the sighand->siglock spinlock if there is no change in the signal mask.
This will prevent unneeded spinlock contention when many threads are
trying to call sigprocmask().

With this patch applied, the spinlock contention in sigprocmask() was
gone.

Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c7be96af89d4b53211862d8599b2430e8900ed92)

Orabug: 25255325
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

audit: fix a double fetch in audit_log_single_execve_arg()

Orabug: 25059936
CVE: CVE-2016-6136

There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1].  Of course this leaves a window of
opportunity for an unsavory application to munge with the data.

This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s).  In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).

As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:

* https://github.com/linux-audit/audit-testsuite/issues/25

[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.

[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data.  I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation).  The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.

Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit 43761473c254b45883a64441dd0bc85a42f3645c)

Conflict:

kernel/auditsc.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

KEYS: Fix short sprintf buffer in /proc/keys show function

Orabug: 25306361
CVE: CVE-2016-7042

Fix a short sprintf buffer in proc_keys_show(). If the gcc stack protector
is turned on, this can cause a panic due to stack corruption.

The problem is that xbuf[] is not big enough to hold a 64-bit timeout
rendered as weeks:

(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
$2 = 30500568904943

That's 14 chars plus NUL, not 11 chars plus NUL.

Expand the buffer to 16 chars.

I think the unpatched code apparently works if the stack-protector is not
enabled because on a 32-bit machine the buffer won't be overflowed and on a
64-bit machine there's a 64-bit aligned pointer at one side and an int that
isn't checked again on the other side.

The panic incurred looks something like:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe
CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f
ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6
ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679
Call Trace:
[<ffffffff813d941f>] dump_stack+0x63/0x84
[<ffffffff811b2cb6>] panic+0xde/0x22a
[<ffffffff81352ebe>] ? proc_keys_show+0x3ce/0x3d0
[<ffffffff8109f7f9>] __stack_chk_fail+0x19/0x30
[<ffffffff81352ebe>] proc_keys_show+0x3ce/0x3d0
[<ffffffff81350410>] ? key_validate+0x50/0x50
[<ffffffff8134db30>] ? key_default_cmp+0x20/0x20
[<ffffffff8126b31c>] seq_read+0x2cc/0x390
[<ffffffff812b6b12>] proc_reg_read+0x42/0x70
[<ffffffff81244fc7>] __vfs_read+0x37/0x150
[<ffffffff81357020>] ? security_file_permission+0xa0/0xc0
[<ffffffff81246156>] vfs_read+0x96/0x130
[<ffffffff81247635>] SyS_read+0x55/0xc0
[<ffffffff817eb872>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Reported-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ondrej Kozina <okozina@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 03dab869b7b239c4e013ec82aea22e181e441cfc)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT

Orabug: 24811361

Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ebf5926a001fd0d0d3fdfc2c39c0c8ff0c846c1a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/turbostat: allow user to alter DESTDIR and PREFIX

Orabug: 24811361

When run
make -C tools DESTDIR=/my/nice/dir turbostat_install
get a message
install: cannot create regular file '/usr/bin/turbostat': Permission denied

Allow user to alter DESTDIR and PREFIX variables.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ac485cb4b3b79fb7ef979cb243ee16e1a1ffa767)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: work around RC6 counter wrap

Orabug: 24811361

Sometimes the rc6 sysfs counter spontaneously resets,
causing turbostat prints a very large number
as it tries to calcuate % = 100 * (old - new) / interval

When we see (old > new), print ***.**% instead
of a bogus huge number.

Note that this detection is not fool-proof, as the counter
could reset several times and still result in new > old.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 9185e988e9d5bb70b690362e84bb2e4a9d71f2c5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: initial KBL support

Orabug: 24811361

KBL is similar to SKL

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit cdc57272ea0a0e952c4609b56e157e4d0ec8e956)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: initial SKX support

Orabug: 24811361

SKX has a lot in common with HSX

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec53e594c65ab099ca784d62b6f4c191e3a4d7cc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: decode BXT TSC frequency via CPUID

Orabug: 24811361

Hard-code BXT ART to 19200MHz, so turbostat --debug
can fully enumerate TSC:

CPUID(0x15): eax_crystal: 3 ebx_tsc: 186 ecx_crystal_hz: 0
TSC: 1190 MHz (19200000 Hz * 186 / 3 / 1000000)

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e8efbc80db5e824ce2382d5e65429b6b493e71e2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: initial BXT support

Orabug: 24811361

Broxton has a lot in common with SKL

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e4085d543e256aff6606ba99ed257f7c06685f3b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: print IRTL MSRs

Orabug: 24811361

Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.

IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 5a63426e2a18775ed05b20e3bc90c68bacb1f68a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: SGX state should print only if --debug

Orabug: 24811361

The CPUID.SGX bit was printed, even if --debug was used

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 8ae7225591fd15aac89769cbebb3b5ecc8b12fe5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: bugfix: TDP MSRs print bits fixing

Orabug: 24811361

MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF

MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF

And the same modification to MSR_CONFIG_TDP_LEVEL_2.

MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 685b535b2cdb9cdf354321f8af9ed17dcf19d19f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump

Orabug: 24811361

MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6c34f160df82ae07bfff5b4c51f50622e9c2759e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: call __cpuid() instead of __get_cpuid()

Orabug: 24811361

turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().

syntax only, no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 5aea2f7f645b27635b856311dee5b775d277c686)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: indicate SMX and SGX support

Orabug: 24811361

SGX presence is related to a SKL power workaround,
so lets show when that is enabled.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit aa8d8cc79af16e16da04efff1c1a72b1ea4a9e7e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: detect and work around syscall jitter

Orabug: 24811361

The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.

When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.

A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.

For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 0102b06747c7d24e334d2b27c4b43eed693676f1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: show GFX%rc6

Orabug: 24811361

The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.

This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit fdf676e51f301d207586d9bac509b8ce055bae8a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: show GFXMHz

Orabug: 24811361

Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz

This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 27d47356b6dfa92042a17a0b474f08910d4c8e8f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: show IRQs per CPU

Orabug: 24811361

The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.

The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 562a2d377bb9882c49debc9e1be7127a1717e242)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: make fewer systems calls

Orabug: 24811361

skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.

The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 36229897ba966bb0dc9e060222ff17b198252367)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix compiler warnings

Orabug: 24811361

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 58cc30a4e6086d542302e04eea39b2a438e4c5dd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: add --out option for saving output in a file

Orabug: 24811361

By default...

Turbostat --debug gconfiguration info goes to stderr.

In FORK mode, turbostat statistics go to stderr.

In PERIODIC mode, turbostat statistics go to stdout.

These defaults do not change, but an option "--out file"
will send all output above only to the specified file.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b7d8c1483bbf6ec9d2dd76d6a1c91a38c3f6ac35)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: re-name "%Busy" field to "Busy%"

Orabug: 24811361

some tools processing turbostat output
have difficulty with items that begin with %...

Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 75d2e44e60490ba1fee076a5f4dcfbdc8598e8c4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding

Orabug: 24811361

Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
  for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
  parsing code
- added x200 to list of architectures that do not support Nahlem compatible
  definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
  bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
  (logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
  same order as implementations for other platforms

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit cbf97abaf3689652bcddc0741dc49629d1838142)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: Intel Xeon x200: fix erroneous bclk value

Orabug: 24811361

x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.

prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 121b48bb77187cf2ed3053e147d2e6c1e864083c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: allow sub-sec intervals

Orabug: 24811361

turbostat -i interval_sec

will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 2a0609c02e6558df6075f258af98a54a74b050ff)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix various build warnings

Orabug: 24811361

When building with gcc 6 we're getting various build warnings that just
require some trivial function declaration and call fixes:

  turbostat.c: In function ‘dump_cstate_pstate_config_info’:
  turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
   dump_cstate_pstate_config_info(family, model)
  turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
  turbostat.c: In function ‘get_tdp’:
  turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
   double get_tdp(model)
  turbostat.c: In function ‘perf_limit_reasons_probe’:
  turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
   void perf_limit_reasons_probe(family, model)
  turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit 1b69317d2dc80bc8e1d005e1a771c4f5bff3dabe)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: Decode MSR_MISC_PWR_MGMT

Orabug: 24811361

This MSR is helpful to show if P-state HW coordination
is enabled or disabled.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit f0057310b40efe9f797ff337e9464e6a6fb9d782)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: decode HWP registers

Orabug: 24811361

# turbostat --debug
...
CPUID(6): ... HWP, HWPnotify, HWPwindow, HWPepp, HWPpkg ...
...
cpu0: MSR_PM_ENABLE: 0x00000001 (HWP)
cpu0: MSR_HWP_CAPABILITIES: 0x01050916 (high 0x16 guar 0x9 eff 0x5 low 0x1)
cpu0: MSR_HWP_REQUEST: 0x80001604 (min 0x4 max 0x16 des 0x0 epp 0x80 window 0x0 pkg 0x0)
cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, Dis_Excursion_Min)
cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, No-Excursion_Min)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 7f5c258e1ce1e0909d3195694ac79f143051e513)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency

Orabug: 24811361

This CPUID leaf is available on Skylake:

CPUID(0x16): base_mhz: 1500 max_mhz: 2200 bus_mhz: 100

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 61a87ba7893a256d86c7eea6a7ab10d38ccac9b2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: decode more CPUID fields

Orabug: 24811361

for debugging, dump a few more fields:

CPUID(1): SSE3 MONITOR EIST TM2 TSC MSR ACPI-TM TM

cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MONITOR)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 69807a638f91524ed75027f808cd277417ecee7a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: use new name for MSR_PLATFORM_INFO

Orabug: 24811361

MSR_PLATFORM_INFO is the new name for MSR_NHM_PLATFORM_INFO

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec0adc539b8bf59b7c00db0748671f6594b77843)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: bugfix: print MAX_NON_TURBO_RATIO

Orabug: 24811361

MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=6 lock=0)
should print all 7 bits of MAX_NON_TURBO_RATIO (in decimal):
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=22 lock=0)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 759d2a932b82009a7039ef5567e7dcba153ce123)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: simplify Bzy_MHz calculation

Orabug: 24811361

Bzy_MHz = TSC_delta*tsc_tweak/APERF_delta/MPERF_delta/measurement_interval

becomes

Bzy_MHz = base_mhz/APERF_delta/MPERF_delta

on systems which support MSR_NHM_PLATFORM_INFO.

base_mhz is calculated directly from the base_ratio
reported in MSR_NHM_PLATFORM_INFO * bclk,
and bclk is discovered via MSR or cpuid.

This reduces the dependency of Bzy_MHz calculation on the TSC.
Previously, there were 4 TSC readings required in each caculation,
the raw TSC delta combined with the measurement_interval.
This also removes the "tsc_tweak" correction factor used when
TSC runs on a different base clock from the CPU's bclk.

After this change, tsc_tweak is used only for %Busy.

The end-result should be a Bzy_MHz result slightly less prone to jitter.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 21ed5574d1622118b49b0c6342acc8d27d0799be)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: SKL: Adjust for TSC difference from base frequency

Orabug: 24811361

On a Skylake with 1500MHz base frequency,
the TSC runs at 1512MHz.

This is because the TSC is no longer in the n*100 MHz BCLK domain,
but is now in the m*24MHz crystal clock domain. (24 MHz * 63 = 1512 MHz)

This adds error to several calculations in turbostat,
unless the TSC sample sizes are adjusted for this difference.

Note that calculations in the time domain are immune
from this issue, as the timing sub-system has already
calibrated the TSC against a known wall clock.

AVG_MHz = APERF_delta/measurement_interval

need no adjustment.  APERF_delta is in the BCLK domain,
and measurement_interval is in the time domain.

TSC_MHz  =  TSC_delta/measurement_interval

needs no adjustment -- as we really do want to report
the actual measured TSC delta here, and measurement_interval
is in the accurate time domain.

%Busy = MPERF_delta/TSC_delta

needs adjustment to use TSC_BCLK_DOMAIN_delta.
TSC_BCLK_DOMAIN_delta = TSC_delta * base_hz / tsc_hz

Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/measurement_interval

need adjustment as above.

No other metrics in turbostat need to be adjusted.

Before:

     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -     550   24.84    2216    1512
       0    2191   98.73    2219    1514
       2       0    0.01    2130    1512
       1       9    0.43    2016    1512
       3       2    0.08    2016    1512

After:

     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -     550   25.05    2198    1512
       0    2190   99.62    2199    1512
       2       0    0.01    2152    1512
       1       9    0.46    2000    1512
       3       2    0.10    2000    1512

Note that in this example, the "Before" Bzy_MHz
was reported as exceeding the 2200 max turbo rate.
Also, even a pinned spin loop would not be reported
as over 99% busy.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a2b7b74945dbfe5d734eafe8aa52f9f1f8bc6931)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: KNL workaround for %Busy and Avg_MHz

Orabug: 24811361

KNL increments APERF and MPERF every 1024 clocks.
This is compliant with the architecture specification,
which requires that only the ratio of APERF/MPERF need be valid.

However, turbostat takes advantage of the fact that these
two MSRs increment every un-halted clock
at the actual and base frequency:

AVG_MHz = APERF_delta/measurement_interval

%Busy = MPERF_delta/TSC_delta

This quirk is needed for these calculations to also work on KNL,
which would otherwise show a value 1024x smaller than expected.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b2b34dfe4d9aa4c468fc363b3b666974783ed1f9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: IVB Xeon: fix --debug regression

Orabug: 24811361

Staring in Linux-4.3-rc1,
commit 6fb3143b561c ("tools/power turbostat: dump CONFIG_TDP")
touches MSR 0x648, which is not supported on IVB-Xeon.
This results in "turbostat --debug" exiting on those systems:

turbostat: /dev/cpu/2/msr offset 0x648 read failed: Input/output error

Remove IVB-Xeon from the list of machines supporting with that MSR.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 756357b8e4b072fd5ee86421f794e071a348802b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix typo on DRAM column in Joules-mode

< RAM_W
> RAM_J

Reported-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bd6906ed3d7a00d55c9bd368a640ef83bb487d1d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix parameter passing for forked command

Orabug: 24811361

turbostat supports forked command when sampling cpu state. However,
the forked command is not allowed to be executed with options, otherwise
turbostat might regard these options as invalid turbostat options.

For example:

./turbostat stress -c 4 -t 10
./turbostat: unrecognized option '-t'

Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a01e72fbc41e322ed229465de8b595a7e3ec6549)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: dump CONFIG_TDP

Orabug: 24811361

Config TDP is a feature that allows parts to be configured
for different thermal limits after they have left the factory.

This can have an effect on the operation of the part,
particularly in determiniing...

Max Non-turbo Ratio
Turbo Activation Ratio

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6fb3143b561c4a7865e5513eeb02d42ef38e8173)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: cpu0 is no longer hard-coded, so update output

Orabug: 24811361

The --debug option reads a number of per-package MSRs.
Previously we explicitly read them on cpu0, but recently
turbostat changed to read them on the current "base_cpu".

Update the print-out to reflect base_cpu, rather than
the hard-coded cpu0.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bfae2052265cde825afaba35eb3a4d3889432734)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>