www.infradead.org Git - users/jedix/linux-maple.git/log

fm10k: Correct MTU for jumbo frames

Based on hardware testing, the host interface supports up to 15368 bytes
as the maximum frame size. To determine the correct MTU, we subtract 8
for the internal switch tag, 14 for the L2 header, and 4 for the
appended FCS header, resulting in 15342 bytes of payload for our maximum
MTU on jumbo frames.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 8c7ee6d2cacc7794a91875ef5fd8284b4a900d8c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: do not assume VF always has 1 queue

It is possible that the PF has not yet assigned resources to the VF.
Although rare, this could result in the VF attempting to read queues it
does not own and result in FUM or THI faults in the PF. To prevent this,
check queue 0 before we continue in init_hw_vf.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 1340181fe435ccb8ca2f996b8680bd9566860619)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: fix memory leak

This was detected by Coverity.
The function skb_cow_head leaves skb alone on failure, so caller needs
to free.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25394529
(cherry picked from commit 6f97532ef05e49f1998a09f8359b83d00a7b3229)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: use snprintf() instead of sprintf() to avoid buffer overflow

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit f6f19f8bb9afcd0e9970fe51b5affa3063af4499)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: TRIVIAL remove unnecessary comma

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 15aa49cb99c128e1484b6373382e264588067cab)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: create "correct" header for the remote end on connect

When we connect to the mailbox, we insert a fake disconnect header so
that the code does not see an invalid header and thus instantly error
every time we bring up the mailbox. However, we incorrectly record the
tail and head from the local perspective. Since the remote end shouldn't
have anything for us, add a "create_fake_disconnect_hdr" function which
inverts the TAIL and HEAD fields. This enables us to connect without any
errors of either TAIL or HEAD incorrectness, and prevents creating
extraneous error messages. This is necessary now since mbx_reset_work
does not actually reset the Tx FIFO head and tail pointers, thus head
and tail might not be equivalent on a reconnect.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit afadfd224f53106e4fd52f3b6885a93431a5a6f5)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: drop transmitted messages in Tx FIFO as part of reset_work

This patch fixes a corner case issue with the PF/VF mailbox code.
Currently, fm10k_mbx_reset_work clears various state about the mailbox.
However, it does not clear the Tx FIFO head/tail pointers. We can't
simply clear these pointers as we unintentionally drop untransmitted
messages without error.

Doing nothing results in a possible phantom re-transmission of messages,
since we leave tx.head and tx.tail intact, but clear the tx_pulled and
tail_len values. This means that the PF could continuously re-send a
message which triggers a reset in the VF. Upon reset, the VF will
re-receive the same message after a reconnect.

If we reset the tx.head and tx.tail pointers completely, we end up
dropping some messages that were pending before connect. This results in
missing LPORT_MSG_READY bits, and VFs will end up reporting no link.

However, we can resolve both issues by simply incrementing head to
account for the already transmitted messages, before we reset tx_pulled.
We do this via the same logic as fm10k_mbx_head_pull.

We account for the tail_len which includes all data not yet transmitted,
once we account for the acked data which means re-reading the HEAD
variable from the message header. Then, we drop messages until we've
dropped more than the new tx_pulled value. At this point, resetting
tail_len and tx_pulled, but not tx.head and tx.tail will result in
prevention of the phantom message. It also prevents us from dropping
untransmitted messages upon attempting to Tx into a connect or
disconnect header.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 4b09728e9d34170c375f41d9b454f067ce39591f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: remove comment about rtnl_lock around mbx operations

This comment is no longer true due to a couple of mailbox locking
refactors, and we now don't actually do any rtnl protected operations
directly in the mailbox path. Remove this comment as it is factually
incorrect and confusing.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 8427672abdc9e405f86755fad5a035511a5b1534)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: fix iov_msg_mac_vlan_pf VID checks

The VF will send a message to request multicast addresses with the
default VID. In the current code, if the PF has statically assigned a
VLAN to a VF, then the VF will not get the multicast addresses. Fix up
all of the various VLAN messages to use identical checks (since each
check was different). Also use set as a variable, so that it simplifies
our check for whether VLAN matches the pf_vid.

The new logic will allow set of a VLAN if it is zero, automatically
converting to the default VID. Otherwise it will allow setting the PF
VID, or any VLAN if PF has not statically assigned a VLAN. This is
consistent behavior, and allows VF to request either 0 or the
default_vid without silently failing.

Note that we need the check for zero since VFs might not get the default
VID message in time to actually request non-zero VLANs.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 9adbac599a71bc25a2617850ffcaa4388dc5c20d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: Only trigger data path reset if fabric is up

This change makes it so that we only trigger the data path reset if the
fabric is ready to handle traffic. The general idea is to avoid
triggering the reset unless the switch API is ready for us. Otherwise
we can just postpone the reset until we receive a switch ready
notification.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit ac98100359e098d03dbd98783ca4becaf2ea7ec3)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: re-enable VF after a full reset on detection of a Malicious event

Modify behavior of Malicious Driver Detection events. Presently, the
hardware disables the VF queues and re-assigns them to the PF. This
causes the VF in question to continuously Tx hang, because it assumes
that it can transmit over the queues in question. For transient events,
this results in continuous logging of malicious events.

New behavior is to reset the LPORT and VF state, so that the VF will
have to reset and re-enable itself. This does mean that malicious VFs
will possibly be able to continue and attempt malicious events again.
However, it is expected that system administrators will step in and
manually remove or disable the VF in question.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 95f4f8da644256d8c0ff5bab2c93ba33d2a42cd8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: TRIVIAL fix typo in fm10k_netdev.c

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 5c2d642fd0cf0f4f7396dfd3754bc03f6e50e359)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: TRIVIAL fix up ordering of __always_unused and style

Fix some style issues in debugfs code, and correct ordering of void and
__always_unused. Technically, the order does not matter, but preferred
style is to put the macro between the type and name.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 6fbc6b358b3f63e701d15e9d330a3b0dbb03f802)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: remove is_slot_appropriate

This function is no longer used now that we have updated fm10k_slot_warn
functionality.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 855c40fc31ca2392845558235f7e92e406936fbe)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: don't store sw_vid at reset

If we store the sw_vid at reset of PF, then we accidentally prevent the
VF from receiving the message to update its default VID. This only
occurs if the VF is created before the PF has come up, which is the
standard way of creating VFs when using the module parameter.

This fixes an issue where we request the incorrect MAC/VLAN
combinations, and prevents us from accidentally reporting some frames as
VLAN tagged.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit b655a5c735867c1a80e7ceaa1f1c45d1dd34909f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: Report MAC address on driver load

This change adds the MAC address to the list of values recorded on driver
load. The MAC address represents the serial number of the unit and allows
us to track the value should a card be replaced in a system.

The log message should now be similar in output to that of ixgbe.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 0ff36676a3778d0655933ace201fca7c11b4e8b5)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: update netdev perm_addr during reinit, instead of at up

Update the netdev permanent address during fm10k_reinit enables the user
to immediately see the new MAC address on the VF even if the device
isn't up. The previous code required that the device by opened before
changes would appear.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit bdc7f5902d22e7297b8f0d7a8d6ed8429cdba636)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: update fm10k_slot_warn to use pcie_get_minimum link

This is useful in cases where we connect to a slot at Gen3, but the slot
is behind a bus which only connected at Gen2. This generally only
happens when a PCIe switch is in the sequence of devices, and can be
very confusing when you see slow performance with no obvious cause.

I am aware this patch has a few lines that break 80 characters, but
there does not seem to be a readable way to format them to less than 80
characters. Suggestions welcome.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 106c07a49506359f7663770ef33f4997d68316c1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: disable service task during suspend

The service task reads some registers as part of its normal routine,
even while the interface is down. Normally this is ok. However, during
suspend we have disabled the PCI device. Due to this, registers will
read in the same way as a surprise-remove event. Disable the service
task while we suspend, and re-enable it after we resume. If we don't do
this, the device could be UP when you suspend and come back from resume
as closed (since fm10k closes the device when it gets a surprise
remove).

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit e40296628bec7400f529927eef4bc87cb425a22a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: fix iov_msg_lport_state_pf issue

When a VF issues an LPORT_STATE request to enable a port that is already
enabled, the PF will first disable the VF LPORT. Then it should
re-enable the VF again with the new requested settings. This ensures
that any switch rules are cleared by deleting the LPORT on the switch.
However, the flow is bugged because we actually check if the VF is
enabled at the end, and thus don't re-enable it. Fix the flow so that we
actually clear the enabled flags as part of our removal of the LPORT.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit ee4373e7d74696821e47faf1b70f779697ddf77b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: remove err_no reference in fm10k_mbx.c

The reference to err_no was left around after a previous code refactor.
We never use the value, and it doesn't seem to be used in side a hidden
macro reference. Discovered via cppcheck.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit d18c438884137609bf838a703972126862db97f4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: fix incorrect DIR_NEVATIVE bit in 1588 code

The SYSTIME_CFG.Adjust Direction bit is actually supposed to indicate
that the adjustment is positive. Fix the code to align correctly with
hardware and documentation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 646725a7c9cbdefd8df4ae7b3217a992ad64a4cd)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: pack TLV overlay structures

This patch adds the __attribute__((packed)) indicator to some structures
which are overlayed onto a TLV message. These structures must be packed
as small as possible in order to correctly align when copied into the
mailbox buffer. Without doing so, the receiving mailbox code incorrectly
parses the values and we get invalid message responses from the switch
manager software.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 7fef39322ce7a0c1bbb5b48bef61b6c1aef73b96)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: re-map all possible VF queues after a VFLR

During initialization, the VF counts its rings by walking the TQDLOC
registers. This works only if the TQMAP/RQMAP registers are set to map
all of the out-of-bound rings back to the first one. This allows the VF
to cleanly detect when it has run out of queues. Update the PF code so
that it resets the empty TQMAP/RQMAP registers post-VFLR to prevent
innocent VF drivers from triggering malicious driver events.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit fba341d5cab38db68eed061cf20e161d967795c6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: force LPORT delete when updating VLAN or MAC address

Currently, we don't notify the switch at all when the PF
administratively sets a new VLAN or MAC address. This causes the old
addresses to remain valid on the switch table. Since the PF is
overriding any configuration done directly by the VF, we choose to
simply re-create the LPORT for the VF. This does mean that all rules for
the VF will be dropped when we set something directly via the PF, but it
prevents some weird issues where the MAC/VLAN table retains some stale
configuration.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit a38488f54004330071f4ec90c3cf52dc7646468e)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: use dma_set_mask_and_coherent in fm10k_probe

This patch cleans up the use of dma_get_required_mask and uses the
simpler dma_set_mask_and_coherent function instead of doing these as
separate steps.

I removed the dma_get_required_mask call because based on some minimal
testing it appears that either (a) we're not doing the right thing with
the call or (b) we don't need it anyways. If the value returned is
<48bits, we'll end up trying with 48 bits anyways. If it's over 48bits,
fm10k can't support that anyways, and we should try 48bits. If 48bits
fails, we'll fallback to 32bits. This cleans up some very funky code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit c04ae58e2b6d836325914e7b4aa25f43da1be3df)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: trivial fixup message style to include a colon

Also use %d for error values, since printing in hexadecimal is probably
not helpful.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 0197cde62a0a39fdee606557b8cef6573a83e866)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: remove extraneous NULL check on l2_accel

l2_accel was checked for NULL at the top of fm10k_dfwd_del_station, and
we return if it is not defined. Due to this, we already know it can't be
null here so a separate check is meaningless. Discovered via cppcheck.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit f1f3322eb411ce875b8b9e2de59ee8d55fb3a33b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: add call to fm10k_clean_all_rx_rings in fm10k_down

This prevents a memory leak in fm10k_set_ringparams. The leak occurs
because we go down, change ring parameters, and then come up. However,
fm10k_down on its own is not clearing the Rx rings. Since fm10k_up
assumes the rings are clean we basically drop the buffers and leak a
bunch of memory. Eventually we hit dirty page faults and reboot the
system. This issue does not occur elsewhere because other flows that
involve fm10k_down go through fm10k_close which immediately called
fm10k_free_all_rx_resources which properly cleans the rings.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit ec6acb801e7b2908c24a60c8aabf47c3e40508a4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: fix incorrect free on skb in ts_tx_enqueue

This patch resolves a bug in the ts_tx_enqueue code responsible for a
NULL pointer dereference and invalid access of the skb list. We
incorrectly freed the actual skb we found instead of our copy. Thus the
skb queue is essentially invalidated. Resolve this by freeing our clone
in the cases where we did not add it to the queue. This also avoids the
skb memory leak caused by failure to free the clone.

[  589.719320] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  589.722344] IP: [<ffffffffa0310e60>] fm10k_ts_tx_subtask+0xb0/0x160 [fm10k]
[  589.723796] PGD 0
[  589.725228] Oops: 0000 [#1] SMP

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit c23544b196e72716244108fb173f2965e9eafd1a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: move setting shinfo inside ts_tx_enqueue

This patch simplifies the code flow for setting the IN_PROGRESS bit of
the shinfo for an skb we will be timestamping.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit e075996ebd14dcd3d1a45172316141fe26d891fa)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: use correct ethernet driver Tx timestamp function

skb_complete_tx_timestamp is intended for use by PHY drivers which
implement a different method of returning timestamps. This method is
intended to be used after a PHY driver accepts a cloned packet via its
phy_driver.txtstamp function. It is not correct to use in the standard
ethernet driver such as fm10k. This patch fixes the following possible
kernel panic.

[ 2744.552896] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  OE  3.19.3-200.fc21.x86_64 #1
[ 2744.552899] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.8x23.060520140825 06/05/2014
[ 2744.552901]  0000000000000000 2f4c8b10ea3f9848 ffff88081ee03a38 ffffffff8176e215
[ 2744.552906]  0000000000000000 0000000000000000 ffff88081ee03a78 ffffffff8109bc1a
[ 2744.552910]  ffff88081ee03c50 ffff88080e55fc00 ffff88080e55fc00 ffffffff81647c50
[ 2744.552914] Call Trace:
[ 2744.552917]  <IRQ>  [<ffffffff8176e215>] dump_stack+0x45/0x57
[ 2744.552931]  [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
[ 2744.552936]  [<ffffffff81647c50>] ? skb_queue_purge+0x20/0x40
[ 2744.552941]  [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
[ 2744.552946]  [<ffffffff81646911>] skb_release_head_state+0xe1/0xf0
[ 2744.552950]  [<ffffffff81647b26>] skb_release_all+0x16/0x30
[ 2744.552954]  [<ffffffff81647ba6>] kfree_skb+0x36/0x90
[ 2744.552958]  [<ffffffff81647c50>] skb_queue_purge+0x20/0x40
[ 2744.552964]  [<ffffffff81751f8d>] packet_sock_destruct+0x1d/0x90
[ 2744.552968]  [<ffffffff81642053>] __sk_free+0x23/0x140
[ 2744.552973]  [<ffffffff81642189>] sk_free+0x19/0x20
[ 2744.552977]  [<ffffffff81647d60>] skb_complete_tx_timestamp+0x50/0x60
[ 2744.552988]  [<ffffffffa02eee40>] fm10k_ts_tx_hwtstamp+0xd0/0x100 [fm10k]
[ 2744.552994]  [<ffffffffa02e054e>] fm10k_1588_msg_pf+0x12e/0x140 [fm10k]
[ 2744.553002]  [<ffffffffa02edf1d>] fm10k_tlv_msg_parse+0x8d/0xc0 [fm10k]
[ 2744.553010]  [<ffffffffa02eb2d0>] fm10k_mbx_dequeue_rx+0x60/0xb0 [fm10k]
[ 2744.553016]  [<ffffffffa02ebf98>] fm10k_sm_mbx_process+0x178/0x3c0 [fm10k]
[ 2744.553022]  [<ffffffffa02e09ca>] fm10k_msix_mbx_pf+0xfa/0x360 [fm10k]
[ 2744.553030]  [<ffffffff811030a7>] ? get_next_timer_interrupt+0x1f7/0x270
[ 2744.553036]  [<ffffffff810f2a47>] handle_irq_event_percpu+0x77/0x1a0
[ 2744.553041]  [<ffffffff810f2bab>] handle_irq_event+0x3b/0x60
[ 2744.553045]  [<ffffffff810f5d6e>] handle_edge_irq+0x6e/0x120
[ 2744.553054]  [<ffffffff81017414>] handle_irq+0x74/0x140
[ 2744.553061]  [<ffffffff810bb54a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 2744.553066]  [<ffffffff8177777f>] do_IRQ+0x4f/0xf0
[ 2744.553072]  [<ffffffff8177556d>] common_interrupt+0x6d/0x6d
[ 2744.553074]  <EOI>  [<ffffffff81609b16>] ? cpuidle_enter_state+0x66/0x160
[ 2744.553084]  [<ffffffff81609b01>] ? cpuidle_enter_state+0x51/0x160
[ 2744.553087]  [<ffffffff81609cf7>] cpuidle_enter+0x17/0x20
[ 2744.553092]  [<ffffffff810de101>] cpu_startup_entry+0x321/0x3c0
[ 2744.553098]  [<ffffffff81764497>] rest_init+0x77/0x80
[ 2744.553103]  [<ffffffff81d4f02c>] start_kernel+0x4a4/0x4c5
[ 2744.553107]  [<ffffffff81d4e120>] ? early_idt_handlers+0x120/0x120
[ 2744.553110]  [<ffffffff81d4e4d7>] x86_64_start_reservations+0x2a/0x2c
[ 2744.553114]  [<ffffffff81d4e62b>] x86_64_start_kernel+0x152/0x175

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 608bb146ff48588c43bf6332912ea8796768fb39)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: ignore invalid multicast address entries

This change fixes an issue with adding an invalid multicast address
using the iproute2 tool (ip maddr add <MADDR> dev <dev>). The iproute2
tool and the kernel do not validate or filter the multicast addresses
when adding them to the multicast list. Thus, when synchronizing this
list with an invalid entry, the action will be aborted with an error
since the fm10k driver currently validates the list. Consequently,
multicast entries beyond the invalid one will not be processed and
communicated with the switch via the mailbox. This change makes it so
that invalid addresses will simply be skipped and allows synchronizing
the full list to proceed.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 745136a8b72d638f3ee53a2b6a63f11c64a59937)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: fold fm10k_pull_tail into fm10k_add_rx_frag

This change folds the fm10k_pull_tail call into fm10k_add_rx_frag. The
advantage to doing this is that the fragment doesn't have to be modified
after it is added to the skb.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 1a8782e59ff9bb0ed4b75d6a346a2ed212bd2031)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: bump driver version

We haven't bumped the driver version in a while despite many fixes being
pulled in from the out-of-tree Sourceforge driver. Update the version to
match.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit e3b6e95d070cbca5e82279fea99e2dba8e38f960)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: TRIVIAL cleanup order at top of fm10k_xmit_frame

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 03d13a51fb4494f3bf47f65e2be00e56c36d2b63)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: Update adaptive ITR algorithm

The existing adaptive ITR algorithm is overly restrictive. It throttles
incorrectly for various traffic rates, and does not produce good
performance. The algorithm now allows for more interrupts per second,
and does some calculation to help improve for smaller packet loads. In
addition, take into account the new itr_scale from the hardware which
indicates how much to scale due to PCIe link speed.

Reported-by: Matthew Vick <matthew.vick@intel.com>
Reported-by: Alex Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 242722dd3d0af32703d4ebb4af63c92a2c85b835)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: introduce ITR_IS_ADAPTIVE macro

Define a macro for identifying when the itr value is dynamic or
adaptive. The concept was taken from i40e. This helps make clear what
the check is, and reduces the line length to something more reasonable
in a few places.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 584373f5b98aed81ff5a432d91b6e16d7554a5c9)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: Fix handling of NAPI budget when multiple queues are enabled per vector

This patch corrects an issue in which the polling routine would increase
the budget for Rx to at least 1 per queue if multiple queues were present.
This would result in Rx packets being processed when the budget was 0 which
is meant to indicate that no Rx can be handled.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 9f872986479b6e0543eb5c615e5f9491bb04e5c1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

drivers/net/intel: use napi_complete_done()

As per Eric Dumazet's previous patches:
(see commit (24d2e4a50737) - tg3: use napi_complete_done())

Quoting verbatim:
Using napi_complete_done() instead of napi_complete() allows
us to use /sys/class/net/ethX/gro_flush_timeout

GRO layer can aggregate more packets if the flush is delayed a bit,
without having to set too big coalescing parameters that impact
latencies.
</end quote>

Tested
configuration: low latency via ethtool -C ethx adaptive-rx off
rx-usecs 10 adaptive-tx off tx-usecs 15
workload: streaming rx using netperf TCP_MAERTS

igb:
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result:  941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589

Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
Local  Remote  Local  Remote  Xfered   Per                 Per
Recv   Send    Recv   Send             Recv (avg)          Send (avg)
    8       8      0       0 1176930056  1475.36    797726   16384.00  71905

MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result:  941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763

Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
Local  Remote  Local  Remote  Xfered   Per                 Per
Recv   Send    Recv   Send             Recv (avg)          Send (avg)
    8       8      0       0 1175182320  50476.00     23282   16384.00  71816

i40e:
Hard to test because the traffic is incoming so fast (24Gb/s) that GRO
always receives 87kB, even at the highest interrupt rate.

Other drivers were only compile tested.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 32b3e08fff60494cd1d281a39b51583edfd2b18f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: do not use enum as boolean

Check for actual value NETREG_UNINITIALIZED in case it ever changes from
the current value of zero.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit b4a5127b03b7f0b06dcaf08a878ffc93ad53fa82)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: add support for extra debug statistics

Add a private ethtool flag to enable display of these statistics, which
are generally less useful. However, sometimes it can be useful for
debugging purposes. The most useful portion is the ability to see what
the PF thinks the VF mailboxes look like.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 80043f3bf5bdb187566620a8f183c15b94e961cb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: send traffic on default VID to VLAN device if we have one

This patch ensures that VLAN traffic on the default VID will go to the
corresponding VLAN device if it exists. To do this, mask the rx_ring VID
if we have an active VLAN on that VID.

For this to work correctly, we need to update fm10k_process_skb_fields
to correctly mask off the VLAN_PRIO_MASK bits and compare them
separately, otherwise we incorrectly compare the priority bits with the
cleared flag. This also happens to fix a related bug where having
priority bits set causes us to incorrectly classify traffic.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit e71c9318428fb16de808b497e0229994010d32a1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

fm10k: Don't assume page fragments are page size

This change pulls out the optimization that assumed that all fragments
would be limited to page size. That hasn't been the case for some time now
and to assume this is incorrect as the TCP allocator can provide up to a
32K page fragment.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit aae072e363bed4e91c00d57f753c799276ddb161)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

x86/mce: Detect local MCEs properly

Orabug: 25384378

Check the MCG_STATUS_LMCES bit on Intel to verify that current MCE is
local. It is always local on AMD.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Massaged it a bit. Reflowed comments. Shut up -Wmaybe-uninitialized. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit fead35c68926682c90c995f22b48f1c8d78865c1)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

x86/mce: Handle Local MCE events

Orabug: 25384378

Add the necessary changes to do_machine_check() to be able to
process MCEs signaled as local MCEs. Typically, only recoverable
errors (SRAR type) will be Signaled as LMCE. The architecture
does not restrict to only those errors, however.

When errors are signaled as LMCE, there is no need for the MCE
handler to perform rendezvous with other logical processors
unlike earlier processors that would broadcast machine check
errors.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 243d657eaf540db882f73497060da5a4f7d86a90)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

x86/mce: Add infrastructure to support Local MCE

Orabug: 25384378

Initialize and prepare for handling LMCEs. Add a boot-time
option to disable LMCEs.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Simplify stuff, align statements for better readability, reflow comments; kill
unused lmce_clear(); save us an MSR write if LMCE is already enabled. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 88d538672ea26223bca08225bc49f4e65e71683d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

x86/mce: Add Local MCE definitions

Orabug: 25384378

Add required definitions to support Local Machine Check
Exceptions.

Historically, machine check exceptions on Intel x86 processors
have been broadcast to all logical processors in the system.
Upcoming CPUs will support an opt-in mechanism to request some
machine check exceptions be delivered to a single logical
processor experiencing the fault.

See http://www.intel.com/sdm Volume 3, System Programming Guide,
chapter 15 for more information on MSRs and documentation on
Local MCE.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit bc12edb8739be10fae9f86691a3708d36d00b4f8)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

Orabug: 25291653
CVE: CVE-2016-9588

commit ef85b67385436ddc1998f45f1d6a210f935b3388 upstream.

When L2 exits to L0 due to "exception or NMI", software exceptions
(#BP and #OF) for which L1 has requested an intercept should be
handled by L1 rather than L0. Previously, only hardware exceptions
were forwarded to L1.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3f618a0b872fea38c7d1d1f79eda40f88c6466c2)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

Revert "crypto: aead - Convert top level interface to new style"

Orabug: 25243093

Revert this commit because old_crypt()/old_encrypt() are buggy
temporary interface.

This reverts commit 21579deee39629f2a68a37c42e89bdad53d88ebf.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

Revert "crypto: aead - Add new interface with single SG list"

Orabug: 25243093

Revert this commit because old_crypt()/old_encrypt() are buggy
temporary interface.

This reverts commit 10b7e301331eaf963b628025075ca4af756b9868.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: aesni - fix failing setkey for rfc4106-gcm-aesni

Orabug: 25243093

rfc4106(gcm(aes)) uses ctr(aes) to generate hash key. ctr(aes) needs
chainiv, but the chainiv gets initialized after aesni_intel when both
are statically linked so the setkey fails.
This patch forces aesni_intel to be initialized after chainiv.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0fbafd06bdde938884f7326548d3df812b267c3c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: skcipher - Fix corner case in crypto_lookup_skcipher

Orabug: 25243093

When the user explicitly states that they don't care whether the
algorithm has been tested (type = CRYPTO_ALG_TESTED and mask = 0),
there is a corner case where we may erroneously return ENOENT.

This patch fixes it by correcting the logic in the test.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 26739535206e819946b0740347c09c94c4e48ba9)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: skcipher - Copy iv from desc even for 0-len walks

Orabug: 25243093

Some ciphers actually support encrypting zero length plaintexts. For
example, many AEAD modes support this. The resulting ciphertext for
those winds up being only the authentication tag, which is a result of
the key, the iv, the additional data, and the fact that the plaintext
had zero length. The blkcipher constructors won't copy the IV to the
right place, however, when using a zero length input, resulting in
some significant problems when ciphers call their initialization
routines, only to find that the ->iv parameter is uninitialized. One
such example of this would be using chacha20poly1305 with a zero length
input, which then calls chacha20, which calls the key setup routine,
which eventually OOPSes due to the uninitialized ->iv member.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 70d906bc17500edfa9bdd8c8b7e59618c7911613)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: gcm - Fix IV buffer size in crypto_gcm_setkey

Orabug: 25243093

The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.

Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers")
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 50d2e6dc1f83db0563c7d6603967bf9585ce934b)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: ahash - Add padding in crypto_ahash_extsize

Orabug: 25243093

The function crypto_ahash_extsize did not include padding when
computing the tfm context size. This patch fixes this by using
the generic crypto_alg_extsize helper.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2495cf25f60e085b35beb9b215235dfe1ca4f200)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: scatterwalk - Add no-copy support to copychunks

Orabug: 25243093

The function ablkcipher_done_slow is pretty much identical to
scatterwalk_copychunks except that it doesn't actually copy as
the processing hasn't been completed yet.

This patch allows scatterwalk_copychunks to be used in this case
by specifying out == 2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 85eccddee401ae81067e763516889780b5545160)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: scatterwalk - Hide PageSlab call to optimise away flush_dcache_page

Orabug: 25243093

On architectures where flush_dcache_page is not needed, we will
end up generating all the code up to the PageSlab call. This is
because PageSlab operates on a volatile pointer and thus cannot
be optimised away.

This patch works around this by checking whether flush_dcache_page
is needed before we call PageSlab which then allows PageSlab to be
compiled awy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 160544075f2a4028209721723a51f16add7b08b9)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: scatterwalk - Add missing sg_init_table to scatterwalk_ffwd

Orabug: 25243093

We need to call sg_init_table as otherwise the first entry may
inadvertently become the last.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit fdaef75f66bba5999a94f3cd9156bf353ba2ef98)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: scatterwalk - Check for same address in map_and_copy

Orabug: 25243093

This patch adds a check for in scatterwalk_map_and_copy to avoid
copying from the same address to the same address. This is going
to be used for IV copying in AEAD IV generators.

There is no provision for partial overlaps.

This patch also uses the new scatterwalk_ffwd instead of doing
it by hand in scatterwalk_map_and_copy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 74412fd5d71b6eda0beb302aa467da000f0d530c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

macsec: fix negative refcnt on parent link

Orabug: 25243093

When creation of a macsec device fails because an identical device
already exists on this link, the current code decrements the refcnt on
the parent link (in ->destructor for the macsec device), but it had not
been incremented yet.

Move the dev_hold(parent_link) call earlier during macsec device
creation.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0759e552bce7257db542e203a01a9ef8843c751e)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

macsec: RXSAs don't need to hold a reference on RXSCs

Orabug: 25243093

Following the previous patch, RXSCs are held and properly refcounted in
the RX path (instead of being implicitly held by their SA), so the SA
doesn't need to hold a reference on its parent RXSC.

This also avoids panics on module unload caused by the double layer of
RCU callbacks (call_rcu frees the RXSA, which puts the final reference
on the RXSC and allows to free it in its own call_rcu) that commit
b196c22af5c3 ("macsec: add rcu_barrier() on module exit") didn't
protect against.
There were also some refcounting bugs in macsec_add_rxsa where I didn't
put the reference on the RXSC on the error paths, which would lead to
memory leaks.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 36b232c880c99fc03e135198c7c08d3d4b4f83ab)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

macsec: fix reference counting on RXSC in macsec_handle_frame

Orabug: 25243093

Currently, we lookup the RXSC without taking a reference on it. The
RXSA holds a reference on the RXSC, but the SA and SC could still both
disappear before we take a reference on the SA.

Take a reference on the RXSC in macsec_handle_frame.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c78ebe1df01f4ef3fb07be1359bc34df6708d99c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Conflicts:
drivers/net/macsec.c

macsec: ensure rx_sa is set when validation is disabled

Orabug: 25243093

macsec_decrypt() is not called when validation is disabled and so
macsec_skb_cb(skb)->rx_sa is not set; but it is used later in
macsec_post_decrypt(), ensure that it's always initialized.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e3a3b626010a14fe067f163c2c43409d5afcd2a9)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: testmgr - don't copy from source IV too much

Orabug: 25243093

While the destination buffer 'iv' is MAX_IVLEN size,
the source 'template[i].iv' could be smaller, thus
memcpy may read read invalid memory.
Use crypto_skcipher_ivsize() to get real ivsize
and pass it to memcpy.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 84cba178a3b88efe2668a9039f70abda072faa21)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

gcm - Fix rfc4543 decryption crash

Orabug: 25243093

This bug has already bee fixed upstream since 4.2. However, it
was fixed during the AEAD conversion so no fix was backported to
the older kernels.

When we do an RFC 4543 decryption, we will end up writing the
ICV beyond the end of the dst buffer. This should lead to a
crash but for some reason it was never noticed.

This patch fixes it by only writing back the ICV for encryption.

Fixes: d733ac90f9fe ("crypto: gcm - fix rfc4543 to handle async...")
Reported-by: Patrick Meyer <patrick.meyer@vasgard.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: tcrypt - Handle async return from crypto_ahash_init

Orabug: 25243093

The function crypto_ahash_init can also be asynchronous just
like update and final. So all callers must be able to handle
an async return.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 43a9607d86e8fb110b596d300dbaae895c198fed)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: tcrypt - Fix AEAD speed tests

Orabug: 25243093

The AEAD speed tests doesn't do a wait_for_completition,
if the return value is EINPROGRESS or EBUSY.
Fixing it here.
Also add a test case for gcm(aes).

Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 1425d2d17f7309c65e2bd124e0ce8ace743b17f2)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: qat: fix issue when mapping assoc to internal AD struct

Orabug: 25243093

This patch fixes an issue when building an internal AD representation.
We need to check assoclen and not only blindly loop over assoc sgl.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ecb479d07a4e2db980965193511dbf2563d50db5)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: testmgr - fix overlap in chunked tests again

Orabug: 25243093

Commit 7e4c7f17cde2 ("crypto: testmgr - avoid overlap in chunked tests")
attempted to address a problem in the crypto testmgr code where chunked
test cases are copied to memory in a way that results in overlap.

However, the fix recreated the exact same issue for other chunked tests,
by putting IDX3 within 492 bytes of IDX1, which causes overlap if the
first chunk exceeds 492 bytes, which is the case for at least one of
the xts(aes) test cases.

So increase IDX3 by another 1000 bytes.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 04b46fbdea5e31ffd745a34fa61269a69ba9f47a)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: testmgr - avoid overlap in chunked tests

Orabug: 25243093

The IDXn offsets are chosen such that tap values (which may go up to
255) end up overlapping in the xbuf allocation. In particular, IDX1
and IDX3 are too close together, so update IDX3 to avoid this issue.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7e4c7f17cde280079db731636175b1732be7188c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone

Orabug: 25243093

The offset advance in scatterwalk_pagedone not only is unnecessary,
but it was also buggy when it was needed by scatterwalk_copychunks.
As the latter has long ago been fixed to call scatterwalk_advance
directly, we can remove this unnecessary offset adjustment.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 28cf86fafdd663cfcad3c5a5fe9869f1fa01b472)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start

Orabug: 25243093

Nothing bad will happen even if sg->length is zero, so there is
no point in keeping this BUG_ON in scatterwalk_start.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2ee732d57496b8365819dfb958bc1ff04fcd4cac)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: cryptd - Use crypto_grab_aead

Orabug: 25483918
Orabug: 25243093

As AEAD has switched over to using frontend types, the function
crypto_init_spawn must not be used since it does not specify a
frontend type. Otherwise it leads to a crash when the spawn is
used.

This patch fixes it by switching over to crypto_grab_aead instead.

Fixes: 5d1d65f8bea6 ("crypto: aead - Convert top level interface to new style")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 9b8c456e081e7eca856ad9b2a92980a68887f533)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: testmgr - fix out of bound read in __test_aead()

Orabug: 25243093

__test_aead() reads MAX_IVLEN bytes from template[i].iv, but the
actual length of the initialisation vector can be shorter.
The length of the IV is already calculated earlier in the
function. Let's just reuses that. Also the IV length is currently
calculated several time for no reason. Let's fix that too.
This fix an out-of-bound error detected by KASan.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit abfa7f4357e3640fdee87dfc276fd0f379fb5ae6)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

crypto: algif_aead - fix for multiple operations on AF_ALG sockets

Orabug: 25243093

The tsgl scatterlist must be re-initialized after each
operation. Otherwise the sticky bits in the page_link will corrupt the
list with pre-mature termination or false chaining.

Signed-off-by: Lars Persson <larper@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit bf433416e67597ba105ece55b3136557874945db)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

netvsc: fix incorrect receive checksum offloading

The Hyper-V netvsc driver was looking at the incorrect status bits
in the checksum info. It was setting the receive checksum unnecessary
flag based on the IP header checksum being correct. The checksum
flag is skb is about TCP and UDP checksum status. Because of this
bug, any packet received with bad TCP checksum would be passed
up the stack and to the application causing data corruption.
The problem is reproducible via netcat and netem.

This had a side effect of not doing receive checksum offload
on IPv6. The driver was also also always doing checksum offload
independent of the checksum setting done via ethtool.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25219569

(cherry picked from commit e52fed7177f74382f742c27de2cc5314790aebb6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

KVM: x86: drop error recovery in em_jmp_far and em_ret_far

Orabug: 25190929
CVE: CVE-2016-9756

em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.

Found by syzkaller:

  WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   [...]
  Call Trace:
   [...] __dump_stack lib/dump_stack.c:15
   [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
   [...] panic+0x1b7/0x3a3 kernel/panic.c:179
   [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
   [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
   [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
   [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
   [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
   [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
   [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
   [...] complete_emulated_io arch/x86/kvm/x86.c:6870
   [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
   [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
   [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
   [...] vfs_ioctl fs/ioctl.c:43
   [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
   [...] SYSC_ioctl fs/ioctl.c:694
   [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
   [...] entry_SYSCALL_64_fastpath+0x1f/0xc2

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit 2117d5398c81554fbf803f5fd1dc55eb78216c0c)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

hv: do not lose pending heartbeat vmbus packets

The host keeps sending heartbeat packets independent of the
guest responding to them. Even though we respond to the heartbeat messages at
interrupt level, we can have situations where there maybe multiple heartbeat
messages pending that have not been responded to. For instance this occurs when the
VM is paused and the host continues to send the heartbeat messages.
Address this issue by draining and responding to all
the heartbeat messages that maybe pending.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 25144648

(cherry picked from commit 407a3aee6ee2d2cb46d9ba3fc380bc29f35d020c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

tcp: take care of truncations done by sk_filter()

Orabug: 25104761
CVE: CVE-2016-8645

With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()

Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.

We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq

Many thanks to syzkaller team and Marco for giving us a reproducer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ac6e780070e30e4c35bd395acfe9191e6268bdd3)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
include/net/tcp.h
net/ipv4/tcp_ipv4.c

rose: limit sk_filter trim to payload

Orabug: 25104761
CVE: CVE-2016-8645

Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.

Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.

Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f4979fcea7fd36d8e2f556abef86f80e0d5af1ba)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
net/core/filter.c

tipc: check minimum bearer MTU

Orabug: 25063416
CVE: CVE-2016-8632

Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.

As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.

Conflicts:
net/tipc/bearer.c
net/tipc/bearer.h

Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3de81b758853f0b29c61e246679d20b513c4cfec)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

fix minor infoleak in get_user_ex()

get_user_ex(x, ptr) should zero x on failure. It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1c109fabbd51863475cd12ac206bdd249aee35af)

Orabug: 25063299
CVE-2016-9178

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: arcmsr: Simplify user_len checking

Do the user_len check first and then the ver_addr allocation so that we
can save us the kfree() on the error path when user_len is >
ARCMSR_API_DATA_BUFLEN.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Marco Grassi <marco.gra@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4bd173c30792791a6daca8c64793ec0a4ae8324f)

Orabug: 24710898
CVE-2016-7425

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()

We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.

Cc: <stable@vger.kernel.org>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7bc2b55a5c030685b399bb65b6baa9ccc3d1f167)

Orabug: 24710898
CVE-2016-7425

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

tmpfs: clear S_ISGID when setting posix ACLs

This change was missed the tmpfs modification in In CVE-2016-7097
commit 073931017b49 ("posix_acl: Clear SGID bit when setting
file permissions")
It can test by xfstest generic/375, which failed to clear
setgid bit in the following test case on tmpfs:

  touch $testfile
  chown 100:100 $testfile
  chmod 2755 $testfile
  _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile

Signed-off-by: Gu Zheng <guzheng1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 497de07d89c1410d76a15bec2bb41f24a2a89f31)

Orabug: 24587481
CVE-2016-7097

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

posix_acl: Clear SGID bit when setting file permissions

When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2). Fix that.

References: CVE-2016-7097
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 073931017b49d9458aa351605b43a7e34598caef)

Orabug: 24587481
CVE-2016-7097

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
fs/f2fs/acl.c
fs/orangefs/acl.c

coredump: Ensure proper size of sparse core files

If the last section of a core file ends with an unmapped or zero page,
the size of the file does not correspond with the last dump_skip() call.
gdb complains that the file is truncated and can be confusing to users.

After all of the vma sections are written, make sure that the file size
is no smaller than the current file position.

This problem can be demonstrated with gdb's bigcore testcase on the
sparc architecture.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Orabug: 22106344
Mainline v4.10 commit 4d22c75d4c7b5c5f4bd31054f09103ee490878fd

dccp: fix freeing skb too early for IPV6_RECVPKTINFO

In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.

However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.

Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.

Similar fixes for TCP:

fb7e2399ec17f1004c0e0ccfd17439f8759ede01 [TCP]: skb is unexpectedly freed.
0aea76d35c9651d55bbaf746e7914e5f9ae5a25d tcp: SYN packets are now
simply consumed

Orabug: 25585296
CVE-2017-6074

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Aniket Alshi <aniket.alshi@oracle.com>
(cherry picked from commit 5edabca9d4cff7f1f2b68f0bac55ef99d9798ba4)
Reviewed-by: Brian Maly <brian.maly@oracle.com>

net: Documentation: Fix default value tcp_limit_output_bytes

Commit c39c4c6abb89 ("tcp: double default TSQ output bytes limit")
updated default value for tcp_limit_output_bytes

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 821b414405a78c3d38921c2545b492eb974d3814)
Oracle-Bug: 25424818
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Raj Matharasi <raj.matharasi@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

tcp: double default TSQ output bytes limit

Xen virtual network driver has higher latency than a physical NIC.
Having only 128K as limit for TSQ introduced 30% regression in guest
throughput.

This patch raises the limit to 256K. This reduces the regression to 8%.
This buys us more time to work out a proper solution in the long run.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c39c4c6abb89d24454b63798ccbae12b538206a5)
Oracle-Bug: 25424818
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Raj Matharasi <raj.matharasi@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

scsi: qla2xxx: Get mutex lock before checking optrom_state

There is a race condition with qla2xxx optrom functions where one thread
might modify optrom buffer, optrom_state while other thread is still
reading from it.

In couple of crashes, it was found that we had successfully passed the
following 'if' check where we confirm optrom_state to be
QLA_SREADING. But by the time we acquired mutex lock to proceed with
memory_read_from_buffer function, some other thread/process had already
modified that option rom buffer and optrom_state from QLA_SREADING to
QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:

        if (ha->optrom_state != QLA_SREADING)
                return 0;

        mutex_lock(&ha->optrom_mutex);
        rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
            ha->optrom_region_size);
        mutex_unlock(&ha->optrom_mutex);

With current optrom function we get following crash due to a race
condition:

[ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1479.466707] IP: [<ffffffff81326756>] memcpy+0x6/0x110
[...]
[ 1479.473673] Call Trace:
[ 1479.474296]  [<ffffffff81225cbc>] ? memory_read_from_buffer+0x3c/0x60
[ 1479.474941]  [<ffffffffa01574dc>] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
[ 1479.475571]  [<ffffffff8127e76b>] read+0xdb/0x1f0
[ 1479.476206]  [<ffffffff811fdf9e>] vfs_read+0x9e/0x170
[ 1479.476839]  [<ffffffff811feb6f>] SyS_read+0x7f/0xe0
[ 1479.477466]  [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

Below patch modifies qla2x00_sysfs_read_optrom,
qla2x00_sysfs_write_optrom functions to get the mutex_lock before
checking ha->optrom_state to avoid similar crashes.

The patch was applied and tested and same crashes were no longer
observed again.

Orabug: 25344639

Tested-by: Milan P. Gandhi <mgandhi@redhat.com>
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c7702b8c22712a06080e10f1d2dee1a133ec8809)
Signed-off-by: Ritika Srivastava <ritika.srivastava@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

kvm: x86: Check memopp before dereference (CVE-2016-8630)

Orabug: 25133227
CVE: CVE-2016-8630

Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a
check for non-NULL under incorrect assumptions. An undefined instruction
with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
to dereference a null pointer here.

Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
Message-Id: <1477592752-126650-2-git-send-email-osh@google.com>
Signed-off-by: Owen Hofmann <osh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d9092f52d7e61dd1557f2db2400ddb430e85937e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

firewire: net: guard against rx buffer overflows

Orabug: 25063191
CVE: CVE-2016-8633

The IP-over-1394 driver firewire-net lacked input validation when
handling incoming fragmented datagrams.  A maliciously formed fragment
with a respectively large datagram_offset would cause a memcpy past the
datagram buffer.

So, drop any packets carrying a fragment with offset + length larger
than datagram_size.

In addition, ensure that
  - GASP header, unfragmented encapsulation header, or fragment
    encapsulation header actually exists before we access it,
  - the encapsulated datagram or fragment is of nonzero size.

Reported-by: Eyal Itkin <eyal.itkin@gmail.com>
Reviewed-by: Eyal Itkin <eyal.itkin@gmail.com>
Fixes: CVE 2016-8633
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
(cherry picked from commit 667121ace9dbafb368618dbabcf07901c962ddac)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

USB: usbfs: fix potential infoleak in devio

Orabug: 23267548
CVE: CVE-2016-4482

The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 681fef8380eb818c0b845fca5d2ab1dcbab114ee)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

usbnet: cleanup after bind() in probe()

In case bind() works, but a later error forces bailing
in probe() in error cases work and a timer may be scheduled.
They must be killed. This fixes an error case related to
the double free reported in
http://www.spinics.net/lists/netdev/msg367669.html
and needs to go on top of Linus' fix to cdc-ncm.

Orabug: 23070825
CVE-2016-3951

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1666984c8625b3db19a9abc298931d35ab7bc64b)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind

usbnet_link_change will call schedule_work and should be
avoided if bind is failing. Otherwise we will end up with
scheduled work referring to a netdev which has gone away.

Instead of making the call conditional, we can just defer
it to usbnet_probe, using the driver_info flag made for
this purpose.

Orabug: 23070825
CVE-2016-3951

Fixes: 8a34b0ae8778 ("usbnet: cdc_ncm: apply usbnet_link_change")
Reported-by: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4d06dd537f95683aba3651098ae288b7cbff8274)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

cdc_ncm: Add support for moving NDP to end of NCM frame

NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.

This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).

V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB

Orabug: 23070825
CVE-2016-3951

Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a0e3e989d66bb7204b163d9cfaa7fa96d0f2023)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

x86/mm/32: Enable full randomization on i386 and X86_32

Currently on i386 and on X86_64 when emulating X86_32 in legacy mode, only
the stack and the executable are randomized but not other mmapped files
(libraries, vDSO, etc.). This patch enables randomization for the
libraries, vDSO and mmap requests on i386 and in X86_32 in legacy mode.

By default on i386 there are 8 bits for the randomization of the libraries,
vDSO and mmaps which only uses 1MB of VA.

This patch preserves the original randomness, using 1MB of VA out of 3GB or
4GB. We think that 1MB out of 3GB is not a big cost for having the ASLR.

The first obvious security benefit is that all objects are randomized (not
only the stack and the executable) in legacy mode which highly increases
the ASLR effectiveness, otherwise the attackers may use these
non-randomized areas. But also sensitive setuid/setgid applications are
more secure because currently, attackers can disable the randomization of
these applications by setting the ulimit stack to "unlimited". This is a
very old and widely known trick to disable the ASLR in i386 which has been
allowed for too long.

Another trick used to disable the ASLR was to set the ADDR_NO_RANDOMIZE
personality flag, but fortunately this doesn't work on setuid/setgid
applications because there is security checks which clear Security-relevant
flags.

This patch always randomizes the mmap_legacy_base address, removing the
possibility to disable the ASLR by setting the stack to "unlimited".

Orabug: 23070708
CVE-2016-3672

Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Acked-by: Ismael Ripoll Ripoll <iripoll@upv.es>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1457639460-5242-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8b8addf891de8a00e4d39fc32f93f7c5eb8feceb)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>