]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
8 years agofm10k: Only trigger data path reset if fabric is up
Alexander Duyck [Wed, 24 Jun 2015 20:34:49 +0000 (13:34 -0700)]
fm10k: Only trigger data path reset if fabric is up

This change makes it so that we only trigger the data path reset if the
fabric is ready to handle traffic.  The general idea is to avoid
triggering the reset unless the switch API is ready for us.  Otherwise
we can just postpone the reset until we receive a switch ready
notification.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit ac98100359e098d03dbd98783ca4becaf2ea7ec3)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: re-enable VF after a full reset on detection of a Malicious event
Jacob Keller [Wed, 24 Jun 2015 20:34:48 +0000 (13:34 -0700)]
fm10k: re-enable VF after a full reset on detection of a Malicious event

Modify behavior of Malicious Driver Detection events. Presently, the
hardware disables the VF queues and re-assigns them to the PF. This
causes the VF in question to continuously Tx hang, because it assumes
that it can transmit over the queues in question. For transient events,
this results in continuous logging of malicious events.

New behavior is to reset the LPORT and VF state, so that the VF will
have to reset and re-enable itself. This does mean that malicious VFs
will possibly be able to continue and attempt malicious events again.
However, it is expected that system administrators will step in and
manually remove or disable the VF in question.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 95f4f8da644256d8c0ff5bab2c93ba33d2a42cd8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: TRIVIAL fix typo in fm10k_netdev.c
Jacob Keller [Wed, 24 Jun 2015 20:34:47 +0000 (13:34 -0700)]
fm10k: TRIVIAL fix typo in fm10k_netdev.c

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 5c2d642fd0cf0f4f7396dfd3754bc03f6e50e359)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: TRIVIAL fix up ordering of __always_unused and style
Jacob Keller [Wed, 24 Jun 2015 20:34:44 +0000 (13:34 -0700)]
fm10k: TRIVIAL fix up ordering of __always_unused and style

Fix some style issues in debugfs code, and correct ordering of void and
__always_unused. Technically, the order does not matter, but preferred
style is to put the macro between the type and name.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 6fbc6b358b3f63e701d15e9d330a3b0dbb03f802)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: remove is_slot_appropriate
Jacob Keller [Wed, 24 Jun 2015 20:34:41 +0000 (13:34 -0700)]
fm10k: remove is_slot_appropriate

This function is no longer used now that we have updated fm10k_slot_warn
functionality.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 855c40fc31ca2392845558235f7e92e406936fbe)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: don't store sw_vid at reset
Jacob Keller [Fri, 19 Jun 2015 17:56:10 +0000 (10:56 -0700)]
fm10k: don't store sw_vid at reset

If we store the sw_vid at reset of PF, then we accidentally prevent the
VF from receiving the message to update its default VID. This only
occurs if the VF is created before the PF has come up, which is the
standard way of creating VFs when using the module parameter.

This fixes an issue where we request the incorrect MAC/VLAN
combinations, and prevents us from accidentally reporting some frames as
VLAN tagged.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit b655a5c735867c1a80e7ceaa1f1c45d1dd34909f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: Report MAC address on driver load
Alexander Duyck [Fri, 19 Jun 2015 02:41:10 +0000 (19:41 -0700)]
fm10k: Report MAC address on driver load

This change adds the MAC address to the list of values recorded on driver
load.  The MAC address represents the serial number of the unit and allows
us to track the value should a card be replaced in a system.

The log message should now be similar in output to that of ixgbe.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 0ff36676a3778d0655933ace201fca7c11b4e8b5)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: update netdev perm_addr during reinit, instead of at up
Jacob Keller [Mon, 15 Jun 2015 22:00:56 +0000 (15:00 -0700)]
fm10k: update netdev perm_addr during reinit, instead of at up

Update the netdev permanent address during fm10k_reinit enables the user
to immediately see the new MAC address on the VF even if the device
isn't up. The previous code required that the device by opened before
changes would appear.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit bdc7f5902d22e7297b8f0d7a8d6ed8429cdba636)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: update fm10k_slot_warn to use pcie_get_minimum link
Jacob Keller [Mon, 15 Jun 2015 22:00:55 +0000 (15:00 -0700)]
fm10k: update fm10k_slot_warn to use pcie_get_minimum link

This is useful in cases where we connect to a slot at Gen3, but the slot
is behind a bus which only connected at Gen2. This generally only
happens when a PCIe switch is in the sequence of devices, and can be
very confusing when you see slow performance with no obvious cause.

I am aware this patch has a few lines that break 80 characters, but
there does not seem to be a readable way to format them to less than 80
characters. Suggestions welcome.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 106c07a49506359f7663770ef33f4997d68316c1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: disable service task during suspend
Jacob Keller [Mon, 15 Jun 2015 22:00:51 +0000 (15:00 -0700)]
fm10k: disable service task during suspend

The service task reads some registers as part of its normal routine,
even while the interface is down. Normally this is ok. However, during
suspend we have disabled the PCI device. Due to this, registers will
read in the same way as a surprise-remove event. Disable the service
task while we suspend, and re-enable it after we resume. If we don't do
this, the device could be UP when you suspend and come back from resume
as closed (since fm10k closes the device when it gets a surprise
remove).

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit e40296628bec7400f529927eef4bc87cb425a22a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: fix iov_msg_lport_state_pf issue
Jacob Keller [Wed, 3 Jun 2015 23:31:12 +0000 (16:31 -0700)]
fm10k: fix iov_msg_lport_state_pf issue

When a VF issues an LPORT_STATE request to enable a port that is already
enabled, the PF will first disable the VF LPORT. Then it should
re-enable the VF again with the new requested settings. This ensures
that any switch rules are cleared by deleting the LPORT on the switch.
However, the flow is bugged because we actually check if the VF is
enabled at the end, and thus don't re-enable it. Fix the flow so that we
actually clear the enabled flags as part of our removal of the LPORT.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit ee4373e7d74696821e47faf1b70f779697ddf77b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: remove err_no reference in fm10k_mbx.c
Jacob Keller [Wed, 3 Jun 2015 23:31:11 +0000 (16:31 -0700)]
fm10k: remove err_no reference in fm10k_mbx.c

The reference to err_no was left around after a previous code refactor.
We never use the value, and it doesn't seem to be used in side a hidden
macro reference. Discovered via cppcheck.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit d18c438884137609bf838a703972126862db97f4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: fix incorrect DIR_NEVATIVE bit in 1588 code
Jacob Keller [Wed, 3 Jun 2015 23:31:10 +0000 (16:31 -0700)]
fm10k: fix incorrect DIR_NEVATIVE bit in 1588 code

The SYSTIME_CFG.Adjust Direction bit is actually supposed to indicate
that the adjustment is positive. Fix the code to align correctly with
hardware and documentation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 646725a7c9cbdefd8df4ae7b3217a992ad64a4cd)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: pack TLV overlay structures
Jacob Keller [Wed, 3 Jun 2015 23:31:09 +0000 (16:31 -0700)]
fm10k: pack TLV overlay structures

This patch adds the __attribute__((packed)) indicator to some structures
which are overlayed onto a TLV message. These structures must be packed
as small as possible in order to correctly align when copied into the
mailbox buffer. Without doing so, the receiving mailbox code incorrectly
parses the values and we get invalid message responses from the switch
manager software.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 7fef39322ce7a0c1bbb5b48bef61b6c1aef73b96)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: re-map all possible VF queues after a VFLR
Jacob Keller [Wed, 3 Jun 2015 23:31:08 +0000 (16:31 -0700)]
fm10k: re-map all possible VF queues after a VFLR

During initialization, the VF counts its rings by walking the TQDLOC
registers. This works only if the TQMAP/RQMAP registers are set to map
all of the out-of-bound rings back to the first one. This allows the VF
to cleanly detect when it has run out of queues. Update the PF code so
that it resets the empty TQMAP/RQMAP registers post-VFLR to prevent
innocent VF drivers from triggering malicious driver events.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit fba341d5cab38db68eed061cf20e161d967795c6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: force LPORT delete when updating VLAN or MAC address
Jacob Keller [Wed, 3 Jun 2015 23:31:07 +0000 (16:31 -0700)]
fm10k: force LPORT delete when updating VLAN or MAC address

Currently, we don't notify the switch at all when the PF
administratively sets a new VLAN or MAC address. This causes the old
addresses to remain valid on the switch table. Since the PF is
overriding any configuration done directly by the VF, we choose to
simply re-create the LPORT for the VF. This does mean that all rules for
the VF will be dropped when we set something directly via the PF, but it
prevents some weird issues where the MAC/VLAN table retains some stale
configuration.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit a38488f54004330071f4ec90c3cf52dc7646468e)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: use dma_set_mask_and_coherent in fm10k_probe
Jacob Keller [Tue, 16 Jun 2015 20:41:43 +0000 (13:41 -0700)]
fm10k: use dma_set_mask_and_coherent in fm10k_probe

This patch cleans up the use of dma_get_required_mask and uses the
simpler dma_set_mask_and_coherent function instead of doing these as
separate steps.

I removed the dma_get_required_mask call because based on some minimal
testing it appears that either (a) we're not doing the right thing with
the call or (b) we don't need it anyways. If the value returned is
<48bits, we'll end up trying with 48 bits anyways. If it's over 48bits,
fm10k can't support that anyways, and we should try 48bits. If 48bits
fails, we'll fallback to 32bits. This cleans up some very funky code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit c04ae58e2b6d836325914e7b4aa25f43da1be3df)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: trivial fixup message style to include a colon
Jacob Keller [Tue, 16 Jun 2015 20:40:32 +0000 (13:40 -0700)]
fm10k: trivial fixup message style to include a colon

Also use %d for error values, since printing in hexadecimal is probably
not helpful.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 0197cde62a0a39fdee606557b8cef6573a83e866)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: remove extraneous NULL check on l2_accel
Jacob Keller [Wed, 3 Jun 2015 23:31:04 +0000 (16:31 -0700)]
fm10k: remove extraneous NULL check on l2_accel

l2_accel was checked for NULL at the top of fm10k_dfwd_del_station, and
we return if it is not defined. Due to this, we already know it can't be
null here so a separate check is meaningless. Discovered via cppcheck.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit f1f3322eb411ce875b8b9e2de59ee8d55fb3a33b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: add call to fm10k_clean_all_rx_rings in fm10k_down
Jacob Keller [Wed, 3 Jun 2015 23:31:02 +0000 (16:31 -0700)]
fm10k: add call to fm10k_clean_all_rx_rings in fm10k_down

This prevents a memory leak in fm10k_set_ringparams. The leak occurs
because we go down, change ring parameters, and then come up. However,
fm10k_down on its own is not clearing the Rx rings. Since fm10k_up
assumes the rings are clean we basically drop the buffers and leak a
bunch of memory. Eventually we hit dirty page faults and reboot the
system. This issue does not occur elsewhere because other flows that
involve fm10k_down go through fm10k_close which immediately called
fm10k_free_all_rx_resources which properly cleans the rings.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit ec6acb801e7b2908c24a60c8aabf47c3e40508a4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: fix incorrect free on skb in ts_tx_enqueue
Jacob Keller [Wed, 3 Jun 2015 23:31:01 +0000 (16:31 -0700)]
fm10k: fix incorrect free on skb in ts_tx_enqueue

This patch resolves a bug in the ts_tx_enqueue code responsible for a
NULL pointer dereference and invalid access of the skb list. We
incorrectly freed the actual skb we found instead of our copy. Thus the
skb queue is essentially invalidated. Resolve this by freeing our clone
in the cases where we did not add it to the queue. This also avoids the
skb memory leak caused by failure to free the clone.

[  589.719320] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  589.722344] IP: [<ffffffffa0310e60>] fm10k_ts_tx_subtask+0xb0/0x160 [fm10k]
[  589.723796] PGD 0
[  589.725228] Oops: 0000 [#1] SMP

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit c23544b196e72716244108fb173f2965e9eafd1a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: move setting shinfo inside ts_tx_enqueue
Jacob Keller [Wed, 3 Jun 2015 23:31:00 +0000 (16:31 -0700)]
fm10k: move setting shinfo inside ts_tx_enqueue

This patch simplifies the code flow for setting the IN_PROGRESS bit of
the shinfo for an skb we will be timestamping.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit e075996ebd14dcd3d1a45172316141fe26d891fa)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: use correct ethernet driver Tx timestamp function
Jacob Keller [Wed, 3 Jun 2015 23:30:59 +0000 (16:30 -0700)]
fm10k: use correct ethernet driver Tx timestamp function

skb_complete_tx_timestamp is intended for use by PHY drivers which
implement a different method of returning timestamps. This method is
intended to be used after a PHY driver accepts a cloned packet via its
phy_driver.txtstamp function. It is not correct to use in the standard
ethernet driver such as fm10k. This patch fixes the following possible
kernel panic.

[ 2744.552896] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  OE  3.19.3-200.fc21.x86_64 #1
[ 2744.552899] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.8x23.060520140825 06/05/2014
[ 2744.552901]  0000000000000000 2f4c8b10ea3f9848 ffff88081ee03a38 ffffffff8176e215
[ 2744.552906]  0000000000000000 0000000000000000 ffff88081ee03a78 ffffffff8109bc1a
[ 2744.552910]  ffff88081ee03c50 ffff88080e55fc00 ffff88080e55fc00 ffffffff81647c50
[ 2744.552914] Call Trace:
[ 2744.552917]  <IRQ>  [<ffffffff8176e215>] dump_stack+0x45/0x57
[ 2744.552931]  [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
[ 2744.552936]  [<ffffffff81647c50>] ? skb_queue_purge+0x20/0x40
[ 2744.552941]  [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
[ 2744.552946]  [<ffffffff81646911>] skb_release_head_state+0xe1/0xf0
[ 2744.552950]  [<ffffffff81647b26>] skb_release_all+0x16/0x30
[ 2744.552954]  [<ffffffff81647ba6>] kfree_skb+0x36/0x90
[ 2744.552958]  [<ffffffff81647c50>] skb_queue_purge+0x20/0x40
[ 2744.552964]  [<ffffffff81751f8d>] packet_sock_destruct+0x1d/0x90
[ 2744.552968]  [<ffffffff81642053>] __sk_free+0x23/0x140
[ 2744.552973]  [<ffffffff81642189>] sk_free+0x19/0x20
[ 2744.552977]  [<ffffffff81647d60>] skb_complete_tx_timestamp+0x50/0x60
[ 2744.552988]  [<ffffffffa02eee40>] fm10k_ts_tx_hwtstamp+0xd0/0x100 [fm10k]
[ 2744.552994]  [<ffffffffa02e054e>] fm10k_1588_msg_pf+0x12e/0x140 [fm10k]
[ 2744.553002]  [<ffffffffa02edf1d>] fm10k_tlv_msg_parse+0x8d/0xc0 [fm10k]
[ 2744.553010]  [<ffffffffa02eb2d0>] fm10k_mbx_dequeue_rx+0x60/0xb0 [fm10k]
[ 2744.553016]  [<ffffffffa02ebf98>] fm10k_sm_mbx_process+0x178/0x3c0 [fm10k]
[ 2744.553022]  [<ffffffffa02e09ca>] fm10k_msix_mbx_pf+0xfa/0x360 [fm10k]
[ 2744.553030]  [<ffffffff811030a7>] ? get_next_timer_interrupt+0x1f7/0x270
[ 2744.553036]  [<ffffffff810f2a47>] handle_irq_event_percpu+0x77/0x1a0
[ 2744.553041]  [<ffffffff810f2bab>] handle_irq_event+0x3b/0x60
[ 2744.553045]  [<ffffffff810f5d6e>] handle_edge_irq+0x6e/0x120
[ 2744.553054]  [<ffffffff81017414>] handle_irq+0x74/0x140
[ 2744.553061]  [<ffffffff810bb54a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 2744.553066]  [<ffffffff8177777f>] do_IRQ+0x4f/0xf0
[ 2744.553072]  [<ffffffff8177556d>] common_interrupt+0x6d/0x6d
[ 2744.553074]  <EOI>  [<ffffffff81609b16>] ? cpuidle_enter_state+0x66/0x160
[ 2744.553084]  [<ffffffff81609b01>] ? cpuidle_enter_state+0x51/0x160
[ 2744.553087]  [<ffffffff81609cf7>] cpuidle_enter+0x17/0x20
[ 2744.553092]  [<ffffffff810de101>] cpu_startup_entry+0x321/0x3c0
[ 2744.553098]  [<ffffffff81764497>] rest_init+0x77/0x80
[ 2744.553103]  [<ffffffff81d4f02c>] start_kernel+0x4a4/0x4c5
[ 2744.553107]  [<ffffffff81d4e120>] ? early_idt_handlers+0x120/0x120
[ 2744.553110]  [<ffffffff81d4e4d7>] x86_64_start_reservations+0x2a/0x2c
[ 2744.553114]  [<ffffffff81d4e62b>] x86_64_start_kernel+0x152/0x175

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 608bb146ff48588c43bf6332912ea8796768fb39)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: ignore invalid multicast address entries
Jacob Keller [Wed, 3 Jun 2015 23:30:58 +0000 (16:30 -0700)]
fm10k: ignore invalid multicast address entries

This change fixes an issue with adding an invalid multicast address
using the iproute2 tool (ip maddr add <MADDR> dev <dev>). The iproute2
tool and the kernel do not validate or filter the multicast addresses
when adding them to the multicast list. Thus, when synchronizing this
list with an invalid entry, the action will be aborted with an error
since the fm10k driver currently validates the list. Consequently,
multicast entries beyond the invalid one will not be processed and
communicated with the switch via the mailbox. This change makes it so
that invalid addresses will simply be skipped and allows synchronizing
the full list to proceed.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 745136a8b72d638f3ee53a2b6a63f11c64a59937)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: fold fm10k_pull_tail into fm10k_add_rx_frag
Alexander Duyck [Thu, 23 Apr 2015 04:49:25 +0000 (21:49 -0700)]
fm10k: fold fm10k_pull_tail into fm10k_add_rx_frag

This change folds the fm10k_pull_tail call into fm10k_add_rx_frag.  The
advantage to doing this is that the fragment doesn't have to be modified
after it is added to the skb.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit 1a8782e59ff9bb0ed4b75d6a346a2ed212bd2031)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: bump driver version
Jacob Keller [Mon, 26 Oct 2015 21:38:40 +0000 (14:38 -0700)]
fm10k: bump driver version

We haven't bumped the driver version in a while despite many fixes being
pulled in from the out-of-tree Sourceforge driver. Update the version to
match.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529
(cherry picked from commit e3b6e95d070cbca5e82279fea99e2dba8e38f960)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: TRIVIAL cleanup order at top of fm10k_xmit_frame
Jacob Keller [Fri, 16 Oct 2015 17:57:11 +0000 (10:57 -0700)]
fm10k: TRIVIAL cleanup order at top of fm10k_xmit_frame

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 03d13a51fb4494f3bf47f65e2be00e56c36d2b63)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: Update adaptive ITR algorithm
Jacob Keller [Fri, 16 Oct 2015 17:57:07 +0000 (10:57 -0700)]
fm10k: Update adaptive ITR algorithm

The existing adaptive ITR algorithm is overly restrictive. It throttles
incorrectly for various traffic rates, and does not produce good
performance. The algorithm now allows for more interrupts per second,
and does some calculation to help improve for smaller packet loads. In
addition, take into account the new itr_scale from the hardware which
indicates how much to scale due to PCIe link speed.

Reported-by: Matthew Vick <matthew.vick@intel.com>
Reported-by: Alex Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 242722dd3d0af32703d4ebb4af63c92a2c85b835)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: introduce ITR_IS_ADAPTIVE macro
Jacob Keller [Fri, 16 Oct 2015 17:57:06 +0000 (10:57 -0700)]
fm10k: introduce ITR_IS_ADAPTIVE macro

Define a macro for identifying when the itr value is dynamic or
adaptive. The concept was taken from i40e. This helps make clear what
the check is, and reduces the line length to something more reasonable
in a few places.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 584373f5b98aed81ff5a432d91b6e16d7554a5c9)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: Fix handling of NAPI budget when multiple queues are enabled per vector
Alexander Duyck [Tue, 22 Sep 2015 21:35:35 +0000 (14:35 -0700)]
fm10k: Fix handling of NAPI budget when multiple queues are enabled per vector

This patch corrects an issue in which the polling routine would increase
the budget for Rx to at least 1 per queue if multiple queues were present.
This would result in Rx packets being processed when the budget was 0 which
is meant to indicate that no Rx can be handled.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 9f872986479b6e0543eb5c615e5f9491bb04e5c1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agodrivers/net/intel: use napi_complete_done()
Jack Vogel [Wed, 25 Jan 2017 21:51:40 +0000 (13:51 -0800)]
drivers/net/intel: use napi_complete_done()

As per Eric Dumazet's previous patches:
(see commit (24d2e4a50737) - tg3: use napi_complete_done())

Quoting verbatim:
Using napi_complete_done() instead of napi_complete() allows
us to use /sys/class/net/ethX/gro_flush_timeout

GRO layer can aggregate more packets if the flush is delayed a bit,
without having to set too big coalescing parameters that impact
latencies.
</end quote>

Tested
configuration: low latency via ethtool -C ethx adaptive-rx off
rx-usecs 10 adaptive-tx off tx-usecs 15
workload: streaming rx using netperf TCP_MAERTS

igb:
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result:  941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589

Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
Local  Remote  Local  Remote  Xfered   Per                 Per
Recv   Send    Recv   Send             Recv (avg)          Send (avg)
    8       8      0       0 1176930056  1475.36    797726   16384.00  71905

MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result:  941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763

Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
Local  Remote  Local  Remote  Xfered   Per                 Per
Recv   Send    Recv   Send             Recv (avg)          Send (avg)
    8       8      0       0 1175182320  50476.00     23282   16384.00  71816

i40e:
Hard to test because the traffic is incoming so fast (24Gb/s) that GRO
always receives 87kB, even at the highest interrupt rate.

Other drivers were only compile tested.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 32b3e08fff60494cd1d281a39b51583edfd2b18f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: do not use enum as boolean
Jacob Keller [Tue, 25 Aug 2015 00:02:00 +0000 (17:02 -0700)]
fm10k: do not use enum as boolean

Check for actual value NETREG_UNINITIALIZED in case it ever changes from
the current value of zero.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit b4a5127b03b7f0b06dcaf08a878ffc93ad53fa82)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: add support for extra debug statistics
Jacob Keller [Thu, 2 Jul 2015 00:38:36 +0000 (17:38 -0700)]
fm10k: add support for extra debug statistics

Add a private ethtool flag to enable display of these statistics, which
are generally less useful. However, sometimes it can be useful for
debugging purposes. The most useful portion is the ability to see what
the PF thinks the VF mailboxes look like.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit 80043f3bf5bdb187566620a8f183c15b94e961cb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: send traffic on default VID to VLAN device if we have one
Jacob Keller [Wed, 24 Jun 2015 20:34:46 +0000 (13:34 -0700)]
fm10k: send traffic on default VID to VLAN device if we have one

This patch ensures that VLAN traffic on the default VID will go to the
corresponding VLAN device if it exists. To do this, mask the rx_ring VID
if we have an active VLAN on that VID.

For this to work correctly, we need to update fm10k_process_skb_fields
to correctly mask off the VLAN_PRIO_MASK bits and compare them
separately, otherwise we incorrectly compare the priority bits with the
cleared flag. This also happens to fix a related bug where having
priority bits set causes us to incorrectly classify traffic.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit e71c9318428fb16de808b497e0229994010d32a1)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofm10k: Don't assume page fragments are page size
Alexander Duyck [Tue, 16 Jun 2015 18:47:12 +0000 (11:47 -0700)]
fm10k: Don't assume page fragments are page size

This change pulls out the optimization that assumed that all fragments
would be limited to page size.  That hasn't been the case for some time now
and to assume this is incorrect as the TCP allocator can provide up to a
32K page fragment.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 25394529

(cherry picked from commit aae072e363bed4e91c00d57f753c799276ddb161)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agox86/mce: Detect local MCEs properly
Yazen Ghannam [Sat, 30 Apr 2016 12:33:57 +0000 (14:33 +0200)]
x86/mce: Detect local MCEs properly

Orabug: 25384378

Check the MCG_STATUS_LMCES bit on Intel to verify that current MCE is
local. It is always local on AMD.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Massaged it a bit. Reflowed comments. Shut up -Wmaybe-uninitialized. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit fead35c68926682c90c995f22b48f1c8d78865c1)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agox86/mce: Handle Local MCE events
Ashok Raj [Thu, 4 Jun 2015 16:55:24 +0000 (18:55 +0200)]
x86/mce: Handle Local MCE events

Orabug: 25384378

Add the necessary changes to do_machine_check() to be able to
process MCEs signaled as local MCEs. Typically, only recoverable
errors (SRAR type) will be Signaled as LMCE. The architecture
does not restrict to only those errors, however.

When errors are signaled as LMCE, there is no need for the MCE
handler to perform rendezvous with other logical processors
unlike earlier processors that would broadcast machine check
errors.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 243d657eaf540db882f73497060da5a4f7d86a90)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agox86/mce: Add infrastructure to support Local MCE
Ashok Raj [Thu, 4 Jun 2015 16:55:23 +0000 (18:55 +0200)]
x86/mce: Add infrastructure to support Local MCE

Orabug: 25384378

Initialize and prepare for handling LMCEs. Add a boot-time
option to disable LMCEs.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Simplify stuff, align statements for better readability, reflow comments; kill
  unused lmce_clear(); save us an MSR write if LMCE is already enabled. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 88d538672ea26223bca08225bc49f4e65e71683d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agox86/mce: Add Local MCE definitions
Ashok Raj [Thu, 4 Jun 2015 16:55:22 +0000 (18:55 +0200)]
x86/mce: Add Local MCE definitions

Orabug: 25384378

Add required definitions to support Local Machine Check
Exceptions.

Historically, machine check exceptions on Intel x86 processors
have been broadcast to all logical processors in the system.
Upcoming CPUs will support an opt-in mechanism to request some
machine check exceptions be delivered to a single logical
processor experiencing the fault.

See http://www.intel.com/sdm Volume 3, System Programming Guide,
chapter 15 for more information on MSRs and documentation on
Local MCE.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit bc12edb8739be10fae9f86691a3708d36d00b4f8)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agokvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)
Jim Mattson [Mon, 12 Dec 2016 19:01:37 +0000 (11:01 -0800)]
kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

Orabug: 25291653
CVE: CVE-2016-9588

commit ef85b67385436ddc1998f45f1d6a210f935b3388 upstream.

When L2 exits to L0 due to "exception or NMI", software exceptions
(#BP and #OF) for which L1 has requested an intercept should be
handled by L1 rather than L0. Previously, only hardware exceptions
were forwarded to L1.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3f618a0b872fea38c7d1d1f79eda40f88c6466c2)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agoRevert "crypto: aead - Convert top level interface to new style"
Ethan Zhao [Wed, 15 Feb 2017 03:54:18 +0000 (12:54 +0900)]
Revert "crypto: aead - Convert top level interface to new style"

Orabug: 25243093

Revert this commit because old_crypt()/old_encrypt() are buggy
temporary interface.

This reverts commit 21579deee39629f2a68a37c42e89bdad53d88ebf.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agoRevert "crypto: aead - Add new interface with single SG list"
Ethan Zhao [Wed, 15 Feb 2017 03:50:45 +0000 (12:50 +0900)]
Revert "crypto: aead - Add new interface with single SG list"

Orabug: 25243093

Revert this commit because old_crypt()/old_encrypt() are buggy
temporary interface.

This reverts commit 10b7e301331eaf963b628025075ca4af756b9868.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: aesni - fix failing setkey for rfc4106-gcm-aesni
Tadeusz Struk [Sat, 27 Jun 2015 06:56:38 +0000 (15:56 +0900)]
crypto: aesni - fix failing setkey for rfc4106-gcm-aesni

Orabug: 25243093

rfc4106(gcm(aes)) uses ctr(aes) to generate hash key. ctr(aes) needs
chainiv, but the chainiv gets initialized after aesni_intel when both
are statically linked so the setkey fails.
This patch forces aesni_intel to be initialized after chainiv.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0fbafd06bdde938884f7326548d3df812b267c3c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: skcipher - Fix corner case in crypto_lookup_skcipher
Herbert Xu [Thu, 23 Apr 2015 08:34:47 +0000 (16:34 +0800)]
crypto: skcipher - Fix corner case in crypto_lookup_skcipher

Orabug: 25243093

When the user explicitly states that they don't care whether the
algorithm has been tested (type = CRYPTO_ALG_TESTED and mask = 0),
there is a corner case where we may erroneously return ENOENT.

This patch fixes it by correcting the logic in the test.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 26739535206e819946b0740347c09c94c4e48ba9)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: skcipher - Copy iv from desc even for 0-len walks
Jason A. Donenfeld [Sun, 6 Dec 2015 01:51:37 +0000 (02:51 +0100)]
crypto: skcipher - Copy iv from desc even for 0-len walks

Orabug: 25243093

Some ciphers actually support encrypting zero length plaintexts. For
example, many AEAD modes support this. The resulting ciphertext for
those winds up being only the authentication tag, which is a result of
the key, the iv, the additional data, and the fact that the plaintext
had zero length. The blkcipher constructors won't copy the IV to the
right place, however, when using a zero length input, resulting in
some significant problems when ciphers call their initialization
routines, only to find that the ->iv parameter is uninitialized. One
such example of this would be using chacha20poly1305 with a zero length
input, which then calls chacha20, which calls the key setup routine,
which eventually OOPSes due to the uninitialized ->iv member.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 70d906bc17500edfa9bdd8c8b7e59618c7911613)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: gcm - Fix IV buffer size in crypto_gcm_setkey
Ondrej Mosnáček [Fri, 23 Sep 2016 08:47:32 +0000 (10:47 +0200)]
crypto: gcm - Fix IV buffer size in crypto_gcm_setkey

Orabug: 25243093

The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.

Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers")
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 50d2e6dc1f83db0563c7d6603967bf9585ce934b)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: ahash - Add padding in crypto_ahash_extsize
Herbert Xu [Wed, 29 Jun 2016 10:03:47 +0000 (18:03 +0800)]
crypto: ahash - Add padding in crypto_ahash_extsize

Orabug: 25243093

The function crypto_ahash_extsize did not include padding when
computing the tfm context size.  This patch fixes this by using
the generic crypto_alg_extsize helper.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2495cf25f60e085b35beb9b215235dfe1ca4f200)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: scatterwalk - Add no-copy support to copychunks
Herbert Xu [Tue, 12 Jul 2016 05:17:55 +0000 (13:17 +0800)]
crypto: scatterwalk - Add no-copy support to copychunks

Orabug: 25243093

The function ablkcipher_done_slow is pretty much identical to
scatterwalk_copychunks except that it doesn't actually copy as
the processing hasn't been completed yet.

This patch allows scatterwalk_copychunks to be used in this case
by specifying out == 2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 85eccddee401ae81067e763516889780b5545160)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: scatterwalk - Hide PageSlab call to optimise away flush_dcache_page
Herbert Xu [Mon, 1 Jun 2015 08:22:03 +0000 (16:22 +0800)]
crypto: scatterwalk - Hide PageSlab call to optimise away flush_dcache_page

Orabug: 25243093

On architectures where flush_dcache_page is not needed, we will
end up generating all the code up to the PageSlab call.  This is
because PageSlab operates on a volatile pointer and thus cannot
be optimised away.

This patch works around this by checking whether flush_dcache_page
is needed before we call PageSlab which then allows PageSlab to be
compiled awy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 160544075f2a4028209721723a51f16add7b08b9)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: scatterwalk - Add missing sg_init_table to scatterwalk_ffwd
Herbert Xu [Wed, 27 May 2015 06:37:27 +0000 (14:37 +0800)]
crypto: scatterwalk - Add missing sg_init_table to scatterwalk_ffwd

Orabug: 25243093

We need to call sg_init_table as otherwise the first entry may
inadvertently become the last.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit fdaef75f66bba5999a94f3cd9156bf353ba2ef98)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: scatterwalk - Check for same address in map_and_copy
Herbert Xu [Thu, 21 May 2015 07:11:12 +0000 (15:11 +0800)]
crypto: scatterwalk - Check for same address in map_and_copy

Orabug: 25243093

This patch adds a check for in scatterwalk_map_and_copy to avoid
copying from the same address to the same address.  This is going
to be used for IV copying in AEAD IV generators.

There is no provision for partial overlaps.

This patch also uses the new scatterwalk_ffwd instead of doing
it by hand in scatterwalk_map_and_copy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 74412fd5d71b6eda0beb302aa467da000f0d530c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agomacsec: fix negative refcnt on parent link
Sabrina Dubroca [Fri, 29 Jul 2016 13:37:55 +0000 (15:37 +0200)]
macsec: fix negative refcnt on parent link

Orabug: 25243093

When creation of a macsec device fails because an identical device
already exists on this link, the current code decrements the refcnt on
the parent link (in ->destructor for the macsec device), but it had not
been incremented yet.

Move the dev_hold(parent_link) call earlier during macsec device
creation.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0759e552bce7257db542e203a01a9ef8843c751e)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agomacsec: RXSAs don't need to hold a reference on RXSCs
Sabrina Dubroca [Fri, 29 Jul 2016 13:37:54 +0000 (15:37 +0200)]
macsec: RXSAs don't need to hold a reference on RXSCs

Orabug: 25243093

Following the previous patch, RXSCs are held and properly refcounted in
the RX path (instead of being implicitly held by their SA), so the SA
doesn't need to hold a reference on its parent RXSC.

This also avoids panics on module unload caused by the double layer of
RCU callbacks (call_rcu frees the RXSA, which puts the final reference
on the RXSC and allows to free it in its own call_rcu) that commit
b196c22af5c3 ("macsec: add rcu_barrier() on module exit") didn't
protect against.
There were also some refcounting bugs in macsec_add_rxsa where I didn't
put the reference on the RXSC on the error paths, which would lead to
memory leaks.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 36b232c880c99fc03e135198c7c08d3d4b4f83ab)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agomacsec: fix reference counting on RXSC in macsec_handle_frame
Sabrina Dubroca [Fri, 29 Jul 2016 13:37:53 +0000 (15:37 +0200)]
macsec: fix reference counting on RXSC in macsec_handle_frame

Orabug: 25243093

Currently, we lookup the RXSC without taking a reference on it.  The
RXSA holds a reference on the RXSC, but the SA and SC could still both
disappear before we take a reference on the SA.

Take a reference on the RXSC in macsec_handle_frame.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c78ebe1df01f4ef3fb07be1359bc34df6708d99c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Conflicts:
drivers/net/macsec.c

8 years agomacsec: ensure rx_sa is set when validation is disabled
Beniamino Galvani [Tue, 26 Jul 2016 10:24:53 +0000 (12:24 +0200)]
macsec: ensure rx_sa is set when validation is disabled

Orabug: 25243093

macsec_decrypt() is not called when validation is disabled and so
macsec_skb_cb(skb)->rx_sa is not set; but it is used later in
macsec_post_decrypt(), ensure that it's always initialized.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e3a3b626010a14fe067f163c2c43409d5afcd2a9)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: testmgr - don't copy from source IV too much
Andrey Ryabinin [Thu, 10 Sep 2015 10:11:55 +0000 (13:11 +0300)]
crypto: testmgr - don't copy from source IV too much

Orabug: 25243093

While the destination buffer 'iv' is MAX_IVLEN size,
the source 'template[i].iv' could be smaller, thus
memcpy may read read invalid memory.
Use crypto_skcipher_ivsize() to get real ivsize
and pass it to memcpy.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 84cba178a3b88efe2668a9039f70abda072faa21)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agogcm - Fix rfc4543 decryption crash
Herbert Xu [Thu, 17 Mar 2016 17:42:00 +0000 (02:42 +0900)]
gcm - Fix rfc4543 decryption crash

Orabug: 25243093

This bug has already bee fixed upstream since 4.2.  However, it
was fixed during the AEAD conversion so no fix was backported to
the older kernels.

When we do an RFC 4543 decryption, we will end up writing the
ICV beyond the end of the dst buffer.  This should lead to a
crash but for some reason it was never noticed.

This patch fixes it by only writing back the ICV for encryption.

Fixes: d733ac90f9fe ("crypto: gcm - fix rfc4543 to handle async...")
Reported-by: Patrick Meyer <patrick.meyer@vasgard.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: tcrypt - Handle async return from crypto_ahash_init
Herbert Xu [Wed, 22 Apr 2015 03:02:27 +0000 (11:02 +0800)]
crypto: tcrypt - Handle async return from crypto_ahash_init

Orabug: 25243093

The function crypto_ahash_init can also be asynchronous just
like update and final.  So all callers must be able to handle
an async return.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 43a9607d86e8fb110b596d300dbaae895c198fed)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: tcrypt - Fix AEAD speed tests
Vutla, Lokesh [Tue, 7 Jul 2015 15:31:49 +0000 (21:01 +0530)]
crypto: tcrypt - Fix AEAD speed tests

Orabug: 25243093

The AEAD speed tests doesn't do a wait_for_completition,
if the return value is EINPROGRESS or EBUSY.
Fixing it here.
Also add a test case for gcm(aes).

Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 1425d2d17f7309c65e2bd124e0ce8ace743b17f2)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: qat: fix issue when mapping assoc to internal AD struct
Tadeusz Struk [Fri, 5 Jun 2015 22:52:13 +0000 (15:52 -0700)]
crypto: qat: fix issue when mapping assoc to internal AD struct

Orabug: 25243093

This patch fixes an issue when building an internal AD representation.
We need to check assoclen and not only blindly loop over assoc sgl.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ecb479d07a4e2db980965193511dbf2563d50db5)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: testmgr - fix overlap in chunked tests again
Ard Biesheuvel [Thu, 8 Dec 2016 08:23:52 +0000 (08:23 +0000)]
crypto: testmgr - fix overlap in chunked tests again

Orabug: 25243093

Commit 7e4c7f17cde2 ("crypto: testmgr - avoid overlap in chunked tests")
attempted to address a problem in the crypto testmgr code where chunked
test cases are copied to memory in a way that results in overlap.

However, the fix recreated the exact same issue for other chunked tests,
by putting IDX3 within 492 bytes of IDX1, which causes overlap if the
first chunk exceeds 492 bytes, which is the case for at least one of
the xts(aes) test cases.

So increase IDX3 by another 1000 bytes.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 04b46fbdea5e31ffd745a34fa61269a69ba9f47a)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: testmgr - avoid overlap in chunked tests
Ard Biesheuvel [Mon, 5 Dec 2016 18:42:23 +0000 (18:42 +0000)]
crypto: testmgr - avoid overlap in chunked tests

Orabug: 25243093

The IDXn offsets are chosen such that tap values (which may go up to
255) end up overlapping in the xbuf allocation. In particular, IDX1
and IDX3 are too close together, so update IDX3 to avoid this issue.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7e4c7f17cde280079db731636175b1732be7188c)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
Herbert Xu [Tue, 12 Jul 2016 05:17:58 +0000 (13:17 +0800)]
crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone

Orabug: 25243093

The offset advance in scatterwalk_pagedone not only is unnecessary,
but it was also buggy when it was needed by scatterwalk_copychunks.
As the latter has long ago been fixed to call scatterwalk_advance
directly, we can remove this unnecessary offset adjustment.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 28cf86fafdd663cfcad3c5a5fe9869f1fa01b472)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
Herbert Xu [Tue, 12 Jul 2016 05:17:59 +0000 (13:17 +0800)]
crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start

Orabug: 25243093

Nothing bad will happen even if sg->length is zero, so there is
no point in keeping this BUG_ON in scatterwalk_start.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2ee732d57496b8365819dfb958bc1ff04fcd4cac)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: cryptd - Use crypto_grab_aead
Herbert Xu [Thu, 21 May 2015 07:10:57 +0000 (15:10 +0800)]
crypto: cryptd - Use crypto_grab_aead

Orabug: 25483918
Orabug: 25243093

As AEAD has switched over to using frontend types, the function
crypto_init_spawn must not be used since it does not specify a
frontend type.  Otherwise it leads to a crash when the spawn is
used.

This patch fixes it by switching over to crypto_grab_aead instead.

Fixes: 5d1d65f8bea6 ("crypto: aead - Convert top level interface to new style")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 9b8c456e081e7eca856ad9b2a92980a68887f533)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: testmgr - fix out of bound read in __test_aead()
Jerome Marchand [Wed, 3 Feb 2016 12:58:12 +0000 (13:58 +0100)]
crypto: testmgr - fix out of bound read in __test_aead()

Orabug: 25243093

__test_aead() reads MAX_IVLEN bytes from template[i].iv, but the
actual length of the initialisation vector can be shorter.
The length of the IV is already calculated earlier in the
function. Let's just reuses that. Also the IV length is currently
calculated several time for no reason. Let's fix that too.
This fix an out-of-bound error detected by KASan.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit abfa7f4357e3640fdee87dfc276fd0f379fb5ae6)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agocrypto: algif_aead - fix for multiple operations on AF_ALG sockets
Lars Persson [Tue, 25 Aug 2015 09:59:15 +0000 (11:59 +0200)]
crypto: algif_aead - fix for multiple operations on AF_ALG sockets

Orabug: 25243093

The tsgl scatterlist must be re-initialized after each
operation. Otherwise the sticky bits in the page_link will corrupt the
list with pre-mature termination or false chaining.

Signed-off-by: Lars Persson <larper@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit bf433416e67597ba105ece55b3136557874945db)
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agonetvsc: fix incorrect receive checksum offloading
Stephen Hemminger [Mon, 24 Oct 2016 04:32:47 +0000 (21:32 -0700)]
netvsc: fix incorrect receive checksum offloading

The Hyper-V netvsc driver was looking at the incorrect status bits
in the checksum info. It was setting the receive checksum unnecessary
flag based on the IP header checksum being correct. The checksum
flag is skb is about TCP and UDP checksum status. Because of this
bug, any packet received with bad TCP checksum would be passed
up the stack and to the application causing data corruption.
The problem is reproducible via netcat and netem.

This had a side effect of not doing receive checksum offload
on IPv6. The driver was also also always doing checksum offload
independent of the checksum setting done via ethtool.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25219569

(cherry picked from commit e52fed7177f74382f742c27de2cc5314790aebb6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agoKVM: x86: drop error recovery in em_jmp_far and em_ret_far
Radim Krčmář [Wed, 23 Nov 2016 20:15:00 +0000 (21:15 +0100)]
KVM: x86: drop error recovery in em_jmp_far and em_ret_far

Orabug: 25190929
CVE: CVE-2016-9756

em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.

Found by syzkaller:

  WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   [...]
  Call Trace:
   [...] __dump_stack lib/dump_stack.c:15
   [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
   [...] panic+0x1b7/0x3a3 kernel/panic.c:179
   [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
   [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
   [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
   [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
   [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
   [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
   [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
   [...] complete_emulated_io arch/x86/kvm/x86.c:6870
   [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
   [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
   [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
   [...] vfs_ioctl fs/ioctl.c:43
   [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
   [...] SYSC_ioctl fs/ioctl.c:694
   [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
   [...] entry_SYSCALL_64_fastpath+0x1f/0xc2

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit 2117d5398c81554fbf803f5fd1dc55eb78216c0c)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agohv: do not lose pending heartbeat vmbus packets
Long Li [Wed, 5 Oct 2016 23:57:46 +0000 (16:57 -0700)]
hv: do not lose pending heartbeat vmbus packets

The host keeps sending heartbeat packets independent of the
guest responding to them.  Even though we respond to the heartbeat messages at
interrupt level, we can have situations where there maybe multiple heartbeat
messages pending that have not been responded to. For instance this occurs when the
VM is paused and the host continues to send the heartbeat messages.
Address this issue by draining and responding to all
the heartbeat messages that maybe pending.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 25144648

(cherry picked from commit 407a3aee6ee2d2cb46d9ba3fc380bc29f35d020c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
8 years agotcp: take care of truncations done by sk_filter()
Eric Dumazet [Thu, 10 Nov 2016 21:12:35 +0000 (13:12 -0800)]
tcp: take care of truncations done by sk_filter()

Orabug: 25104761
CVE: CVE-2016-8645

With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()

Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.

We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq

Many thanks to syzkaller team and Marco for giving us a reproducer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ac6e780070e30e4c35bd395acfe9191e6268bdd3)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
include/net/tcp.h
net/ipv4/tcp_ipv4.c

8 years agorose: limit sk_filter trim to payload
Willem de Bruijn [Tue, 12 Jul 2016 22:18:56 +0000 (18:18 -0400)]
rose: limit sk_filter trim to payload

Orabug: 25104761
CVE: CVE-2016-8645

Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.

Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.

Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f4979fcea7fd36d8e2f556abef86f80e0d5af1ba)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
net/core/filter.c

8 years agotipc: check minimum bearer MTU
Michal Kubeček [Fri, 2 Dec 2016 08:33:41 +0000 (09:33 +0100)]
tipc: check minimum bearer MTU

Orabug: 25063416
CVE: CVE-2016-8632

Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.

As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.

Conflicts:
    net/tipc/bearer.c
    net/tipc/bearer.h

Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3de81b758853f0b29c61e246679d20b513c4cfec)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
8 years agofix minor infoleak in get_user_ex()
Al Viro [Thu, 15 Sep 2016 01:35:29 +0000 (02:35 +0100)]
fix minor infoleak in get_user_ex()

get_user_ex(x, ptr) should zero x on failure.  It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1c109fabbd51863475cd12ac206bdd249aee35af)

Orabug: 25063299
CVE-2016-9178

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agoscsi: arcmsr: Simplify user_len checking
Borislav Petkov [Fri, 23 Sep 2016 11:22:26 +0000 (13:22 +0200)]
scsi: arcmsr: Simplify user_len checking

Do the user_len check first and then the ver_addr allocation so that we
can save us the kfree() on the error path when user_len is >
ARCMSR_API_DATA_BUFLEN.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Marco Grassi <marco.gra@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4bd173c30792791a6daca8c64793ec0a4ae8324f)

Orabug: 24710898
CVE-2016-7425

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agoscsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()
Dan Carpenter [Thu, 15 Sep 2016 13:44:56 +0000 (16:44 +0300)]
scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()

We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.

Cc: <stable@vger.kernel.org>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7bc2b55a5c030685b399bb65b6baa9ccc3d1f167)

Orabug: 24710898
CVE-2016-7425

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agotmpfs: clear S_ISGID when setting posix ACLs
Gu Zheng [Mon, 9 Jan 2017 01:34:48 +0000 (09:34 +0800)]
tmpfs: clear S_ISGID when setting posix ACLs

This change was missed the tmpfs modification in In CVE-2016-7097
commit 073931017b49 ("posix_acl: Clear SGID bit when setting
file permissions")
It can test by xfstest generic/375, which failed to clear
setgid bit in the following test case on tmpfs:

  touch $testfile
  chown 100:100 $testfile
  chmod 2755 $testfile
  _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile

Signed-off-by: Gu Zheng <guzheng1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 497de07d89c1410d76a15bec2bb41f24a2a89f31)

Orabug: 24587481
CVE-2016-7097

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agoposix_acl: Clear SGID bit when setting file permissions
Jan Kara [Mon, 19 Sep 2016 15:39:09 +0000 (17:39 +0200)]
posix_acl: Clear SGID bit when setting file permissions

When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2).  Fix that.

References: CVE-2016-7097
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 073931017b49d9458aa351605b43a7e34598caef)

Orabug: 24587481
CVE-2016-7097

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
 Conflicts:
fs/f2fs/acl.c
fs/orangefs/acl.c

8 years agocoredump: Ensure proper size of sparse core files
Dave Kleikamp [Wed, 11 Jan 2017 19:25:00 +0000 (13:25 -0600)]
coredump: Ensure proper size of sparse core files

If the last section of a core file ends with an unmapped or zero page,
the size of the file does not correspond with the last dump_skip() call.
gdb complains that the file is truncated and can be confusing to users.

After all of the vma sections are written, make sure that the file size
is no smaller than the current file position.

This problem can be demonstrated with gdb's bigcore testcase on the
sparc architecture.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Orabug: 22106344
Mainline v4.10 commit 4d22c75d4c7b5c5f4bd31054f09103ee490878fd

8 years agodccp: fix freeing skb too early for IPV6_RECVPKTINFO
Andrey Konovalov [Thu, 16 Feb 2017 16:22:46 +0000 (17:22 +0100)]
dccp: fix freeing skb too early for IPV6_RECVPKTINFO

In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.

However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.

Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.

Similar fixes for TCP:

fb7e2399ec17f1004c0e0ccfd17439f8759ede01 [TCP]: skb is unexpectedly freed.
0aea76d35c9651d55bbaf746e7914e5f9ae5a25d tcp: SYN packets are now
simply consumed

Orabug: 25585296
CVE-2017-6074

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Aniket Alshi <aniket.alshi@oracle.com>
(cherry picked from commit 5edabca9d4cff7f1f2b68f0bac55ef99d9798ba4)
Reviewed-by: Brian Maly <brian.maly@oracle.com>
8 years agonet: Documentation: Fix default value tcp_limit_output_bytes
Niklas Cassel [Mon, 9 Nov 2015 14:59:00 +0000 (15:59 +0100)]
net: Documentation: Fix default value tcp_limit_output_bytes

Commit c39c4c6abb89 ("tcp: double default TSQ output bytes limit")
updated default value for tcp_limit_output_bytes

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 821b414405a78c3d38921c2545b492eb974d3814)
Oracle-Bug: 25424818
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Raj Matharasi <raj.matharasi@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agotcp: double default TSQ output bytes limit
Wei Liu [Wed, 3 Jun 2015 10:10:42 +0000 (11:10 +0100)]
tcp: double default TSQ output bytes limit

Xen virtual network driver has higher latency than a physical NIC.
Having only 128K as limit for TSQ introduced 30% regression in guest
throughput.

This patch raises the limit to 256K. This reduces the regression to 8%.
This buys us more time to work out a proper solution in the long run.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c39c4c6abb89d24454b63798ccbae12b538206a5)
Oracle-Bug: 25424818
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Raj Matharasi <raj.matharasi@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoscsi: qla2xxx: Get mutex lock before checking optrom_state
Milan P. Gandhi [Sat, 24 Dec 2016 16:32:46 +0000 (22:02 +0530)]
scsi: qla2xxx: Get mutex lock before checking optrom_state

There is a race condition with qla2xxx optrom functions where one thread
might modify optrom buffer, optrom_state while other thread is still
reading from it.

In couple of crashes, it was found that we had successfully passed the
following 'if' check where we confirm optrom_state to be
QLA_SREADING. But by the time we acquired mutex lock to proceed with
memory_read_from_buffer function, some other thread/process had already
modified that option rom buffer and optrom_state from QLA_SREADING to
QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:

        if (ha->optrom_state != QLA_SREADING)
                return 0;

        mutex_lock(&ha->optrom_mutex);
        rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
            ha->optrom_region_size);
        mutex_unlock(&ha->optrom_mutex);

With current optrom function we get following crash due to a race
condition:

[ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1479.466707] IP: [<ffffffff81326756>] memcpy+0x6/0x110
[...]
[ 1479.473673] Call Trace:
[ 1479.474296]  [<ffffffff81225cbc>] ? memory_read_from_buffer+0x3c/0x60
[ 1479.474941]  [<ffffffffa01574dc>] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
[ 1479.475571]  [<ffffffff8127e76b>] read+0xdb/0x1f0
[ 1479.476206]  [<ffffffff811fdf9e>] vfs_read+0x9e/0x170
[ 1479.476839]  [<ffffffff811feb6f>] SyS_read+0x7f/0xe0
[ 1479.477466]  [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

Below patch modifies qla2x00_sysfs_read_optrom,
qla2x00_sysfs_write_optrom functions to get the mutex_lock before
checking ha->optrom_state to avoid similar crashes.

The patch was applied and tested and same crashes were no longer
observed again.

Orabug: 25344639

Tested-by: Milan P. Gandhi <mgandhi@redhat.com>
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c7702b8c22712a06080e10f1d2dee1a133ec8809)
Signed-off-by: Ritika Srivastava <ritika.srivastava@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agokvm: x86: Check memopp before dereference (CVE-2016-8630)
Owen Hofmann [Thu, 27 Oct 2016 18:25:52 +0000 (11:25 -0700)]
kvm: x86: Check memopp before dereference (CVE-2016-8630)

Orabug: 25133227
CVE: CVE-2016-8630

Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a
check for non-NULL under incorrect assumptions. An undefined instruction
with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
to dereference a null pointer here.

Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
Message-Id: <1477592752-126650-2-git-send-email-osh@google.com>
Signed-off-by: Owen Hofmann <osh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d9092f52d7e61dd1557f2db2400ddb430e85937e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agofirewire: net: guard against rx buffer overflows
Stefan Richter [Sat, 29 Oct 2016 19:28:18 +0000 (21:28 +0200)]
firewire: net: guard against rx buffer overflows

Orabug: 25063191
CVE: CVE-2016-8633

The IP-over-1394 driver firewire-net lacked input validation when
handling incoming fragmented datagrams.  A maliciously formed fragment
with a respectively large datagram_offset would cause a memcpy past the
datagram buffer.

So, drop any packets carrying a fragment with offset + length larger
than datagram_size.

In addition, ensure that
  - GASP header, unfragmented encapsulation header, or fragment
    encapsulation header actually exists before we access it,
  - the encapsulated datagram or fragment is of nonzero size.

Reported-by: Eyal Itkin <eyal.itkin@gmail.com>
Reviewed-by: Eyal Itkin <eyal.itkin@gmail.com>
Fixes: CVE 2016-8633
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
(cherry picked from commit 667121ace9dbafb368618dbabcf07901c962ddac)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agoUSB: usbfs: fix potential infoleak in devio
Kangjie Lu [Tue, 3 May 2016 20:32:16 +0000 (16:32 -0400)]
USB: usbfs: fix potential infoleak in devio

Orabug: 23267548
CVE: CVE-2016-4482

The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 681fef8380eb818c0b845fca5d2ab1dcbab114ee)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agousbnet: cleanup after bind() in probe()
Oliver Neukum [Mon, 7 Mar 2016 10:31:10 +0000 (11:31 +0100)]
usbnet: cleanup after bind() in probe()

In case bind() works, but a later error forces bailing
in probe() in error cases work and a timer may be scheduled.
They must be killed. This fixes an error case related to
the double free reported in
http://www.spinics.net/lists/netdev/msg367669.html
and needs to go on top of Linus' fix to cdc-ncm.

Orabug: 23070825
CVE-2016-3951

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1666984c8625b3db19a9abc298931d35ab7bc64b)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agocdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
Bjørn Mork [Mon, 7 Mar 2016 20:15:36 +0000 (21:15 +0100)]
cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind

usbnet_link_change will call schedule_work and should be
avoided if bind is failing. Otherwise we will end up with
scheduled work referring to a netdev which has gone away.

Instead of making the call conditional, we can just defer
it to usbnet_probe, using the driver_info flag made for
this purpose.

Orabug: 23070825
CVE-2016-3951

Fixes: 8a34b0ae8778 ("usbnet: cdc_ncm: apply usbnet_link_change")
Reported-by: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4d06dd537f95683aba3651098ae288b7cbff8274)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agocdc_ncm: Add support for moving NDP to end of NCM frame
Enrico Mioso [Wed, 8 Jul 2015 11:05:57 +0000 (13:05 +0200)]
cdc_ncm: Add support for moving NDP to end of NCM frame

NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.

This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).

V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB

Orabug: 23070825
CVE-2016-3951

Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a0e3e989d66bb7204b163d9cfaa7fa96d0f2023)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agox86/mm/32: Enable full randomization on i386 and X86_32
Hector Marco-Gisbert [Thu, 10 Mar 2016 19:51:00 +0000 (20:51 +0100)]
x86/mm/32: Enable full randomization on i386 and X86_32

Currently on i386 and on X86_64 when emulating X86_32 in legacy mode, only
the stack and the executable are randomized but not other mmapped files
(libraries, vDSO, etc.). This patch enables randomization for the
libraries, vDSO and mmap requests on i386 and in X86_32 in legacy mode.

By default on i386 there are 8 bits for the randomization of the libraries,
vDSO and mmaps which only uses 1MB of VA.

This patch preserves the original randomness, using 1MB of VA out of 3GB or
4GB. We think that 1MB out of 3GB is not a big cost for having the ASLR.

The first obvious security benefit is that all objects are randomized (not
only the stack and the executable) in legacy mode which highly increases
the ASLR effectiveness, otherwise the attackers may use these
non-randomized areas. But also sensitive setuid/setgid applications are
more secure because currently, attackers can disable the randomization of
these applications by setting the ulimit stack to "unlimited". This is a
very old and widely known trick to disable the ASLR in i386 which has been
allowed for too long.

Another trick used to disable the ASLR was to set the ADDR_NO_RANDOMIZE
personality flag, but fortunately this doesn't work on setuid/setgid
applications because there is security checks which clear Security-relevant
flags.

This patch always randomizes the mmap_legacy_base address, removing the
possibility to disable the ASLR by setting the stack to "unlimited".

Orabug: 23070708
CVE-2016-3672

Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Acked-by: Ismael Ripoll Ripoll <iripoll@upv.es>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1457639460-5242-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8b8addf891de8a00e4d39fc32f93f7c5eb8feceb)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
8 years agoDon't feed anything but regular iovec's to blk_rq_map_user_iov
Linus Torvalds [Wed, 7 Dec 2016 00:18:14 +0000 (16:18 -0800)]
Don't feed anything but regular iovec's to blk_rq_map_user_iov

In theory we could map other things, but there's a reason that function
is called "user_iov".  Using anything else (like splice can do) just
confuses it.

Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0)

Orabug: 25230657
CVE: CVE-2016-9576
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Conflicts:
block/blk-map.c

8 years agocrypto: algif_hash - Only export and import on sockets with data
Herbert Xu [Sun, 1 Nov 2015 09:11:19 +0000 (17:11 +0800)]
crypto: algif_hash - Only export and import on sockets with data

Orabug: 25097996
CVE: CVE-2016-8646

The hash_accept call fails to work on sockets that have not received
any data.  For some algorithm implementations it may cause crashes.

This patch fixes this by ensuring that we only export and import on
sockets that have received data.

Cc: stable@vger.kernel.org
Reported-by: Harsh Jain <harshjain.prof@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Mueller <smueller@chronox.de>
(cherry picked from commit 4afa5f9617927453ac04b24b584f6c718dfb4f45)
Signed-off-by: Aniket Alshi <aniket.alshi@oracle.com>
8 years agouserfaultfd: fix SIGBUS resulting from false rwsem wakeups
Andrea Arcangeli [Thu, 12 Jan 2017 01:18:45 +0000 (12:18 +1100)]
userfaultfd: fix SIGBUS resulting from false rwsem wakeups

Orabug: 21685254

With >=32 CPUs the userfaultfd selftest triggered a graceful but
unexpected SIGBUS because VM_FAULT_RETRY was returned by
handle_userfault() despite the UFFDIO_COPY wasn't completed.

This seems caused by rwsem waking the thread blocked in handle_userfault()
and we can't run up_read() before the wait_event sequence is complete.

Keeping the wait_even sequence identical to the first one, would require
running userfaultfd_must_wait() again to know if the loop should be
repeated, and it would also require retaking the rwsem and revalidating
the whole vma status.

It seems simpler to wait the targeted wakeup so that if false wakeups
materialize we still wait for our specific wakeup event, unless of course
there are signals or the uffd was released.

Debug code collecting the stack trace of the wakeup showed this:

$ ./userfaultfd 100 99999
nr_pages: 25600, nr_pages_per_cpu: 800
bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2
bounces: 99997, mode: rnd ver poll, Bus error (core dumped)

   [<ffffffff8102e19b>] save_stack_trace+0x2b/0x50
   [<ffffffff8110b1d6>] try_to_wake_up+0x2a6/0x580
   [<ffffffff8110b502>] wake_up_q+0x32/0x70
   [<ffffffff8113d7a0>] rwsem_wake+0xe0/0x120
   [<ffffffff8148361b>] call_rwsem_wake+0x1b/0x30
   [<ffffffff81131d5b>] up_write+0x3b/0x40
   [<ffffffff812280fc>] vm_mmap_pgoff+0x9c/0xc0
   [<ffffffff81244b79>] SyS_mmap_pgoff+0x1a9/0x240
   [<ffffffff810228f2>] SyS_mmap+0x22/0x30
   [<ffffffff81842dfc>] entry_SYSCALL_64_fastpath+0x1f/0xbd
   [<ffffffffffffffff>] 0xffffffffffffffff
FAULT_FLAG_ALLOW_RETRY missing 70
CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G        W       4.8.0+ #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
 0000000000000000 ffff880218027d40 ffffffff814749f6 ffff8802180c4a80
 ffff880231ead4c0 ffff880218027e18 ffffffff812e0102 ffffffff81842b1c
 ffff880218027dd0 ffff880218082960 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff814749f6>] dump_stack+0xb8/0x112
 [<ffffffff812e0102>] handle_userfault+0x572/0x650
 [<ffffffff81842b1c>] ? _raw_spin_unlock_irq+0x2c/0x50
 [<ffffffff812407b4>] ? handle_mm_fault+0xcf4/0x1520
 [<ffffffff81240d7e>] ? handle_mm_fault+0x12be/0x1520
 [<ffffffff81240d8b>] handle_mm_fault+0x12cb/0x1520
 [<ffffffff81483588>] ? call_rwsem_down_read_failed+0x18/0x30
 [<ffffffff810572c5>] __do_page_fault+0x175/0x500
 [<ffffffff810576f1>] trace_do_page_fault+0x61/0x270
 [<ffffffff81050739>] do_async_page_fault+0x19/0x90
 [<ffffffff81843ef5>] async_page_fault+0x25/0x30

This always happens when the main userfault selftest thread is running
clone() while glibc runs either mprotect or mmap (both taking mmap_sem
down_write()) to allocate the thread stack of the background threads,
while locking/userfault threads already run at full throttle and are
susceptible to false wakeups that may cause handle_userfault() to return
before than expected (which results in graceful SIGBUS at the next
attempt).

This was reproduced only with >=32 CPUs because the loop to start the
thread where clone() is too quick with fewer CPUs, while with 32 CPUs
there's already significant activity on ~32 locking and userfault threads
when the last background threads are started with clone().

This >=32 CPUs SMP race condition is likely reproducible only with the
selftest because of the much heavier userfault load it generates if
compared to real apps.

We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a
patch floating around that provides it also hidden this problem but in
reality only is successfully at hiding the problem.  False wakeups could
still happen again the second time handle_userfault() is invoked, even if
it's a so rare race condition that getting false wakeups twice in a row is
impossible to reproduce.  This full fix is needed for correctness, the
only alternative would be to allow VM_FAULT_RETRY to be returned
infinitely.  With this fix the WP support can stick to a strict "one more"
VM_FAULT_RETRY logic (no need of returning it infinite times to avoid the
SIGBUS).

Link: http://lkml.kernel.org/r/20170111005535.13832-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next next-20170117
 commit d08f4a51bce6143390469c92e01d201ac4d68890)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agouserfaultfd: hugetlbfs: fix add copy_huge_page_from_user for hugetlb userfaultfd...
Andrew Morton [Mon, 16 Jan 2017 21:38:04 +0000 (13:38 -0800)]
userfaultfd: hugetlbfs: fix add copy_huge_page_from_user for hugetlb userfaultfd support

Orabug: 21685254

Was in Andrew's patch series on January 17, 2017 as:
userfaultfd hugetlbfs fix __mcopy_atomic_hugetlb retry error processing fix fix

kunmap() takes a page*, per Hugh

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Ported to UEK ]
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agouserfaultfd: hugetlbfs: reserve count on error in __mcopy_atomic_hugetlb
Mike Kravetz [Thu, 12 Jan 2017 01:19:18 +0000 (12:19 +1100)]
userfaultfd: hugetlbfs: reserve count on error in __mcopy_atomic_hugetlb

Orabug: 21685254

If __mcopy_atomic_hugetlb exits with an error, put_page will be called if
a huge page was allocated and needs to be freed.  If a reservation was
associated with the huge page, the PagePrivate flag will be set.  Clear
PagePrivate before calling put_page/free_huge_page so that the global
reservation count is not incremented.

Link: http://lkml.kernel.org/r/20161216144821.5183-26-aarcange@redhat.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next next-20170117
 commit afcc2616d4284859d8f70bbdfa6c9ca92fbb08ed)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agouserfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY
Andrea Arcangeli [Thu, 12 Jan 2017 01:19:17 +0000 (12:19 +1100)]
userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY

Orabug: 21685254

Add support for VM_FAULT_RETRY to follow_hugetlb_page() so that
get_user_pages_unlocked/locked and "nonblocking/FOLL_NOWAIT" features will
work on hugetlbfs.  This is required for fully functional userfaultfd
non-present support on hugetlbfs.

Link: http://lkml.kernel.org/r/20161216144821.5183-25-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next next-20170117
 commit b33127bd2a0367d093bfeb1abd147754ff90e670)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
 Conflicts:
mm/hugetlb.c

8 years agouserfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges
Mike Kravetz [Thu, 12 Jan 2017 01:19:17 +0000 (12:19 +1100)]
userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges

Orabug: 21685254

Add routine userfaultfd_huge_must_wait which has the same functionality as
the existing userfaultfd_must_wait routine.  Only difference is that new
routine must handle page table structure for hugepmd vmas.

Link: http://lkml.kernel.org/r/20161216144821.5183-24-aarcange@redhat.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next next-20170117
 commit 36a121cb303d54b24bc4e590faf813daec1025d7)
[ Ported to UEK ]
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agouserfaultfd: hugetlbfs: add userfaultfd_hugetlb test
Mike Kravetz [Thu, 12 Jan 2017 01:19:16 +0000 (12:19 +1100)]
userfaultfd: hugetlbfs: add userfaultfd_hugetlb test

Orabug: 21685254

Test userfaultfd hugetlb functionality by using the existing testing
method (in userfaultfd.c).  Instead of an anonymous memeory, a hugetlbfs
file is mmap'ed private.  In this way fallocate hole punch can be used to
release pages.  This is because madvise(MADV_DONTNEED) is not supported
for huge pages.

Use the same file, but create wrappers for allocating ranges and releasing
pages.  Compile userfaultfd.c with HUGETLB_TEST defined to produce an
executable to test userfaultfd hugetlb functionality.

Link: http://lkml.kernel.org/r/20161216144821.5183-23-aarcange@redhat.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next next-20170117
 commit 44d5f4b70ff826f43791aa0fc18ca8fd02f1b432)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
tools/testing/selftests/vm/Makefile
tools/testing/selftests/vm/run_vmtests

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agouserfaultfd: hugetlbfs: allow registration of ranges containing huge pages
Mike Kravetz [Thu, 12 Jan 2017 01:19:16 +0000 (12:19 +1100)]
userfaultfd: hugetlbfs: allow registration of ranges containing huge pages

Orabug: 21685254

Expand the userfaultfd_register/unregister routines to allow VM_HUGETLB
vmas.  huge page alignment checking is performed after a VM_HUGETLB vma is
encountered.

Also, since there is no UFFDIO_ZEROPAGE support for huge pages do not
return that as a valid ioctl method for huge page ranges.

Link: http://lkml.kernel.org/r/20161216144821.5183-22-aarcange@redhat.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next next-20170117
 commit 6be4576b101b7026f72ec240f393c1dd5dfa02da)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
fs/userfaultfd.c

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agouserfaultfd: hugetlbfs: add userfaultfd hugetlb hook
Mike Kravetz [Thu, 12 Jan 2017 01:19:16 +0000 (12:19 +1100)]
userfaultfd: hugetlbfs: add userfaultfd hugetlb hook

Orabug: 21685254

When processing a hugetlb fault for no page present, check the vma to
determine if faults are to be handled via userfaultfd.  If so, drop the
hugetlb_fault_mutex and call handle_userfault().

Link: http://lkml.kernel.org/r/20161216144821.5183-21-aarcange@redhat.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next next-20170117
 commit 20609d5667d3db6545062527036875e68451086a)
[ Ported to UEK ]
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>