Do not write the TX doorbell if skb->xmit_more is set unless the TX
queue is full.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4d172f21cefe896df8477940269b8d52129f8c87) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Older chips require the doorbells to be written twice, but newer chips
do not. Add a new common function bnxt_db_write() to write all
doorbells appropriately depending on the chip. Eliminating the extra
doorbell on newer chips has a significant performance improvement
on pktgen.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 434c975a8fe2f70b70ac09ea5ddd008e0528adfa) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Add additional chip definitions and macros for all supported chips.
Add a new macro BNXT_CHIP_P4_PLUS for the newer generation of chips and
use the macro to properly determine the features supported by these
newer chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3284f9e1ab505b41fa604c81e4b3271c6b88cdcb) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
When bnxt_en gets a PCI shutdown call, we need to have a new callback
to inform the RDMA driver to do proper shutdown and removal.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0efd2fc65c922dff207ff10a776a7a33e0e3c7c5) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c7ef35eb0c8d0b58d2d5ae5be599e6aa730361b2) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The new short message format is used on the new BCM57454 VFs. Each
firmware message is a fixed 16-byte message sent using the standard
firmware communication channel. The short message has a DMA address
pointing to the legacy long firmware message.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e605db801bdeb9d94cccbd4a2f641030067ef008) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Otherwise, all the host based DCBX settings from lldpad will fail if the
firmware DCBX agent is running.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f667724b99ad1afc91f16064d8fb293d2805bd57) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
In the current code, bnxt_dcb_init() is called too early before we
determine if the firmware DCBX agent is running or not. As a result,
we are not setting the DCB_CAP_DCBX_HOST and DCB_CAP_DCBX_LLD_MANAGED
flags properly to report to DCBNL.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 87fe603274aa9889c05cca3c3e45675e1997cb13) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Shannon Nelson [Tue, 4 Apr 2017 18:41:35 +0000 (11:41 -0700)]
bnxt: add dma mapping attributes
On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING
attribute in our dma mapping in order to get the expected performance
out of the receive path. Setting this boosts a simple iperf receive
session from 2 Gbe to 23.4 Gbe.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
We have the number of longs, but we need to calculate the number of
bytes required.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ac45bd93a5035c2f39c9862b8b6ed692db0fdc87) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
This change restricts the PF in multi-host mode from setting any port
level PHY configuration. The settings are controlled by firmware in
Multi-Host mode.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9e54e322ded40f424dcb5a13508e2556919ce12a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Check the additional flag in bnxt_hwrm_func_qcfg() before allowing
DCBX to be done in host mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7d63818a35851cf00867248d5ab50a8fe8df5943) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Added support for 100G link speed reporting for Broadcom BCM57454
ASIC in ethtool command.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 38a21b34aacd4db7b7b74c61afae42ea6718448d) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The .ndo_get_vf_config() is returning the wrong qos attribute. Fix
the code that checks and reports the qos and spoofchk attributes. The
BNXT_VF_QOS and BNXT_VF_LINK_UP flags should not be set by default
during init. time.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f0249056eaf2b9a17b2b76a6e099e9b7877e187d) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
When the driver gets the RoCE app priority set/delete call through DCBNL,
the driver will send the information to the firmware to set up the
priority VLAN tag for RDMA traffic.
[ New version using the common ETH_P_IBOE constant in if_ether.h ]
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a82fba8dbfb522bd19b1644bf599135680fd0122) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The current code enables up to the maximum MSIX vectors in the PCIE
config space without considering the max completion rings available.
An MSIX vector is only useful when it has an associated completion
ring, so it is better to cap it.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 68a946bb81e07ed0e59a99e0c068d091ed42cc1b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 67fea463fd873492ab641459a6d1af0e9ea3c9ce) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
It is necessary to disable autoneg before enabling PHY loopback,
otherwise link won't come up.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 91725d89b97acea168a94c577d999801c3b3bcfb) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The mac loopback self test operates in polling mode. To support that,
we need to add functions to open and close the NIC half way. The half
open mode allows the rings to operate without IRQ and NAPI. We
use the XDP transmit function to send the loopback packet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f7dc1ea6c4c1f31371b7098d6fae0d49dc6cdff1) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Add the basic infrastructure and only firmware tests initially.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit eb51365846bc418687af4c4f41b68b6e84cdd449) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Add suspend/resume callbacks using the newer dev_pm_ops method.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f65a2044a8c988adf16788c51c04ac10dbbdb494) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
And add functions to set and free magic packet filter.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5282db6c794fed3ea8b399bc5305c4078e084f7b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8e202366dd752564d7f090ba280cc51cbf7bbbd9) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Add pci shutdown method to put device in the proper WoL and power state.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d196ece740bf337aa25731cd8cb44660a2a227dd) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Add code to driver probe function to check if the device is WoL capable
and if Magic packet WoL filter is currently set.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c1ef146a5bd3b286d5c3eb2c9f631b38647c76d3) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8eb992e876a88de7539b1b9e132dd171d865cd2f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
In bnxt_free_rx_skbs(), which is called to free up all RX buffers during
shutdown, we need to unmap the page if we are running in XDP mode.
Fixes: c61fb99cae51 ("bnxt_en: Add RX page mode support.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3ed3a83e3f3871c57b18cef09b148e96921236ed) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 23e12c893489ed12ecfccbf866fc62af1bead4b0) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Net device reset can fail when the h/w or f/w is in a bad state.
Subsequent netdevice open fails in bnxt_hwrm_stat_ctx_alloc().
The cleanup invokes bnxt_hwrm_resource_free() which inturn
calls bnxt_disable_int(). In this routine, the code segment
if (ring->fw_ring_id != INVALID_HW_RING_ID)
BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons);
results in NULL pointer dereference as cpr->cp_doorbell is not yet
initialized, and fw_ring_id is zero.
The fix is to initialize cpr fw_ring_id to INVALID_HW_RING_ID before
bnxt_init_chip() is invoked.
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2247925f0942dc4e7c09b1cde45ca18461d94c5f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
In some situations, the firmware will return 0 for autoneg supported
speed. This may happen if the firmware detects no SFP module, for
example. The driver should ignore this so that we don't end up with
an invalid autoneg setting with nothing advertised. When SFP module
is inserted, we'll get the updated settings from firmware at that time.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 520ad89a54edea84496695d528f73ddcf4a52ea4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Set DCB_CAP_DCBX_HOST capability flag only if the firmware LLDP agent
is not running.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc39f885a9c3bdbff0a96ecaf07b162a78eff6e4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
If we call bnxt_reset_task() due to tx timeout, we should call
bnxt_ulp_stop() to inform the RDMA driver about the error and the
impending reset.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b386cd362ffea09d05c56bfa85d104562e860647) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
The firmware call to do function reset is done too late. It is causing
the rings that have been reserved to be freed. In NPAR mode, this bug
is causing us to run out of rings.
Fixes: 391be5c27364 ("bnxt_en: Implement new scheme to reserve tx rings.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3c2217a675bac22afb149166e0de71809189850d) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Dave Carroll [Thu, 22 Jun 2017 18:44:09 +0000 (12:44 -0600)]
Initialize fiblink list head during fib initialization
The fiblink pointer is used extensively for the sync mode driver, and has never been initialized
Initialize this during the fib_setup operation, and for copied fibs
Dave Carroll [Fri, 16 Jun 2017 20:59:27 +0000 (14:59 -0600)]
aacraid: Update scsi_host_template to use tagged commands
Most of the non-multi-queue scsi drivers were updated to include
.use_blk_tags in the scsi_host_template, however the aacraid driver
was left out. This will cause the inbox driver to fail to
allocate a tagged fib.
Update the scsi_host_template to include .use_blk_tags
Signed-off-by: Dave Carroll <david.carroll@microsemi.com> Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
syzkaller found a way to trigger double frees from ip_mc_drop_socket()
It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.
Very similar to commit 8b485ce69876 ("tcp: do not inherit
fastopen_req from parent")
Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Pray3r <pray3r.z@gmail.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/inet_connection_sock.c
When ring buf full, hw queue will be stopped. While blkif interrupt consume
request and make free space in ring buf, hw queue will be started again.
But since start queue is protected by spin lock while stop not, that will
cause a race.
If ring buf is made empty in this case, interrupt will never come, then the
hw queue will be stopped forever, all processes waiting for the pending io
in the queue will hung.
blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
behavior that drivers expect. However, commit 4e68a011428a changed
blk_mq_queue_reinit() to not remap queues for the case of CPU
hotplugging, inadvertently making blk_mq_update_nr_hw_queues() not remap
queues as well. This breaks, for example, NBD's multi-connection mode,
leaving the added hardware queues unused. Fix it by making
blk_mq_update_nr_hw_queues() explicitly remap the queues.
Fixes: 4e68a011428a ("blk-mq: don't redistribute hardware queues on a CPU hotplug event") Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ebe8bddb6e30d7a02775b9972099271fc5910f37)
Conflicts:
block/blk-mq.c
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Currently blk-mq will totally remap hardware context when a CPU hotplug
even happened, which causes major havoc for drivers, as they are never
told about this remapping. E.g. any carefully sorted out CPU affinity
will just be completely messed up.
The rebuild also doesn't really help for the common case of cpu
hotplug, which is soft onlining / offlining of cpus - in this case we
should just leave the queue and irq mapping as is. If it actually
worked it would have helped in the case of physical cpu hotplug,
although for that we'd need a way to actually notify the driver.
Note that drivers may already be able to accommodate such a topology
change on their own, e.g. using the reset_controller sysfs file in NVMe
will cause the driver to get things right for this case.
With the rebuild removed we will simplify retain the queue mapping for
a soft offlined CPU that will work when it comes back online, and will
map any newly onlined CPU to queue 0 until the driver initiates
a rebuild of the queue map.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4e68a011428af3211facd932b4003b3fa3ef4faa)
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Without CONFIG_DEBUG_FS, we run into a link error:
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll':
qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect':
qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect':
qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work':
qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover'
This defines the symbol as a constant in this case, as there is no way to
set it to anything other than zero without DEBUG_FS. In addition, I'm renaming
it to qedi_do_not_recover in order to put it into a driver specific namespace,
as "do_not_recover" is a really bad name for a kernel-wide global identifier
when it is used only in one driver.
Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The call to qedi_setup_int is not updating the return code rc yet rc
is being checked for an error. Fix this by assigning rc to the return
code from the call to qedi_setup_int.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
'conn_info' is malloced in qedi_iscsi_update_conn() and should be freed
before leaving from the error handling cases, otherwise it will cause
memory leak.
Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Although on most systems va_end is a no-op, it is good practice to use
va_end on the function return path, especially since the va_start
documenation states:
"Each invocation of va_start() must be matched by a corresponding
invocation of va_end() in the same function."
Found with static analysis by CoverityScan, CIDs 1389477-1389479
Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: QLogic-Storage-Upstream@cavium.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
log a message when we enter this situation:
1) we already allocated the max number of available grants from hypervisor
and
2) we still need more (but the request fails because of 1)).
Sometimes the lack of grants causes IO hangs in xen_blkfront devices.
Adding this log would help debuging.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
We call skb_cow_data, which is good anyway to ensure we can actually
modify the skb as such (another error from prior). Now that we have the
number of fragments required, we can safely allocate exactly that amount
of memory.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5294b83086cc1c35b4efeca03644cf9d12282e5b) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/macsec.c
While this may appear as a humdrum one line change, it's actually quite
important. An sk_buff stores data in three places:
1. A linear chunk of allocated memory in skb->data. This is the easiest
one to work with, but it precludes using scatterdata since the memory
must be linear.
2. The array skb_shinfo(skb)->frags, which is of maximum length
MAX_SKB_FRAGS. This is nice for scattergather, since these fragments
can point to different pages.
3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff,
which in turn can have data in either (1) or (2).
The first two are rather easy to deal with, since they're of a fixed
maximum length, while the third one is not, since there can be
potentially limitless chains of fragments. Fortunately dealing with
frag_list is opt-in for drivers, so drivers don't actually have to deal
with this mess. For whatever reason, macsec decided it wanted pain, and
so it explicitly specified NETIF_F_FRAGLIST.
Because dealing with (1), (2), and (3) is insane, most users of sk_buff
doing any sort of crypto or paging operation calls a convenient function
called skb_to_sgvec (which happens to be recursive if (3) is in use!).
This takes a sk_buff as input, and writes into its output pointer an
array of scattergather list items. Sometimes people like to declare a
fixed size scattergather list on the stack; othertimes people like to
allocate a fixed size scattergather list on the heap. However, if you're
doing it in a fixed-size fashion, you really shouldn't be using
NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its
frag_list children arent't shared and then you check the number of
fragments in total required.)
Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're
using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will
overflow the heap, and disaster ensues.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: stable@vger.kernel.org Cc: security@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4d6fa57b4dab0d77f4d8e9d9c73d1e63f6fe8fee) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/macsec.c
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.
Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
(cherry picked from commit e6838a29ecb484c97e4efef9429643b9851fba6e) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Paolo Abeni [Thu, 27 Apr 2017 17:29:34 +0000 (19:29 +0200)]
bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.
Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.
If bond0 is configured on top of ipoib devices, with the
following commands:
ifup bond0
for slave in $BOND_SLAVES_LIST; do
ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>
we will obtain a skb_under_panic() with a similar call trace:
skb_push+0x3d/0x40
push_pseudo_header+0x17/0x30 [ib_ipoib]
ipoib_hard_header+0x4e/0x80 [ib_ipoib]
arp_create+0x12f/0x220
arp_send_dst.part.19+0x28/0x50
arp_solicit+0x115/0x290
neigh_probe+0x4d/0x70
__neigh_event_send+0xa7/0x230
neigh_resolve_output+0x12e/0x1c0
ip_finish_output2+0x14b/0x390
ip_finish_output+0x136/0x1e0
ip_output+0x76/0xe0
ip_local_out+0x35/0x40
ip_send_skb+0x19/0x40
ip_push_pending_frames+0x33/0x40
raw_sendmsg+0x7d3/0xb50
inet_sendmsg+0x31/0xb0
sock_sendmsg+0x38/0x50
SYSC_sendto+0x102/0x190
SyS_sendto+0xe/0x10
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().
The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard
header").
Reported-by: Norbert P <noe@physik.uzh.ch> Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len") Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 19cdead3e2ef8ed765c5d1ce48057ca9d97b5094)
Rama Nichanamatlu [Tue, 27 Jun 2017 12:34:16 +0000 (05:34 -0700)]
[PATCH] RDS: Print failed rdma op details if failure is remote access
Improves diagnosability when RDMA op fails allowing this print to be
matched with prints on the responder side; which prints RDMA keys
which are prematurely purged by the owning process.
Rama Nichanamatlu [Tue, 27 Jun 2017 12:22:46 +0000 (05:22 -0700)]
[PATCH] RDS: When RDS socket is closed, print unreleased MR's
Improves diagnosability when the requester RDMA operation fails with
remote access error. This commit prints RDMA credentials which are
prematurely purged by owning process.
Qing Huang [Thu, 18 May 2017 23:33:53 +0000 (16:33 -0700)]
RDMA/core: not to set page dirty bit if it's already set.
This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.
8K + 256 (8448B) is an important message size for the RDBMS workload. Since
Infiniband supports scatter-gather in hardware, there is no reason to
fragment each RDS message into PAGE_SIZE work requests. Hence, RDS fragment
sizes up to 16K has been introduced.
Fixes: 23f90cccfba4 ("RDS: fix the sg allocation based on actual msg sz")
Previous behavior was allocating a contiguous memory buffer, corresponding
to the size of the RDS message. Although this was functional correct, it
introduced hard pressure on the memory allocation system, which was
not needed.
This commit fixes the drawback introduced by only allocating
the buffer according to RDS_MAX_FRAG_SIZE.
Sagi Grimberg [Tue, 6 Oct 2015 16:52:37 +0000 (19:52 +0300)]
xprtrdma: Don't require LOCAL_DMA_LKEY support for fastreg
There is no need to require LOCAL_DMA_LKEY support as the
PD allocation makes sure that there is a local_dma_lkey. Also
correctly set a return value in error path.
This caused a NULL pointer dereference in mlx5 which removed
the support for LOCAL_DMA_LKEY.
Fixes: bb6c96d72879 ("xprtrdma: Replace global lkey with lkey local to PD") Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Orabug: 26151481
Alexander Duyck [Mon, 30 Jan 2017 20:29:35 +0000 (12:29 -0800)]
i40e/i40evf: Add support for mapping pages with DMA attributes
This patch adds support for DMA_ATTR_SKIP_CPU_SYNC and
DMA_ATTR_WEAK_ORDERING. By enabling both of these for the Rx path we
are able to see performance improvements on architectures that implement
either one due to the fact that page mapping and unmapping only has to
sync what is actually being used instead of the entire buffer. In addition
by enabling the weak ordering attribute enables a performance improvement
for architectures that can associate a memory ordering with a DMA buffer
such as Sparc.
Change-ID: If176824e8231c5b24b8a5d55b339a6026738fc75 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26396243
Local modifications to account for dma_attr data type difference.
(cherry picked from commit 59605bc09630c2b577858c371edf89c099b5f925) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Christoph Hellwig [Fri, 23 Jun 2017 17:41:41 +0000 (10:41 -0700)]
block: defer timeouts to a workqueue
Timer context is not very useful for drivers to perform any meaningful abort
action from. So instead of calling the driver from this useless context
defer it to a workqueue as soon as possible.
Note that while a delayed_work item would seem the right thing here I didn't
dare to use it due to the magic in blk_add_timer that pokes deep into timer
internals. But maybe this encourages Tejun to add a sensible API for that to
the workqueue API and we'll all be fine in the end :)
Contains a major update from Keith Bush:
"This patch removes synchronizing the timeout work so that the timer can
start a freeze on its own queue. The timer enters the queue, so timer
context can only start a freeze, but not wait for frozen."
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 287922eb0b186e2a5bf54fdd04b734c25c90035c)
Rob Gardner [Fri, 9 Jun 2017 04:36:24 +0000 (00:36 -0400)]
sparc64: Set valid bytes of misaligned no-fault loads
If a misaligned no-fault load (ldm* from ASI 0x82, primary no fault)
crosses a page boundary, and one of the pages causes an MMU miss
that cannot be resolved, then the kernel must load bytes from the
valid page into the high (or low) bytes of the destination register,
and must load zeros into the low (or high) bytes of the register.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Reviewed-by: Steve Sistare steven.sistare@oracle.com Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Babu Moger [Wed, 21 Jun 2017 23:22:09 +0000 (17:22 -0600)]
fs/fuse: Fix for correct number of numa nodes
When fuse filesystem is mounted it sets up data structure
for all the available numa nodes(with -o numa). However,
it uses nr_node_ids which is set to MAX_NUMNODES(16). This
causes following panic when kmalloc_node is called.
Pavel Tatashin [Thu, 15 Jun 2017 14:40:59 +0000 (10:40 -0400)]
sparc64: broken %tick frequency on spitfire cpus
After early boot time stamps project the %tick frequency is detected
incorrectly on spittfire cpus.
We must use cpuid of boot cpu to find corresponding cpu node in OpenBoot,
and extract clock-frequency property from there.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit eea9833453bd39e2f35325abb985d00486c8aa69) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Thu, 15 Jun 2017 14:40:58 +0000 (10:40 -0400)]
sparc64: use prom interface to get %stick frequency
We initialize time early, we must use prom interface instead of open
firmware driver, which is not yet initialized.
Also, use prom_getintdefault() instead of prom_getint() to be compatible
with the code before early boot timestamps project.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit fca4afe400cb68fe5a7f0a97fb1ba5cfdcb81675) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Mon, 12 Jun 2017 20:41:48 +0000 (16:41 -0400)]
sparc64: optimize functions that access tick
Replace read tick function pointers with the new hot-patched get_tick().
This optimizes the performance of functions such as: sched_clock()
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit eae3fc9871111e9bbc77dad5481a3e805e02ac46) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Mon, 12 Jun 2017 20:41:47 +0000 (16:41 -0400)]
sparc64: add hot-patched and inlined get_tick()
Add the new get_tick() function that is hot-patched during boot based on
processor we are booting on.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit 4929c83a6ce6584cb64381bf1407c487f67d588a) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Mon, 12 Jun 2017 20:41:46 +0000 (16:41 -0400)]
sparc64: initialize time early
In Linux it is possible to configure printk() to output timestamp next to
every line. This is very useful to determine the slow parts of the boot
process, and also to avoid regressions, as boot time is visiable to
everyone.
Also, there are scripts that change these time stamps to intervals.
However, on larger machines these timestamps start appearing many seconds,
and even minutes into the boot process. This patch gets stick-frequency
property early from OpenBoot, and uses its value to initialize time stamps
before the first printk() messages are printed.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit 83e8eb99d908da78e6eff7dd141f26626fe01d12) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Mon, 12 Jun 2017 20:41:45 +0000 (16:41 -0400)]
sparc64: improve modularity tick options
This patch prepares the code for early boot time stamps by making it more
modular.
- init_tick_ops() to initialize struct sparc64_tick_ops
- new sparc64_tick_ops operation get_frequency() which returns a
frequency
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit 89108c3423e8047cd0da73182ea09b9da190b57e) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Mon, 12 Jun 2017 20:41:44 +0000 (16:41 -0400)]
sparc64: optimize loads in clock_sched()
In clock sched we now have three loads:
- Function pointer
- quotient for multiplication
- offset
However, it is possible to improve performance substantially, by
guaranteeing that all three loads are from the same cacheline.
By moving these three values first in sparc64_tick_ops, and by having
tick_operations 64-byte aligned we guarantee this.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit 178bf2b9a20e866677bbca5cb521b09a8498c1d7) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Mon, 12 Jun 2017 20:41:43 +0000 (16:41 -0400)]
sparc64: show time stamps from zero
On most platforms, time is shown from the beginning of boot. This patch is
adding offset to sched_clock() for SPARC, to also show time from 0.
This means we will have one more load, but we saved one in an ealier patch.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit b5dd4d807f0fe7da67c5cc67b2ec681b60e4994b) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit b8a83fcb78c859b99807af4c8b0ab09f0f827a40) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Mon, 12 Jun 2017 20:41:41 +0000 (16:41 -0400)]
sparc64: remove trailing white spaces
A few changes that were reported by checkpatch, removed all trailing white
spaces in these two files.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 24401250
Orabug: 25637776
(cherry picked from commit 68a792174d7f67c7d2108bf1cc55ab8a63fc4678) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Wed, 31 May 2017 15:25:25 +0000 (11:25 -0400)]
sparc64: delete old wrap code
The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0197e41ce70511dc3b71f7fefa1a676e2b5cd60b) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Wed, 31 May 2017 15:25:24 +0000 (11:25 -0400)]
sparc64: new context wrap
The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap. This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a memory location in every process.
Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.
Several approaches were explored before settling on this one:
Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).
This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.
Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.
The approach in this patch is the simplest and has almost no impact on
runtime. We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a0582f26ec9dfd5360ea2f35dd9a1b026f8adda0) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7a5b4bbf49fe86ce77488a70c5dccfe2d50d7a2d) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Wed, 31 May 2017 15:25:22 +0000 (11:25 -0400)]
sparc64: redefine first version
CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c4415235b2be0cc791572e8e7f7466ab8f73a2bf) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Wed, 31 May 2017 15:25:21 +0000 (11:25 -0400)]
sparc64: combine activate_mm and switch_mm
The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 14d0334c6748ff2aedb3f2f7fdc51ee90a9b54e7) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Pavel Tatashin [Wed, 31 May 2017 15:25:20 +0000 (11:25 -0400)]
sparc64: reset mm cpumask after wrap
After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:
1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 588974857359861891f478a070b1dc7ae04a3880) Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Borislav Petkov [Mon, 23 Jan 2017 18:35:07 +0000 (19:35 +0100)]
x86/ras/therm_throt: Do not log a fake MCE for thermal events
We log a fake bank 128 MCE to note that we're handling a CPU thermal
event. However, this confuses people into thinking that their hardware
generates MCEs. Hijacking MCA for logging thermal events is a gross
misuse anyway and it shouldn't have been done in the first place. And
besides we have other means for dealing with thermal events which are
much more suitable.
Trond Myklebust [Mon, 1 Aug 2016 17:36:08 +0000 (13:36 -0400)]
SUNRPC: Handle EADDRNOTAVAIL on connection failures
If the connect attempt immediately fails with an EADDRNOTAVAIL error, then
that means our choice of source port number was bad.
This error is expected when we set the SO_REUSEPORT socket option and we
have 2 sockets sharing the same source and destination address and port
combinations.
Kris Van Hees [Mon, 12 Jun 2017 13:33:05 +0000 (09:33 -0400)]
dtrace: add kprobe-unsafe addresses to FBT blacklist
By means of the newly introduced API to add entries to the FBT
blacklist, we make sure to register addresses that are unsafe for
kprobes with the FBT blacklist because they are unsafe there also.
Orabug: 26190412 Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Kris Van Hees [Mon, 12 Jun 2017 13:29:16 +0000 (09:29 -0400)]
dtrace: convert FBT blacklist to RB-tree
The blacklist for FBT was implemented as a sorted list, populated from
a static list of functions. In order to allow functions to be added
from other places (i.e. programmatically), it has been converted to an
RB-tree with an API to add functions and to traverse the list. It is
still possible to add functions by address or to add them by symbol
name, to be resolved into the corresponding address.
Orabug: 26190412 Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Sanath Kumar [Tue, 20 Jun 2017 03:17:29 +0000 (22:17 -0500)]
sparc64: Enable MGAG200 driver support
This driver enables video console on T7 systems that use
MGA G200e video device. This console can be used to view
kernel boot prints and login to the system.
Reviewed-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com> Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Orabug: 26170808 Reviewed-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Tom Hromatka [Wed, 21 Jun 2017 14:46:53 +0000 (08:46 -0600)]
memory: sparc64: Add privileged ADI driver
This patch adds an ADI driver for reading/writing MCD versions
using physical addresses from privileged user space processes.
This file maps linearly to physical memory at a ratio of
1:adi_blksz. A read (or write) of offset K in the file operates
upon physical address K * adi_blksz. The version information
is encoded as one version per byte. Intended consumers are
makedumpfile and crash.
Orabug: 2617080 Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com> Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Tom Hromatka [Wed, 21 Jun 2017 14:45:28 +0000 (08:45 -0600)]
sparc64: Export the adi_state structure
Orabug: 26170808 Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Jag Raman [Wed, 21 Jun 2017 15:23:50 +0000 (11:23 -0400)]
sparc64: sunvdc: skip vdisk response validation upon error
Skip validating the vdisk IO response from vdisk server, if IO
request has failed.
sunvdc checks if the size of the request processed by the
server matches with the size of request sent by vdc. This
is to ensure that partial IO completions are caught, since
it's not expected. In the case where the server reports an
error, it could set the size of IO processed to zero.
Therefore, validating the size of request processed in the
case of an error could mis-classify the problem.
Introduce DAX2 support in the driver. This involves negotiating the right
version with hypervisor as well as exposing a new INIT_V2 ioctl. This new
ioctl will return failure if DAX2 is not present on the system, otherwise
it will attempt to initialize the DAX2. A user should first call INIT_V2
and on failure call INIT_V1. See Documentation/sparc/dax.txt for more
detail.
Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com> Reviewed-by: Sanath Kumar <sanath.s.kumar@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Dave Aldridge [Tue, 30 May 2017 14:59:02 +0000 (08:59 -0600)]
sparc64: Exclude perf user callchain during critical sections
This fixes another cause of random segfaults and bus errors that
may occur while running perf with the callgraph (-g) option.
Critical sections beginning with spin_lock_irqsave() raise the
interrupt level to PIL_NORMAL_MAX (14) and intentionally do not block
performance counter interrupts, which arrive at PIL_NMI (15). So perf
code must be very careful about what it does since it might execute in
the middle of one of these critical sections. In particular, the
perf_callchain_user() path is problematic because it accesses user
space and may cause TLB activity as well as faults as it unwinds the
user stack.
One particular critical section occurs in switch_mm:
If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active tsb for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.
Some potential solutions are:
1) Make critical sections run at PIL_NMI instead of PIL_NORMAL_MAX.
This would certainly eliminate the problem, but it would also prevent
perf from having any visibility into code running in these critical
sections, and it seems clear that PIL_NORMAL_MAX is used for just
this reason.
2) Protect this particular critical section by masking all interrupts,
either by setting %pil to PIL_NMI or by clearing pstate.ie around the
calls to load_secondary_context() and tsb_context_switch(). This approach
has a few drawbacks:
- It would only address this particular critical section, and would
have to be repeated in several other known places. There might be
other such critical sections that are not known.
- It has a performance cost which would be incurred at every context
switch, since it would require additional accesses to %pil or
%pstate.
- Turning off pstate.ie would require changing __tsb_context_switch(),
which expects to be called with pstate.ie on.
3) Avoid user space MMU activity entirely in perf_callchain_user() by
implementing a new copy_from_user() function that accesses the user
stack via physical addresses. This works, but requires quite a bit of
new code to get it to perform reasonably, ie, caching of translations,
etc.
4) Allow the perf interrupt to happen in existing critical sections as
it does now, but have perf code detect that this is happening, and
skip any user callchain processing. This approach was deemed best, as
the change is extremely localized and covers both known and unknown
instances of perf interrupting critical sections. Perf has interrupted
a critical section when %pil == PIL_NORMAL_MAX at the time of the perf
interrupt.
Ordinarily, a function that has a pt_regs passed in can simply examine
(reg->tstate & TSTATE_PIL) to extract the interrupt level that was in
effect at the time of the perf interrupt. However, when the perf
interrupt occurs while executing in the kernel, the pt_regs pointer is
replaced with task_pt_regs(current) in perf_callchain(), and so
perf_callchain_user() does not have access to the state of the machine
at the time of the perf interrupt. To work around this, we check
(regs->tstate & TSTATE_PIL) in perf_event_nmi_handler() before calling
in to the arch independent part of perf, and temporarily change the
event attributes so that user callchain processing is avoided. Though
a user stack sample is not collected, this loss is not statistically
significant. Kernel call graph collection is not affected.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Anthony Yznaga [Tue, 13 Jun 2017 20:47:06 +0000 (13:47 -0700)]
sparc64: rtrap must set PSTATE.mcde before handling outstanding user work
The kernel must execute with PSTATE.mcde=1 for ADI version checking to
be enabled when the kernel reads or writes user memory mapped with ADI
enabled using versioned addresses. If PSTATE.mcde=0 then the MMU
interprets version bits in an address as address bits, and an access
attempt results in a data access exception. Until now setting
PSTATE.mcde=1 in the kernel has been handled only by patching etrap to
ensure that is set on entry into the kernel. However, there are code
paths in rtrap that overwrite PSTATE and inadvertently clear PSTATE.mcde
before additional execution happens in the kernel.
rtrap is executed to exit the kernel and return to user mode execution.
Before restoring registers and returning to user mode, rtrap checks for
work to do. The check is done with interrupts disabled, and if there is
work to do, then interrupts are enabled before calling a function to
complete the work after which interrupts are disabled again and the
check is repeated. Interrupts are disabled and enabled by overwriting
PSTATE. Possible work includes (but is not limited to) preemption,
signal delivery, and writing out buffered user register windows to the
stack. All of these may lead to accessing user addresses. In the case
of preemption, a resumed thread will run with PSTATE.mcde=0 until it
completes a return to user mode or is rescheduled on a CPU where
PSTATE.mcde is set. If the thread accesses ADI-enabled user memory with
a versioned address (e.g. to complete some I/O) in that timeframe then
the access will fail. To fix the problem, patch rtrap to set
PSTATE.mcde when interrupts are enabled before handling the work.
Orabug: 25853545 Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Shannon Nelson [Wed, 14 Jun 2017 22:43:37 +0000 (15:43 -0700)]
sunvnet: restrict advertized checksum offloads to just IP
As much as we'd like to play well with others, we really aren't
handling the checksums on non-IP protocol packets very well. This
is easily seen when trying to do TCP over ipv6 - the checksums are
garbage.
Here we restrict the checksum feature flag to just IP traffic so
that we aren't given work we can't yet do.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit 7e9191c54a36c864b901ea8ce56dc42f10c2f5ae) Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>
Allen Pais [Wed, 21 Jun 2017 09:48:25 +0000 (15:18 +0530)]
arch/sparc: Avoid DCTI Couples
Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated per
Oracle SPARC Architecture notes below(Section 6.3.4.7 - DCTI Couples).
"A delayed control transfer instruction (DCTI) in the delay slot of another
DCTI is referred to as a DCTI couple. The use of DCTI couples is deprecated
in the Oracle SPARC Architecture; no new software should place a DCTI in
the delay slot of another DCTI, because on future Oracle SPARC Architecture
implementations DCTI couples may execute either slowly or differently than
the programmer assumes it will."
Babu Moger [Wed, 11 Jan 2017 00:13:02 +0000 (16:13 -0800)]
net/rds: Fix minor linker warnings
Fixes: fcdaab66 {IB/{core,ipoib},net/{mlx4,rds}}: Mark unload_allowed
as __initdata variable
Seeing this warning while building the kernel. Fix it.
MODPOST 1555 modules
WARNING: net/rds/rds_rdma.o(.text+0x1d8): Section mismatch in
reference from the function rds_rdma_init() to the variable
.init.data:unload_allowed
The function rds_rdma_init() references
the variable __initdata unload_allowed.
This is often because rds_rdma_init lacks a __initdata
annotation or the annotation of unload_allowed is wrong.
Babu Moger [Wed, 30 Mar 2016 20:28:41 +0000 (13:28 -0700)]
drivers/usb: Skip auto handoff for TI and RENESAS usb controllers
I have never seen auto handoff working on TI and RENESAS xhci
cards. Eventually, we force handoff. This code forces the handoff
unconditionally. It saves 5 seconds boot time for each card.
Added the vendor/device id checks for the card which I have tested.
Vijay Kumar [Thu, 6 Oct 2016 19:06:11 +0000 (12:06 -0700)]
usb/core: Added devspec sysfs entry for devices behind the usb hub
Grub finds incorrect of_node path for devices behind usb hub.
Added devspec sysfs entry for devices behind usb hub so that
right of_node path is returned during grub sysfs walk for these
devices.
Vijay Kumar [Fri, 28 Oct 2016 20:59:57 +0000 (13:59 -0700)]
USB: core: let USB device know device node
Although most of USB devices are hot-plug's, there are still some devices
are hard wired on the board, eg, for HSIC and SSIC interface USB devices.
If these kinds of USB devices are multiple functions, and they can supply
other interfaces like i2c, gpios for other devices, we may need to
describe these at device tree.
In this commit, it uses "reg" in dts as physical port number to match
the phyiscal port number decided by USB core, if they are the same,
then the device node is for the device we are creating for USB core.
Signed-off-by: Peter Chen <peter.chen@freescale.com> Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 69bec725985324e79b1c47ea287815ac4ddb0521)
Conflicts:
include/linux/usb/of.h
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com>
Orabug: 24785721 Signed-off-by: Allen Pais <allen.pais@oracle.com>
Allen Pais [Wed, 21 Jun 2017 09:16:50 +0000 (14:46 +0530)]
Improves clear_huge_page() using work queues
The idea is to exploit the parallelism available on large
multicore systems such as SPARC T7 systems to clear huge pages
in parallel with multiple worker threads.
Reviewed-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Kishore Pusukuri <kishore.kumar.pusukuri@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com>