Alexander Duyck [Wed, 11 May 2011 07:18:47 +0000 (07:18 +0000)]
ixgbe: add support for displaying ntuple filters via the nfc interface
This code adds support for displaying the filters that were added via the
nfc interface. This is primarily to test the interface for now, but I am
also looking into the feasibility of moving all of the ntuple filter code
in ixgbe over to the nfc interface since it seems to be better implemented.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3e05334f8be83e8529f1cbf4f4dea06a4d51d676)
Alexander Duyck [Wed, 11 May 2011 07:18:41 +0000 (07:18 +0000)]
ixgbe: add basic support for setting and getting nfc controls
This change adds basic support for the obtaining of RSS ring counts.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 91cd94bfe4f00fccf692e32dfa86a9fad0d61280)
Alexander Duyck [Wed, 11 May 2011 07:18:36 +0000 (07:18 +0000)]
ixgbe: update perfect filter framework to support retaining filters
This change is meant to update the internal framework of ixgbe so that
perfect filters can be stored and tracked via software.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c04f6ca84866ef207e009a08e4c34ca241df7aa2)
Alexander Duyck [Fri, 20 May 2011 07:36:17 +0000 (07:36 +0000)]
ixgbe: fix flags relating to perfect filters to support coexistence
I am removing the requirement that Ntuple filters have the same
number of queues and requirements as ATR. As a result this change will
make it so that all the Ntuple flag does is disable ATR for now.
This change fixes an issue in which we were incorrectly re-enabling ATR
when we exited perfect filter mode. This was due to the fact that the
logic assumed RSS and DCB were mutually exclusive which is no longer the
case.
To correct this we just need to add a check to guarantee DCB is disabled
before re-enabling ATR.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 03ecf91aae757eeb70763a3393227c4597c87b23)
Alexander Duyck [Wed, 11 May 2011 07:18:26 +0000 (07:18 +0000)]
ixgbe: remove ntuple filtering
Due to numerous issues in ntuple filters it has been decided to move the
interface over to the network flow classification interface. As a first
step to achieving this I first need to remove the old ntuple interface.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b29a21694f7d12e40537e1e587ec47725849769b)
Vasu Dev [Wed, 11 May 2011 05:41:46 +0000 (05:41 +0000)]
ixgbe: setup per CPU PCI pool for FCoE DDP
Currently single PCI pool used across all CPUs and that
doesn't scales up as number of CPU increases, so this
patch adds per CPU PCI pool to setup udl and that aligns
well from FCoE stack as that already has per CPU exch locking.
Adds per CPU PCI alloc setup and free in
ixgbe_fcoe_ddp_pools_alloc and ixgbe_fcoe_ddp_pools_free,
use CPU specific pool during DDP setup.
Re-arranged ixgbe_fcoe struct to have fewer holes
along with adding pools ptr using pahole.
Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit dadbe85ac47f180fa1e3ef93b276ab7938b1a98b)
Emil Tantilov [Sat, 7 May 2011 07:40:20 +0000 (07:40 +0000)]
ixgbe: add support for Dell CEM
This patch adds support for Dell CEM (Comprehensive Embedded Management)).
This consists of informing the management firmware of the driver version
during probe on 82599 and X540 HW.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Evan Swanson <evan.swanson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9612de92e023bff0d1cd5725ee65293accc70c56)
John Fastabend [Tue, 26 Apr 2011 07:26:30 +0000 (07:26 +0000)]
ixgbe: DCB and perfect filters can coexist
Now flow directors perfect filters features can coexist with DCB.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b1bbdb206a52b7eb13c2e57ee794b90618f61002)
John Fastabend [Tue, 26 Apr 2011 07:26:25 +0000 (07:26 +0000)]
ixgbe: fix bit mask for DCB version
This bit mask is wrong DCBX_HOST is always set. It was missed up
until now because lldpad reprograms the device on a link
event. However this is still wrong and it is best not to be
mis-configured for some time immediately following ixgbe_up().
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6f70f6acc7321f9f6501157b29d2db8c475cf3b2)
John Fastabend [Tue, 26 Apr 2011 07:26:19 +0000 (07:26 +0000)]
ixgbe: setup redirection table for multiple packet buffers
Setup RSS redirection table to be compatible with multiple packet
buffers. Currently, this works on 82599 devices because the RSS
redirection index is masked by the number of queues per packet
buffer.
This sets the cap on the RSS table to maxq.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 86b4db3bcce714d6bdd8c056158821301624bf00)
John Fastabend [Tue, 26 Apr 2011 07:26:14 +0000 (07:26 +0000)]
ixgbe: DCB 82598 devices, tx_idx and rx_idx swapped
The tx_idx and rx_idx values are swapped on 82598 devices
with DCB enabled.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit aba70d5e6c1c188fe45c1a782060440b6c2ea759)
John Fastabend [Tue, 26 Apr 2011 07:26:08 +0000 (07:26 +0000)]
ixgbe: DCB use existing TX and RX queues
The number of TX and RX queues allocated depends on the device
type, the current features set, online CPUs, and various
compile flags.
To enable DCB with multiple queues and allow it to coexist with
all the features currently implemented it has to setup a valid
queue count. This is done at init time using the FDIR and RSS
max queue counts and allowing each TC to allocate a queue per
CPU.
DCB will now use available queues up to (8 x TCs) this is somewhat
arbitrary cap but allows DCB to use up to 64 queues. Its easy to
increase this later if that is needed.
This is prep work to enable Flow Director with DCB. After this
DCB can easily coexist with existing features and no longer
needs its own DCB feature ring.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit e901acd6fa5538436e08e8a862dd2c080297f852)
John Fastabend [Tue, 3 May 2011 02:26:48 +0000 (02:26 +0000)]
ixgbe: configure minimal packet buffers to support TC
ixgbe devices support different numbers of packet buffers either
8 or 4. Here we only allocate the minimal number of packet
buffers required to implement the net_devices number of traffic
classes.
Fewer traffic classes allows for larger packet buffers in
hardware. Also more Tx/Rx queues can be given to each
traffic class.
This patch is mostly about propagating the number of traffic
classes through the init path. Specifically this adds the 4TC
cases to the MRQC and MTQC setup routines. Also ixgbe_setup_tc()
was sanitized to handle other traffic class value.
Finally changing the number of packet buffers in the hardware
requires the device to reinit. So this moves the reinit work
from DCB into the main ixgbe_setup_tc() routine to consolidate
the reset code. Now dcbnl_xxx ops call ixgbe_setup_tc() to
configure packet buffers if needed.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8b1c0b24d9afd4a59a8aa9c778253bcff949395a)
John Fastabend [Tue, 26 Apr 2011 07:25:58 +0000 (07:25 +0000)]
ixgbe: consolidate MRQC and MTQC handling
The MRQC and MTQC registers are configured in the main
setup path but are also reconfigured in the DCB setup
path. The DCB path fixes the DCB configuration by configuring
the SECTXMINIFG gap which is required for DCB pause
to operate correctly.
This patch reduces the duplicate code and does all setup
in ixgbe_setup_mtqc() and ixgbe_setup_mrqc().
Additionally, this removes the IXGBE_QDE. This write never
set the WRITE bit in the register so the write was not
actually doing anything. Also this was to clear the register
but, it is never set and defaults to zero. If this is
needed for SRIOV it should be added correctly in a follow
up patch. But it's never been working so removing it here
should be OK.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 72a32f1f3f68b7d95e7151b5f88831fb9906416e)
John Fastabend [Mon, 2 May 2011 12:34:10 +0000 (12:34 +0000)]
ixgbe: consolidate packet buffer allocation
Consolidate packet buffer allocation currently being
done in the DCB path and main path. This allows the
feature set and packet buffer requirements to be done
once.
This is prep work to allow DCB to coexist with other
features namely, flow director.
CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 80605c6513207344d00b32e8d1e64bd34fdf1358)
John Fastabend [Tue, 26 Apr 2011 10:05:14 +0000 (10:05 +0000)]
ixgbe: dcbnl reduce duplicated code and indentation
Replace duplicated code in if/else branches with single
check and ixgbe_init_interrupt_scheme().
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 1fcd86b51179518f7e69164e37353fb59cd6301e)
- unify vlan and nonvlan rx path
- kill adapter->vlgrp and ixgbevf_vlan_rx_register
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dadcd65f52921456112183fde543fc214bb0a227)
Stephen Hemminger [Thu, 9 Jun 2011 02:58:39 +0000 (02:58 +0000)]
ixgbevf: remove unnecessary ampersands
Use standard format for net_device_ops (without &)
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Greg Rose <Gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c12db7695e1f52b27108c3dfde3cf655a0368c58)
When the Manageability Engine (ME) is enabled on 82579, it periodically
accesses some MAC CSR registers. There is an arbiter in hardware which
prevents simultaneous access of these registers by the host software, i.e.
the driver. There is a hardware bug in the aribter that signals a host
access of the registers later than it actually happens. A write of the
Transmit or Receive Descriptor Tail register could result in an incorrect
value if the driver and ME perform simultaneous accesses which could result
in an access to an invalid memory address. This would return an
Unsupported Request which could hang the hardware. Workaround the issue by
checking the FWSM register bit24 which is set by ME before it accesses the
MAC CSR registers.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c6e7f51e73c1bc6044bce989ec503ef2e4758d55)
Bruce Allan [Fri, 22 Jul 2011 06:21:56 +0000 (06:21 +0000)]
e1000e: Spurious interrupts & dropped packets with 82577/8/9 in half-duplex
On 82577/8/9 in half-duplex when a received packet is passed from the PHY
to the MAC, if too many preamble octects are stripped from the packet
before arriving at the MAC, it can be misintrepeted as an in-band message
rather than an actual frame. For example, if the frame contents resembled
an interrupt request in-band message, it would trigger a false interrupt.
In most cases, the packet is just dropped.
By reducing the number of preamble octets stripped from the beginning of
the frame when passing it from the PHY to the MAC, the MAC will interpret
the frame properly.
An additional uses of the magic PHY_REG(770, 16) have been updated with a
define introduced with this patch.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 1d2101a712b3b7281a19ff6d7bfc16c2ce9d3998)
Bruce Allan [Fri, 22 Jul 2011 06:22:02 +0000 (06:22 +0000)]
e1000e: increase driver version number
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 12440928dca77eccc8a793cf3cd83d017abbd7d6)
Bruce Allan [Fri, 29 Jul 2011 05:53:07 +0000 (05:53 +0000)]
e1000e: alternate MAC address update
If word 0x37 in the EEPROM is 0xFFFF _or_ 0x0000, then there is no
alternate MAC address in the EEPROM.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 244735f6ebccbf72a283db89472309f770e14c80)
Bruce Allan [Fri, 22 Jul 2011 06:21:35 +0000 (06:21 +0000)]
e1000e: do not disable receiver on 82574/82583
Due to a hardware erratum, the receiver on 82574 and 82583 should not be
stopped once it has been started.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7f99ae633884043c70f4cc4a03f43dad0f0ecba2)
Bruce Allan [Fri, 29 Jul 2011 05:52:51 +0000 (05:52 +0000)]
e1000e: minor re-order of #include files
The recent commit a6b7a407 when back-ported to the out-of-tree e1000e
driver caused a compilation error on older kernels which required a
re-ordering of the #include files. This cosmetic patch syncs the two
drivers for easier maintainability.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9fb7a5f77b26dedfcfa4e3a36fe207f818662bee)
Bruce Allan [Fri, 22 Jul 2011 06:21:41 +0000 (06:21 +0000)]
e1000e: remove unnecessary check for NULL pointer
The array shadow_ram is never NULL.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b9e06f70dc186f8353cc593f2b4609383b3be7a9)
after review of all intel drivers, found several instances where
drivers had the incorrect pattern of:
memory mapped write();
delay();
which should always be:
memory mapped write();
write flush(); /* aka memory mapped read */
delay();
explanation:
The reason for including the flush is that writes can be held
(posted) in PCI/PCIe bridges, but the read always has to complete
synchronously and therefore has to flush all pending writes to a
device. If a write is held and followed by a delay, the delay
means nothing because the write may not have reached hardware
(maybe even not until the next read)
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 945a51517cc0bd9e461f2018624dfc1faef9ddee)
Jeff Kirsher [Tue, 12 Jul 2011 16:10:12 +0000 (16:10 +0000)]
e1000e: use GFP_KERNEL allocations at init time
In process and sleep allowed context, favor GFP_KERNEL allocations over
GFP_ATOMIC ones.
-v2: fixed checkpatch.pl warnings
CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Ben Greear <greearb@candelatech.com> CC: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c2fed9965c60e1f989f57889357c557f7b907ab7)
This patch adds support for the Jumbo Frames feature on 82583 devices.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a3d72d5d01b82a86f3b16ca1918d2040b1acba8c)
Eric Dumazet [Mon, 11 Jul 2011 10:00:40 +0000 (10:00 +0000)]
e1000e: remove e1000_queue_stats
struct e1000_queue_stats is not used, lets remove it
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 995181b9529adbfecd6882c734ee702b5ed9226c)
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3e714ad3c2a07ee120044b72222cc20c14959efb)
Jon Mason [Mon, 27 Jun 2011 07:43:47 +0000 (07:43 +0000)]
e1000e: remove unnecessary reads of PCI_CAP_ID_EXP
The PCIE capability offset is saved during PCI bus walking. It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it.
Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 353064de8af3bf46757376db66c29fa87a9fda3a)
Bruce Allan [Thu, 19 May 2011 01:53:41 +0000 (01:53 +0000)]
e1000e: update driver version
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b3ccf26704f80cd9959c4d2a83c2278bcc93425c)
Bruce Allan [Fri, 13 May 2011 07:20:14 +0000 (07:20 +0000)]
e1000e: Clear host wakeup bit on 82577/8 without touching PHY page 800
The Host Wakeup Active bit in the PHY Port General Configuration register
(page 769 register 17) must be cleared after every PHY reset to prevent an
unexpected wake signal from the PHY. Originally, this was accomplished by
simply reading the PHY Wakeup Control register on page 800 which clears the
Host Wakeup Active bit as a side-effect. Unfortunately, a hardware bug on
the 82577 and 82578 PHY can cause unexpected behavior when registers on
page 800 are accessed while in gigabit mode.
This patch changes the remaining instances when the Host Wakeup Active bit
needs to be cleared while possibly in gigabit mode by accessing the Port
General Configuration register directly instead of accessing any register
on page 800.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3ebfc7c9a6177794e0a1635483bd64268bed5d3c)
Bruce Allan [Fri, 13 May 2011 07:20:09 +0000 (07:20 +0000)]
e1000e: access multiple PHY registers on same page at the same time
Doing a PHY page select can take a long time, relatively speaking. This
can cause a significant delay when updating a number of PHY registers on
the same page by unnecessarily setting the page for each PHY access. For
example when going to Sx, all the PHY wakeup registers (WUC, RAR[], MTA[],
SHRAR[], IP4AT[], IP6AT[], etc.) on 82577/8/9 need to be updated which
takes a long time which can cause issues when suspending.
This patch introduces new PHY ops function pointers to allow callers to
set the page directly and do any number of PHY accesses on that page.
This feature is currently only implemented for 82577, 82578 and 82579
PHYs for both the normally addressed registers as well as the special-
case addressing of the PHY wakeup registers on page 800. For the latter
registers, the existing function for accessing the wakeup registers has
been divided up into three- 1) enable access to the wakeup register page,
2) perform the register access and 3) disable access to the wakeup register
page. The two functions that enable/disable access to the wakeup register
page are necessarily available to the caller so that the caller can restore
the value of the Port Control (a.k.a. Wakeup Enable) register after the
wakeup register accesses are done.
All instances of writing to multiple PHY registers on the same page are
updated to use this new method and to acquire any PHY locking mechanism
before setting the page and performing the register accesses, and release
the locking mechanism afterward.
Some affiliated magic number cleanup is done as well.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2b6b168d52aa044363647cfff8bda5cef8068ca3)
Bruce Allan [Fri, 13 May 2011 07:20:03 +0000 (07:20 +0000)]
e1000e: do not schedule the Tx queue until ready
Start the Tx queue when the interface is brought up in e1000e_up() but do
not schedule the queue until link is up as detected in the watchdog task
which sets netif_carrier_on.
Also flush the descriptors and clean the Tx and Rx rings before resetting
the hardware when bringing the interface down otherwise there is a small
window where the watchdog task can be triggered with netif_carrier_off
and the Tx ring not yet empty which causes an additional and unnecessary
reset.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 400484fa65ead1bbc3e86ea79e7505182a31bce1)
Bruce Allan [Fri, 13 May 2011 07:19:53 +0000 (07:19 +0000)]
e1000e: log when swflag is cleared unexpectedly on ICH/PCH devices
Since EXTCNF_CTRL.SWFLAG (used in the ownership arbitration of shared
resources, e.g. the PHY shared between the s/w, f/w, and h/w clients)
can be cleared by any of those clients, log a debug message when
software attempts to clear it and it is already cleared unexpectedly.
And since the swflag is cleared by a hardware reset, the driver does
not need to do that, but the mutex acquired when the bit is set must
still be cleared.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c5caf4825b22957e4ad70fd94316e91ce8cfb51c)
Bruce Allan [Fri, 13 May 2011 07:19:48 +0000 (07:19 +0000)]
e1000e: 82579 intermittently disabled during S0->Sx
When repeatedly cycling Sx->S0 states with the network cable unplugged,
the 82579 PHY may not initialize as expected and may require a full power
cycle to recover functionality to the device. Workaround this by testing
access of the PHY registers after resuming; if that returns unexpected
results toggle the LANPHYPC signal to power cycle the PHY.
This is implemented in the new function e1000_resume_workarounds_pchlan()
which calls another new function, e1000_toggle_lanphypc_value_ich8lan(),
which has been created to reduce code duplication (same functionality
required by a previous workaround). Also, e1000e_disable_gig_wol_ich8lan
is now e1000_suspend_workarounds_ich8lan to better reflect what it does.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 99730e4c13c8344b02dd96108945b48d28c14c25)
Bruce Allan [Fri, 13 May 2011 07:19:42 +0000 (07:19 +0000)]
e1000e: disable far-end loopback mode on ESB2
The ESB2 LAN includes a debug feature that enables far-end loopback (FELB)
of the SerDes/Kumeran interface. This feature is activated when receiving
a sequence of symbols that includes a reserved codeword. On a perfect
link, FELB would never be activated. In the presence of bit errors, there
is a very small, but non-zero, probability of FELB being activated.
If the FELB is activated, the SerDes link becomes non-functional and must
be reset. It could also corrupt the switching tables in the switch since
the ESB2 is transmitting packets with a different source MAC address.
This patch disables the FELB feature.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d9b24135b972ccdd5f5174fba06c730e895e6daf)
Eric Dumazet [Tue, 12 Jul 2011 03:08:34 +0000 (20:08 -0700)]
net: introduce __netdev_alloc_skb_ip_align
RX rings should use GFP_KERNEL allocations if possible, add
__netdev_alloc_skb_ip_align() helper to ease this.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4915a0de43c3e9aef92005c1f94a8ff3a6cfced5)
Dan Carpenter [Thu, 26 Jan 2012 13:55:16 +0000 (16:55 +0300)]
xfs: fix acl count validation in xfs_acl_from_disk()
We applied a fix for CVE-2012-0038 fa8b18edd7 "xfs: validate acl count",
but there was a follow on patch which is not in our kernel. If count
was a negative then we could get by the new check.
From 093019cf1b18dd31b2c3b77acce4e000e2cbc9ce Mon Sep 17 00:00:00 2001
From: Xi Wang <xi.wang@gmail.com>
Date: Mon, 12 Dec 2011 21:55:52 +0000
Subject: [PATCH] xfs: fix acl count validation in xfs_acl_from_disk()
Commit fa8b18ed didn't prevent the integer overflow and possible
memory corruption. "count" can go negative and bypass the check.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Srinivas Eeda [Tue, 31 Jan 2012 22:37:19 +0000 (14:37 -0800)]
ocfs2: use spinlock irqsave for downconvert lock.patch
When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.
The patch disables interrupts when acquiring dc_task_lock spinlock.
Chris Mason [Wed, 25 Jan 2012 18:47:40 +0000 (13:47 -0500)]
Btrfs: fix reservations in btrfs_page_mkwrite
Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.
This makes sure we only release if we were able to reserve.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 16 Jan 2012 13:13:11 +0000 (08:13 -0500)]
Btrfs: use larger system chunks
system chunks by default are very small. This makes them slightly
larger and also fixes the conditional checks to make sure we don't
allocate a billion of them at once.
Josef Bacik [Fri, 13 Jan 2012 17:09:22 +0000 (12:09 -0500)]
Btrfs: add a delalloc mutex to inodes for delalloc reservations
I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
that and theres no real way to get rid of those, so just stop using i_mutex to
protect delalloc metadata reservations and use a delalloc mutex instead. This
shouldn't be contended often at all, only if you are writing and mmap writing to
the file at the same time. Thanks,
Josef Bacik [Fri, 2 Dec 2011 20:44:12 +0000 (15:44 -0500)]
Btrfs: protect orphan block rsv with spin_lock
We've been seeing warnings coming out of the orphan commit stuff forever from
ceph. Turns out it's because we're racing with checking if the orphan block
reserve is set, because we clear it outside of the spin_lock. So leave the
normal fastpath checks where they are, but take the spin_lock and _recheck_ to
make sure we haven't had an orphan block rsv added in the meantime. Then clear
the root's orphan block rsv and release the lock. With this patch a user said
the warnings went away and they usually showed up pretty soon after he started
ceph. Thanks,
Josef Bacik [Fri, 13 Jan 2012 00:10:12 +0000 (19:10 -0500)]
Btrfs: don't call btrfs_throttle in file write
Btrfs_throttle will make us wait if there is a currently committing transaction
until we can open new transactions, which is ridiculous since we don't actually
start any transactions within the file write path anyway, so all this does is
introduce big latencies if we have a sync/fsync heavy workload going on while
somebody else is trying to do work. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 45a8090e626ab470c91142954431a93846030b0d)
Josef Bacik [Fri, 13 Jan 2012 00:10:12 +0000 (19:10 -0500)]
Btrfs: release space on error in page_mkwrite
If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit ec39e180fd3188c983c94603634bfcd019f42ae7)
This is because of the wrong if condition, which is used to check if we should
subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not.
When we truncate a compressed extent, btrfs substracts the bytes of the whole
extent, it's wrong. We should substract the real size that we truncate, no
matter it is a compressed extent or not. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit f70a9a6b94af86fca069a7552ab672c31b457786)
Josef Bacik [Fri, 13 Jan 2012 00:10:12 +0000 (19:10 -0500)]
Btrfs: do not use btrfs_end_transaction_throttle everywhere
A user reported a problem where things like open with O_CREAT would take up to
30 seconds when he had nfs activity on the same mount. This is because all of
our quick metadata operations, like create, symlink etc all do
btrfs_end_transaction_throttle, which if the transaction is blocked will wait
for the commit to complete before it returns. This adds a ridiculous amount of
latency and isn't really needed. The normal btrfs_end_transaction will mark the
transaction as blocked and wake the transaction kthread up if it thinks the
transaction needs to end (this being in the running out of global reserve space
scenario), and this is all that is really needed since we've already done
everything we're going to do, we just need to return. This should help people
with the latency they were seeing when using synchronous heavy workloads.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 7ad85bb76a61801362701b77c5cee5aa09f35369)
Li Zefan [Wed, 7 Dec 2011 03:38:24 +0000 (11:38 +0800)]
Btrfs: fix possible deadlock when opening a seed device
The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:
Since seed device is readonly, there's no usable space in the filesystem.
Afterwards we add a sprout device to it, and the kernel creates a METADATA
block group and a SYSTEM block group where comes free space we can reserve,
but we still get revervation failure because the global block_rsv hasn't
been updated accordingly.
Li Zefan [Thu, 29 Dec 2011 06:47:27 +0000 (14:47 +0800)]
Btrfs: rewrite btrfs_trim_block_group()
There are various bugs in block group trimming:
- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.
I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.
Before patching:
# fstrim -v /mnt/
/mnt/: 1496465408 bytes were trimmed
After patching:
# fstrim -v /mnt/
/mnt/: 2193768448 bytes were trimmed
Li Zefan [Thu, 1 Dec 2011 06:06:42 +0000 (14:06 +0800)]
Btrfs: simplfy calculation of stripe length for discard operation
For btrfs raid, while discarding a range of space, we'll need to know
the start offset and length to discard for each device, and it's done
in btrfs_map_block().
However the calculation is a bit complex for raid0 and raid10, so I
reimplement it based on a fact that:
Li Zefan [Thu, 1 Dec 2011 04:55:47 +0000 (12:55 +0800)]
Btrfs: don't pre-allocate btrfs bio
We pre-allocate a btrfs bio with fixed size, and then may re-allocate
memory if we find stripes are bigger than the fixed size. But this
pre-allocation is not necessary.
Also we don't have to calcuate the stripe number twice.
Alexandre Oliva [Fri, 14 Oct 2011 15:10:36 +0000 (12:10 -0300)]
Btrfs: revamp clustered allocation logic
Parameterize clusters on minimum total size, minimum chunk size and
minimum contiguous size for at least one chunk, without limits on
cluster, window or gap sizes. Don't tolerate any fragmentation for
SSD_SPREAD; accept it for metadata, but try to keep data dense.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 1bb91902dc90e25449893e693ad45605cb08fbe5)
Alexandre Oliva [Mon, 28 Nov 2011 14:36:17 +0000 (12:36 -0200)]
Btrfs: don't set up allocation result twice
We store the allocation start and length twice in ins, once right
after the other, but with intervening calls that may prevent the
duplicate from being optimized out by the compiler. Remove one of the
assignments.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit fc7c1077ceb99c35e5f9d0ce03dc7740565bb2bf)
Alexandre Oliva [Mon, 12 Dec 2011 06:48:19 +0000 (04:48 -0200)]
Btrfs: test free space only for unclustered allocation
Since the clustered allocation may be taking extents from a different
block group, there's no point in spin-locking and testing the current
block group free space before attempting to allocate space from a
cluster, even more so when we might refrain from even trying the
cluster in the current block group because, after the cluster was set
up, not enough free space remained. Furthermore, cluster creation
attempts fail fast when the block group doesn't have enough free
space, so the test was completely superfluous.
I've move the free space test past the cluster allocation attempt,
where it is more useful, and arranged for a cluster in the current
block group to be released before trying an unclustered allocation,
when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
the cluster stands a chance of being combined with additional free
space in the block group so as to succeed in the allocation attempt.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit a5f6f719a5cd7caeee8ed8137cf3f94c3bbebc65)
Chris Mason [Fri, 6 Jan 2012 20:41:34 +0000 (15:41 -0500)]
Btrfs: lower the bar for chunk allocation
The chunk allocation code has tried to keep a pretty tight lid on creating new
metadata chunks. This is partially because in the past the reservation
code didn't give us an accurate idea of how much space was being used.
The new code is much more accurate, so we're able to get rid of some of these
checks.
Chris Mason [Fri, 6 Jan 2012 20:23:57 +0000 (15:23 -0500)]
Btrfs: run chunk allocations while we do delayed refs
Btrfs tries to batch extent allocation tree changes to improve performance
and reduce metadata trashing. But it doesn't allocate new metadata chunks
while it is doing allocations for the extent allocation tree.
This commit changes the delayed refence code to do chunk allocations if we're
getting low on room. It prevents crashes and improves performance.
Al Viro [Fri, 23 Dec 2011 12:58:13 +0000 (07:58 -0500)]
Btrfs: call d_instantiate after all ops are setup
This closes races where btrfs is calling d_instantiate too soon during
inode creation. All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 08c422c27f855d27b0b3d9fa30ebd938d4ae6f1f)
Chris Mason [Fri, 23 Dec 2011 12:53:00 +0000 (07:53 -0500)]
Btrfs: fix worker lock misuse in find_worker
Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.
This fixes both errors.
Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
(cherry picked from commit 8d532b2afb2eacc84588db709ec280a3d1219be3)
Konrad Rzeszutek Wilk [Tue, 24 Jan 2012 21:55:29 +0000 (16:55 -0500)]
xen/config: turn CONFIG_XEN_DEBUG_FS off.
That option makes the Xen spinlock code (xen/spinlock.c) accumulate
statistics about how many locks taken, time in slowpath, etc.
Good information during debugging but not in production.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Maxim Uvarov [Mon, 23 Jan 2012 20:08:00 +0000 (12:08 -0800)]
proc: clean up and fix /proc/<pid>/mem handling
Orabug: 13618927
CVE-2012-0056
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.
This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open. That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.
That is different from our previous behavior, but much simpler. If
somebody actually finds a load where this matters, we'll need to revert
this commit.
I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve. So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.
Reported-by: Jüri Aedla <asd@ut.ee> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
Maxim Uvarov [Sat, 21 Jan 2012 01:45:24 +0000 (17:45 -0800)]
add __init arguments to init functions
Fix following issues:
WARNING: vmlinux.o(.text+0x3aba): Section mismatch in reference from the function xen_align_and_add_e820_region() to the function .init.text:e820_add_region()
The function xen_align_and_add_e820_region() references
the function __init e820_add_region().
This is often because xen_align_and_add_e820_region lacks a __init
annotation or the annotation of e820_add_region is wrong.
WARNING: vmlinux.o(.text+0x2e9ec): Section mismatch in reference from the function acpi_map_cpu2node() to the variable .cpuinit.data:__apicid_to_node
The function acpi_map_cpu2node() references
the variable __cpuinitdata __apicid_to_node.
This is often because acpi_map_cpu2node lacks a __cpuinitdata
annotation or the annotation of __apicid_to_node is wrong.
WARNING: vmlinux.o(.text+0x2e9f1): Section mismatch in reference from the function acpi_map_cpu2node() to the function .cpuinit.text:numa_set_node()
The function acpi_map_cpu2node() references
the function __cpuinit numa_set_node().
This is often because acpi_map_cpu2node lacks a __cpuinit
annotation or the annotation of numa_set_node is wrong.
WARNING: vmlinux.o(.text+0x3f9b4): Section mismatch in reference from the function enable_iommus() to the function .init.text:iommu_set_device_table()
The function enable_iommus() references
the function __init iommu_set_device_table().
This is often because enable_iommus lacks a __init
annotation or the annotation of iommu_set_device_table is wrong.
Maxim Uvarov [Sun, 15 Jan 2012 20:08:20 +0000 (12:08 -0800)]
hpwdt: clean up set_memory_x call for 32 bit
1. addess has to be page aligned.
2. set_memory_x uses page size argument, not size.
Bug causes with following commit:
commit da28179b4e90dda56912ee825c7eaa62fc103797
Author: Mingarelli, Thomas <Thomas.Mingarelli@hp.com>
Date: Mon Nov 7 10:59:00 2011 +0100
watchdog: hpwdt: Changes to handle NX secure bit in 32bit path
Manish Rangankar [Fri, 2 Dec 2011 08:25:03 +0000 (13:55 +0530)]
qla4xxx: Fixed BFS with sendtargets as boot index.
If ql4xdisablesysfsboot = 0 and sendtargets entry as boot index then
driver does export sendtarget entries in sysfs but iscsistart does not
do discovery. So in this case let driver do the discovery and
login to the targets.
Nilesh Javali [Wed, 7 Dec 2011 08:22:31 +0000 (13:52 +0530)]
qla4xxx: Correct the default relogin timeout value
The ACB default timeout value is used to set the default
relogin timeout value. For ISP4022 adapters where
the ACB default value is set to 2560s, limit the relogin
timeout to 12s.
JIRA Key: IUEKR2ISCSI-8
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Nilesh Javali [Thu, 1 Dec 2011 09:06:04 +0000 (14:36 +0530)]
qla4xxx: Limit the ACB Default Timeout value to 12s
The ACB default timeout value is set to 2560s in the
ISP4022 firmware. This caused the driver to loop
for 2560s. Hence limit the default timeout at the driver
level to min 12s.
Also break out from the loop if the sendtargets list was empty.
JIRA Key: IUEKR2ISCSI-7
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>