Michael Chan [Wed, 20 Jul 2011 14:55:22 +0000 (14:55 +0000)]
cnic: Fix Context ID space calculation
Include FCoE CID space only for E2_PLUS devices. Remove old CID
offset adjustments that are no longer needed.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b37a41e390310429d4171b0f7b6c6eab04512dc0)
bnx2x: Implementation for netdev->ndo_fcoe_get_wwn
(
note this was hand merged. should we need ndo_setup_tc then
revert/reapply and/or merge away.
)
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 20 Jul 2011 14:55:25 +0000 (14:55 +0000)]
bnx2: Fix endian swapping on firmware version string
so that ethtool -i will display it correctly on big endian systems.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3aeb7d2243e55ddcad3c0b402e7b09619a67f5da)
Michael Chan [Fri, 15 Jul 2011 06:53:58 +0000 (06:53 +0000)]
bnx2: Close device if tx_timeout reset fails
Based on original patch and description from Flavio Leitner <fbl@redhat.com>
When bnx2_reset_task() is called, it will stop,
(re)initialize and start the interface to restore
the working condition.
The bnx2_init_nic() calls bnx2_reset_nic() which will
reset the chip and then calls bnx2_free_skbs() to free
all the skbs.
The problem happens when bnx2_init_chip() fails because
bnx2_reset_nic() will just return skipping the ring
initializations at bnx2_init_all_rings(). Later, the
reset task starts the interface again and the system
crashes due a NULL pointer access (no skb in the ring).
To fix it, we call dev_close() if bnx2_init_nic() fails.
One minor wrinkle to deal with is the cancel_work_sync()
call in bnx2_close() to cancel bnx2_reset_task(). The
call will wait forever because it is trying to cancel
itself and the workqueue will be stuck.
Since bnx2_reset_task() holds the rtnl_lock() and checks
for netif_running() before proceeding, there is no need
to cancel bnx2_reset_task() in bnx2_close() even if
bnx2_close() and bnx2_reset_task() are running concurrently.
The rtnl_lock() serializes the 2 calls.
We need to move the cancel_work_sync() call to
bnx2_remove_one() to make sure it is canceled before freeing
the netdev struct.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cd6340199f65cad63262db0fd561bdcfd69df3bd)
Michael Chan [Wed, 13 Jul 2011 17:24:22 +0000 (17:24 +0000)]
bnx2: Read iSCSI config from shared memory during ->probe()
The scratchpad location that we were reading from has not been
initialized yet during ->probe(), so we were getting inaccurate
information.
Update version to 2.1.10.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 41c2178adce37b249147063624f8a27b064b471e)
to help debug issues related to management firmware.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ecdbf6e0d555d353188647d1b2dee9a79db69c68)
Jon Mason [Mon, 27 Jun 2011 07:44:43 +0000 (07:44 +0000)]
bnx2: remove unnecessary read of PCI_CAP_ID_EXP
The PCIE capability offset is saved during PCI bus walking. It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it. Also, pci_is_pcie is a
better way of determining if the device is PCIE or not (as it uses the
same saved PCIE capability offset).
Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e82760e7d6498d24d1a92f22767ba578c8980a6d)
Michael Chan [Wed, 13 Jul 2011 17:24:20 +0000 (17:24 +0000)]
cnic: Return proper error code if we fail to send netlink message
to allow iSCSI connection to fail faster instead of waiting for the
long timeout.
Update version to 2.5.6.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 558e4c758c4c7bf209325f5865189c6558860b2b)
Michael Chan [Wed, 13 Jul 2011 17:24:19 +0000 (17:24 +0000)]
cnic: Fix ring setup/shutdown code
Latest bnx2x driver uses different CID for the iSCSI rings, so
we need to pass it in the ring init data. The rx ring is also
zeroed out to prevent stale DMA addresses from being used after
shutdown.
The same cp local variable redefined inside the else branch is
also eliminated.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e1dd883cb15310dc2ded9995a1f1d8b1cb1e88f3)
Michael Chan [Wed, 13 Jul 2011 17:24:18 +0000 (17:24 +0000)]
cnic: Fix port_mode setting
CHIP_2_PORT_MODE was not set correctly.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b7d40315c9643034ac4b5c9dda480d0124416f89)
Michael Chan [Wed, 13 Jul 2011 17:24:17 +0000 (17:24 +0000)]
cnic: Replace get_random_bytes() with random32()
Suggested by Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 973e574e26cc8f4704e5d7f112fd566386e37f04)
Michael Chan [Mon, 20 Jun 2011 15:15:56 +0000 (15:15 +0000)]
cnic, bnx2i: Add support for new devices - 57800, 57810, and 57840
And change iSCSI RQ doorbell size from 16B to 64B to match new firmware.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f4b5ad26bcb983c493e131ff34b2fa60100c82e5)
These are the remainder casts after several specific
patches to remove netdev_priv and dev_priv.
Done via coccinelle script (and a little editing):
$ cat cast_void_pointer.cocci
@@
type T;
T *pt;
void *pv;
@@
- pt = (T *)pv;
+ pt = pv;
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Acked-By: Chris Snook <chris.snook@gmail.com> Acked-by: Jon Mason <jdmason@kudzu.us> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Dillow <dave@thedillows.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Deleting NPIV port causes a kernel panic when the NPIV port is in the same zone
as the physical port and shares the same LUN. This happens due to the fact that
vport destroy and unsolicited ELS are scheduled to run on the same workqueue,
and vport destroy destroys the lport and the unsolicited ELS tries to access
the invalid lport. This patch fixes this issue by maintaining a list of valid
lports and verifying if the lport is valid or not before accessing it.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
(cherry picked from commit d36b3279e157641c345b12eddb3db78fb42da80f)
Vladislav Zolotarov [Tue, 14 Jun 2011 01:34:46 +0000 (01:34 +0000)]
bnx2x: Update date to 2011/06/13 and version to 1.70.00-0
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
(cherry picked from commit b96368e9365a4db7429da87cfd25bb54b24954f8)
Vladislav Zolotarov [Tue, 14 Jun 2011 01:33:51 +0000 (01:33 +0000)]
bnx2x: 57712 parity handling
- Added support for a parity error handling for a 57712 chip.
- Changed the parity recovery scheme from per-chip to per-engine.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
(cherry picked from commit c9ee92062424375fe6e73c4af5d52df289ccf9eb)
Vlad Zolotarov [Tue, 14 Jun 2011 11:33:44 +0000 (14:33 +0300)]
New 7.0 FW: bnx2x, cnic, bnx2i, bnx2fc
New FW/HSI (7.0):
- Added support to 578xx chips
- Improved HSI - much less driver's direct access to the FW internal
memory needed.
New implementation of the HSI handling layer in the bnx2x (bnx2x_sp.c):
- Introduced chip dependent objects that have chip independent interfaces
for configuration of MACs, multicast addresses, Rx mode, indirection table,
fast path queues and function initialization/cleanup.
- Objects functionality is based on the private function pointers, which
allows not only a per-chip but also PF/VF differentiation while still
preserving the same interface towards the driver.
- Objects interface is not influenced by the HSI changes which do not require
providing new parameters keeping the code outside the bnx2x_sp.c invariant
with regard to such HSI chnages.
Changes in a CNIC, bnx2fc and bnx2i modules due to the new HSI.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
(cherry picked from commit 619c5cb6885b936c44ae1422ef805b69c6291485)
Michael Chan [Tue, 14 Jun 2011 01:32:38 +0000 (01:32 +0000)]
cnic: Move indexing function pointers to struct kcq_info
The hardware indexing scheme for the FCoE kcq will change in the upcoming
firmware. This patch will cope with the change easily.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
(cherry picked from commit 59e5137357559ec60b2e72bdc3d5a7e22c47212b)
Vladislav Zolotarov [Tue, 14 Jun 2011 01:33:39 +0000 (01:33 +0000)]
bnx2x: Created bnx2x_sp
Moved the HSI dependent slow path code to a separate file.
Currently it contains the implementation of MACs, Rx mode,
multicast addresses, indirection table, fast path queue and function
configuration code.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
(cherry picked from commit 042181f5aa8833a8918e1a91cfaf292146ffc62c)
Yaniv Rosner [Tue, 31 May 2011 21:29:42 +0000 (21:29 +0000)]
bnx2x: Improve cl45 access methods
Instead of setting CL45 mode for every CL45 access, apply it once during initialization.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a198c142aacf82acad29e1752191bda8b451a0c7)
Yaniv Rosner [Tue, 31 May 2011 21:29:27 +0000 (21:29 +0000)]
bnx2x: Modify XGXS functions
Modify XGXS functions to follow rest of PHY scheme.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ec146a6f019923819f5ca381980248b6d154ca1a)
Yaniv Rosner [Tue, 31 May 2011 21:29:05 +0000 (21:29 +0000)]
bnx2x: Fix link status sync
Fix link status synchronization between the primary function, and rest functions.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fd36a2e69e05f42ddfe388efe14e068c0d0c6cb7)
Yaniv Rosner [Tue, 31 May 2011 21:28:43 +0000 (21:28 +0000)]
bnx2x: Adjust BCM8726 module detection settings
Move BCM8726 module detection code into a separate function to be called only once during initialization.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 020c7e3f3cd38d41104c7f55d3d5732c5ac939be)
Yaniv Rosner [Tue, 31 May 2011 21:28:27 +0000 (21:28 +0000)]
bnx2x: Fix grammar and relocate code
This patch relocates some functions as a preparation for next patches, and also fixes some grammar mistakes.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9045f6b44a01737a84c5bb79f580dccce6806d80)
Yaniv Rosner [Tue, 31 May 2011 21:28:10 +0000 (21:28 +0000)]
bnx2x: Fix BCM84833 settings
Fix BCM84833 register settings.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bac27bd941454aaf40f7876ce3b487e303c4953d)
Yaniv Rosner [Tue, 31 May 2011 21:27:48 +0000 (21:27 +0000)]
bnx2x: Fix over current port display
On 57712 chip, port number is enumerated per engine, so it requires adjustment in port display to the user.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 27d024321cf4fc0a96c41c3b0f3c123796734a63)
Yaniv Rosner [Tue, 31 May 2011 21:27:06 +0000 (21:27 +0000)]
bnx2x: Add TX fault check for fiber PHYs
In case TX fault is detected on Fiber PHYs, declare the link as down until TX fault is gone.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c688fe2fc0cab3a5d266f7f6fcb21f14e4ac39ba)
Yaniv Rosner [Tue, 31 May 2011 21:26:28 +0000 (21:26 +0000)]
bnx2x: Change return status type
Change return status from u8 to int.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fcf5b650832996bd857bb8f0b0b42097218f7fb8)
Yaniv Rosner [Tue, 31 May 2011 21:26:11 +0000 (21:26 +0000)]
bnx2x: Fix port type display
Display the current media type connected to the port in ethtool.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ac9e4286dc9e64dd2d937df7f8660bb5f260792)
Yaniv Rosner [Tue, 31 May 2011 21:25:55 +0000 (21:25 +0000)]
bnx2x: Add new phy BCM8722
Add support for new phy BCM8722.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e4d78f120c039bbd18ae449a6b2af3df83ca02bf)
Chris Mason [Wed, 25 Jan 2012 18:47:40 +0000 (13:47 -0500)]
Btrfs: fix reservations in btrfs_page_mkwrite
Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.
This makes sure we only release if we were able to reserve.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 16 Jan 2012 13:13:11 +0000 (08:13 -0500)]
Btrfs: use larger system chunks
system chunks by default are very small. This makes them slightly
larger and also fixes the conditional checks to make sure we don't
allocate a billion of them at once.
Josef Bacik [Fri, 13 Jan 2012 17:09:22 +0000 (12:09 -0500)]
Btrfs: add a delalloc mutex to inodes for delalloc reservations
I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
that and theres no real way to get rid of those, so just stop using i_mutex to
protect delalloc metadata reservations and use a delalloc mutex instead. This
shouldn't be contended often at all, only if you are writing and mmap writing to
the file at the same time. Thanks,
Josef Bacik [Fri, 2 Dec 2011 20:44:12 +0000 (15:44 -0500)]
Btrfs: protect orphan block rsv with spin_lock
We've been seeing warnings coming out of the orphan commit stuff forever from
ceph. Turns out it's because we're racing with checking if the orphan block
reserve is set, because we clear it outside of the spin_lock. So leave the
normal fastpath checks where they are, but take the spin_lock and _recheck_ to
make sure we haven't had an orphan block rsv added in the meantime. Then clear
the root's orphan block rsv and release the lock. With this patch a user said
the warnings went away and they usually showed up pretty soon after he started
ceph. Thanks,
Josef Bacik [Fri, 13 Jan 2012 00:10:12 +0000 (19:10 -0500)]
Btrfs: don't call btrfs_throttle in file write
Btrfs_throttle will make us wait if there is a currently committing transaction
until we can open new transactions, which is ridiculous since we don't actually
start any transactions within the file write path anyway, so all this does is
introduce big latencies if we have a sync/fsync heavy workload going on while
somebody else is trying to do work. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 45a8090e626ab470c91142954431a93846030b0d)
Josef Bacik [Fri, 13 Jan 2012 00:10:12 +0000 (19:10 -0500)]
Btrfs: release space on error in page_mkwrite
If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit ec39e180fd3188c983c94603634bfcd019f42ae7)
This is because of the wrong if condition, which is used to check if we should
subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not.
When we truncate a compressed extent, btrfs substracts the bytes of the whole
extent, it's wrong. We should substract the real size that we truncate, no
matter it is a compressed extent or not. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit f70a9a6b94af86fca069a7552ab672c31b457786)
Josef Bacik [Fri, 13 Jan 2012 00:10:12 +0000 (19:10 -0500)]
Btrfs: do not use btrfs_end_transaction_throttle everywhere
A user reported a problem where things like open with O_CREAT would take up to
30 seconds when he had nfs activity on the same mount. This is because all of
our quick metadata operations, like create, symlink etc all do
btrfs_end_transaction_throttle, which if the transaction is blocked will wait
for the commit to complete before it returns. This adds a ridiculous amount of
latency and isn't really needed. The normal btrfs_end_transaction will mark the
transaction as blocked and wake the transaction kthread up if it thinks the
transaction needs to end (this being in the running out of global reserve space
scenario), and this is all that is really needed since we've already done
everything we're going to do, we just need to return. This should help people
with the latency they were seeing when using synchronous heavy workloads.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 7ad85bb76a61801362701b77c5cee5aa09f35369)
Li Zefan [Wed, 7 Dec 2011 03:38:24 +0000 (11:38 +0800)]
Btrfs: fix possible deadlock when opening a seed device
The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:
Since seed device is readonly, there's no usable space in the filesystem.
Afterwards we add a sprout device to it, and the kernel creates a METADATA
block group and a SYSTEM block group where comes free space we can reserve,
but we still get revervation failure because the global block_rsv hasn't
been updated accordingly.
Li Zefan [Thu, 29 Dec 2011 06:47:27 +0000 (14:47 +0800)]
Btrfs: rewrite btrfs_trim_block_group()
There are various bugs in block group trimming:
- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.
I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.
Before patching:
# fstrim -v /mnt/
/mnt/: 1496465408 bytes were trimmed
After patching:
# fstrim -v /mnt/
/mnt/: 2193768448 bytes were trimmed
Li Zefan [Thu, 1 Dec 2011 06:06:42 +0000 (14:06 +0800)]
Btrfs: simplfy calculation of stripe length for discard operation
For btrfs raid, while discarding a range of space, we'll need to know
the start offset and length to discard for each device, and it's done
in btrfs_map_block().
However the calculation is a bit complex for raid0 and raid10, so I
reimplement it based on a fact that:
Li Zefan [Thu, 1 Dec 2011 04:55:47 +0000 (12:55 +0800)]
Btrfs: don't pre-allocate btrfs bio
We pre-allocate a btrfs bio with fixed size, and then may re-allocate
memory if we find stripes are bigger than the fixed size. But this
pre-allocation is not necessary.
Also we don't have to calcuate the stripe number twice.
Alexandre Oliva [Fri, 14 Oct 2011 15:10:36 +0000 (12:10 -0300)]
Btrfs: revamp clustered allocation logic
Parameterize clusters on minimum total size, minimum chunk size and
minimum contiguous size for at least one chunk, without limits on
cluster, window or gap sizes. Don't tolerate any fragmentation for
SSD_SPREAD; accept it for metadata, but try to keep data dense.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 1bb91902dc90e25449893e693ad45605cb08fbe5)
Alexandre Oliva [Mon, 28 Nov 2011 14:36:17 +0000 (12:36 -0200)]
Btrfs: don't set up allocation result twice
We store the allocation start and length twice in ins, once right
after the other, but with intervening calls that may prevent the
duplicate from being optimized out by the compiler. Remove one of the
assignments.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit fc7c1077ceb99c35e5f9d0ce03dc7740565bb2bf)
Alexandre Oliva [Mon, 12 Dec 2011 06:48:19 +0000 (04:48 -0200)]
Btrfs: test free space only for unclustered allocation
Since the clustered allocation may be taking extents from a different
block group, there's no point in spin-locking and testing the current
block group free space before attempting to allocate space from a
cluster, even more so when we might refrain from even trying the
cluster in the current block group because, after the cluster was set
up, not enough free space remained. Furthermore, cluster creation
attempts fail fast when the block group doesn't have enough free
space, so the test was completely superfluous.
I've move the free space test past the cluster allocation attempt,
where it is more useful, and arranged for a cluster in the current
block group to be released before trying an unclustered allocation,
when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
the cluster stands a chance of being combined with additional free
space in the block group so as to succeed in the allocation attempt.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit a5f6f719a5cd7caeee8ed8137cf3f94c3bbebc65)
Chris Mason [Fri, 6 Jan 2012 20:41:34 +0000 (15:41 -0500)]
Btrfs: lower the bar for chunk allocation
The chunk allocation code has tried to keep a pretty tight lid on creating new
metadata chunks. This is partially because in the past the reservation
code didn't give us an accurate idea of how much space was being used.
The new code is much more accurate, so we're able to get rid of some of these
checks.
Chris Mason [Fri, 6 Jan 2012 20:23:57 +0000 (15:23 -0500)]
Btrfs: run chunk allocations while we do delayed refs
Btrfs tries to batch extent allocation tree changes to improve performance
and reduce metadata trashing. But it doesn't allocate new metadata chunks
while it is doing allocations for the extent allocation tree.
This commit changes the delayed refence code to do chunk allocations if we're
getting low on room. It prevents crashes and improves performance.
Al Viro [Fri, 23 Dec 2011 12:58:13 +0000 (07:58 -0500)]
Btrfs: call d_instantiate after all ops are setup
This closes races where btrfs is calling d_instantiate too soon during
inode creation. All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 08c422c27f855d27b0b3d9fa30ebd938d4ae6f1f)
Chris Mason [Fri, 23 Dec 2011 12:53:00 +0000 (07:53 -0500)]
Btrfs: fix worker lock misuse in find_worker
Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.
This fixes both errors.
Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
(cherry picked from commit 8d532b2afb2eacc84588db709ec280a3d1219be3)
Konrad Rzeszutek Wilk [Tue, 24 Jan 2012 21:55:29 +0000 (16:55 -0500)]
xen/config: turn CONFIG_XEN_DEBUG_FS off.
That option makes the Xen spinlock code (xen/spinlock.c) accumulate
statistics about how many locks taken, time in slowpath, etc.
Good information during debugging but not in production.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Maxim Uvarov [Mon, 23 Jan 2012 20:08:00 +0000 (12:08 -0800)]
proc: clean up and fix /proc/<pid>/mem handling
Orabug: 13618927
CVE-2012-0056
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.
This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open. That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.
That is different from our previous behavior, but much simpler. If
somebody actually finds a load where this matters, we'll need to revert
this commit.
I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve. So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.
Reported-by: Jüri Aedla <asd@ut.ee> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
Maxim Uvarov [Sat, 21 Jan 2012 01:45:24 +0000 (17:45 -0800)]
add __init arguments to init functions
Fix following issues:
WARNING: vmlinux.o(.text+0x3aba): Section mismatch in reference from the function xen_align_and_add_e820_region() to the function .init.text:e820_add_region()
The function xen_align_and_add_e820_region() references
the function __init e820_add_region().
This is often because xen_align_and_add_e820_region lacks a __init
annotation or the annotation of e820_add_region is wrong.
WARNING: vmlinux.o(.text+0x2e9ec): Section mismatch in reference from the function acpi_map_cpu2node() to the variable .cpuinit.data:__apicid_to_node
The function acpi_map_cpu2node() references
the variable __cpuinitdata __apicid_to_node.
This is often because acpi_map_cpu2node lacks a __cpuinitdata
annotation or the annotation of __apicid_to_node is wrong.
WARNING: vmlinux.o(.text+0x2e9f1): Section mismatch in reference from the function acpi_map_cpu2node() to the function .cpuinit.text:numa_set_node()
The function acpi_map_cpu2node() references
the function __cpuinit numa_set_node().
This is often because acpi_map_cpu2node lacks a __cpuinit
annotation or the annotation of numa_set_node is wrong.
WARNING: vmlinux.o(.text+0x3f9b4): Section mismatch in reference from the function enable_iommus() to the function .init.text:iommu_set_device_table()
The function enable_iommus() references
the function __init iommu_set_device_table().
This is often because enable_iommus lacks a __init
annotation or the annotation of iommu_set_device_table is wrong.
Maxim Uvarov [Sun, 15 Jan 2012 20:08:20 +0000 (12:08 -0800)]
hpwdt: clean up set_memory_x call for 32 bit
1. addess has to be page aligned.
2. set_memory_x uses page size argument, not size.
Bug causes with following commit:
commit da28179b4e90dda56912ee825c7eaa62fc103797
Author: Mingarelli, Thomas <Thomas.Mingarelli@hp.com>
Date: Mon Nov 7 10:59:00 2011 +0100
watchdog: hpwdt: Changes to handle NX secure bit in 32bit path