David S. Miller [Wed, 15 Apr 2015 02:29:27 +0000 (22:29 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-04-14
This series contains updates to fm10k only.
Fixed transmit statistics which was actually using values from the
receive ring, instead of the transmit ring. Fixed up spelling mistakes
in code comments and resolved unused argument warnings. Added support
for netconsole. Fixed up statistic reporting so that we are only
reporting from actual queues as well as display PF only stats for
just the PF and not the VF. Also fixed an issue that when returning
virtualization queues from the VF back to the PF, we were retaining
the VF rate limiter.
Fixed up the driver to use a separate workqueue, which helps reduce
and stabilize latency between scheduling the work in our interrupt and
actually performing the work.
Fixed a bug where the VF tried to set a multicast address before
requesting the required xcast mode.
Fix VF multicast update since VFs were being improperly added to the
switch's mutlicast group. The error stems from the fact that incorrect
arguments were passed to the update_mc_addr().
Thanks to Alex Duyck for the extensive review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:15 +0000 (13:27 -0700)]
fm10k: corrected VF multicast update
VFs were being improperly added to the switch's multicast group. The
error stems from the fact that incorrect arguments were passed to the
"update_mc_addr" function. It would seem to be a copy paste error since
the parameters are similar to the "update_uc_addr" function.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:14 +0000 (13:27 -0700)]
fm10k: mbx_update_max_size does not drop all oversized messages
When we call update_max_size it does not drop all oversized messages.
This is due to the difficulty in performing this operation, since it is
a FIFO which makes updating anything other than head or tail very
difficult. To fix this, modify validate_msg_size to ensure that we error
out later when trying to transmit the message that could be oversized.
This will generally be a rare condition, as it requires the FIFO to
include a message larger than the max_size negotiated during mailbox
connect. Note that max_size is always smaller than rx.size so it should
be safe to use here.
Also, update the update_max_size function header comment to clearly
indicate that it does not drop all oversized messages, but only those at
the head of the FIFO.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:13 +0000 (13:27 -0700)]
fm10k: reset head instead of calling update_max_size
When we forcefully shutdown the mailbox, we then go about resetting max
size to 0, and clearing all messages in the FIFO. Instead, we should
just reset the head pointer so that the FIFO becomes empty, rather than
changing the max size to 0. This helps prevent increment in tx_dropped
counter during mailbox negotiation, which is confusing to viewers of
Linux ethtool statistics output.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:12 +0000 (13:27 -0700)]
fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
The use of dropped doesn't really mean dropped mailbox messages, but
rather specifically messages which were too large to fit in the remote
Rx FIFO. Rename the stat to more clearly indicate what it means.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
====================
Netfilter updates for net-next
A final pull request, I know it's very late but this time I think it's worth a
bit of rush.
The following patchset contains Netfilter/nf_tables updates for net-next, more
specifically concatenation support and dynamic stateful expression
instantiation.
This also comes with a couple of small patches. One to fix the ebtables.h
userspace header and another to get rid of an obsolete example file in tree
that describes a nf_tables expression.
This time, I decided to paste the original descriptions. This will result in a
rather large commit description, but I think these bytes to keep.
Patrick McHardy says:
====================
netfilter: nf_tables: concatenation support
The following patches add support for concatenations, which allow multi
dimensional exact matches in O(1).
The basic idea is to split the data registers, currently consisting of
4 registers of 16 bytes each, into smaller units, 16 registers of 4
bytes each, and making sure each register store always leaves the
full 32 bit in a well defined state, meaning smaller stores will
zero the remaining bits.
Based on that, we can load multiple adjacent registers with different
values, thereby building a concatenated bigger value, and use that
value for set lookups.
Sets are changed to use variable sized extensions for their key and
data values, removing the fixed limit of 16 bytes while saving memory
if less space is needed.
As a side effect, these patches will allow some nice optimizations in
the future, like using jhash2 in nft_hash, removing the masking in
nft_cmp_fast, optimized data comparison using 32 bit word size etc.
These are not done so far however.
The patches are split up as follows:
* the first five patches add length validation to register loads and
stores to make sure we stay within bounds and prepare the validation
functions for the new addressing mode
* the next patches prepare for changing to 32 bit addressing by
introducing a struct nft_regs, which holds the verdict register as
well as the data registers. The verdict members are moved to a new
struct nft_verdict to allow to pull struct nft_data out of the stack.
* the next patches contain preparatory conversions of expressions and
sets to use 32 bit addressing
* the next patch introduces so far unused register conversion helpers
for parsing and dumping register numbers over netlink
* following is the real conversion to 32 bit addressing, consisting of
replacing struct nft_data in struct nft_regs by an array of u32s and
actually translating and validating the new register numbers.
* the final two patches add support for variable sized data items and
variable sized keys / data in set elements
The patches have been verified to work correctly with nft binaries using
both old and new addressing.
====================
The following patches are the grand finale of my nf_tables set work,
using all the building blocks put in place by the previous patches
to support something like iptables hashlimit, but a lot more powerful.
Sets are extended to allow attaching expressions to set elements.
The dynset expression dynamically instantiates these expressions
based on a template when creating new set elements and evaluates
them for all new or updated set members.
In combination with concatenations this effectively creates state
tables for arbitrary combinations of keys, using the existing
expression types to maintain that state. Regular set GC takes care
of purging expired states.
We currently support two different stateful expressions, counter
and limit. Using limit as a template we can express the functionality
of hashlimit, but completely unrestricted in the combination of keys.
Using counter we can perform accounting for arbitrary flows.
The following examples from patch 5/5 show some possibilities.
Userspace syntax is still WIP, especially the listing of state
tables will most likely be seperated from normal set listings
and use a more structured format:
1. Limit the rate of new SSH connections per host, similar to iptables
hashlimit:
flow ip saddr timeout 60s \
limit 10/second \
accept
2. Account network traffic between each set of /24 networks:
flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
counter
3. Account traffic to each host per user:
flow skuid . ip daddr \
counter
4. Account traffic for each combination of source address and TCP flags:
flow ip saddr . tcp flags \
counter
The resulting set content after a Xmas-scan look like this:
In the future the "expressions attached to elements" will be extended
to also support user created non-stateful expressions to allow to
efficiently select beween a set of parameter sets, f.i. a set of log
statements with different prefixes based on the interface, which currently
require one rule each. This will most likely have to wait until the next
kernel version though.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:11 +0000 (13:27 -0700)]
fm10k: update xcast mode before synchronizing multicast addresses
When the PF receives a request to update a multicast address for the VF,
it checks the enabled multicast mode first. Fix a bug where the VF tried
to set a multicast address before requesting the required xcast mode.
This ensures the multicast addresses are honored as long as the xcast
mode was allowed.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:09 +0000 (13:27 -0700)]
fm10k: start service timer on probe
Since the service task handles varying work that doesn't all require the
interface to be up, launch the service timer immediately. This ensures
that we continually check the mailbox, as well as handle other tasks
while the device is down.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:08 +0000 (13:27 -0700)]
fm10k: fix function header comment
The header comment included a miscopy of a C-code line, and also
mis-used Rx FIFO when it clearly meant Tx FIFO
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:07 +0000 (13:27 -0700)]
fm10k: comment next_vf_mbx flow
Add a header comment explaining why we have the somewhat crazy mailbox
flow. This flow is necessary as it prevents the PF<->SM mailbox from
being flooded by the VF messages, which normally trigger a message to
the PF. This helps prevent the case where we see a PF mailbox timeout.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Sat, 11 Apr 2015 00:20:17 +0000 (17:20 -0700)]
fm10k: don't handle mailbox events in iov_event path and always process mailbox
Since we already schedule the service task, we can just wait for this
task to handle the mailbox events from the VF. This reduces some complex
code flow, and makes it so we have a single path for handling the VF
messages. There is a possibility that we have a slight delay in handling
VF messages, but it should be minimal.
The result of tx_complete and !rx_ready is insufficient to determine
whether we need to process the mailbox. There is a possible race
condition whereby the VF fills up the mbmem for us, but we have already
recently processed the mailboxes in the interrupt. During this time,
the interrupt is disabled. Thus, our Rx FIFO is empty, but the mbmem now
has data in it. Since we continually check whether Rx FIFO is empty, we
then never call process. This results in the possibility to prevent PF
from handling the VF mailbox messages.
Instead, just call process every time, despite the fact that we may or
may not have anything to process for the VF. There should be minimal
overhead for doing this, and it resolves an issue where the VF never
comes up due to never getting response for its SET_LPORT_STATE message.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:05 +0000 (13:27 -0700)]
fm10k: use separate workqueue for fm10k driver
Since we run the watchdog periodically, which might take a while and
potentially monopolize the system default workqueue, create our own
separate work queue. This also helps reduce and stabilize latency
between scheduling the work in our interrupt and actually performing
the work. Still use a timer for the regular scheduled interval but
queue the work onto its own work queue.
It seemed overkill to create a single workqueue per interface, so we
just spawn a single work queue for all interfaces upon driver load. For
this reason, use a multi-threaded workqueue with one thread per
processor, rather than single threaded queue.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:04 +0000 (13:27 -0700)]
fm10k: Set PF queues to unlimited bandwidth during virtualization
When returning virtualization queues from the VF back to the PF, do not
retain the VF rate limiter.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Todd Russell <todd.a.russell@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:03 +0000 (13:27 -0700)]
fm10k: expose tx_timeout_count as an ethtool stat
Named it tx_hang_count to differentiate it from tx_hwtstamp_timeout.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:02 +0000 (13:27 -0700)]
fm10k: only increment tx_timeout_count in Tx hang path
We were incrementing the tx_timeout_count for both the Tx hang
and then for all reset flows. Instead, we should only increment
tx_timeout_count in the Tx hang path, so that our Tx hang counter
does not increment when it was not caused by a Tx hang.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since we already print this message when a reset is requested via the
RESET_REQUESTED flag, we do not need to print it before setting the
flag.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:27:00 +0000 (13:27 -0700)]
fm10k: separate PF only stats so that VF does not display them
This patch resolves an issue with ethtool stats displaying useless
values on the VF, because some stats simply have no meaning to the VF.
Resolve this by splitting these out into PF_STATS and only showing them
if we aren't the VF.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:26:59 +0000 (13:26 -0700)]
fm10k: use hw->mac.max_queues for stats
Even though it shouldn't strictly matter, don't count queue stats higher
than the max_queues value stored for this mac. This ensures that we
don't attempt to check queues which don't belong to use in VFs. This
shouldn't be a visible change, as the VFs should see zero for queues
which don't belong to them.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:26:58 +0000 (13:26 -0700)]
fm10k: only show actual queues, not the maximum in hardware
Currently, we show statistics for all 128 queues, even though we don't
necessarily have that many queues available especially in the VF case.
Instead, use the hw->mac.max_queues value, which tells us how many
queues we actually have, rather than the space for the rings we
allocated. In this way, we prevent dumping statistics that are useless
on the VF.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 10 Apr 2015 23:48:19 +0000 (16:48 -0700)]
fm10k: allow creation of VLAN on default vid
Previously, the user was not allowed to create a VLAN interface on top
of the switch default vid.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:26:56 +0000 (13:26 -0700)]
fm10k: fix unused warnings
The were several functions which had parameters which were never or
sometimes used in functions. To resolve possible compiler warnings,
use __always_unused or __maybe_unused kernel macros to resolve.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Sat, 11 Apr 2015 02:14:31 +0000 (19:14 -0700)]
fm10k: Add netconsole support
This change adds a function called "fm10k_netpoll" that's used to define
"ndo_poll_controller" in "fm10k_netdev_ops". This is required to enable
support for "netconsole" in fm10k.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Fri, 3 Apr 2015 20:26:51 +0000 (13:26 -0700)]
fm10k: Corrected an error in Tx statistics
The function collecting Tx statistics was actually using values from the RX
ring. Thus, Tx and Rx statistics values reported by "ifconfig" will
return identical values. This change corrects this error and the Tx
statistics is now reading from the Tx ring.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
David S. Miller [Tue, 14 Apr 2015 19:08:52 +0000 (15:08 -0400)]
Merge branch 'cxgb4-next'
Hariprasad Shenai says:
====================
cxgb4: Misc. fixes for sge
Increases value of MAX_IMM_TX_PKT_LEN to improve latency, fill freelist
starving threshold based on adapter type, add comments for tx flits and sge
length code and don't call t4_slow_intr_handler when we are not master PF.
This patch series has been created against net-next tree and includes patches on
cxgb4 driver
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Tue, 14 Apr 2015 10:08:01 +0000 (12:08 +0200)]
bgmac: fix DMA rx corruption
The driver needs to inform the hardware about the first invalid (not yet
filled) rx slot, by writing its DMA descriptor pointer offset to the
BGMAC_DMA_RX_INDEX register.
This register was set to a value exceeding the rx ring size, effectively
allowing the hardware constant access to the full ring, regardless of
which slots are initialized.
To fix this issue, always mark the last filled rx slot as invalid.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Tue, 14 Apr 2015 10:08:00 +0000 (12:08 +0200)]
bgmac: simplify dma init/cleanup
Instead of allocating buffers at device init time and initializing
descriptors at device open, do both at the same time (during open).
Free all buffers when closing the device.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Tue, 14 Apr 2015 10:07:59 +0000 (12:07 +0200)]
bgmac: increase rx ring size from 511 to 512
Limiting it to 511 looks like a failed attempt at leaving one descriptor
empty to allow the hardware to stop processing a buffer that has not
been prepared yet. However, this doesn't work because this affects the
total ring size as well
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Tue, 14 Apr 2015 10:07:58 +0000 (12:07 +0200)]
bgmac: add check for oversized packets
In very rare cases, the MAC can catch an internal buffer that is bigger
than it's supposed to be. Instead of crashing the kernel, simply pass
the buffer back to the hardware
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Tue, 14 Apr 2015 10:07:55 +0000 (12:07 +0200)]
bgmac: leave interrupts disabled as long as there is work to do
Always poll rx and tx during NAPI poll instead of relying on the status
of the first interrupt. This prevents bgmac_poll from leaving unfinished
work around until the next IRQ.
In my tests this makes bridging/routing throughput under heavy load more
stable and ensures that no new IRQs arrive as long as bgmac_poll uses up
the entire budget.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Tue, 14 Apr 2015 10:07:54 +0000 (12:07 +0200)]
bgmac: simplify tx ring index handling
Keep incrementing ring->start and ring->end instead of pointing it to
the actual ring slot entry. This simplifies the calculation of the
number of free slots.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens [Tue, 14 Apr 2015 05:28:44 +0000 (15:28 +1000)]
toshiba: Remove celleb from Kconfig options
The toshiba drivers had celleb as an optional dependency.
celleb has been dropped [1], so clean that out of Kconfig.
[1] http://patchwork.ozlabs.org/patch/451730/
CC: netdev@vger.kernel.org CC: Valentin Rothberg <valentinrothberg@gmail.com> CC: mpe@ellerman.id.au CC: linuxppc-dev@lists.ozlabs.org Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: Implement partial copy into send buffer
If remaining space in a send buffer slot is too small for the whole message,
we only copy the RNDIS header and PPI data into send buffer, so we can batch
one more packet each time. It reduces the vmbus per-message overhead.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Apr 2015 22:18:05 +0000 (18:18 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Al Viro says:
====================
netdev-related stuff in vfs.git
There are several commits sitting in vfs.git that probably ought to go in
via net-next.git. First of all, there's merge with vfs.git#iocb - that's
Christoph's aio rework, which has triggered conflicts with the ->sendmsg()
and ->recvmsg() patches a while ago. It's not so much Christoph's stuff
that ought to be in net-next, as (pretty simple) conflict resolution on merge.
The next chunk is switch to {compat_,}import_iovec/import_single_range - new
safer primitives for initializing iov_iter. The primitives themselves come
from vfs/git#iov_iter (and they are used quite a lot in vfs part of queue),
conversion of net/socket.c syscalls belongs in net-next, IMO. Next there's
afs and rxrpc stuff from dhowells. And then there's sanitizing kernel_sendmsg
et.al. + missing inlined helper for "how much data is left in msg->msg_iter" -
this stuff is used in e.g. cifs stuff, but it belongs in net-next.
That pile is pullable from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem
I'll post the individual patches in there in followups; could you take a look
and tell if everything in there is OK with you?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Weinberger [Sun, 12 Apr 2015 22:52:39 +0000 (00:52 +0200)]
netfilter: Fix format string of nfnetlink_log proc file
The printed values are all of type unsigned integer, therefore use
%u instead of %d. Otherwise an user can face negative values.
Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Weinberger [Sun, 12 Apr 2015 22:52:37 +0000 (00:52 +0200)]
netfilter: Fix portid types
The netlink portid is an unsigned integer, use this type
also in netfilter.
Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Support instantiating stateful expressions based on a template that
are associated with dynamically created set entries. The expressions
are evaluated when adding or updating the set element.
This allows to maintain per flow state using the existing set
infrastructure and expression types, with arbitrary definitions of
a flow.
Usage is currently restricted to anonymous sets, meaning only a single
binding can exist, since the desired semantics of multiple independant
bindings haven't been defined so far.
Examples (userspace syntax is still WIP):
1. Limit the rate of new SSH connections per host, similar to iptables
hashlimit:
flow ip saddr timeout 60s \
limit 10/second \
accept
2. Account network traffic between each set of /24 networks:
flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
counter
3. Account traffic to each host per user:
flow skuid . ip daddr \
counter
4. Account traffic for each combination of source address and TCP flags:
flow ip saddr . tcp flags \
counter
The resulting set content after a Xmas-scan look like this:
Patrick McHardy [Sat, 11 Apr 2015 09:46:41 +0000 (10:46 +0100)]
netfilter: nf_tables: add flag to indicate set contains expressions
Add a set flag to indicate that the set is used as a state table and
contains expressions for evaluation. This operation is mutually
exclusive with the mapping operation, so sets specifying both are
rejected. The lookup expression also rejects binding to state tables
since it only deals with loopup and map operations.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 11 Apr 2015 09:46:40 +0000 (10:46 +0100)]
netfilter: nf_tables: mark stateful expressions
Add a flag to mark stateful expressions.
This is used for dynamic expression instanstiation to limit the usable
expressions. Strictly speaking only the dynset expression can not be
used in order to avoid recursion, but since dynamically instantiating
non-stateful expressions will simply create an identical copy, which
behaves no differently than the original, this limits to expressions
where it actually makes sense to dynamically instantiate them.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 11 Apr 2015 09:46:39 +0000 (10:46 +0100)]
netfilter: nf_tables: prepare for expressions associated to set elements
Preparation to attach expressions to set elements: add a set extension
type to hold an expression and dump the expression information with the
set element.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tcp: fix bogus RTT for CC when retransmissions are acked
Since retransmitted segments are not used for RTT estimation, previously
SACKed segments present in the rtx queue are used. This estimation can be
several times larger than the actual RTT. When a cumulative ack covers both
previously SACKed and retransmitted segments, CC may thus get a bogus RTT.
Such segments previously had an RTT estimation in tcp_sacktag_one(), so it
seems reasonable to not reuse them in tcp_clean_rtx_queue() at all.
Afaik, this has had no effect on SRTT/RTO because of Karn's check.
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 10 Apr 2015 21:07:54 +0000 (23:07 +0200)]
net: use jump label patching for ingress qdisc in __netif_receive_skb_core
Even if we make use of classifier and actions from the egress
path, we're going into handle_ing() executing additional code
on a per-packet cost for ingress qdisc, just to realize that
nothing is attached on ingress.
Instead, this can just be blinded out as a no-op entirely with
the use of a static key. On input fast-path, we already make
use of static keys in various places, e.g. skb time stamping,
in RPS, etc. It makes sense to not waste time when we're assured
that no ingress qdisc is attached anywhere.
Enabling/disabling of that code path is being done via two
helpers, namely net_{inc,dec}_ingress_queue(), that are being
invoked under RTNL mutex when a ingress qdisc is being either
initialized or destructed.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Apr 2015 17:15:14 +0000 (13:15 -0400)]
Merge branch 'netdev_diet'
Thomas Graf says:
====================
Bring sizeof(net_device) down to < 2K bytes
The size of struct net_device crossed the 2K boundary a while ago which
is a waste in combination with many net namespaces. This series brings
the size of struct net_device down to well below 2K in total size with
a typical configuration. Some reserves a several holes leave room for
further expansion.
Patrick McHardy [Sat, 11 Apr 2015 01:27:39 +0000 (02:27 +0100)]
netfilter: nf_tables: variable sized set element keys / data
This patch changes sets to support variable sized set element keys / data
up to 64 bytes each by using variable sized set extensions. This allows
to use concatenations with bigger data items suchs as IPv6 addresses.
As a side effect, small keys/data now don't require the full 16 bytes
of struct nft_data anymore but just the space they need.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 11 Apr 2015 01:27:38 +0000 (02:27 +0100)]
netfilter: nf_tables: support variable sized data in nft_data_init()
Add a size argument to nft_data_init() and pass in the available space.
This will be used by the following patches to support variable sized
set element data.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 11 Apr 2015 01:27:37 +0000 (02:27 +0100)]
netfilter: nf_tables: switch registers to 32 bit addressing
Switch the nf_tables registers from 128 bit addressing to 32 bit
addressing to support so called concatenations, where multiple values
can be concatenated over multiple registers for O(1) exact matches of
multiple dimensions using sets.
The old register values are mapped to areas of 128 bits for compatibility.
When dumping register numbers, values are expressed using the old values
if they refer to the beginning of a 128 bit area for compatibility.
To support concatenations, register loads of less than a full 32 bit
value need to be padded. This mainly affects the payload and exthdr
expressions, which both unconditionally zero the last word before
copying the data.
Userspace fully passes the testsuite using both old and new register
addressing.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add helper functions to parse and dump register values in netlink attributes.
These helpers will later be changed to take care of translation between the
old 128 bit and the new 32 bit register numbers.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 11 Apr 2015 01:27:31 +0000 (02:27 +0100)]
netfilter: nf_tables: get rid of NFT_REG_VERDICT usage
Replace the array of registers passed to expressions by a struct nft_regs,
containing the verdict as a seperate member, which aliases to the
NFT_REG_VERDICT register.
This is needed to seperate the verdict from the data registers completely,
so their size can be changed.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change nft_validate_input_register() to not only validate the input
register number, but also the length of the load, and rename it to
nft_validate_register_load() to reflect that change.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
All users of nft_validate_register_store() first invoke
nft_validate_output_register(). There is in fact no use for using it
on its own, so simplify the code by folding the functionality into
nft_validate_register_store() and kill it.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The existing name is ambiguous, data is loaded as well when we read from
a register. Rename to nft_validate_register_store() for clarity and
consistency with the upcoming patch to introduce its counterpart,
nft_validate_register_load().
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 11 Apr 2015 01:27:26 +0000 (02:27 +0100)]
netfilter: nf_tables: validate len in nft_validate_data_load()
For values spanning multiple registers, we need to validate that enough
space is available from the destination register onwards. Add a len
argument to nft_validate_data_load() and consolidate the existing length
validations in preparation of that.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Mon, 13 Apr 2015 01:36:57 +0000 (21:36 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-04-11
This series contains updates to iflink, ixgbe and ixgbevf.
The entire set of changes come from Vlad Zolotarov to ultimately add
the ethtool ops to VF driver to allow querying the RSS indirection table
and RSS random key.
Currently we support only 82599 and x540 devices. On those devices, VFs
share the RSS redirection table and hash key with a PF. Letting the VF
query this information may introduce some security risks, therefore this
feature will be disabled by default.
The new netdev op allows a system administrator to change the default
behaviour with "ip link set" command. The relevant iproute2 patch has
already been sent and awaits for this series upstream.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Apr 2015 01:25:14 +0000 (21:25 -0400)]
Merge branch 'fou-next'
Cong Wang says:
====================
fou: some fixes and updates
Patch 1~3 fix some minor bugs in net/ipv4/fou.c, the only
thing I am not sure is if it's too late to change the
byte order of FOU_ATTR_PORT, if so we have to fix iproute2
instead of kernel.
Patch 4~5 add some new features to make it complete.
v2: make fou->port be16 too
====================
Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 10 Apr 2015 14:24:28 +0000 (16:24 +0200)]
selinux/nlmsg: add XFRM_MSG_MAPPING
This command is missing.
Fixes: 3a2dfbe8acb1 ("xfrm: Notify changes in UDP encapsulation via netlink") CC: Martin Willi <martin@strongswan.org> Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 10 Apr 2015 14:24:27 +0000 (16:24 +0200)]
selinux/nlmsg: add XFRM_MSG_MIGRATE
This command is missing.
Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE") Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 10 Apr 2015 14:24:26 +0000 (16:24 +0200)]
selinux/nlmsg: add XFRM_MSG_REPORT
This command is missing.
Fixes: 97a64b4577ae ("[XFRM]: Introduce XFRM_MSG_REPORT.") Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Apr 2015 13:07:18 +0000 (06:07 -0700)]
tcp: do not cache align timewait sockets
With recent adoption of skc_cookie in struct sock_common,
struct tcp_timewait_sock size increased from 192 to 200 bytes
on 64bit arches. SLAB rounds then to 256 bytes.
It is time to drop SLAB_HWCACHE_ALIGN constraint for twsk_slab.
This saves about 12 MB of memory on typical configuration reaching
262144 timewait sockets, and has no noticeable impact on performance.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Apr 2015 00:43:46 +0000 (20:43 -0400)]
Merge tag 'mac80211-next-for-davem-2015-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
There isn't much left, but we have
* new mac80211 internal software queue to allow drivers to have
shorter hardware queues and pull on-demand
* use rhashtable for mac80211 station table
* minstrel rate control debug improvements and some refactoring
* fix noisy message about TX power reduction
* fix continuous message printing and activity if CRDA doesn't respond
* fix VHT-related capabilities with "iw connect" or "iwconfig ..."
* fix Kconfig for cfg80211 wireless extensions compatibility
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wolfgang Steinwender [Fri, 10 Apr 2015 09:42:56 +0000 (11:42 +0200)]
net/macb: sqe_test_errors are TX errors, not RX errors
The statistics are grouped by TX and RX errors.
The SQE Test Errors Register indicates problems with TX.
Signed-off-by: Wolfgang Steinwender <wsteinwender@pcs.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>