David Woodhouse [Tue, 29 Jun 2021 12:20:34 +0000 (13:20 +0100)]
vhost: Avoid TX queue when writing directly is faster
Using vhost makes high-volume transfers go nice and fast, especially
we are using 100% of a CPU in the single-threaded OpenConnect process
and just offloading the kernel←→user copies for the tun packets to
the vhost thread instead of having to do them from our single thread
too.
However, for a lightly used link with *occasional* packets, which is
fairly much the definition of a VPN being used for VoIP, it adds a lot
of unwanted latency. If our userspace thread is otherwise going to be
*idle*, and fall back into select() to wait for something else to do,
then we might as well just write the packet *directly* to the tun
device.
So... when the queue is stopped and would need to be kicked, and if
there are only a *few* (heuristic: half max_qlen) packets on the
queue to be sent, just send them directly.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Wed, 16 Jun 2021 23:05:14 +0000 (00:05 +0100)]
Initial vhost-net support
We spend a lot of CPU time copying packets between kernel and userspace.
Eventually we want to implement a completely in-kernel data path. It
isn't even that hard, now that most of the functionality we need from
the kernel is there and it's mostly just a case of fitting it together.
In the meantime, though, there are a few things we can do even on today's
released kernels. For a start, we can use vhost-net to avoid having to
do the read()/write() on the tun device in our mainloop.
Ultimately, it ends up being done by a kernel thread instead; it doesn't
really go away. But that should at least give us a performance win which
would compare with a decent threading model, while allowing OpenConnect
to remain naïvely single-threaded and lock-free.
We have to carefully pick a configuration for vhost-net which actually
works, since it's fairly hosed for IFF_TUN support:
https://lore.kernel.org/netdev/2433592d2b26deec33336dd3e83acfd273b0cf30.camel@infradead.org/T/
But by limiting the sndbuf (which disables XDP, sadly) and by requesting
a virtio header that we don't actually want, we *can* make it work even
with today's production kernels.
Thanks to Eugenio Pérez Martín >eperezma@redhat.com> for his blog at
https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels
and for lots more help and guidance as I floundered around trying to make
this work.
Although this gives a 10% improvement on the bandwidth we can manage in
my testing (up to 2.75Gb/s with other tricks, on a c5.8xlarge Skylake VM)
it also introduces a small amount of extra latency, so disable it by
default unless the user has bumped the queue length to 16 or more, which
presumably means they choose bandwidth over latency.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Thu, 1 Jul 2021 16:03:13 +0000 (17:03 +0100)]
Clear epoll_fd after forking to background self
Otherwise we remove the events from the epoll_fd before we exit in
the parent process.
This would be a bit awful if it were something we require all users of
libopenconnect to know about, but it isn't. We make everything O_CLOEXEC
and we don't expect users to be calling openconnect_vpninfo_free() from
another process after forking, like the background code does. We only
do it there so that we can check for memory leaks (I think).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Thu, 1 Jul 2021 15:30:27 +0000 (16:30 +0100)]
Fix epoll support for connection pause/restart
We need to actually remove the file descriptors from the epoll set.
Otherwise we get -EEXIST when adding them again (in the case of the
cmd_fd as we re-enter the main loop).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Dimitri Papadopoulos [Tue, 29 Jun 2021 09:50:16 +0000 (11:50 +0200)]
Reorganize #include
- Reorder header files as suggested here:
https://stackoverflow.com/questions/2762568/c-c-include-header-file-order
https://softwareengineering.stackexchange.com/questions/325549/c-header-file-order
- Remove duplicates
- Remove unused headers files
- Change "config.h" to <config.h>
- Include <winsock2.h> before openconnect.h, which is not entirely self-contained.
Daniel Lenski [Fri, 25 Jun 2021 16:42:27 +0000 (09:42 -0700)]
With --user, enter username in all forms, not just the first
Until now, the -u/--user=USERNAME option has caused the OpenConnect CLI to
automatically fill the username only in the *first* form where there is a
match field. This patch causes it to fill the username repeatedly
(including when a form is repeated due to an authentication failure).
As described by @DimitriPapadopoulos in #267:
> In many cases, I expect the authentication failure to be caused by an
> incorrect password, not an incorrect username
>
> Having to [re]enter the username, when it has already been specified from
> the command line or worse in a config file, is an annoyance. I suggest
> openconnect [re]prompts for the username only when it has been entered
> interactively from the start.
Daniel Lenski [Thu, 24 Jun 2021 06:19:12 +0000 (23:19 -0700)]
Update documentation for the --authenticate option
Mention the CONNECT_URL and RESOLVE options, and how to use them to invoke
the connection phase in the maximally-robust way (which should work for all
protocols, and all possible proxies).
David Woodhouse [Mon, 28 Jun 2021 13:44:51 +0000 (14:44 +0100)]
Reuse packets
I see malloc/free showing up at ~5% of perf traces, and it's entirely
pointless when we could be reusing packets.
This trick isn't *perfect* and there's potential for a pathological
case where all the packets on the free_queue are too small to be
reused but we never get rid of them anyway — but rounding up to 2KiB
should mean that never happens in practice, and the alignment we get
from that rounding probably doesn't hurt performance anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Thu, 24 Jun 2021 15:54:00 +0000 (16:54 +0100)]
Stop polling cmd_fd while busy
We have an explicit select() call on the cmd_fd even when we're busy
shovelling packets and never hit the main select() in the mainloop.
This is *just* to ensure that we react to a cancel command quickly.
In the *common* case that we're running in openconnect(8), there's no
need for that since the *only* thing that will write to the cmd_fd is
openconnect itself, and *that* can set a flag in memory to tell us to
look.
So implement that optimisation — don't check it each time around the
mainloop unless the vpninfo->need_poll_cmd_fd flag is set. That flag
is set whenever we have written to cmd_fd and there's something to be
read. And cleared by poll_cmd_fd() when it runs and finds nothing there.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Daniel Lenski [Fri, 28 May 2021 19:42:57 +0000 (12:42 -0700)]
Assume that a 'portal-*cookie' will allow us to bypass gateway SAML
For many GlobalProtect VPNs with SAML, the 'portal-userauthcookie' appears
to be *the* mechanism by which gateway authentication can be bypassed once
portal authentication is complete.
Unfortunately, there are exceptions which will require a more complex
resolution involved a re-entrant SAML flow
(https://gitlab.com/openconnect/openconnect/-/issues/147#note_587163143),
but this patch will at least not make them worse.
This can work in many cases…
- When the user's password is only usable one time (already working as of 008aefd7),
- When the portal requires SAML but the gateway doesn't (already working in 008aefd7),
- When the gateway requires SAML even though the portal doesn't (fixed here)
Additionally, this patch adds tests (tests/{fake-gp-server.py,gp-auth-and-config}) of
OpenConnect's ability to complete the following SAML flows:
- (SAML to portal after acquiring prelogin-cookie externally) → (complete gateway login
using portal-userauthcookie)
- (SAML to gateway after acquiring prelogin-cookie externally)
Daniel Lenski [Thu, 24 Jun 2021 06:16:16 +0000 (23:16 -0700)]
More complete comment about issues with proxies in connection phase
The comment for openconnect_get_connect_url (added in
https://gitlab.com/openconnect/openconnect/-/commit/ec6c0caed28ebf4f60984a49ce3122196f9c87fa)
should mention the possibility that a proxy requires the correct hostname at
the TLS layer (via Server Name Indication, SNI) as well at the HTTP layer
(via 'Host' header), in order to correctly forward it to the VPN server.
See https://gitlab.gnome.org/GNOME/NetworkManager-openconnect/-/issues/46
for a case where the 'Host' header was apparently required.
Daniel Lenski [Thu, 17 Jun 2021 20:23:18 +0000 (13:23 -0700)]
Fortinet requires us to check for an HTTP error response only over TLS
If the Fortinet PPP connection request *succeeds* over TLS, there is no HTTP
response before we start exchanging PPP packets. If it *fails*, there is an
HTTP response.
If the Fortinet PPP connection request is over DTLS, a 'svrhello' response
is expected regardless of whether it succeeded or failed. This is handled
by fortinet_dtls_catch_svrhello()
Let's only check for that HTTP response in Fortinet if we're definitely
connecting over TLS. The "proceeding to tunnel stage" test in
'fortinet-auth-config-tests' verifies the correctness of the HTTP response
parsing behavior.
Daniel Lenski [Thu, 17 Jun 2021 20:14:32 +0000 (13:14 -0700)]
PPP: Replace no_terminate_on_pause flag with terminate_on_pause flag
We know of two real-world PPP-based VPNs that require us *not* to TERMINATE
at the PPP layer if we will want to subsequently resume a connection. We
don't know of any real-world PPP-based VPNs that *do* want us to TERMINATE
at the PPP layer in this case; in fact, any server that *requires* this
would be unable to resume inadvertently dropped connections.
Replace the no_terminate_on_pause flag with a terminate_on_pause flag, in
order to reduce boilerplate and use the more plausible behavior as the
default.
Daniel Lenski [Thu, 17 Jun 2021 17:25:15 +0000 (10:25 -0700)]
Follow disable_ipv6 for Pulse and Fortinet
As with other protocols (AnyConnect, F5, GP), the behavior of 'disable_ipv6'
for these protocols is relatively "shallow": if set, it will cause
OpenConnect to ignore any IPv6 address or netmask sent by the server, but
will *not* ignore IPv6 split-{in,ex}cludes or IPv6 addresses of DNS servers.
More thorough IPv6-ignoring could be handled by the vpnc-script, or cleaned
up as part of a future change to simplify IP configuration and routing
across protocols.
(The lack of support for --disable-ipv6 in Pulse was noted in
https://gitlab.com/openconnect/openconnect/-/issues/254.)