David Woodhouse [Tue, 29 Jun 2021 12:20:34 +0000 (13:20 +0100)]
vhost: Avoid TX queue when writing directly is faster
Using vhost makes high-volume transfers go nice and fast, especially
we are using 100% of a CPU in the single-threaded OpenConnect process
and just offloading the kernel←→user copies for the tun packets to
the vhost thread instead of having to do them from our single thread
too.
However, for a lightly used link with *occasional* packets, which is
fairly much the definition of a VPN being used for VoIP, it adds a lot
of unwanted latency. If our userspace thread is otherwise going to be
*idle*, and fall back into select() to wait for something else to do,
then we might as well just write the packet *directly* to the tun
device.
So... when the queue is stopped and would need to be kicked, and if
there are only a *few* (heuristic: half max_qlen) packets on the
queue to be sent, just send them directly.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Wed, 16 Jun 2021 23:05:14 +0000 (00:05 +0100)]
Initial vhost-net support
We spend a lot of CPU time copying packets between kernel and userspace.
Eventually we want to implement a completely in-kernel data path. It
isn't even that hard, now that most of the functionality we need from
the kernel is there and it's mostly just a case of fitting it together.
In the meantime, though, there are a few things we can do even on today's
released kernels. For a start, we can use vhost-net to avoid having to
do the read()/write() on the tun device in our mainloop.
Ultimately, it ends up being done by a kernel thread instead; it doesn't
really go away. But that should at least give us a performance win which
would compare with a decent threading model, while allowing OpenConnect
to remain naïvely single-threaded and lock-free.
We have to carefully pick a configuration for vhost-net which actually
works, since it's fairly hosed for IFF_TUN support:
https://lore.kernel.org/netdev/2433592d2b26deec33336dd3e83acfd273b0cf30.camel@infradead.org/T/
But by limiting the sndbuf (which disables XDP, sadly) and by requesting
a virtio header that we don't actually want, we *can* make it work even
with today's production kernels.
Thanks to Eugenio Pérez Martín >eperezma@redhat.com> for his blog at
https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels
and for lots more help and guidance as I floundered around trying to make
this work.
Although this gives a 10% improvement on the bandwidth we can manage in
my testing (up to 2.75Gb/s with other tricks, on a c5.8xlarge Skylake VM)
it also introduces a small amount of extra latency, so disable it by
default unless the user has bumped the queue length to 16 or more, which
presumably means they choose bandwidth over latency.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Thu, 1 Jul 2021 16:03:13 +0000 (17:03 +0100)]
Clear epoll_fd after forking to background self
Otherwise we remove the events from the epoll_fd before we exit in
the parent process.
This would be a bit awful if it were something we require all users of
libopenconnect to know about, but it isn't. We make everything O_CLOEXEC
and we don't expect users to be calling openconnect_vpninfo_free() from
another process after forking, like the background code does. We only
do it there so that we can check for memory leaks (I think).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Thu, 1 Jul 2021 15:30:27 +0000 (16:30 +0100)]
Fix epoll support for connection pause/restart
We need to actually remove the file descriptors from the epoll set.
Otherwise we get -EEXIST when adding them again (in the case of the
cmd_fd as we re-enter the main loop).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Dimitri Papadopoulos [Tue, 29 Jun 2021 09:50:16 +0000 (11:50 +0200)]
Reorganize #include
- Reorder header files as suggested here:
https://stackoverflow.com/questions/2762568/c-c-include-header-file-order
https://softwareengineering.stackexchange.com/questions/325549/c-header-file-order
- Remove duplicates
- Remove unused headers files
- Change "config.h" to <config.h>
- Include <winsock2.h> before openconnect.h, which is not entirely self-contained.
Daniel Lenski [Fri, 25 Jun 2021 16:42:27 +0000 (09:42 -0700)]
With --user, enter username in all forms, not just the first
Until now, the -u/--user=USERNAME option has caused the OpenConnect CLI to
automatically fill the username only in the *first* form where there is a
match field. This patch causes it to fill the username repeatedly
(including when a form is repeated due to an authentication failure).
As described by @DimitriPapadopoulos in #267:
> In many cases, I expect the authentication failure to be caused by an
> incorrect password, not an incorrect username
>
> Having to [re]enter the username, when it has already been specified from
> the command line or worse in a config file, is an annoyance. I suggest
> openconnect [re]prompts for the username only when it has been entered
> interactively from the start.
Daniel Lenski [Thu, 24 Jun 2021 06:19:12 +0000 (23:19 -0700)]
Update documentation for the --authenticate option
Mention the CONNECT_URL and RESOLVE options, and how to use them to invoke
the connection phase in the maximally-robust way (which should work for all
protocols, and all possible proxies).
David Woodhouse [Mon, 28 Jun 2021 13:44:51 +0000 (14:44 +0100)]
Reuse packets
I see malloc/free showing up at ~5% of perf traces, and it's entirely
pointless when we could be reusing packets.
This trick isn't *perfect* and there's potential for a pathological
case where all the packets on the free_queue are too small to be
reused but we never get rid of them anyway — but rounding up to 2KiB
should mean that never happens in practice, and the alignment we get
from that rounding probably doesn't hurt performance anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Thu, 24 Jun 2021 15:54:00 +0000 (16:54 +0100)]
Stop polling cmd_fd while busy
We have an explicit select() call on the cmd_fd even when we're busy
shovelling packets and never hit the main select() in the mainloop.
This is *just* to ensure that we react to a cancel command quickly.
In the *common* case that we're running in openconnect(8), there's no
need for that since the *only* thing that will write to the cmd_fd is
openconnect itself, and *that* can set a flag in memory to tell us to
look.
So implement that optimisation — don't check it each time around the
mainloop unless the vpninfo->need_poll_cmd_fd flag is set. That flag
is set whenever we have written to cmd_fd and there's something to be
read. And cleared by poll_cmd_fd() when it runs and finds nothing there.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Daniel Lenski [Fri, 28 May 2021 19:42:57 +0000 (12:42 -0700)]
Assume that a 'portal-*cookie' will allow us to bypass gateway SAML
For many GlobalProtect VPNs with SAML, the 'portal-userauthcookie' appears
to be *the* mechanism by which gateway authentication can be bypassed once
portal authentication is complete.
Unfortunately, there are exceptions which will require a more complex
resolution involved a re-entrant SAML flow
(https://gitlab.com/openconnect/openconnect/-/issues/147#note_587163143),
but this patch will at least not make them worse.
This can work in many cases…
- When the user's password is only usable one time (already working as of 008aefd7),
- When the portal requires SAML but the gateway doesn't (already working in 008aefd7),
- When the gateway requires SAML even though the portal doesn't (fixed here)
Additionally, this patch adds tests (tests/{fake-gp-server.py,gp-auth-and-config}) of
OpenConnect's ability to complete the following SAML flows:
- (SAML to portal after acquiring prelogin-cookie externally) → (complete gateway login
using portal-userauthcookie)
- (SAML to gateway after acquiring prelogin-cookie externally)
Daniel Lenski [Thu, 24 Jun 2021 06:16:16 +0000 (23:16 -0700)]
More complete comment about issues with proxies in connection phase
The comment for openconnect_get_connect_url (added in
https://gitlab.com/openconnect/openconnect/-/commit/ec6c0caed28ebf4f60984a49ce3122196f9c87fa)
should mention the possibility that a proxy requires the correct hostname at
the TLS layer (via Server Name Indication, SNI) as well at the HTTP layer
(via 'Host' header), in order to correctly forward it to the VPN server.
See https://gitlab.gnome.org/GNOME/NetworkManager-openconnect/-/issues/46
for a case where the 'Host' header was apparently required.
Daniel Lenski [Thu, 17 Jun 2021 20:23:18 +0000 (13:23 -0700)]
Fortinet requires us to check for an HTTP error response only over TLS
If the Fortinet PPP connection request *succeeds* over TLS, there is no HTTP
response before we start exchanging PPP packets. If it *fails*, there is an
HTTP response.
If the Fortinet PPP connection request is over DTLS, a 'svrhello' response
is expected regardless of whether it succeeded or failed. This is handled
by fortinet_dtls_catch_svrhello()
Let's only check for that HTTP response in Fortinet if we're definitely
connecting over TLS. The "proceeding to tunnel stage" test in
'fortinet-auth-config-tests' verifies the correctness of the HTTP response
parsing behavior.
Daniel Lenski [Thu, 17 Jun 2021 20:14:32 +0000 (13:14 -0700)]
PPP: Replace no_terminate_on_pause flag with terminate_on_pause flag
We know of two real-world PPP-based VPNs that require us *not* to TERMINATE
at the PPP layer if we will want to subsequently resume a connection. We
don't know of any real-world PPP-based VPNs that *do* want us to TERMINATE
at the PPP layer in this case; in fact, any server that *requires* this
would be unable to resume inadvertently dropped connections.
Replace the no_terminate_on_pause flag with a terminate_on_pause flag, in
order to reduce boilerplate and use the more plausible behavior as the
default.
Daniel Lenski [Thu, 17 Jun 2021 17:25:15 +0000 (10:25 -0700)]
Follow disable_ipv6 for Pulse and Fortinet
As with other protocols (AnyConnect, F5, GP), the behavior of 'disable_ipv6'
for these protocols is relatively "shallow": if set, it will cause
OpenConnect to ignore any IPv6 address or netmask sent by the server, but
will *not* ignore IPv6 split-{in,ex}cludes or IPv6 addresses of DNS servers.
More thorough IPv6-ignoring could be handled by the vpnc-script, or cleaned
up as part of a future change to simplify IP configuration and routing
across protocols.
(The lack of support for --disable-ipv6 in Pulse was noted in
https://gitlab.com/openconnect/openconnect/-/issues/254.)
Daniel Lenski [Thu, 20 May 2021 01:35:55 +0000 (18:35 -0700)]
Add tests of using portal-userauthcookie to continue through gateway
This test sets up fake-gp-server.py to require a one-time password on the
portal *and* on the gateway, and to set 'portal-userauthcookie' after
successful login to the gateway. For OpenConnect to successfully login
within prompting for a second password, the 'portal-userauthcookie' value
from the portal must be forwarded to the gateway login request.
Kevin Yue [Wed, 13 May 2020 06:35:42 +0000 (14:35 +0800)]
Pass the `portal-*cookie` values received in the portal config to the gateway login
These "cookies" appear to be the mechanism by which GlobalProtect clients
can login to the portal and then automatically login to gateway *even if*
the credentials used on the portal are not reusable:
1. Because the credentials used on the portal include a one-time password.
2. Because the credentials used on the portal resulted from SAML login.
(ctx->alt_secret, which leads to a SAML nonce value that can only be
used once).
The logs provided by users (see
https://gitlab.com/openconnect/openconnect/-/issues/147#note_578888250 and
https://gitlab.com/openconnect/openconnect/-/issues/147#note_580406042)
allowed me to answer one of the key unanswered questions (see
https://gitlab.com/openconnect/openconnect/-/merge_requests/109#note_341959833):
> If we do have a `portal_userauthcookie` and/or
> `portal_prelogonuserauthcookie`, should we omit the password from form
> submitted to the gateway? Or do we have to leave it in?
The answer is that it doesn't appear to matter: real servers appear to
ignore the `passwd` field if the `portal-*cookie` field is correctly set.
Signed-off-by: Kevin Yue <yuezk001@gmail.com> Signed-off-by: Daniel Lenski <dlenski@gmail.com>
David Woodhouse [Tue, 15 Jun 2021 13:33:35 +0000 (14:33 +0100)]
Add +SIGN-ALL to GnuTLS DTLS ciphersuite configs
At least for AES256-SHA et al in DTLSv1.2, we needed to explicitly add
+SIGN-RSA-SHA1. Half the ciphersuites already had +SIGN-ALL anyway, so
make them consistent.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Tue, 15 Jun 2021 12:02:49 +0000 (13:02 +0100)]
Support non-AEAD ciphersuites in DTLSv1.2 with GnuTLS
We have encountered a Cisco server in the wild which appears only to
support the legacy ciphersuites. And since we offer a set of DTLSv1.2
ciphers it doesn't fall back to accepting the DTLSv1.0 offer; we end
up with no DTLS at all.
This should fix #249.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Daniel Lenski [Sat, 29 May 2021 17:39:57 +0000 (10:39 -0700)]
Mark obsolete-server-crypto test as XFAIL in Fedora/GnuTLS/* CI
The system-wide minimum crypto policy on Fedora prevents us from enabling
3DES and RC4 ciphers via GnuTLS priority strings. We have unconditionally
disabled it in OpenConnect for now in commit 7e862f2f03 but the
obsolete-server-crypto test is *still* failing, with ocserv reporting
'GnuTLS error (at worker-vpn.c:861): No supported cipher suites have
been found.'
Just mark obsolete-server-crypto test as XFAIL for these builds. It's
the most accurate description of the state of those tests: these
environments do not provide OpenConnect with the capabilities to
reliably enable obsolete/insecure crypto algorithms in a self-contained
way.
See https://bugzilla.redhat.com/show_bug.cgi?id=1960763 for ongoing
discussions about how to come up with a more reliable, testable, and
maintainable mechanism for OpenConnect to enable these algorithms without
compromising the system-wide minimum crypto policy.
Signed-off-by: Daniel Lenski <dlenski@gmail.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
David Woodhouse [Sat, 12 Jun 2021 07:50:09 +0000 (08:50 +0100)]
Unconditionally bypass system crypto policy
This makes me extremely sad, but they rolled it out with *no* way to
selectively allow the user to say "connect anyway", as we've always had
for "invalid" certificates, etc.
It's just unworkable and incomplete as currently implemented in the
distributions, so we have no choice except to bypass it and wait for
it to be fixed.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>