Fix a really subtle bug causing 100% CPU utilization after ESP tunnel failure, and subsequent reconnect
(This was reported at https://github.com/dlenski/openconnect/issues/76.)
Here's what was happening:
1. GlobalProtect connect, start ESP
-> dtls_state = DTLS_CONNECTED, dtls_fd is read-monitored
2. ESP tunnel fails and GP switches to HTTPS (due to network outage, dead peer?),
-> dtls_state = DTLS_NOSECRET, dtls_fd is still read-monitored (!!!)
3. Tunnel restarts (due to rekey or pause-and-reconnect signal, USR2) and
/ssl-vpn/getconfig.esp is repulled, including new ESP keys.
-> dtls_state = DTLS_SECRET, dtls_fd is still read-monitored (!!!)
4. ESP probes are sent out *once* in esp_setup(), but dtls_fd != -1, so the
dtls_state is *not* upgraded to DTLS_SLEEPING.
-> dtls_state = DTLS_SECRET, dtls_fd is still read-monitored (!!!)
As a result of the probes being sent out, ESP packets will subsequently arrive
and select() call in openconnect_mainloop() will wake up… but
udp_mainloop() will never be called to service it because…
if (vpninfo->dtls_state > DTLS_DISABLED) {
...
ret = vpninfo->proto->udp_mainloop(vpninfo, &timeout);
}
This patch fixes that by not just setting dtls_state = DTLS_SECRET when the
HTTPS tunnel connects, but actually calling esp_close_secret (which closes
dtls_fd, unmonitors it, and sets it to -1).
Signed-off-by: Daniel Lenski <dlenski@gmail.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>