Hi,
we're currently having that issue that a MultiWAN failover doesn't affect Wireguard VPN tunnels in multiple scenarios. However, we encountered this issue on a RUT950, we believe that it could be also existing on other devices.
After some investigation, we've determined that the reason for this behavior is a static route to the VPN host. This route is injected by the Wireguard protocol script management script (`/lib/netifd/proto/wireguard.sh`).
Before the failover, the routing table looks like this:
default via $WAN_GATEWAY dev eth1
$VPN_HOST via $WAN_GATEWAY dev eth1 proto static
Whether the route becomes an issue during failover depends on how exactly the failover is handled by the router.
Initially, we've tried to force a failover from main (wired) to backup (mobile) by severing the Internet connection. In this case, the route for the VPN host becomes an issue:
default via $WWAN_GATEWAY dev wwan0
default via $WAN_GATEWAY dev eth1 metric 10
$VPN_HOST via $WAN_GATEWAY dev eth1 proto static
Although, the VPN host is reachable using the new default gateway on the mobile WAN, the route still forces the traffic over the "failed" connection.
However, if we try to force a failover by simply disconnecting the main WAN network cable, the main WAN (i.e. eth1) interface goes down which leads to a correct adjustment of the routing table:
default via $WWAN_GATEWAY dev wwan0
$VPN_HOST via $WWAN_GATEWAY dev wwan0 proto static metric 10
Quite similarly, the route can also become an issue when the network connection on the main WAN is restored, e.g. when we've simply replugged the main WAN network cable. This leads to the following change in the routing table:
default via $WAN_GATEWAY dev eth1
default via $WWAN_GATEWAY dev wwan0 metric 10
$VPN_HOST via $WWAN_GATEWAY dev wwan0 proto static metric 10
Unfortunately, the VPN traffic is still routed using the backup WAN even after the main WAN is restored. In our case, this can lead to quite annoying behavior, since the VPN connection silently continues to consume mobile traffic.
There are a couple of possible alternatives to fix or mitigate this issue.
In general, it seems unnecessary that a static route for the gateway needs to be injected regardless of whether the VPN host is actually included in the VPN address range. Such a route isn't injected by default - to the best of our knowledge - when using the Wireguard desktop clients. We've also seen that in more recent versions of OpenWRT an option "nohostroute" is added (see [1]) which allows to control this behavior.
In our case, the VPN address range doesn't include the host, therefore, we've mitigated this issue quite straightforward by adding a script to the router firmware which is triggered upon a MultiWAN event. This script reuses the code snippet from the original protocol management script to remove all routes regarding the VPN host. Its contents are shown below:
#!/bin/sh
function clear_wg_hostroute_for_config() {
local config="$1"
wg show "${config}" endpoints | \
sed -E 's/\[?([0-9.:a-f]+)\]?:([0-9]+)/\1 \2/' | \
while IFS=$'\t ' read -r key address port; do
[ -n "${port}" ] || continue
ip route del "${address}" || true
done
}
clear_wg_hostroute_for_config "$1"
However, this mitigation isn't sufficient in cases where the static route is actually required (although it could be likely adapted). Still, a fix in the upstream firmware seems like a better solution.
Additionally, we've come across an another (potential) issue while experimenting with the Wireguard protocol management script. If the management script is used to setup a Wireguard tunnel (using proto_wireguard_setup()) and the routing table looks like this:
default via $WAN_GATEWAY dev eth1
default via $WWAN_GATEWAY dev wwan0 metric 10
a route is injected that invalidly references the gateway with the lower priority, e.g. like this:
$VPN_HOST via $WWAN_GATEWAY dev wwan0 proto static metric 10
We believe that this could be an instance of an already known OpenWRT bug that is still present in the Teletonika firmware fork. More details can be found in [2].
Best,
Curd
[1] https://github.com/openwrt/openwrt/blob/6f96a4d043a9367c6c0d166299d808df764e88e6/package/network/utils/wireguard-tools/files/wireguard.sh#L172
[2] https://bugs.openwrt.org/index.php?do=details&task_id=1358