FOR TIPS, gUIDES & TUTORIALS

subscribe to our Youtube

GO TO YOUTUBE

14455 questions

17168 answers

28195 comments

0 members

We are migrating to our new platform at https://community.teltonika.lt. Moving forward, you can continue discussions on this new platform. This current platform will be temporarily maintained for reference purposes.
+4 votes
2,214 views 15 comments
by anonymous

Hi,

we're currently having that issue that a MultiWAN failover doesn't affect Wireguard VPN tunnels in multiple scenarios. However, we encountered this issue on a RUT950, we believe that it could be also existing on other devices.

After some investigation, we've determined that the reason for this behavior is a static route to the VPN host. This route is injected by the Wireguard protocol script management script (`/lib/netifd/proto/wireguard.sh`).

Before the failover, the routing table looks like this:

default via $WAN_GATEWAY dev eth1

$VPN_HOST via $WAN_GATEWAY dev eth1  proto static

Whether the route becomes an issue during failover depends on how exactly the failover is handled by the router. 

Initially, we've tried to force a failover from main (wired) to backup (mobile) by severing the Internet connection. In this case, the route for the VPN host becomes an issue:

default via $WWAN_GATEWAY dev wwan0

default via $WAN_GATEWAY dev eth1  metric 10

$VPN_HOST via $WAN_GATEWAY dev eth1  proto static

Although, the VPN host is reachable using the new default gateway on the mobile WAN, the route still forces the traffic over the "failed" connection.

However, if we try to force a failover by simply disconnecting the main WAN network cable, the main WAN (i.e. eth1) interface goes down which leads to a correct adjustment of the routing table:

default via $WWAN_GATEWAY dev wwan0

$VPN_HOST via $WWAN_GATEWAY dev wwan0  proto static  metric 10

Quite similarly, the route can also become an issue when the network connection on the main WAN is restored, e.g. when we've simply replugged the main WAN network cable. This leads to the following change in the routing table:

default via $WAN_GATEWAY dev eth1

default via $WWAN_GATEWAY dev wwan0  metric 10

$VPN_HOST via $WWAN_GATEWAY dev wwan0  proto static  metric 10

Unfortunately, the VPN traffic is still routed using the backup WAN even after the main WAN is restored. In our case, this can lead to quite annoying behavior, since the VPN connection silently continues to consume mobile traffic.

There are a couple of possible alternatives to fix or mitigate this issue. 

In general, it seems unnecessary that a static route for the gateway needs to be injected regardless of whether the VPN host is actually included in the VPN address range. Such a route isn't injected by default - to the best of our knowledge - when using the Wireguard desktop clients. We've also seen that in more recent versions of OpenWRT an option "nohostroute" is added (see [1]) which allows to control this behavior.

In our case, the VPN address range doesn't include the host, therefore, we've mitigated this issue quite straightforward by adding a script to the router firmware which is triggered upon a MultiWAN event. This script reuses the code snippet from the original protocol management script to remove all routes regarding the VPN host. Its contents are shown below:

#!/bin/sh

function clear_wg_hostroute_for_config() {

  local config="$1"

  wg show "${config}" endpoints | \

    sed -E 's/\[?([0-9.:a-f]+)\]?:([0-9]+)/\1 \2/' | \

    while IFS=$'\t ' read -r key address port; do

    [ -n "${port}" ] || continue

    ip route del "${address}" || true

  done

}

clear_wg_hostroute_for_config "$1"

However, this mitigation isn't sufficient in cases where the static route is actually required (although it could be likely adapted). Still, a fix in the upstream firmware seems like a better solution.

Additionally, we've come across an another (potential) issue while experimenting with the Wireguard protocol management script. If the management script is used to setup a Wireguard tunnel (using proto_wireguard_setup()) and the routing table looks like this:

default via $WAN_GATEWAY dev eth1

default via $WWAN_GATEWAY dev wwan0  metric 10

a route is injected that invalidly references the gateway with the lower priority, e.g. like this:

$VPN_HOST via $WWAN_GATEWAY dev wwan0  proto static  metric 10

We believe that this could be an instance of an already known OpenWRT bug that is still present in the Teletonika firmware fork. More details can be found in [2].

Best,

Curd

[1] https://github.com/openwrt/openwrt/blob/6f96a4d043a9367c6c0d166299d808df764e88e6/package/network/utils/wireguard-tools/files/wireguard.sh#L172

[2] https://bugs.openwrt.org/index.php?do=details&task_id=1358

by anonymous
RUT955 with FW ver.: RUT9XX_R_00.06.07.7 here, also suffering from the same failover issue.
Wireguard package version is 0.0.20191219.
by anonymous
Same here with RUTX11. The wireguard sticks on some wan but do not flush the connection if this wan connection is lost and failover rules are active.

We really need a fix on that.
by anonymous
Try to use the workaround that I’ve shared in my question. It has been working pretty good with our routers in production so far

2 Answers

0 votes
by anonymous
Hello,

What device you used in these reproducible steps and what was the firmware installed on it?

EB.
by anonymous
Hi,

thanks for your quick reply.

The device is a RUT950. The firmware is RUT9XX_R_00.06.07.5 and the version of the Wireguard package is 0.0.20191219. Should both be the latest versions available.

Best,

Curd
by anonymous
Thank you, I've reported this issue to RnD and will come back to you as soon as I get any information about this.

EB.
0 votes
by anonymous
can you tell me where you placed the script and how you triggered on the mmwan event?
by anonymous
Sure :)

We've placed the script simply in /etc/clear_wg_hostroute_for_config.sh and made it executable. The first parameter to the script is the name of the Wireguard interface. The script is intended to get triggered as custom command for all WAN interfaces upon a MultiWAN failover.

You can set the script in the WebGUI as follows: Network -> WAN -> Edit -> Advanced Settings -> Check "Execute Command" -> Enter the full path to the script including the interface name as first parameter -> Save. Repeat this for all WAN interfaces which could be relevant for failover.
by anonymous

Hi,

very appreciated! :) 

 But it seems the RUTX11 with FW RUTX_R_00.02.06.1 has some other settings. At the interfaces -> Advanced Settings i don't have any check "Execute Command" with Advanced Menu view. I also can't find any other place :(

by anonymous

Oh, right. Yep, I forgot that we’re not using the same model :D The web interface really looks completely different.

But both firmwares seem to be based on OpenWRT, so you should be able to set the settings using the terminal iff this firmware also supports command execution on failover. See also here for reference https://openwrt.org/docs/guide-user/network/wan/multiwan/multiwan_package. Unfortunately, the link also doesn't list the command option that we're going to use, but it's definitively there for RUT950, so I guess it's worth a try.

You can execute the following commands on the router's SSH terminal:

uci show multiwan # show all multiwan settings
uci set multiwan.wan.command='/etc/clear_wg_hostroute_for_config.sh "YOUR_VPN_INTERFACE"' # set command
uci set multiwan.wan2.command='/etc/clear_wg_hostroute_for_config.sh "YOUR_VPN_INTERFACE"' # set command
# possibly also for wan3 if it's there and you're using it (check also the first command)
uci changes # show and verify changes
uci commit # commit changes
uci show multiwan # just to verify

Afterwards, you can try to create a failover scenario and check whether the route disappears as described in my first post. You can also monitor the router's syslog using logread -f (assuming that you can maintain the SSH session during your test). The script's execution should show up as a message in the log as result of the multiwan failover.

I hope it works for you :) 

by anonymous
Hi, many thanks for your very detailed information.

The RUTX11 OS is very different than for the RUT950. In my case https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3 is used for multiwan. And the OpenWRT Version is 19.07.04, i assume yours is much older. The package multiwan is obsolet and unmaintained since years.

So as i see for now i can't adapt your workaround in general :(
by anonymous
Okay, then it's based on a newer version of OpenWRT. That's unfortunate in this case (and for us RUT9xx users xD), but in general quite cool :)

Hm, maybe you can try to adapt a similar approach using mwan3 and hotplug-events. mwan3 seems to send events that execute scripts from hotplug.d in case of a failover (see [1] and [2]). Maybe you can try to adapt the solution by placing the script in hotplug.d or calling it from a script in hotplug.d (see [3]) such that it gets triggered in case of a WAN ifdown/ifup event?

Unfortunately, I can't test that, because I don't have any OpenWRT device that's capable of running a newer version :D

[1] https://github.com/openwrt/packages/issues/4882#issuecomment-334008422
[2] https://github.com/openwrt/packages/blob/master/net/mwan3/files/lib/mwan3/mwan3.sh#L934
[3] https://openwrt.org/docs/guide-user/base-system/hotplug
by anonymous

Ok, it seems i found an easy solution. 

There is the /etc/mwan3.user where you are able to use some own commands on mwan events:

https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3#environment_variables

So i used your script:

if [ "${ACTION}" = "ifdown" ] || [ "${ACTION}" = "ifup" ] ; then

 # Only on either an ifdown or ifup event for any interface

 if [ "${INTERFACE}" != "loopback" ] && [ "${INTERFACE}" != "self" ] ; then

 # Exclude events for interfaces loopback and self

function clear_wg_hostroute_for_config() {

  local config="$1"

  wg show "${config}" endpoints | \

    sed -E 's/\[?([0-9.:a-f]+)\]?:([0-9]+)/\1 \2/' | \

    while IFS=$'\t ' read -r key address port; do

    [ -n "${port}" ] || continue

    ip route del "${address}" || true

  done

}

clear_wg_hostroute_for_config "$1"

 fi

fi

And it seems to work! It was much more easy than expected. 

Thanks for your input in this :) 

by anonymous
The only point i've realize now is, that in case of an ifup event the wg tunnel still sticks on the previous interface. Only the ifdown part is working very good. But it can happen now that the wg tunnel is still active on some mobile connection while you have a wwan or wan connection available.
by anonymous
Okay, cool that you've also found a solution for your use case :) We're also going to use the TRB140 for some setups soon, so maybe I can also take then a look at the newer OpenWRT base firmware :D

Hm, hm. What's the exact issue with the ifup-part then? The host route should be gone in any case, right? Is the traffic then still routed over the mobile connection as a default route? I don't see why that should happen otherwise.
by anonymous
I just included it in the /etc/mwan3.user file as pasted. It is a part of the mwan3 package and will be triggered. Just follow the link to see some examples.

The file also needs to get the execute bit with chmod +x

The ifup-part of the java code placed in /etc/mwan3 seems to be not triggered. But i'm not sure yet, it need more investigation i think.
by anonymous

That's the part which is not working.

It is triggered with ifdown but not with ifup... :( 

if [ "${ACTION}" = "ifdown" ] || [ "${ACTION}" = "ifup" ] ; then