11384 questions

13563 answers

21250 comments

31888 members

0 votes
100 views 3 comments
by

Hello friends!

I have trouble with failover performance on my RUT955.

I have Starlink on WAN set to failover to Mobile WAN. (I will show screenshots of failover settings).

Starlink seems to have a VERY brief interruption every two hours - perhaps to do with satellite movements.

according to the logread output, the wan connectivity is detected as lost at 9:58:06, and reconnected again within 2s, at 9:58:08, however the ifdown event starts after 1s at 9:58:07.

 Now at 9:58:10 it says WAN is now up

then at 9:58:14  it says "Execute disconnected event on interface wan (eth1)"

and at 9:58:16 kern.info WAN (wan) is down, switching to backup WAN (mob1s1a1)

What I'm seeing here is a very brief interruption, but the mechanism of failing over is initiated, and doesn't stop, even though the WAN connectivity comes straight back. To fix this I have tried increasing the time interval (from 10s to 30s) between ping tests for failover and also changing the thresholds for deciding when the WAN is up or down (this was set at 50 and didn't help), but I still get the failovers, regularly, and they interrupt my connections.

 Any ideas?

( I also am aware I don't understand the meaning/application of flush connections, or of failover policies, so perhaps my answer lies here but I will need help to understand this).

I should also say that before the RUT955, I used a RUT240 with very old firmware (6.2 ish?) and it didn't have this problem - the connections were maintained perfectly.

Logread   output attached

EDIT - sorry added logread file

1 Answer

0 votes
by
Hello Samgs,

From the failover configurations, the router is sending the pings to 3 IPs 1.1.1.1, 1.0.0.1, 8.8.8.8.

At a count of 1: Meaning at least 1 ping should be sent to the above IPs.

UP: 1; at least 1 ping should be successful for the device to be considered online.

I feel like the latency with the ping can not complete the initial ping and consider the device to be offline. Again this is an educational guess, there could be many factors affecting this, so we can try testing and put the UP to maybe be 3 or 5. By default, the value is 3.

Try the above-mentioned settings, if still not working share the troubleshoot file from WebUI > System > Administration > Troubleshoot.

For more information on the failover configuration parameters follow the link:

https://wiki.teltonika-networks.com/view/RUT955_Failover

Regards,

Shivang
by

Shivang

Thanks for your reply.

Since UP is defined as "Number of successful tests to considered link as alive", doesn't this mean increasing UP to 3 makes it harder to decide WAN is up?

The problem I have is it is deciding WAN is down too quickly. I want it to ignore short (1s) WAN interruptions. I tried to increase INTERVAL and DOWN, but it did not help (which is very confusing).

I will send you the troubleshooting data.

see also logread output :

Thank you!

by
No that's count that you should increase to a number like 5 if the outages are below 5 seconds.
by
Hi Shivang,

Just to advise this issue continues, despite trying your suggestion of UP=3 and UP=5 ( and also increasing COUNT to 5).

I have sent you the troubleshooting files by private message, please let me know what you think - also see the logread screenshot in my last comment - I think this shows really well the problem.

Thanks

Sam