FOR TIPS, gUIDES & TUTORIALS

subscribe to our Youtube

GO TO YOUTUBE

12688 questions

15067 answers

24145 comments

47111 members

0 votes
155 views 5 comments
by

Hi all

I have an odd issue with a RUT950 which is deployed in an installation. The router is connected to 4G/LTE, and it works as an L2TP/OVPN server and it forwards some ports for a couple of web interfaces for other devices on the network. It's accessed remotely using a DDNS domain. This has worked well until recently, but I've been getting complaints that the web interfaces are sporadically down. I don't know exactly when this started as these services are rarely used, they're only for maintenance purposes.

I have confirmed that the issue exists – the router is sometimes inaccessible  and VPN, web IF, etc are unusable, and it doesn't respond to ping either. However it does respond to SMS commands. If tends to work again after waiting it out.

At the moment I can not ping the router remotely, my requests just time out – but when I send an SMS message to get the router status I find that:

1) The router is connected to LTE

2) The WAN IP is the same that the DDNS domain resolves to

so I take it this is neither a mobile data issue nor a DDNS problem.

I just rebooted the router by SMS, and it rebooted successfully and was assigned a new WAN IP. When I try to ping the DDNS host name I can see that it now points to the new IP, so obviously that bit works – but it still does not respond to ping and is just as inaccessible as it was before.

I'm really puzzled, what could be causing this?

-G 

1 Answer

0 votes
by

Hello,

Would it be possible to receive a troubleshoot file from the device, once it becomes available, without restarting it. The logs in the file may help to indicate and solve the issue. 

To generate the file, access router's WebUI, go to System -> Administration -> Troubleshoot section and download troubleshoot file from there. Attach it by editing your question.

Best regards,

by
Troubleshoot file added above. I looked at the event logs yesterday and didn't see anything out of the ordinary. I then upgraded to the latest legacy FW. Since 01:00 yesterday night, there's been an insane amount of "Joined 4G LTE" events logged – literally thousands, at intervals of approx 10 seconds.
by

The spamming "Joined 4G LTE" seems to be some stuck loop, however, this is simply a message, without any impact on the connection. It might be due to the firmware update, if the settings from previous version were kept. 

The mobile connection seems stable, signal quality is good, pinging the device for a while shows no lost packets.

However, in regards, to OpenVPN configuration, there was the following error:

  • Authenticate/Decrypt packet error: packet HMAC authentication failed

Which would indicate some mismatch between server and client devices, though HMAC authentication is not configured on your device at all, thus, I would suggest to check client's configuration.

If the issue occurs, it can be suggested to upgrade to current latest RutOS, since at this point, legacy receives only critical updates, related to device hardware changes and security updates. However, in this case, you would need to reset and reconfigure the router.

by

Thanks for looking into this!

I did indeed keep the settings through the last firmware upgrade, so if that's a possible explanation to the log message flooding then that's OK.

I agree the signal quality seems decent, but I noticed that RSRP is somewhat bad at -98dB. Borderline case? 

It's a bit early to say for sure, but it's starting to look like a bit of a pattern: it seems the problem often occurs in the morning. For the last couple of days I haven't been able to reach the RUT until around 11 (local time) and then it's been perfectly stable for the rest of the day. As of now, 08:20 local time, it's unreachable again. I'll wait and see if I can reach it again at noon. 

I realise it's probably best to upgrade to the latest OS, but I'm a bit reluctant to do so. I vaguely remember that the OVPN configuration was a bit of a pain when setting this up initially, and it was easier to get it to work on the legacy OS than on the (at that point) very fresh new-generation OS. I'd really rather not go through all that again, especially now that the devices are deployed in the field (on the other side of the country).

As for that OVPN thing: it's just a simple site-to-site bridge with a RUT240 as the client. On the RUT950 site there are a couple of PLS-type devices, and on the RUT240 site (client's office) there is a touch panel controller for these devices. They use multicast to communicate, hence the need for a bridge VPN. That's pretty much it, I vaguely remember that I tried to follow some guides but it was a bit more involved than that – can't remember why exactly. HMAC doesn't ring any bells. I'd prefer not to touch it – and changing the client configuration will be difficult, as for some reason I can't reach the RUT240 remotely even though I can reach the devices behind it! Could never figure that one out.

by

The issue seems to be firewall related. 

Your device does respond to pings IP assigned to mobile interface and resolves the configured DDNS domain, but connection attempts seem to be refused by the router's firewall.

In this case, if possible, I would suggest to send SMS to restart the device, and see if the connection restores. 

The next step, is to change HTTP port for remote access in System -> Administration -> Remote access, as current configuration creates a serious flaw from security standpoint in your configuration. Set HTTP(S) port to a random number in the range of 32768-65535. Changing login password would be a good addition too. To access the WebUI locally, you would need to add the new port number too: IP:port. 

For monitoring, it would be useful, if you installed enabled TCPdump for mobile interface on your specified HTTP port and save the logs in flash. It should be available in System -> Administration -> Troubleshoot section. Then wait until the device becomes inaccessible, make several attempts to connect. Download the dump file, when the device can be accessed again. This would provide more details into what is happening during refusal periods.

Best regards,

 

by
Ah, I forgot to mention that I have already changed the web interface port, so that would be why the firewall is now rejecting connection  attempts to the previously assigned default port. (Bad idea, I agree!)

At around 10:00 today it started working again, which is why it's currently responding to ping. When the problem occurs it stops responding to ping. I've enabled TCP dump nonetheless.