today I had a global connectivity outage on more than 25 RUT240 devices. Most of them recovered in 15-30 min after this outage except for 3.
Currently I'm investigating the cause of this issue with the networkprovider(M2M,data-only,freeeway). My guess is that there was some issue on the mobile network that caused that outage, what bothers me is that 3 devices didn't recover.
One device recovered after a power cycle. The other 2 weren't power cycled and are still offline. I downloaded a troubleshoot log from that device to inspect the reason for this behaviour.
I'm wondering because:
1) I enabled the ping reboot to 188.8.131.52 on a 5 min interval.
From my understanding a power cycle and a ping triggered reboot do the same to the router: reboot. Why did the power cycle help and not that ping reboot? Is there any possibilty that 184.108.40.206 is pingable but the router can't connect to "the rest of the internet" ?
2) This setup is replicated multiple times
In sum I have more than 30 RUT240 in the field. All updated to at least RUT2_R_00.07.02.4 firmware, all with the same configuration as the attached troubleshoot log, some with EC25 Quetel modem, some with SLM 750 MEIG modem. The configuration is not heavily modified from the factory configuration, just added my wifi, an application script, ping reboot and 24 hour reboot.
There are devices with exact the same setup(Firmware version + modem) that recovered. What could be the reason for these differences?
3) Provider support advice
The support on the providers side suggested "it might be necessary to restart the device to force a full new network session" In my understanding a reboot will do that. Even a software triggered reboot from the ping monitoring. Is there any way to make this more reliable? Or open a new network session automatically if connectivity is bad?
Its always unpleasant to call customers asking them to pull the plug and restart their device to fix connectivity issues - also this isn't a solution on scale.
I really need to fix those issues and harden the routers against any mobile-connectivity hickups. I have a really hard time tracking down these issues, since they occur really rarely and I can't find a good way to debug them. So I also would appreciate any advice in this direction. For now looking at troubleshoot logs after those events is the only way for me to get an idea of whats going wrong.