We have a production environment with around 100 RUT955 devices. A couple of months ago we updated serveral devices to fw version RUT9_R_00.07.01.4.
After about 50 days uptime we are getting high ping responses from those devices. Remote ssh is not available (connection refused or timeout). WebGUI login fails with "device busy".
Device logs show out of memory problems with process "ports_eventsd" as the source:
[4268791.398774] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=port_eventsd,pid=2632,uid=0
[4268791.407894] Out of memory: Killed process 2632 (port_eventsd) total-vm:67244kB, anon-rss:65532kB, file-rss:4kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[4268792.032643] oom_reaper: reaped process 2632 (port_eventsd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[4586670.162124] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=port_eventsd,pid=2632,uid=0
[4586670.171269] Out of memory: Killed process 2632 (port_eventsd) total-vm:71784kB, anon-rss:70076kB, file-rss:4kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
[4586670.965407] oom_reaper: reaped process 2632 (port_eventsd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
After the oom-kill device response time is back to normal. But before this we see 24-72 hrs with reduced performance and lost connectivity.
Is this a fw related bug? Any workaround?