nblock's ~

Automatically recover a failing USB LTE modem

My home network consists of a wireless router running LEDE 17.01 and a ZTE MF831 LTE USB modem for Internet connectivity. From time to time the Internet connection fails and the only way to recover was to physically reconnect the USB modem. So each time it failed, I had to get to my wireless router, pull the USB modem and reconnect it. This post describes the steps I took to work around this issue.

The lost connection doesn't seem to follow a pattern. Sometimes it happens every day and sometimes the connection works for weeks without any issues. But still, the problem exists and when it kicks in, the system is not able to recover itself. When the connection dies, logread contains the following log entries:

[snipped]
daemon.info pppd[9626]: No response to 5 echo-requests
daemon.notice pppd[9626]: Serial link appears to be disconnected.
daemon.info pppd[9626]: Connect time 39.7 minutes.
daemon.info pppd[9626]: Sent 90740877 bytes, received 878872777 bytes.
daemon.notice netifd: Network device '3g-provider' link is down
daemon.notice netifd: Interface 'provider' has lost the connection
daemon.warn dnsmasq[1466]: no servers found in /tmp/resolv.conf.auto, will retry
daemon.info odhcpd[954]: Using a RA lifetime of 0 seconds on br-lan
daemon.notice pppd[9626]: Connection terminated.
daemon.notice pppd[9626]: Modem hangup
daemon.info pppd[9626]: Exit.
daemon.notice netifd: Interface 'provider' is now down
daemon.notice netifd: Interface 'provider' is setting up now
daemon.notice netifd: provider (9858): comgt 12:02:15 -> -- Error Report --
daemon.notice netifd: provider (9858): comgt 12:02:15 -> ---->                  ^
daemon.notice netifd: provider (9858): comgt 12:02:15 -> Error @118, line 9, Could not \
  write to COM device. (1)
daemon.notice netifd: provider (9858):
daemon.notice pppd[9873]: pppd 2.4.7 started by root, uid 0
local2.info chat[9875]: abort on (BUSY)
local2.info chat[9875]: abort on (NO CARRIER)
local2.info chat[9875]: abort on (ERROR)
local2.info chat[9875]: report (CONNECT)
local2.info chat[9875]: timeout set to 10 seconds
local2.info chat[9875]: send (AT&F^M)
local2.info chat[9875]: alarm
local2.info chat[9875]:  -- write timed out
local2.err chat[9875]: Failed
daemon.err pppd[9873]: Connect script failed
[snipped]

Interestingly, the devices in /dev/ttyUSB* are still there and the logs don't contain anything USB related.

PPPD notices that the serial connection with the modem is broken and shuts down. Simply restarting the interface afterwards (ifdown/ifup, web interface) does not work. The first step of the workaround is to restart the USB modem via software. Fortunately, this Stack Exchange post pointed me into the right direction. A simple unbind followed by a bind on the correct USB port works fine. On unbind, the modem disappears and all /dev/ttyUSB* devices are removed by the kernel. On bind, the kernel re-initializes the modem, does some mode switching and a few seconds later, the /dev/ttyUSB* devices reappear. After this unbind/bind cycle, PPPD is started automatically and Internet connectivity is restored. A list of USB ports may be obtained via:

# find /sys/bus/usb/devices/
/sys/bus/usb/devices/
/sys/bus/usb/devices/1-1
/sys/bus/usb/devices/usb1
/sys/bus/usb/devices/usb2
/sys/bus/usb/devices/1-0:1.0
/sys/bus/usb/devices/1-1:1.0
/sys/bus/usb/devices/1-1:1.1
/sys/bus/usb/devices/1-1:1.2
/sys/bus/usb/devices/2-0:1.0

There is one problem though, I want the reconnection steps to trigger automatically when PPPD detects that the serial link stopped working. Fortunately, PPPD offers various hooks that one can leverage. In my case, the ip-down hook is the correct one. It is called with various arguments and with some environment variables. To enable a ip-down hook on OpenWRT/LEDE, create the directory /etc/ppp/ip-down.d and place your executable ip-down script in this directory. All ip-down scripts in /etc/ppp/ip-down.d executed each time PPPD had a working IP connectivity and is in the process of shutting down. The last part of the puzzle is to only trigger the reconnection when the serial link is faulty. Especially, do not trigger when:

  • The user requested to shutdown the interface (ifdown, web interface).
  • The USB modem is physically disconnected.

The complete solution is the shell script listed below. It leverages the OpenWRT/LEDE logging system and the fact that PPPD sets the environment variable PPPD_PID. I only need to inspect log entries produced by the currently running PPPD and find log entries that indicate a faulty serial link.

#!/bin/sh
# pppd ip-down script to reset a USB LTE Modem when the serial link is faulty.

# The USB port where the USB LTE modem is connected.
USB_DEVICE_ADDRESS="1-1"

# Try to find out why pppd is shutting down and only reset the device when the
# serial link is faulty. Exit early otherwise. Luckily, pppd provides us with
# some environment variables/arguments that we can leverage:
# - PPPD_PID -> The PID of the *calling, currently running* pppd.

# Exit if we are not called by pppd.
[ -z "$PPPD_PID" ] && exit 0

# pppd logs that a certain amount of echo-requests sent to the device failed.
if ! logread | grep -q "pppd\[$PPPD_PID\]: No response to .\+ echo-requests"; then
  exit 0
fi

# pppd also logs that the serial link appears to be disconnected.
if ! logread | grep -q "pppd\[$PPPD_PID\]: Serial link appears to be disconnected"; then
  exit 0
fi

# Reset the device
logger "Reset USB device at address $USB_DEVICE_ADDRESS"
echo "$USB_DEVICE_ADDRESS" > /sys/bus/usb/drivers/usb/unbind
sleep 1
echo "$USB_DEVICE_ADDRESS" > /sys/bus/usb/drivers/usb/bind
logger "Reset complete"

permalink | tweet this

tagged lede, lte, modem, openwrt and usb