The cause enforced me to start searching for an alternative communication channel to ssh for managing my home server is a tunnel agent that is wrapped up in a container. The agent is responsible for keeping a network tunnel established, but from time to time disconnects and isn't able to reconnect to a tunnel server. The tunnel exposes a public entry point, a subdomain of a tunnel provider, in the global network to reach the home server behind the NAT.
When the tunnel server is unreachable by the agent for a reason, while it's a vital part of the network access to managing server, the ssh in general and Docker Daemon in particular are unreachable either. In other words, i loose remote control completely.
The issue isn't that big when i'm within the local network, connect to the server through LAN and restart the tunnel agent to reexpose the entry point. But how to stay connected with the server when the issue occurs?
The most straightforward solution is to replace a tunnel provider with a more stable one. Plenty of them can be found, for instance, Cloud Flare Zero Thrust , Loophole.cloud . However, some of them are difficult in setting up, some have restrictions now allowing me to implement a custom infrastructure, and after all, i could encounter the same issue in their work.
An investegation was made to find a good full-fledged replacement and it ended up with i would like to stay with the current Playit.gg provider. Its functionality is exactly what my needs are:
- It's free.
- Easy to set up.
- TCP/UDP layer. Not either HTTP or SSH.
- Not a platform. No confusing settings and complexities.
- Playit's subdomains can be assigned as CNAME.
- Custom port range.
- ANYCAST network. Automatic search for a faster network route from a client to an agent.
All right, what is then i decided to stay with Playit, the problem isn't gone?
Internally, as the agent is wrapped up in a container, docker daemon could restart it automatically when detected the agent process is finished. It doesn't happen, because the process in the container stays working fine, but is logging about not being able to reconnect.
Such the state has to be detected in other way and restarted. I could make it manually, the status page service makes health checks and notifies me when the problem has occurred. Unfortunately, having the remote control completely lost, i'm only able to sadly observe that all my services on the home server are not available from the global network.
Breath out, if it's not accessible from the global network can it gets restarted from the local network? Yes, the other way is to run a bot in there that polls for commands highly available Telegram Server. Bot proactively requests the server for new commands every few seconds, the global network access is unnecessary. The good thing is that coding and running a bot to restart the agent isn't a difficult objective.
Telegram as a bot platform has been chosen for its high availability, developed bot ecosystem, nice and mature python library python-telegram-bot .
Briefly, steps i've done to run and command it:
- Bot registration.
- Implementing a few command handlers. Each command handler calls an according docker api method.
- Running the bot on the home server.
- Creating a chat with the bot for the public command interface.
- Giving the bot access to the docker daemon's unix socket by mounting it to a bot container
I figured out that three commands and a few parameters are my bare minimum: list containers, output container's logs, restart a container.
To see container names and current statuses.
The logs list serves for the detection of the tunnel agent connection problem.
The final command to get the network state back to normal.
Good to have a commands and their parameters description for later on.