Year: 2019

  • Networking nonsense

    I’ve recently been working on setting up Drone CI on the tilde.team machine. However, there’s been something strange going on with the networking on there.

    Starting up drone with docker-compose didn’t seem to be working: netstat -tulpn showed the port binding properly to 127.0.0.1:8888 but I was completely unable to get anything from it (using curl the nginx proxy that was to come).

    I ended up scrapping docker on the ~team box itself and moving it into a LXD container (pronounced “lex-dee”) with nesting enabled.

    This got us in to another problem that had been seen before when using nginx to proxy to apps running in other containers. Requests were dropped intermittently, sometimes hanging for upwards of 30 seconds.

    Getting frustrated with this error, I tried to reproduce it on another host. Both the docker-proxy and nginx->LXD proxies worked on the first try, yielding no clues as to where things were going wrong.

    In a half-awake stupor last Saturday evening, I decided to try rule out IPv6 by disabling it system-wide. As is expected for sleepy work, it didn’t fix the problem and created more in the process.

    Feeling satisfied that the problem didn’t lie with IPv6, I re-enabled it, only to find that I was unable to bind nginx to my allocated /64. I may or may not have ranted a bit about this on IRC but I was able to get it back up and running by restarting systemd-networkd.

    One step forwards broke something and now we’re back to where we started with the original problem of the intermittent hangups to the LXD container.

    Seeing my troubles on IRC, jchelpau offered to help dig in to the problem with a fresh set of eyes. He noted right away that pings over ipv6 to the containers worked fine, but ipv4 did not.

    We ended up looking at the firewall configurations, only to find that one of the subnets I blocked after november’s nmap incident included lxdbr0’s subnet (the bridge device used by LXD).

    Now that I made the exception for lxdbr0, everything is working as expected!

    Thanks to fosslinux and jchelpau for their debugging help!

  • RAID nonsense

    Last week, I did some maintenance on the tilde.team box. Probably should have written about it sooner but I didn’t make time for it until now.

    The gist of the problem was that the default images provided by Hetzner default to RAID1 between the available disks. Our box has two 240gb SSDs, which resulted in about 200gb usable space for /. It also defaulted to giving us a huge swap partition which I deem unnecessary for a box with 64gb of RAM.

    The only feasible solution that I’ve found involved using the rescue system and the installimage software to reconfigure the disk partitions.

    deepend recently upgraded to a beefier dedi (more threads and more disk space) and had a bit of contract time on the old one. He offered to let me use it as a staging box for the meantime while I reinstalled and reconfigured my raid settings.

    I’ve migrated tilde.team twice before (from Linode > Woothosting > Hetzner > and now back to Hetzner on the same box) using a slick little rsync that I’ve put together.

    rsync -auHxv --numeric-ids \
        --exclude=/etc/fstab \
        --exclude=/etc/network/* \
        --exclude=/proc/* \
        --exclude=/tmp/* \
        --exclude=/sys/* \
        --exclude=/dev/* \
        --exclude=/mnt/* \
        --exclude=/boot/* \
        --exclude=/root/* \
        root@oldbox:/* /

    As long as the destination and source boxen are running the same distro/version, you should be good to go after rebooting the destination box!

    The only thing to watch out for is running databases. It happened to me this time with mysql. There were 3 pending transactions that were left open during the rsync backup. It kept failing to start after I got the box back up, along with all the other services that depend on it.

    Eventually I was able to get mysqld back up and running in recovery mode (basically read-only) and got a mysqldump of all databases. I then purged all existing mysql data, reinstalled mariadb-server, and restored the mysqldump. Everything came up as expected and we were good to go!

    The raid is now in a RAID0 config, leaving us with 468gb (not GiB) available space. Thanks for tuning in to this episode of sysadmin adventures!