Categories
Uncategorized

tilde.team subdomains as bluesky handles

This is a quick tutorial on using your tilde.team subdomain as a handle on bluesky. Domain verification on bluesky can be done with a DNS challenge or by serving a text file from .well-known in your webroot.

Since adjusting the Content-Type for plaintext files with no extension isn’t possible without changing global nginx configs, the quickest way is to use a tiny php script.

<?php header("Content-Type: text/plain");
echo "did:plc:v7tbr6qxk6xanxzn6hjmbk7o";

Make sure that the following directory exists ~/public_html/.well-known/atproto-did and put the above script in there as index.php, replacing the did with your own.

Then go to Settings > Change Handle on bsky.app, pick “I have my own domain”, then pick the No DNS Panel tab. Enter your subdomain and hit Verify Text File.

You can use any of the domains that are hooked up to your ~/public_html. See the list on the tilde.team wiki.

Here’s the source for mine on tildegit.

Categories
Uncategorized

Update Adventures

tl;dr I got bit by an interface naming change (bug?) https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Linux_Bridge_MAC-Address_Change. Network didn’t come back up after reboot and I spent a long time figuring it out.


Here’s the longer version about the outage on August 24, 2021:

After finishing the package upgrades on my Proxmox hosts for the new release (Proxmox 7.0, corresponding to Debian 11/bullseye), I typed reboot and pressed enter, crossing my fingers that it would come back up as expected.

It didn’t.

Luckily I had done one last round of VM-level backups before starting the upgrade! I started restoring the backups to one of my other servers, but my authoritative DNS is hosted on the same server as tilde.team, so that needed to happen first.

I got the ns1 set up on my Proxmox node at Hetzner, but my ns2 secondary zones had been hosted at ovh. Time to move those to he.net to get it going again (and move away from a provider-dependent solution).

While shuffling VMs around, I ended up starting a restore of the tilde.team VM on my infra-2 server at OVH. It’s a large VM with two 300gb disks so it would take a while.

I started working to update the DNS records for tilde.team to live on OVH instead of my soyoustart box, but shortly after, I received a mail (in my non-tilde inbox luckily) from the ovh monitoring team that my server had been rebooted into rescue mode after being unpingable for this long.

I was able to log in with the temporary ssh password and update /etc/network/interfaces to use the currently working MAC address that the rescue system was using.

Once I figured out how to disable the netboot rescue mode in the control panel, I hit reboot once more. we’re back up and running on the server that it was on at the start of the day!

ejabberd wasn’t happy with mysql for some reason but everything else seems to have come back up now.

Like usual, holler if you see anything amiss!

Cheers, ~ben

Categories
Uncategorized

Mastodon PostgreSQL upgrade fun

Howdy friends!

If you’re a mastodon user on tilde.zone (the tildeverse mastodon instance), you might’ve noticed some downtime recently.

Here’s a quick recap of what went down during the upgrade process.

We run the current stable version of PostgreSQL from the postgres apt repos. PostgreSQL 13 was released recently and the apt upgrades automatically created a new cluster running 13.

The database for mastodon has gotten quite large (about 16gb) which complicates this upgrade a bit. This was my initial plan:

  1. drop the 13 cluster created by the apt package upgrades
  2. upgrade the 12-main cluster to 13
  3. drop the 12 cluster

These steps appeared to work fine, but closer inspection afterwards led me to discover that the new cluster had ended up with SQL_ASCII encoding somehow. This is not a situation we want to be in. Time to fix it.

Here’s the new plan:

  1. stop mastodon:
    for i in streaming sidekiq web; do systemctl stop mastodon-$i; done
  2. dump current database state:

    pg_dump mastodon_production > db.dump

  3. drop and recreate cluster with utf8 encoding:
    pg_dropcluster 13 main --stop pg_createcluster --locale=en_US.UTF8 13 main --start
  4. restore backup:
    sudo -u postgres psql -c "create user mastodon createdb;" sudo -u mastodon createdb -E utf8 mastodon_production
    sudo -u mastodon psql < db.dump

I’m still not 100% sure how the encoding reverted to ASCII but it seems that the locale was not correctly set while running the apt upgrades…

If this happens to you, hopefully this helps you wade out while keeping all your data 🙂

Categories
Uncategorized

Networking nonsense

I’ve recently been working on setting up Drone CI on the tilde.team machine. However, there’s been something strange going on with the networking on there.

Starting up drone with docker-compose didn’t seem to be working: netstat -tulpn showed the port binding properly to 127.0.0.1:8888 but I was completely unable to get anything from it (using curl the nginx proxy that was to come).

I ended up scrapping docker on the ~team box itself and moving it into a LXD container (pronounced “lex-dee”) with nesting enabled.

This got us in to another problem that had been seen before when using nginx to proxy to apps running in other containers. Requests were dropped intermittently, sometimes hanging for upwards of 30 seconds.

Getting frustrated with this error, I tried to reproduce it on another host. Both the docker-proxy and nginx->LXD proxies worked on the first try, yielding no clues as to where things were going wrong.

In a half-awake stupor last Saturday evening, I decided to try rule out IPv6 by disabling it system-wide. As is expected for sleepy work, it didn’t fix the problem and created more in the process.

Feeling satisfied that the problem didn’t lie with IPv6, I re-enabled it, only to find that I was unable to bind nginx to my allocated /64. I may or may not have ranted a bit about this on IRC but I was able to get it back up and running by restarting systemd-networkd.

One step forwards broke something and now we’re back to where we started with the original problem of the intermittent hangups to the LXD container.

Seeing my troubles on IRC, jchelpau offered to help dig in to the problem with a fresh set of eyes. He noted right away that pings over ipv6 to the containers worked fine, but ipv4 did not.

We ended up looking at the firewall configurations, only to find that one of the subnets I blocked after november’s nmap incident included lxdbr0’s subnet (the bridge device used by LXD).

Now that I made the exception for lxdbr0, everything is working as expected!

Thanks to fosslinux and jchelpau for their debugging help!

Categories
Uncategorized

RAID nonsense

Last week, I did some maintenance on the tilde.team box. Probably should have written about it sooner but I didn’t make time for it until now.

The gist of the problem was that the default images provided by Hetzner default to RAID1 between the available disks. Our box has two 240gb SSDs, which resulted in about 200gb usable space for /. It also defaulted to giving us a huge swap partition which I deem unnecessary for a box with 64gb of RAM.

The only feasible solution that I’ve found involved using the rescue system and the installimage software to reconfigure the disk partitions.

deepend recently upgraded to a beefier dedi (more threads and more disk space) and had a bit of contract time on the old one. He offered to let me use it as a staging box for the meantime while I reinstalled and reconfigured my raid settings.

I’ve migrated tilde.team twice before (from Linode > Woothosting > Hetzner > and now back to Hetzner on the same box) using a slick little rsync that I’ve put together.

rsync -auHxv --numeric-ids \
    --exclude=/etc/fstab \
    --exclude=/etc/network/* \
    --exclude=/proc/* \
    --exclude=/tmp/* \
    --exclude=/sys/* \
    --exclude=/dev/* \
    --exclude=/mnt/* \
    --exclude=/boot/* \
    --exclude=/root/* \
    root@oldbox:/* /

As long as the destination and source boxen are running the same distro/version, you should be good to go after rebooting the destination box!

The only thing to watch out for is running databases. It happened to me this time with mysql. There were 3 pending transactions that were left open during the rsync backup. It kept failing to start after I got the box back up, along with all the other services that depend on it.

Eventually I was able to get mysqld back up and running in recovery mode (basically read-only) and got a mysqldump of all databases. I then purged all existing mysql data, reinstalled mariadb-server, and restored the mysqldump. Everything came up as expected and we were good to go!

The raid is now in a RAID0 config, leaving us with 468gb (not GiB) available space. Thanks for tuning in to this episode of sysadmin adventures!

Categories
Uncategorized

WeeChat Setup

So you decided to try WeeChat?

What options do you need to set? what plugins? what scripts?

I’ll go over some of the most essential of these, and share my full configs.

Options

  • logger.level.irc
    The default is 9, which includes joins and parts. In most cases you can set this to 3, which only includes messages.
  • weechat.look.buffer_notify_default
    The default here is all, which will add joins and parts to your hotlist. set it to message.
  • weechat.look.confirm_quit
    Set this to on. You’ll thank me when you type /quit and mean /close.
  • weechat.look.highlight
    Add a comma-separated list of names/terms you’d like to trigger a highlight here
  • weechat.look.prefix_align_max
    Set this to something between 10 and 20. Otherwise, long nicks will crush your available screen real estate.
  • buflist.format.indent
    Adjusts the display of your channel list. I use ${color:237}${if:${buffer.next_buffer.local_variables.type}=~^(channel|private)$?├:└} to print some nice unicode characters to draw nice guides.
  • buflist.format.number
    If you want to skip the . or space after the number, set it to ${color:green}${number}
  • irc.look.color_nicks_in_names
    Set this to on
  • irc.look.color_nicks_in_nicklist
    Set this to on
  • irc.look.server_buffer
    Set this to independent to prevent automatic merges with the core WeeChat buffer. Especially useful if you plan on using autosort.py
  • irc.server_default.autoconnect
    Set this to on so you don’t have to set it for every new network you add.

Scripts

These scripts can be managed with the built in /script tool. Press i, then enter on the selected script to install it.

  • highmon.pl
    Set aside a buffer to list the places your nick has been mentioned.
  • colorize_nicks.py
    Show nicks in chat with colors.
  • go.py
    Fuzzy quick jump by buffer number or channel name.
  • autojoin.py
    Use /autojoin --run to save all the channels you’re currently in to be autojoined the next time you start WeeChat.
  • autosort.py
    Use this script in tandem with irc.look.server_buffer=independent to keep your channel and server list in order.
  • colorize_lines.pl
    I use this script to highlight the entire line of messages I’ve been mentioned in. Check the options in the source or with /help colorize_lines.
  • grep.py
    Quickly search history and buffers with /grep.

The rest of my configs

You can find the rest of my configs here. Be sure to keep the WeeChat Documentation handy as well.

If you have any questions, feel free to ping me on IRC. I’m ben on libera and tilde.chat.

Screenshot

Here’s a screenshot of my current configs

Bonus

If you have an existing setup, you can check the config changes you’ve made with /set diff.

Additionally, feel free to use my .gitignore, add your ~/.weechat to source control, and compare.

Hope you’ve enjoyed customizing your WeeChat!

EDIT: s/freenode/libera/g

Categories
Uncategorized

Proactive (reactive?) redundancy

After the fiasco earlier this week, I’ve been taking steps to minimize the impact if tilde.team were to go down. It’s still a large SPOF (single-point-of-failure), but I’m reasonably certain that at least the IRC net will remain up and functional in the event of another outage.

The first thing that I set up was a handful of additional IRCd nodes: see the wiki for a full list. slash.tilde.chat is on my personal vps, and bsd.tilde.chat is hosted on the bsd vps that I set up for tilde.team.

I added the ipv4 addresses for these machines, along with the IP for yourtilde.com as A records for tilde.chat, creating a DNS round-robin. host tilde.chat will return all four. Requesting the DNS record will return any one of them, rotating them in a semi-random fashion. This means that when connecting to tilde.chat on 6697 for IRC, you might end up on any of {your,team,bsd,slash}.tilde.chat.

This creates the additional problem that visiting the tilde.chat site will end up at any of those 4 machines in much the same way. For the moment, the site is deployed on all of the boxes, making site setup issues hard to debug. the solution to this problem is to use a subdomain as the round-robin host, as other networks like Libera.Chat do (see host irc.libera.chat for the list of servers).

I’m not sure how to make any of the other services more resilient. It’s something that I have been and will continue to research moving forward.

The other main step that I have taken to prevent the same issue from happening again was to configure the firewall to drop outgoing requests to the subnets as defined in rfc 1918.

I’d like to consider at least this risk to be mitigated.

Thanks for reading,

~ben

Update: the round robin host is now irc.tilde.chat, which resolves the site issues that we were having, due to the duplicated deployments.

Categories
Uncategorized

November 13 post mortem

We had something of an outage on november 13, 2018 on tilde.team.

I awoke, not suspecting anything to be amiss. As soon as I logged in to check my email and IRC mentions, it became clear.

tilde.team was at the least inaccessible, and at the worst, down completely. According to the message in my inbox, there had been an attempted “attack” from my IP.

We have indications that there was an attack from your server. Please take all necessary measures to avoid this in the future and to solve the issue.

At this point, I have no idea what could have happened overnight while I was sleeping. The timestamp shows that it arrived only 30 minutes after I’d turned in for the night.

When I finally log on in the morning to check mails and IRC mentions, I find that I’m unable to connect to tilde.team… strange, but ok; time to troubleshoot. I refresh the webmail to see what I’m missing. It ends up failing to find the server. Even stranger! I’d better get the mails off my phone if they’re on my @tilde.team mail!

Here, i launch in to full debugging mode: what command was it? who ran it?

Searching ~/.bash_history per user was not very successful. Nothing I could find was related to net or map. I had checked sudo grep nmap /home/*/.bash_history and many other commands.

At this point, I had connected with other ~teammates across other IRC nets (#!, ~town, etc). Among suggestions to check /var/log/syslog, /var/log/kern.log, and dmesg, I finally decided to check ps. ps -ef | grep nmap yielded nmap on an obscured uid and gid, which I shortly established to belong to a LXD container I had provisioned for ~fosslinux.

I’m not considering methods of policing access to any site over port 80 and port 443. This is crazy. How do you police nmap when it isn’t scanning on every port?

After a bit of shit-talking and reassurance from other sysadmins, I reexamined and realized that ~fosslinux had only run nmap for addresses in the 10.0.0.0/8 space. The 10/8 address space is intended to not be addressable outside the local space. How could Hetzner have found out about a localhost network probe!?

Finally, after speaking with more people than I expected to speak with in one day, I ended up sending three different support emails to Hetzner support, which finally resulted in them unlocking the IP.

It’s definitely time to research redundancy options!

Categories
Uncategorized

DNS shenanigans post-mortem

Let’s start by saying I probably should have done a bit more research before diving head-first into this endeavor.

I’ve been thinking about transferring my domains off google domains for some time now, as part of my personal goal to self host and limit my dependence on google and other large third-party monstrosities. Along that line, I asked for registrar recommendations. ~tomasino responded with namesilo. I found that they had $3.99 registrations for .team and .zone domains, which is 1/10th the cost of the $40 registration on google domains.

I started out by getting the list of domains from the google console. 2 or 3 of them had been registered within the last 60 days, so I wasn’t able to transfer those just yet. I grabbed all the domain unlock codes and dropped them into namesilo. I failed to realize that the DNS panel on google domains would disappear as soon as it went through, but more importantly that the nameservers would be left pointing to the old defunct google domains ones.

I updated the nameservers as soon as I realized this error from the namesilo panel. Some of the domains propagated quickly. Others, not so much. tilde.team was still in a state of flux between the old and new nameservers.

In a rush to get the DNS problem fixed, and under recommendation from several people on IRC, I decided to switch the nameservers for tilde.team and tilde.zone to cloudflare, leaving another layer of flux for the DNS to be stuck in…

Of the five domains that I moved to cloudflare, 3 returned with a DNSSEC error, claiming that I needed to remove the DS record from that zone. D’oh!

I removed the DNSSEC from those affected domains, so we should be good to go as soon as it all propagates through the fickle beast that is DNS.

Categories
Uncategorized

LXD networking and additional IPs

Now that tilde.team is on a fancy-shmancy new dedi server, I’ve tried to get a secondary IP address assigned to a LXD container (which I plan to use for my personal stuff). LXD shows that the secondary IP is being picked up by that container, but I’m still seeing the host machine’s IP as the external address.

I’m not sure how I’ll need to configure the network settings on the host machine (now that we’re running ubuntu 18.04 and it uses netplan for configs and not /etc/network/interfaces). Another confusing thing is that the main config in /etc/netplan says that the network config is handled by systemd-networkd

At least i have through the end of the year when my current VPS runs out to get this up and running.

Ping me on IRC or email if you have experience with this.