Aug 24

Aug 24 Network Revamp 2023

*There is a distinct chance this rack will change from the time I took this picture until the publishing of this blog.*

This is a writeup on some adjustments, equipment shuffling, and general changes to the network. This revamp was needed. It happens a lot, but usually in small chunks. However I recently made some fairly major changes all at once so I thought I’d cover them.

A Bit of Background

First off, let me state this was mainly driven by the fact that I used to sit in the tiny server room with the equipment and do my thing - for work and non-work activities alike. However I no longer had that option due to more and more equipment being acquired and simply running out of room. It was rather inconvenient to set up detailed experiments, so in essence I was using my lab “remotely” much more than I wanted. As things expanded, more power was being used as well. So I wanted to solve two problems - a lack of space, and reduce the overall power consumption.

Yes one can call this activity “home lab” but since there are external users from different parts of the planet accessing it, well, this is different. Besides, since I’ve been doing this since the late 90s, I can state this is not a typical setup (static IPs, publicly accessible, used as a service to others for email etc) and doesn’t fall under the home lab label. Yes, on the internal network I do some decidedly home lab type things, but the public stuff definitely is the priority.

Instead of running multiple services in VMs or containers, I typically try to run bare metal. There are a few reasons for this, but the three main factors are performance, hardware compatibility, and security. We’ll discuss each one of these, but I should preface this with an important bit of detail. Most of these bare metal servers do run more than one service, and a few are running docker containers. So the overall approach is “bare metalish” I guess, but the overall philosophy is isolation of major services to specific physical devices.

Performance

With several services, if that particular service is having a problem that is eating up CPU, I don’t want it impacting other processes. I’ve experienced this in the past, especially in this environment where I think nothing of trying out the latest hack or attack on one of the services. This also carries over when someone decides to try and attack one of the public systems - services have been impacted, but isolation between systems does help contain the impact.

Hardware compatibility

Some of the services have odd hardware associated with them, such as weird USB devices, Wi-Fi antennas used for sniffing, or some other specialized hardware. I’ve experienced incompatibilities where one service simply will not run because it needs a particular version of a library or a language (such as Python) and the driver for some added hardware needs something completely different. Yes using containers sometimes helps, and even full VMs can solve some of this, but it had happened enough that it was easier to try and physically separate things, and that is still my default.

The move from the old systems to the new marked a move from AMD to Intel for the processor. Truthfully, I’ve had slightly less problems with Intel-based systems versus AMD-based systems when it comes to hardware compatibility. Most of the time it doesn’t matter at all, but it has cropped up on occasion. This wasn’t enough of an issue to really matter in this migration, but it was as far as I’m concerned a step up.

I should note since some will want to know the specs for these new servers - System76 Meerkats - can be found here. My standard with this last round of purchases is 64GB RAM and a 2TB drive, which is overkill for the current services at this point. I expect to have these Meerkats for quite a while and I have no idea what future demands of the hardware might be, so overkill it is.

Security

This is a really big one. The various NMRC public-facing servers are under constant attack and this has been the case for more than two decades. The servers have a main major function, and then in some cases a minor function that is typically either not public facing or is configured extremely conservatively. Each major service - DNS, email, web-based services - is therefore on its own server. For example, the main NMRC mail server does not take up a lot of resources, however it is a system with exposed services to the wild Internet. In the advent of an attacker compromising a service inside of a container does mean that they are “contained” to a degree, but if they were able to use additional flaws they could potentially escape the container and compromise other important services. This isn’t a theoretical flaw, and while one could apply various levels of security to mitigate a lot of the flaws, there might be a container that needs some level of access to function that requires a relaxing of certain settings. Therefore I personally prefer to not run multiple public services on the same system without better separation to prevent some level of escalation in the case of one service being compromised.

This isn’t a “clean” separation, but the main services surround NMRC. There is DNS, web, mail, and coding - so there are four separate systems. The coding server - blackhole.nmrc.org - runs a GitLab instance and that’s it. It was first set up on a System76 Meerkat, and I’ve been quite happy with the performance. However the three remaining servers ran on larger systems.

What has changed

Those three remaining servers are large and medium tower cases, which take up a lot of room. Additionally those servers were purchased kind of on the cheap. They were chosen because they were somewhat inexpensive (and available). This meant I really wasn’t paying attention to the size of the case or the specs on the power supply. I simply got something that worked. And they have worked just fine.

Now that I need that space back, I sat down and looked at what I was using, noting what worked and what didn’t in relationship to my needs. I had grown rather fond of the System76 Meerkat since the first one I purchased in 2020 for Blackhole. Having acquired two more since then, I had those mounted in a 1.5U rack, and they took up very little space at all. So I decided to replace Daemon, Talon, and Rigor-Mortis - all tower systems - with System76 Meerkats. I had considered changing the server software itself, but after some consideration and keeping in mind I wanted new features with a distribution that is well supported and gets security patches quickly, I opted to stay with Ubuntu. In fact I even opted to not only stick with 22.04 LTS, but I am using Ubuntu Pro (which is currently free for up to 5 machines), mainly for the security patches.

I’ve done hardware migrations over the years, and while it can be stressful, doing it in a controlled fashion is not that bad. The needed files were backed up as I prefer to do a fresh hardware load from scratch, install all of the required apps, and then restore data and configuration files only.

order of operations

There was an approach I took that seemed to make the most sense. It was as follows.

Daemon is the recursive DNS server running Pi-hole, functioning as both the DNS server for the other public servers as well as internal systems. Setting up Pi-hole in recursive mode was one of the latest projects done but I had dealt with Pi-hole before, so I decided to tackle that system first. Besides, Daemon functions as my logging server for the other public servers, and having the logging available during the migration for Talon and Rigor-mortis made the most sense.
I followed with Talon, which has been migrated the most over its 26 year existence, so I kind of knew what to expect. As usual, I forgot some of the Dovecot settings needed for sending outbound email without errors to their destination, but that was corrected quickly. I even had it listed in my notes as one of the steps, and simply skipped it. This happened the last time I migrated Talon. Oh well…
Rigor-mortis had basic static web services but also functioned as the Mastodon server and would require the weirdest steps for moving over the data - mainly the database as well as plenty of media files to backup and restore. As Mastodon was the newest (to me) technology to admin and I’d never migrated it, I was going to do that one last. That experience was weird enough I covered it in a separate blog post.
I would load on the base operating system onto a new Meerkat, and based upon the main and secondary services each server required, I would install additional packages as needed. Some services such as SSH and Duo Security’s Duo Unix (most of the steps in that older blog post haven’t changed, follow the link to the Duo instructions for clarity if needed) were loaded up and tested before any data was moved over. Other things such as the main NMRC mail server was copied over to ensure it was running and configured properly, and at the moment of actual migration the mail spools were backed up one final time and migrated over.
One server was completed at a time on a weekend, the old and new systems simply swapped IP addresses (and cabling), and were given a few days for any issues to crop up.

Once complete, the old servers were shut down and the new servers were more permanently set up physically in the rack.

Odd Tips

One tip is when you install Ubuntu Server and ask for the entire drive to be used, it goes ahead and sets up the volume. However if the drive is greater than 200GB, it will only configure the volume to 200GB unless you adjust it during a second installation page where you confirm the drive settings. One can easily miss this and you can be left with a four-fifths of that terabyte drive unused. If you miss this, there’s a fairly easy way to correct it, and yes I did miss this on Daemon, so here are the post-migration steps one can do while it was running which will correct the problem:

Confirm the volume info: df -hT /dev/mapper/ubuntu--vg-ubuntu--lv
Resize the volume: sudo lvresize -vl +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv
Complete the resizing: sudo resize2fs -p /dev/mapper/ubuntu--vg-ubuntu--lv
Check your work: df -hT /dev/mapper/ubuntu--vg-ubuntu--lv

Another tip is to verify all settings if you’re doing a fresh load of the operating system. If you started with an initial load of Ubuntu Server 18.04 and simply did upgrades to 22.04, restarting from a fresh load might result in different default settings. Check everything. To illustrate this, one of my minor hiccups involved time settings, which were coming up UTC on the newer systems even though I thought I’d adjusted the proper config files to reflect my local timezone. It seems newer installs came up as UTC due to new configuration files being stored in different locations during an install. Migrating from 20.04 to 22.04 copied the old config files over as a part of that migration. Nothing major, but it did result in some oddly-dated logging entries.

A nice trick was to have both the old server and the new server up and running. I simply swapped the IP addresses to the old and new servers, ran netplan apply && reboot, followed by a swapping of cables. I could then access the new server and see what was working and what wasn’t, and I had the old server I could access to look at exactly what was needed to fix any problems. Before wiping the old server I gave each of the new servers a few days, in case there was something obscure that I had overlooked.

Before the migration I was not using Ansible to manage most of the public servers. I had set up Ansible internally on various systems and had just started exploring what to include in the public systems. Talon functions as both a mail server and a shell server for NMRC members so there are a fair number of user accounts defined on that system. As data and services were set up on the new server, the numbering of the user accounts was substantially different with the addition of the Ansible user, so data had to be painstakingly hand modified. This wasn’t too much of a problem with regular user accounts, but some the accounts associated with mail services created some odd errors that took a bit to track down.

Every migration had at least one issue that caused some kind of reaction where I went “oh shit” but truthfully nothing really that major happened with Daemon and Talon. Rigor-mortis had issues, but as I mentioned before I covered that in that other blog post.

Other Minor Adjustments

I did address some cable management that needed some attention as one might imagine, but I also took the opportunity to retire the UPS battery backups. This might raise a few eyebrows, but let me state that with the advent of the battery system attached to the solar array and having the feed from the grid running through the Sol-Ark 15k, the UPS batteries were simply not needed. I’ve talked about EMPs in the past, but I should note that the Sol-Ark 15k is capable of withstanding an EMP so I have very little concern about removing UPS systems. Note that the reaction time of a UPS is measured in milliseconds, but the Sol-Ark’s reaction can be measured in nanoseconds, and since a UPS can’t react fast enough to prevent an EMP anyway I’m fully protected. Besides, I’ve had a few short outages that have lasted a few hours and the Sol-Ark with the batteries have handled it fine, and it completely negates the need for the UPS systems, so they are officially retired.

Migration complete

The new servers run using roughly 75% less power, and since I already had room in the rack, this is a massive savings as far as real estate goes in the server room. What did I do with the old towers? The drives were securely wiped to government standards as is my habit with all old drives, and they were loaded up with a fresh Linux operating system (fairly locked down, BTW) and provided to family and friends that needed a computer system. So they received new homes.

I’m very pleased with the new Meerkats as well as all of other improvements, but if something weird happens or changes, well, I will let you know!

Mark Loveless, aka Simple Nomad, is a researcher and hacker. He frequently speaks at security conferences around the globe, gets quoted in the press, and has a somewhat odd perspective on security in general.

Aug 24 Network Revamp 2023

A Bit of Background

Performance

Hardware compatibility

Security

What has changed

order of operations

Odd Tips

Other Minor Adjustments

Migration complete

Mark Loveless, aka Simple Nomad, is a researcher and hacker. He frequently speaks at security conferences around the globe, gets quoted in the press, and has a somewhat odd perspective on security in general.

Mark Loveless, aka Simple Nomad, is a researcher and hacker. He frequently speaks at security conferences around the globe, gets quoted in the press, and has a somewhat odd perspective on security in general.

Aug 24 Network Revamp 2023

A Bit of Background

Performance

Hardware compatibility

Security

What has changed

order of operations

Odd Tips

Other Minor Adjustments

Migration complete

Sep 1 Fun Friday: Rooster Blocked

Aug 18 Tales from the Past: Not Getting Caught Part 1

Mark Loveless, aka Simple Nomad, is a researcher and hacker. He frequently speaks at security conferences around the globe, gets quoted in the press, and has a somewhat odd perspective on security in general.