My server has failed me.

Well, this was a fun week on the server front.

On Wednesday morning, after I had my breakfast, I logged into my main PC to check email and morning news (as I usually do before going to Cypress or Creekmont). Unfortunately, I found a major problem on my main server involving the hard drive holding the operating system. The system had registered enough errors where it put the filesystem in read-only mode. That’s usually a very good sign that the drive needs to be replaced. I sighed, and rebooted the machine to clear out the problem. I resolved to see what spare hard drives I had at the house when I got home from work, and do the replacement on Saturday.

Alas, that was not meant to be. Later that morning, while at the office, I got a notification that the main drive had again gone into read-only mode. I sighed, logged into the machine, and shut it down. At that point I was resigned to reloading the server that night. I went ahead and downloaded the most recent version of Slackware, and otherwise did work as normal.

I’m sure a couple of my Linux-using friends are probably raising their eyebrows over the choice of Slackware, considering at work (both main job and MK Online) I use CentOS, which is a free reimplementation of Red Hat Enterprise Linux. The truth of the matter, though, is that when I started with Linux, I started on Slackware. Considering Slackware is the oldest of the active Linux distributions, you really can’t get much more old-school than that. :-) I had briefly considered using CentOS for the new server build, but decided that I really had no reason to. Slackware was what I started with. I’ll keep using Slackware for my personal servers until such time it no longer is actively developed… if that ever happens.

When I got home, my first order of business was to switch the server back on and copy off the AMANDA databases and indexes, copy off the entire configuration directory, and make a full dump of the MySQL database. From there I shut down the server, disconnected it, put it on a table, and got to work opening the case and removing the OS hard drive. I’m actually not too surprised the drive finally failed, to be honest. It was a 9 GB SCSI hard drive that I got with the original server hardware back in 2002, and even back then the drive was probably old. Fortunately I found a 13 GB Maxtor hard drive in my pile of spares, and ended up using that. I burned the Slackware 12.2 CD images to discs, reconnected the server (with the case still open in case there was a problem with the “new” drive), and tested. Once I was sure it had come up fine, I put the case back together, hooked everything up, and installed Slackware on the machine.

The rest of the night was spent getting the mail system operational. While it wasn’t that important that my email, my mom’s email, or even my friends’ email be up ASAP (sorry, guys!), my dad’s business uses my server for email hosting, so it was kind of important I get it back up for his sake. I finally had email back up and running by midnight… and it was NOT an easy process. One thing I always forget about these reinstalls is just how many services need to be brought back into operation. In this case, it was the SMTP service, POP/IMAP service, spam filter, virus filter, and greylisting service. That doesn’t even include the webmail, which requires the web server to be operational. By the time I had all that operational, I was ready for bed.

The next day, I worked on getting everything else operational (in between doing work tickets). The biggest pain in the ass ended up being the web server; I encountered more than a couple of problems getting it up and running alongside mod_security. On the other hand, fixing those problems showed me why an upgraded Apache wouldn’t work on my old install. The thing that took the longest to fix was a problem in the webmail, and that ended up being an extremely easy fix once someone in Freenode’s #slackware channel pointed me the right way. By the end of the day Thursday, things were more or less completely operational.

So, in effect, if you read this site and noticed it was down or I host your email and you lost access temporarily, then I apologize. I hadn’t expected needing to replace the drive that quickly. Fortunately things appear to be running smoothly, with the possible exception of a couple of headaches that I’ve so far been able to work around. Hopefully I won’t have to do this all over again soon, but in case I do, I hope to have early warning. We’ll see what happens.