Well, Mortal Kombat Online is facing an extended downtime. Apparently we are suffering from such a catastrophic hardware failure, so to speak, that it’s going to take a few days and a LOT of work to get everything back up and running the way it was. I’ve left the standard disclaimer on the site itself, but I figured I would use this post on my own site to give an in-depth explanation as to what happened and allow others to give feedback if necessary.
In any event, here’s what’s happening.
Back in Christmas of 2005, the hard drive on the main MK Online server suffered a crash. We were forced to get our hosting provider to replace the drive; when we had them replace it, we demanded that they put in mirrored drives like we had when we first got our dedicated server with them. (At the time, we were actually on a new dedicated server.) They did so, but instead of using two identical drives they used a single SCSI (heavy-duty) drive and a much larger IDE (consumer-level) drive. We had meant to get with our hosting provider to fix that, as such a configuration is much less than ideal. However, we never got around to it as bigger issues kept coming up.
Well, recently we’d been noticing system crashes, caused by memory errors. We soon realized that it was because of virtual memory (which is on the IDE drive), and it appears that the IDE drive is close to failure. The problem has gotten much worse over the past week, to the point where the filesystems are being corrupted. The last straw came an hour ago, when the forum database table that held the actual posts was lost. Obviously, we can’t continue like this.
So what are we doing to correct this? I’m working on restoring a backup copy of the database from this morning. With luck, it will be in a decent state but there is the very real possibility of data loss. In the meantime, our hosting provider is building a newer, faster server for us. Once it’s ready, we’ll rush to copy the data from the old server onto the new one and try and get us back online as quickly as possible. Still, we expect to be offline until the weekend at least.
We apologize for the delays and inconvenience this has caused. Trust me, I’m as frustrated as anyone else about this, if not more so. (After all, I’m the one trying to pick up the pieces here….) Hopefully once we have everything corrected it’ll be smooth sailing. :-)
EDIT – 7:00 PM CDT – It looks like the latest backup of the database is intact. Everything up to this morning at around 9 AM EDT appears to have been saved. What a relief… heh. Now we just need to wait for the new server to come online, and we’ll work from there. I’ll provide more information as it becomes available.
EDIT – 7:50 PM CDT – By the way, the Java chat is still available for those of you who come onto #mortalkombat (aka, the #MortalKombat Online IRC Network). You can reach it by clicking here.
EDIT – 7:20 AM CST 10 May – Thanks for the input, everyone. I just wanted to point out a few things based on comments I am seeing…
- For those asking, when the site comes back up it will come back up with no changes in layout, software, etc. CCShadow hasn’t had time to work on version 8 of CDS (the backend of the site), and to expect us to change software packages at a drop of the hat is unrealistic at best.
- Jonin01 is right in that this will take time. I’ve done everything I can do on this end; now it’s just up to our hosting provider to get a machine ready for us to migrate to. Once I hear the new machine is up and running, I can copy all of our old data to it and bring it online. Until then, it’s simply a matter of waiting.
- Finally, for those offering assistance, it is much appreciated. Truth be told, we have everything under control; I suppose it helps a great deal when I do this stuff (system administration and support) as my paying job as well as for MK Online. It’s really not as catastrophic as it sounds; even if we were to lose the main server filesystem today I have a site backup elsewhere that can be used to get MK Online back up and running. We’re in good shape, all things considered; it’s more annoyances than anything else right now.
As before, once I get more information I’ll be sure to post it here. Thanks for your support and patience while we work to get these problems taken care of. :-)
EDIT – 2:20 PM CDT 11 May – The new server is ready for us to start working on it. However, being out of town and without a direct path to the internet, I can’t access the server properly to begin working on it. I should be home by 10 PM CDT, at which point I’ll begin working on it. If all goes well, we should be back up and running tomorrow evening. :-)
EDIT – 10:50 PM CDT 11 May – There’s a problem with the operating system’s configuration that’s preventing us from using it. Rather than going through hours of trying to fix it manually, I’ve had CCShadow contact our hosting provider with precise instructions for an OS reinstall. Hopefully sometime tomorrow it’ll be ready for the actual site reimplementation.
EDIT – 9:40 PM CDT 12 May – No, our hosting provider hasn’t rebuilt the server yet. I’m going to get with CCShadow as soon as he gets online about contacting them directly and getting them to hurry it up. I know everyone’s getting annoyed; I’m probably more annoyed than anyone at this point. Thanks for your patience.
EDIT – 3:35 PM CDT 13 May – Our hosting provider went ahead and made the needed changes without requiring a reinstallation of the operating system. I’m in the process of configuring the server and restoring the data. This will take a while, so please be patient.
EDIT – 7:50 PM CDT 13 May – All of the data is restored. We’re working now to get the new server onto the old server’s IP address and do some final testing. Once that’s done we’ll reopen the site.
EDIT – 8:50 PM CDT 13 May – It’s going to be a while, as their automated phone system is backed up and I can’t get a call through. CCShadow will call them when he gets home and get it online. I don’t know if we’ll be open tonight, but we’ll do our very best. Otherwise, we’re hoping tomorrow at the very latest.
EDIT – 8:25 AM CDT 14 May – The good news is that the new server is on the necessary IP address. The bad news is that I’m encountering some serious issues getting the site to work properly on the new server. I’ll keep working on it throughout the day to see if I can get the problems fixed.
EDIT – 11:30 AM CDT 14 May – CCShadow and I have taken a preliminary look at the server. Right now we think there’s some incompatibility or software conflict that’s causing CDS not to work properly. I’ll continue looking at it when I have time (which isn’t often as there are upgrade issues here at work that need resolving; as that’s my paying job it has to take priority), and CCShadow will do heavy-duty debugging when he gets home tonight.
EDIT – 3:10 PM CDT 14 May – If the testing we’ve been doing is anything to go by, we’ve knocked out all but one minor annoying bug. Right now we’re evaluating whether it’s safe to reopen the site now or not. If CCShadow gives the go-ahead, we’ll go ahead and reopen.
EDIT – 3:55 PM CDT 14 May – Never mind. It turned out that one bug is affecting more site functions than we thought. Reopening will have to wait until CCShadow’s had a chance to stamp out the bug. Sorry, folks… this is frustrating us as much as it is you. Hopefully we can be back up and running tonight…
EDIT – 10:35 PM CDT 14 May – We’re back up!