Mysterious Uptime
Or: Why RAID isn’t foolproof.
First, a little bit of background. At home, I have a MythTV installation. And as part of that installation, I have a MythTV Backend, which is basically a glorified fileserver that sports a couple video capture cards, the MythTV scheduling and recording software, a mysql database, and a few other odds and ends (not the least of which is this web server). Now, being a fileserver, one of the jobs that machine fulfills is to provide large amounts of storage, primary for MythTV recordings, and since I don’t want to lose those records, I have my storage set up in a RAID-1 mirror, which basically takes two drives and makes it look like a single drive, while underneath, anything written to the logical drive is actually written out to both physical disks. That way, if something bad happens, I have what amounts to a live backup that I can quickly switch to (in addition to my regular, nightly incremental and weekly checkpoint backups).
So I came home on Wednesday night to discover something rather annoying: Some sort of write error had occurred on one of those physical disks, and so the mirror was degraded and deactivated. Now, this has happened in the past (I think it’s related to a buggy DMA implementation on my SATA controller), but usually recovery is pretty easy: remove the bad disk from the mirror, then re-add it, which causes Linux to synchronize the two disks, using the good disk as the primary. But for some reason, this time, it wasn’t so easy.
See, when I ran a command to view the status of the mirror, I found both drives marked as “removed” (ie, taken out of the mirror), and one marked as a “spare”. That itself is kinda weird, as usually it’s one active, and one failed. “Whatever”, I told myself, “I’ll just take the spare out of the mirror, re-add it, and then add the other drive, and voila, that should be it”. But when I attempted to re-add the spare, I got the weirdest error message:
cannot find valid superblock in this array - HELP
I can tell you right now, when your computer is imploring you for help, it’s probably a bad thing. Now, for those not in the know, a superblock is kinda like a special marker on the disk, and in this case, it tells Linux which mirror the disk belongs to, along with a bunch of other metadata. This error indicates that this decidedly important piece of bookkeeping information was, supposedly, absent. That’s bad. Unfortunately, googling around lead me nowhere. Even more confusing, when I attempted to mount (ie, attach, connect, etc) one of the halves of the mirror, the OS detected the filesystem, and the contents of the mirror looked to be intact. And running a tool to examine the RAID mirror components returned what looked like perfectly normal data.
In the end, I gave up for the day, figuring I would come up with some strategy for moving forward the next day. Eventually, I settled on breaking the mirror up, mounting both drives separately, and then using a tool like rsync to manually back up the primary disk to the secondary… not an ideal solution, as a disk failure means you lose everything since the last snapshot, but it’d do the job, and I wouldn’t have to deal with RAID headaches anymore.
So this evening, I fire up zaphod (that’s the fileserver name) into single user mode, and as I watch the kernel messages scroll by, I see the RAID mirror… start up perfectly normally. Examining the mirror showed one active disk, and one re-syncing, suggesting that the kernel was rebuilding the RAID successfully. What. The. Heck. And as of this writing, I still have absolutely no idea what on earth went wrong, or how it magically got fixed.
Lucky.
Again with the NetBSD
Well, it’s been a couple days now, and I continue to fiddle around with NetBSD… it’s definitely not going to be displacing Ubuntu any time soon, but it’s definitely an amusing project to play around with.
Most recently, as I was testing out Evolution (my email client) compiled from pkgsrc, I discovered that it started up incredibly slowly. Like, 5 minutes from invocation to a window popping up on my desktop. So, a little Google-fu, and I found myself here. It turns out that one of the things Evolution does a lot is attempt to open shared libraries that don’t exist. Unfortunately, those failures are very expensive, and as of 5.0.2, NBSD’s linker doesn’t cache the failures.
And this is where that blog post comes in. The author of that post wrote up a negative lookup cache and incorporated it into the NBSD dynamic linker. By itself, that’d be interesting, but what’s deeply cool about this is that I was able to get a patch representing his change, tweak them, apply them to my local copy of the NBSD source, and then build out and install a new version of the dynamic linker. Result: startup times went from minutes to seconds. I’d call that a huge win.
What this fundamentally speaks to is just how open and easy it is to fiddle around with the internals of NetBSD. The entire system is designed to make it trivial to alter the base and rebuild it out from scratch, which makes it possible to do the kinds of things I just did. Very cool!
Next up: Attempt to hack nouveau DRI support into the kernel so I can get reasonable video performance.
BSD-Curious
So for no particular reason at all, I recently got the urge to try out a BSD variant on my laptop. Now, historically I’ve been a die-hard Linux user, having cut my teeth on Slackware back when you needed dozens of floppies to install the thing (as a quick aside, I didn’t have internet access at home at the time, and so I used a PC at school to download Slack from a local BBS, which meant trucking dozens of floppies there and back… which was really fun when, say, disk 12 of 20-something had a bad sector, requiring me to return to school the next day (leaving the install process up and in limbo in the mean time) to write out a new disk). Since then, I’ve worked with Redhat, Debian, Fedora, and Ubuntu, but have never strayed outside the realm of Linux, and so, in a fit of boredom, I decided to address that little shortcoming in my technical upbringing.
Of course, there are multiple BSDs out there, each with their own focus and vision, and chosing one is often a matter of taste. My initial choice was FreeBSD, which I threw on a 10GB partition on my laptop, after which I found myself facing the familiar command prompt (well, not quite familiar… it was straight sh instead of bash, which was… annoying), and a fairly barebones system. At this point I discovered an important difference between the BSDs and, say, Ubuntu: out of the box, they tend to provide a very bare-bones system, enough to get you bootstrapped so you can build the system you need. But you have to build it. Not that I mind, I’m a tinkerer at heart.
I then spent the next couple days fiddling around with the system and configuring it as necessary, which was a very different experience from what you see in Linux. You see, in FreeBSD (and NetBSD, which I’ll get to later), the primary system configuration, which includes network configuration, system daemon selection, and so forth, is all stored in a single file in etc called ‘rc.conf’. In contrast, Linux distros tend to manage things in varying ways, which means you to need to learn individual platform quirks and tools, something which is always a bit tedious. And so, by playing with the rc.conf, I was easily able to get networking up and running, including my wireless card, various system daemons, and so forth. And after that, it was off to install some interesting programs.
And this was where I discovered my next surprise. In the Linux world, package managers are really king, with two main contenders vying for the top spot: deb and rpm. Of course, there are a few outliers (Slackware’s tgz’s, Gentoo’s portage system, etc), but for the most part, modern distros are based on one of these two package management systems. Not so with FreeBSD. FBSD uses a system called ‘ports’, which should be familiar to a Gentoo user, as portage is really a rip-off of ports. In essence, ports is a gigantic set of scripts, where each supported application is represented by a directory containing Makefiles, patches, and so forth, which can be used to install the application. A simple ‘make install’ in the directory results in the source for the package being downloaded, patched, configured, built, and installed. It’s really quite slick, if you’re interested in building everything from source (which can take quite a while). Of course, FBSD also has binary package support, but building from ports is the most common way people install software in the FBSD world.
Unfortunately, I finally hit a brick wall with FBSD on my laptop when I attempted to suspend it. Big mistake. You see, it turns out that, even now, with FreeBSD 8.0, support for suspend/resume is incredibly weak. So while Linux has stumbled along and finally reached a point where things kinda sorta work most of the time, FBSD is, I’d wager, at least 5 years behind. Which is a real shame, as I use suspend all the time with my laptop. And thus it was that FBSD as a possible OS alternative was nixed.
So, what next? Well, in my mind, the most obvious alternative contender was NetBSD (I eventually chose the 32-bit version for reasons I won’t get in to here). Like FreeBSD, NetBSD installs to a very barebones system, though even more barebones than FBSD, if that can be believed. In fact, the ISO for the installation media is a mere 250MB, give or take, which is pretty diminutive beside FBSD’s 2GB DVD image (though, to be fair, FBSD’s DVD ships with a ton of pre-compiled packages, while NetBSD leaves you having to download all that software from the intertubes). Similar to FBSD, the entire system is configured through /etc/rc.conf, and basic configuration was equally easy. Once that was done, again my thoughts turned to software.
The NetBSD package system shares a lot of commonalities with the FreeBSD system. Which shouldn’t be surprising because NetBSD’s system, pkgsrc, was forked from ports back in 1997. As such, they share an underlying philosophy, and so the two systems operate very similarly. I will say, though, that ports does have one significant advantage over pkgsrc: Much better OS integration. See, pkgsrc is really a sister project to NetBSD. As such, it can actually be run on myriad operating systems, including Linux, among many others. But that means that the system doesn’t tie into the OS all that well. So while a ports package, once built, will populate /etc/rc.conf will configuration values, throw itself into /usr/local/etc/rc.d, and so forth, a pkgsrc package requires the user to perform extra work to integrate the software into the OS. Additionally, I do prefer the way ports actively prompts the user for configuration directives for packages that provide them, but that’s probably just a matter of taste.
Of course, I once again made the mistake of investing a fair bit of time into installing packages before I decided to test out suspend, and once again I was disappointed, though somewhat less so (which is why NetBSD is still on my laptop). Suspending the laptop worked flawlessly, and was incredibly fast. Honestly, I’ve never seen a laptop go to sleep that quickly. But on resume, oddly enough, my videocard doesn’t get initialized properly (this is a known problem with nVidia graphics chips in general, and on my laptop model in particular). On the other hand, everything else works perfectly (the OS is actually fully responsive under the hood, the display simply doesn’t come on). Some hacking got things sorta working, but not reliably, so for now suspend on NetBSD will have to wait. But at least there appears to be a chance.
So for now I’ve decided to stick with NetBSD. Naturally I expect there to be more problems and limitations (at minimum, I’ll be stuck with nv as my X driver, as nVidia’s binary blob isn’t supported on NetBSD), and I doubt it’ll displace my Ubuntu install, but it should be fun seeing if it can!
And quick aside: I was very impressed to discover that both Free and NetBSD supported essentially all the hardware on my laptop, without exception (well, save for ACPI suspend, of course), straight out of the box. Very nice!
Transition Complete
Welcome to the new domain! As per my previous post, I’ve made the migration to my new domain, “b-ark.ca”. Additionally, this website is now IPv6 accessible, so anyone with IPv6 access (either through a tunnel broker, 6to4, or teredo) will be able to reach this place over v6 instead of v4.
As an aside, Hurricane Electric and Afraid.org are awesome services. Tunnel performance is spectacular (I see maybe 20ms extra latency over IPv6 versus IPv4), and they provide a routed /64, a full routed /48 if you want it, and support for reverse DNS delegation (so my IPv6 addresses will reverse resolve to my host names).
Meanwhile, Afraid.org has excellent support for IPv4 and dynamic DNS, and IPv6, both forward and reverse. Now maybe I’ll go apply for an “IPv6 Enabled” badge to stick on the website…