However, the fallout from the other five hard drive failures is starting to well and truly suck. I'm on the hook to estimate the cost of preventing future failures of this sort, which means either quoting a huge cost for the current solution, or at least having a design for a less-expensive solution roughed out in my head. By Friday.
On the bright side, it looks like the data is actually recoverable: some of the failed drives came back on a subsequent power cycle, so there were enough to run the RAID in degraded mode. Of course, moving the recovered data somewhere will require time I don't exactly have. So, ugh....
Last Friday, I got handed the on-call pager by surprise. I thought I was off the on-call rotation, but I hadn't been taken off the list yet. Oh well. It was either take the on call pager, or call abiku into the office for a surprise on-call week. I didn't want to do the latter, and on-call's usually pretty quiet, so I took the pager.
Unfortunately, the on-call gods also thought it was abiku's week, so I got what they had prepared for him. As in 5 drives failing within 8 hours on a RAID device. Everybody says a double drive failure is a rare occurence. Maybe if they said it some more, the disks would agree...
So after spending all night in the office, on and off the phone to the RAID vendor's tech support to see if they had any tricks to get the data back, writing off the data (at least temporarily), and getting back home at about 7:00am, getting some sleep, and then going about a rather pleasant (if delayed) labor day, I blew my keg of barleywine. All those barleywine nights added up to about 5 gallons, apparently. Now I really need to brew more, since I'm down to a half-case of my 2003 vintage now. My boss encouraged me to take an unofficial "comp day" for the work I did during the outage, so I may take that as an opportunity to brew a replacement. For now, though I'll make do with IPA.
But of course, this wouldn't all be worthy of a rant without noting that my cursed laptop also got in on the act, providing me with a sixth disk failure in 72 hours. At least I won't have to explain this one to management, but I may need to ship it to Taiwan again. Of course, it could just be a loose hard drive cable. I have my fingers crossed, but with my recent tech luck, I'm not too optimistic.
If this keeps up, I may have to take some time away from any technology not fixable with a length of 2x4 and a mouthfull of 10 penny nails....
To be fair I (or rather, a co-worker) did take it apart to reseat a loosened cable. I'm still not sure what the vendor was thinking -- that we hadn't put it back together? In any case, it wasn't too much of an issue. Instead of replacing the whole laptop as was originally arranged, they would "only" replace the screen and motherboard. Fine.
And then I heard nothing. A couple weeks later, I pinged sub300 about it. Then a couple days later. And finally, this Monday. At 3:00-ish AM Tuesday, I heard they had just shipped it. It arrived Wednesday morning, no FCC hassles, etc. Somebody must have pulled out all the stops.
Now all is well. The new LCD has a single red pixel, but I can live with that. The hard drive has not been wiped, so I'm still running FreeBSD. The only thing I've changed is setting the magic sysctl (hw.acpi.lid_switch_state=S3) to put the laptop to sleep when the lid is closed. I'm hoping that heat had something to do with the LCD woes, and that this will hold them off for good (or at least, until I'm ready to buy another laptop).
With the sole exception of this last delay, I've been very impressed with the customer service of both Fox and Sub300, as well as the laptop itself. I'll be charitable and assume that the delay was for a burn-in period to see if the problem re-appeared :-)
I found the little firewire chassis oddly cool... Now I want more of them. And a couple 200+G hard drives. Hook them to mikk.net mirror them, and have more space than I know what to do with, literally. I'm not a huge music downloader, much less movies. As a matter of fact, every computer I've been on has fit "my stuff" in under 10G. But hey, this is a "build it, they will come" sort of thing.
For example, I could now set up an open PVR and use the space to archive shows. Oops, I'd probably need a few more disks for that to be worthwhile. And the PVR. And I'd need to run some more ethernet. Or plunk down on a faster wireless gateway, which I almost did while picking up the firewire enclosure. This is getting expensive.
And it all started by marveling at a little toy. I thought I had recovered from this addiction. Maybe I should cool off by cleaning out some of the dead/obsolete hardware in our computer room. Or finish a woodworking project. Or brew, or at least rack something....
While working on this, my laptop's LCD developed another fucking line. This time, a vertical green line toward the left side of the screen. What luck.... I'll probably have to ship this bugger halfway across the world and back AGAIN. At least I'll have the benefit of experience this time :-/
Operating system not foundAt first I thought the manufacturer had just wiped the disk, but after a couple unsuccessful attempts at installing a new OS, I finally noticed that the hard drive was MIA.
Oh, bother. Did they remove the hard drive? Nah, it could have died on its own, or more likely, a connection shook loose. I hate taking apart laptops. The last one I took apart never really recovered.... Luckily, there's a guy at work who lives for taking apart small electronic devices and, the important part, can actually get them back together again. I brought the laptop over to his cube, he took it mostly apart, explored a bit, re-seated the hard drive cable, put it back together, and it worked. He thanked me for the disassembly opportunity, I thanked him for fixing my laptop, and I went away happy.
As an added bonus, the manufacturer did not wipe the hard disk, so my existing FreeBSD installation was still intact.
I just finished mixing up a keg of root beer for Jess's cousin's wedding this weekend. The guests are staying in a state park, so no real beer. That's OK, because I'm starting to get "low", with only 10 gallons on deck and probably 10-15 gallons on tap. I'm starting to ponder when and what I'll brew next.
And then it hit Anchorage, AK, and got held up waiting for an FCC form. Form 740, specifically. Pretty straightforward, except for a few of the blanks. After a few phone calls to Sub300 and FedEx, I was pointed to part II, option #2: basically a box you check to say "I don't need to fill out this form". I wish all forms had that. It would make life so much easier.
So I pulled the trigger on the 10th, and it arrived on the 15th. It was good as advertised, just a little noisier and warmer than I expected, although not as bad as my older laptop which both sounded and felt like a jet engine when it got going. The fact that the manual was printed in Chinese with no English (or even Engrish) translation was a minor annoyance, but not a show stopper (who reads manuals anyway?)
In fact, the only thing I didn't like was the OS, the Linspire distribution of Linux. It's a good slick, integrated desktop based on KDE: usable, but not what I wanted. There are no alternative desktop environments/window managers shipped with Linspire, not even twm. Worse, there's no development system or X libraries installed to let you download and compile your own software. I managed to figure out enough about apt-get to get a C compiler on the system, but ran into some package conflicts getting the X development stuff installed.
And then I killed it on the 17th. It (either Linux or KDE, don't know which, don't care) became unresponsive when I was trying to manually edit some network settings, and I power cycled the computer to get control of it. This corrupted the root (and only) filesystem to the point it could not mount it or find fsck. I ordered a usb CD-ROM drive later that day. ($39.99 CD-ROM only, no RW/DVD, from isellsurplus.com).
Over a week later, I received a notice that the drive had just shipped. Grrr.... To work off my anger, I set up a quick-and-dirty netboot set-up and got FreeBSD installed on the laptop. Woo-hoo!
FreeBSD 5.3 worked just fine on the laptop, except:
Then yesterday, I was getting ready to move in to the new laptop when I noticed a vertical blue line down the middle of the screen. Argh, LCD defect. Best-case, I get to ship this back to Sub300, they send a replacement laptop, and I get to do the install and move-in process all over again (hey, at least I'll have notes this time ;-). I have the best luck with technology...
At least the new mikk.net (an Asus pundit also from Sub300) went smoothly. It's well-supported by FreeBSD 5.x. It's also very quiet, which was the main reason I bought it.
From: "mail15.com" <...> Subject: %HEY%PLUSHEY %CHILL %DICK %CONTACT http://www.%URL/d/1.php %BYE %ASSHOLE g
Yow! Is this sexual intercourse yet?I knew it was definitely time to go home...
I spent some time today racking my brain over possible shared-disk clustering with FreeBSD. Linux has had this for a while, but, er... I like FreeBSD :-) In particular, RedHat has been distributing the U of MN/Sistina Global Filesystem (GFS), a SAN shared filesystem supporting multiple read/write clients, with basically no bottlenecks or single points of failure. Very cool stuff. If it ran on FreeBSD, I'd be using it in a heartbeat.
I was looking at something more primitive though, namely having a single filesystem with one host mounting it read/write, the other mounting it read-only (if at all), with the possibility of "taking over" that filesystem if the other host fails. This is mostly trivial stuff: just make sure that the SCSI chain is properly terminated, and re-mount the volume read/write if the other host fails. The hard problem is reliably determining that the other host has failed, and distinguishing host failures from network partitioning. Both boxes mounting the filesystem read/write would be really, really bad.
The last time I took a serious look at GFS (back in 2000-2001, when it was going to be ported to FreeBSD), it relied on a SCSI locking primitive to synchronize access. Theoretically, I could use this mechanism to lock the other host out from the shared storage device. This mechanism was unfortunately never standardized or widely implemented, so the modern GFS relies on distributed lock manager software on the participating hosts.
This has the same problem: If a host's network fails while it's holding a lock, it's not safe for other hosts to "steal" that lock because the supposedly-failed host may still access the shared disk thinking it still holds the lock. There needs to be some mechanism to forcibly prevent the failed host from accessing the disk.
One possibility, I thought, would be to somehow physically switch off the SCSI connection to the other host. In a SAN environment this could be done by logging into a switch and turning off the failed host's port. Or, it hit me, we have most of our servers on remotely controllable power switches, and the failover system could run a script that logs into the power switch and turns off its peer. Yuck, what a kludge.
So, I looked back through the GFS sources and docs, and noted some references to "fencing off" a failed node from the storage. This is provided by a cluster service called "fenced", which I promptly downloaded and checked out. Fenced supports multiple mechanisms for separating a failed host from storage, implemented by separate scripts. For example, the fence_brocade script works with Brocade SAN switches by, well, logging in to the switch and shutting down the failed host's port.
And then I noticed one of the scripts was named fenced_apc. Our power switches are APCs, hmm... Yep, they're using the mechanism I'd just thought of and nearly rejected as too ugly...
That's what happens with a lot of my ideas. They go through the following stages:
This hit me the other day at work, when I met one of the
more established engineers in the company. We dropped
by his cube, and I noticed three "Patent awarded to ..."
plaques on his desk. Now there's a trophy. I
was quite impressed at that point1, and the actual
meeting confirmed my first impression. The conversation
was enlightening and fast-paced enough to make my brain
hurt, something that doesn't happen nearly
often enough.
1As a kid, I always wanted to have patents when I grew up.
I need to get cracking :-)
fix to toshiba S127 problem of free BSD how to boot freebsd 5.3 in toshiba S127Hey! my site's actually useful: it at least got this fix indexed on google, currently the first hit for the first, and the only hit for the second.
Historical IPAEnjoying this one with ice cream right now (yes, it's an odd pairing; I'm not sure I'd recommend it, although it's not half as bad as it sounds, and a fair bit better than a lot of wine and food pairings I've been sold). This one's also the first hit on google as of a few minutes ago, ahead of the original recipe I used as a base.
calorie counts beer RasputinFunny. There's a great beer named (Old) Rasputin, but I was talking about the Boney M/Boiled in Lead song. And, of course, beer.
brew beer broken glass thermometerNot a recommended brewing adjunct. May we kindly suggest searching for "perforated bowel" in addition?
for what makes yeast foamIt foams for thee.
hallucination in sport her sweet hand???... Apparently the second one's a porn site, and the search eventually hits "a sweet hand-turned chalice" from my homebrew contest post a couple weeks ago, but the first I can't figure out. Band name?
The story involves a reformation-era heresy called antinomianism. Antinomians hold that it's already been decided whether we're going to heaven or hell, so it does not matter what we do on earth. It's sort of the protestant/Lutheran "not saved by good works" idea reductio ad absurdum. At the beginning of the story, the progenitor of this heresy has been brought back to life by a modern scientst, except 50 feet tall.
If nothing else, it's a novel (well, "odd," at least) premise for a short story, and I'm all about the wierd premises...
Of course, I never bookmarked the link or anything. All I remembered was that the revived heretic was antinomian, and 50 feet tall. Googling "antinomian" turned up a "Catholic Encylopedia" entry on antinomianism. Aha! That got me the name of the heretic: Johannes Agricola. Next query:
50 foot Johannes Agricolaand the first link was the story: Night of the Antinomian.
Ah, success, and a nice silly ending to a crappy day.