An Article from Aaron's Article ArchiveWhen Good Upgrades Go Bad
Photo: Close-up of Sand in BloomIPv4You are not logged in. Click here to log in.
Use Google to search aarongifford.com:
When Good Upgrades Go Bad
Thursday, 22 January 2004 7:08 PM MST
Web Site News
Monday night, I shut down my personal web site and other services running on my server at my house in preparation to upgrade to from FreeBSD 5.1 to FreeBSD 5.2. On Tuesday my server spent it's CPU cycles building version 5.2.
Sometime later, I rebooted to 5.2 and things looked good, or nearly so. My mail system refused to start, so there must have been something between FreeBSD 5.1 and 5.2 that freaked Postfix out.
Once running version 5.2, I then started rebuilding various software packages, including Postfix, so that I'd have the latest stable versions and so that I wouldn't run into any more unexpected weirdness like I had with Postfix.
Wednesday morning after I got up, I restarted the server again. Trouble! My software RAID-5 array failed to come up, the vinum system crashing my kernel. Eeek!
I went into panic mode, booted to single-user (my root partition is NOT on a vinum partition, and so was accessable). I typed
Next reboot, instead of doing
I went through many contortions trying to get my RAID-5 volume back up, since it has lots of my personal data I've accumulated through the years on it (and it has this web site too), including lots of my digital photographs, etc.--I really didn't want to lose it. I even stooped to removing the old vinum configuration, and reconfiguring the volume, plex, and subdisks carefully by hand, forcing them into the down state, then forcing them up when I was ready.
Through it all, something in either the 5.2 kernel or the 5.2 vinum command-line tool just kept freaking the set-up out. Sometimes I'd get errors that wouldn't let me write the configuration back to disk. *Grrrrrrr!* I was frustrated—very!
At one point, I decided to double-check my FreeBSD slice partition information with
All this time, I was using
After resetting my vinum partitions (using something like
This morning, I decided to try the only thing I could think of that I hadn't yet tried. I would attempt to boot from my backup root partition which had my old 5.1 kernel and root-parition binaries on it, and try one last time to bring my RAID-5 array back online.
If this failed, I would either lose my data forever, or have to spend some big $$$ to build myself a new hardware-based RAID-5 array large enough to store disk images from my four 120 gigabyte drives, then hope that over the course of weeks or months I could decipher how vinum works well enough to reconstruct my data by hand.
Hallelujah! It appeared to work! With FreeBSD 5.1's vinum, when I configured things by hand and then brought my volume, plex and subdisks to the up state, I got no errors. Then, when I saved the configuration back to disk, vinum didn't crash like it was wont to do with my 5.2 kernel.
But the true test was to run vinum's parity checker,
I left the machine scanning away while I did other things, then later in the day returned to see that it had completed without finding any parity errors. What a sweet and hopeful sight that was to see! The filesystem check went more quickly and found no errors either.
You don't know how happy I was. I was elated as I ran
This afternoon, I set my machine busy downloading sources for FreeBSD 5.1 so I could reinstall 5.1 cleanly from sources, then rebuild the various pieces of server software (like Apache for running this web site) and other utilities I like to use.
Tonight, things are at last mostly back to normal, web, database, and e-mail reinstalled. I've still got some final clean-up, and I need to reinstall Samba so I can share my files with my Windows boxes.
After this incident, I don't know if I encountered a bug in vinum, or FreeBSD 5.2, or just a problem more specific to my compiler settings in
Some of you might wonder why in the world I'm running 5.x in the first place, since it's the 4.x line (4.9 RELEASE currently) that FreeBSD recommends for production use. Well, I only run 5.1 because I was crazy enough last year to try out 5.x. And once 5.1 seemed stable, it just didn't seem worth the trouble to try to backtrack to 4.x.
But I have learned my lesson with my personal data. I'll definitely be far more conservative in all future upgrades of my personal system. I mean, I wouldn't try things like this on my employer's production boxes, so why on earth would I experiment with my own system when I don't want to lose my personal data stored there?
Now others of you might wonder why I don't have something in place for doing backups. Hey, I'd love to be able to back things up, but after having spent so much on this RAID-5 array, and having collected more than can fit on my external USB/Firewire 120 GB drive, I don't really have a good way to back this stuff up. Someday, I hope to, though...
And that is how my good upgrade went bad.