Adventures in Hard Drives

“I don’t have time for backups.”

Most people feel like their data is a kitten asleep in its mommy’s paws.

These are the words spoken by those who have not yet lost data that they cared about. Most people undervalue their data, but when it is gone they realize how much work it’ll take to replace it, if it’s replaceable at all.

Having seen several people go through this, I made it my goal today to finally get around to adding a backup disk to my home server. It runs Debian Linux and has been rock solid for about 3 years. In fact, it was the subject of This Blog Post over at my other blog, TidbitsForTechs.com. It’s an older HP Pavilion, Athlon 64 X2 with 3GB memory and a trusty 160GB Western Digital hard drive.

Enter two hard disks: A Seagate 1TB disk salvaged from a dead all-in-one computer, and a Western Digital 1TB disk salvaged from a failed USB enclosure.

The Beginning of The End

The plan was to add the two 1TB disks in a RAID 1 Mirror and use them as the /home/ partition on the server, and move all data to it for safe keeping. I started by running “smartctl -t short /dev/sda” and making sure my current disk passed the SMART tests. It did, and so I powered down the machine, gave it a much needed cleaning, and installed the two extra 1TB disks as planned.

Things were going swimmingly until I powered the machine back on. You can listen to what this drive sounded like on my SoundCloud account here: WD Click of Death

Yes, you guessed it. In the 30 minutes between finishing the SMART test and powering it back up, the drive failed, and I had no backups! I had the original source data for much of it, but such things as ssh keys, custom scripts and configuration files were all gone. Or were they?

The Freezer Trick

You’ve probably heard of The Freezer Trick. If not, here’s the run down. Put the failed hard drive into a freezer rated zip lock bag. Next, put the bag in the coldest spot in the freezer. Then, wait several hours for the hard drive to be really really cold, then power it up and grab the data you need before the drive warms up.

So, I did exactly that. I have a small chest freezer, and I stuck it in the bottom of the freezer inside a freezer rated zip lock bag. I waited about 12 hours, and connected my USB External SATA/IDE to USB 2.0 adapter to my Linux laptop and hoped that I’d be able to mount the drive.

If you haven’t used one of these USB SATA/IDE to USB 2.0 adapters for doing hard drive work, I highly recommend it. The thing is just great for quickly connecting to a drive even if it’s in a PC or enclosure. I’ve used mine extensively in my computer repair business and this job was a lot easier with it.

I powered the drive up and down a few times because “fdisk -l” didn’t show the drive. But after about 4 tries, it spun up and was recognized- no click of death! I acted quickly and grabbed the most important files first, and saved a copy on my Linux laptop, and then started grabbing less important things. About 10 minutes into the procedure, the drive started giving I/O errors, and that was that. I’d already copied what I needed.

One reason this worked has to do with the type of failure this drive had. The heads, platters, and all the internals were fine. The controller PCB was what failed. With the drive powered up, I could smell the distinct aroma of overheating electronics. Feeling the board with my finger revealed the hot spot- a failed chip on the board. Freezing it kept it cool enough to function for a little while.

Another one Bites The Dust

Since my bootable drive was was now swimming with the fishes, I needed to rebuild my server. I grabbed a motherboard and a CPU I’ve been saving for some time and upgraded the server to an Athlon II X4 2.4ghz (the 610e chip). I took the DDR2 memory from the previous motherboard, and so the server still has 3GB, which is enough for my use.

I put the two 1TB disks mentioned above: A Seagate 1TB disk salvaged from a dead all-in-one computer, and a Western Digital 1TB disk salvaged from a failed USB enclosure.

Then I installed CentOS 7 with a Software RAID 1, which mirrors the disks in case of failure. Sure, I only get 1TB capacity, but that’s okay with me. I need reliability more than I need the storage.

Signs of Trouble

One thing I noticed immediately was that there was a fairly pronounced vibration coming from the machine. I attributed this to HP’s bad engineering. I’m using an old HP Pavilion case (circa 2006) and it mounts hard drives in the case with special rounded screws, which slide along a rail from the front of the case and then a plastic clip holds the hard drive in. It works, if you have the screws that match the case. I didn’t, and I attributed the vibration to said lack of correct screws. I installed Vibration Isolation Devices (some electrical tape) on each screw, and it cut the vibration down significantly.

Have you tried turning it off and on again?

The next morning, I logged into the server from the desktop PC in my living room (my office is in a separate building on the property) and ran some installs and updates and gave it a reboot. The machine isn’t anything fast and so I waited patiently for a reboot, which can take several minutes. A watched pot never boils, right? After several minutes though it was evident that the machine wasn’t coming back from the reboot.

I went into my office and plugged in a keyboard and monitor, and saw nothing horribly amiss other than it just didn’t boot completely. I turned it off and on again. When it powered up, it sounded like somebody had just started a Harley Davidson, or maybe a Cessna 172 was revving up for take off inside the case.

At first I thought such an atrocious sound could only come a cable that being hit a fan blade, or a failing fan bearing. The trick I use to diagnose fan issues is to pull the case cover while the computer is running, and just put a finger onto a fan to slow it down and see if the noise changes. Neither the CPU or the case fan were the culprit, and a careful listen to the PSU fan indicated no problem. So, I powered down the computer and pulled the power from both hard drives. When I powered it up, there was no bad sound. I plugged in each drive separately, and found that the Western Digital 1TB Green had failed.

I’ve recorded the sound of both failed hard disks and put them on SoundCloud here for your listening pleasure:

Why So Much Death?

As mentioned previously, the 1TB Western Digital Green drive that failed had spent its life in a USB enclosure, and the enclosure had failed as they often do. Being mounted vertically was apparently not very good for its bearings, which did not react well to being mounted horizontally after so long. I’ve seen such a failure in the past, but it’s been about 15 or 20 years since I’ve seen a change in orientation kill a drive like this. The bearings in modern hard drives are very good and should not have had this problem. It’s possible too that the drive was bad originally and not the USB enclosure that I pulled it from. It’s very hard to say why the bearing failed, but they did, and it’s dead.

All in all, I need some new hard drives. I’m going to wait until I can get a couple of hard drives on sale on Amazon. Most likely I’ll go for a couple of Western Digital Blue 1TB disks.

That choice might seem odd, considering that I lost two Western Digital hard drives inside of 24 hours, but consider the circumstances. Both drives were well used. The Western Digital 160GB disk was manufactured in September 2006 (almost 10 years ago!), and the 1TB disk in June 2009. Both had seen extensive use, and I got them for free. I’d say I got my money’s worth out of them, and then some.

My other Western Digital 500GB disk which is in my PC is still running great, although it’s due for replace