S.M.A.R.T. - Outsmarted by hard disks

Sven Krumrey

I break out a sweat whenever one of those warning lights comes on in my car. How about you? Even if it was only signaling a loose ashtray it'd still drive me mad. But these lights are supposed to warn us of imminent danger. And it worked, I knew with absolute certainty that my ancient BMW was done. But that's another story, this one's about computers.

For years I had wondered why PCs didn't come with a similar warning system. You were able to hear when a hard disk was about to break down (it would become louder or emit scratchy noises) or when a power supply unit would begin to stink of molten plastic just before it gave up the ghost. But this was not what usually happened. Usually, you'd just sit there panicking and wondering which data was now forever lost. The clever, disciplined folks among us create regular backups and this may get me kicked out of the holy league of computer scientists but I'm not one of them. Let's hope none of my colleagues is reading this.

Danger ahead!

So I have a vested interest in the health state of my drives. What I didn't know for a long time: Most drives do sound an alarm we just don't take notice. I was surprised when I found out that most drive manufacturers had done their part during the 90s. So-called S.M.A.R.T. technology (Self-monitoring Analysis and Reporting Technology) has been around since 1996 as part of nearly every modern disk drive. What may sound like a secret spy organization simply reports a couple of parameter values to your drive. This includes temperature, running time, possible errors, performance metrics and a few others. Sounds interesting doesn't it? Since we're talking about computers things naturally aren't that simple because these values don't just appear on your screen.

These values have to be retrieved by special applications that may display something like "ErrorRate: 84 60 30 WOPE". Doesn't ring a bell? Same here. That's because disk drives only report raw values without any data evaluation. And since so many different parameters are meticulously monitored there's plenty of data on errors, defective sectors and other read/write anomalies that mean absolutely nothing to the untrained eye. To make sense of it all, companies like us have developed applications such as Ashampoo HDD Control 3 and WinOptimizer 12 (module HDD Inspector) that perform not only data analysis but also present results in a meaningful fashion (WinOptimizer 12) and offer long-term, permanent disk monitoring (HDD Control 3). Since I'm a security fanatic I'll go with permanent disk monitoring.

Precious and fragile – a hard disk drive.

I've learned that my system has non-critical errors, that 37° Celsius are okay, that my machine has been booted up 775 times with a total running time of 6300 hours. There were no faulty boots, 50 sectors have been marked unusable and the overall health of my drive drive is good. Good to know! I'll admit you might get lost in details and speculation since not all drive manufacturers adhere to the same standards and external drives, ironically often the first choice for data backups, tend not to report any values at all robbing users of a valuable tool to foresee and prevent data loss. Why save a small penny on a tiny component that has the potential to help foresee a crash with potential data loss?

So everything's fine now? In all honesty - no, there's always a remaining risk. Just like the safest car won't be 100% free from damage this is all about minimizing risks. Studies on this topic found that the majority of disk failures were preceded by more or less detectable signals that were covered by SMART technology. These signals may provide valuable evidence that a drive might be nearing failure but even drive manufacturers are unable to forecast this event with absolute certainty. Case in point: Go kick your PC as hard as you can (please don't). No parameter can indicate this event and the same goes for voltage fluctuations and high outside temperatures. Most drive crashes are preceded by drops in performance, delayed response times and increases in error rates. I do have one recommendation though: As soon as one of the SMART values is approaching a critical threshold please create a backup to avoid data loss. You won't regret it.

Back to overview

Write comment

Please log in to comment