Western Digital WD RE4 2TB HDD troubles

Story about, how my data got screwed, when I tried to use the new 2T RE4 WD drives.

Here is the story:
We have one NAS storage which is 24 port chassis from SuperMicro with expander on the backplane:
http://supermicro.com/products/chassis/4U/846/SC846E1-R900.cfm

Raid Controller: SuperMicro H4iR
http://supermicro.com/products/accessories/addon/AOC-USAS-H4iR.cfm

3 RAID5 arrays each with 6 x 1T RE3 Western Digital HDD

The machine was working just fine for more than 7 months.
In July I got 6x2T WD RE4 GreenPower and decided to full the last six bays with drives.
All was fine, the raid array was built with no problems, then I copied approx 5T of data on the new array, still no problem, then I left it idle for the night.

On the next morning the backup system started to use the new array, which after some time of work just went down, because 2 of the 2T HDDs failed + one array of 1T HDDs went down, because 2 of its HDDs were also reported dead by the controller.

At this point I was almost sure that all the data is lost, because all arrays are united in a single filesystem with linux LVM. And I was right. I marked one failed disk from each array as "online", which puts the array online, however I was unable to recover from this state, because the filesystem basically refused to repair at some point.

So I lost 14T of data because of some weird 2T + backplane + Controller behavior.

I returned the disks to my vendor, which started testing with the same type of chassis and same type of controller and guess what, the same crap, sudden OS reboots and other peculiar things.

After they spent lot of time in testing with different chassises (with and without expander), today they called me that probably they have found the reason for this shitty behavior.

They have disabled "intellypark" feature of the HDD and according to their tests this miraculously solved the problem. I am still not terribly convinced, but it really looks like this is the problem.

If you take a look at this thread here, you will see that this stupid GREEN thing can do more harm than good:

http://chbits.blogspot.com/2009/07/fixing-wd-gps-drives-with-wdtler-and.html

Conclusion: For servers: DO NOT BUY GREEN HDDs, they suck. All was just fine with the NON green RE1, RE2, RE3 and it looks like it will be fine and with RE4 with disabled head parking, which makes it so evil and non green.


3 comments:

  1. We're experiencing the same issue with the WD 2TB RE4-GP drives in the SuperMicro SAS expander chassis with an LSI controller. How can we disable the Intellipark feature? Can you point us to a place to get the utility to do it? I assume you used wdidle3.exe but we can locate it. Or if you reseller knows how to do it can you post their contact information?

    ReplyDelete
  2. I was able to get a copy of wdidle3.exe from WD to disable Intellipark. We're going to test it tomorrow. Thanks for your post on this subject.

    ReplyDelete
  3. Update: I was able to run wdiddle /d to set the Intellpark to 63 mins on 96 WD 2TB RE4-GP drives (the max allowed on this model). It was a gruling task since the utility only worked with the motherboard SATA controller. It doesn't work on SAS cards or SAS expanders. But the effort paid off. I've so far completed 12 contious hours of random writes that before was reporting bad sectors and taking Linux's XFS file system offline and causing controller resets and sense erors and now 12 without the Intellypark kicking in I haven't had a single media or sense error. I've only seen four LSI controller aborts with subcode 0x3000. I'll have to look that one up. Before I would have had a few hundred errors by now. I few more days of I/O like this and I consider the problem solved. Note I am also using other workarounds: WD jumper set to 1.5Gb/S, Linux Kernel lowered to 2.6.24.4, firmware lowered on the LSI by one step and NCQ disabled on all targets. But at least the system is now stable and in a few days might finally be ready to put into production. Thanks for your post.

    ReplyDelete

Comment