Why RAID Cannot Cope With Large Disks: Rebuild Times
There is a fundamental shift in going on in enterprise storage. At the core of the shift is the fact that hard disks haven’t gotten much faster as they have gotten a lot larger. One part of the shift is the move to use solid state storage, which has very different characteristics to spinning disks. But, there’s another shift associated with drives of ever-increasing capacity: The move away from RAID. For a long time we have used RAID to protect against the failure of hard disks, storing data redundantly across a small group of disks. When one disk fails we use a spare disk and rebuild the redundant data, either from a mirror copy in RAID1 or from the remaining data and parity in RAID5 or RAID6. The RAID1 or RAID5 array cannot cope with another disk failing during the rebuild. A RAID6 array, thanks to its dual parity nature, can cope with two concurrent disk failures.
Time is of the Essence
The fundamental problem is that it takes too long to fill the large disks we are now using just to regain redundancy. All of these RAID levels will use a single disk to replace a failed disk and all will need to fill the one disk with data. The time it takes to recover from the failed disk cannot be less than the size of the disk divided by its sequential write speed. For a 72GB disk with an 80MBps write rate we get 72,000MB / 80MBps = 900 seconds, about 15 minutes. This is an acceptable rebuild time and ten years ago when we used 72GB disks RAID was good. Today we use at least 1TB disks and their sequential write rate has only gone up to around 115MBps, the same math is 1,000,000MB / 115MBps = approximately 8700 seconds which is nearly two and a half hours. If you are using 4TB disks then your rebuild time will be at least ten hours.
But Wait… There’s More!
These are best case rebuild times, where the limiting factor is the hard disk throughput and there is nothing else that uses the disks. If you use RAID5 or RAID6 there is a parity calculation required to rebuild redundancy, this can be hard work for the array if it isn’t done in dedicated hardware leading to a longer rebuild time, possibly much longer. If the array is still in use (it usually is) then the rebuild must compete with the normal IO operations that are always placed on the array. If the array was heavily loaded before the disk failed the rebuild will be very slow. Worst of all is the heavily loaded RAID5 or RAID6 array where the parity calculation is required to rebuild the array and also to provide real data to the servers that use the array.
The worrying part of the rebuild is that a whole disk’s worth of data needs to be read off the remaining disks in the array, this is rather more work than most arrays do in a day so the chance of a second disk failure is much higher during the rebuild than in normal operation. A second disk failure on RAID1 or RAID5 will cause data loss, a second disk failure on RAID6 will remove redundancy and cause a second spare disk to be used to rebuild.
Since sequential write speeds of hard disks have not increased as fast as the capacity the time to recover from hard disk failure on RAID arrays has become huge. To avoid a large risk to data we need to stop using RAID and move to new data layouts that better protects us from failure of these large disks.