Image Image Image Image Image Image Image Image Image Image
Scroll to top

Top

12 Comments

Why RAID Cannot Cope With Large Disks: Rebuild Times

Why RAID Cannot Cope With Large Disks: Rebuild Times
Alastair Cooke, vExpert

There is a fundamental shift in going on in enterprise storage.  At the core of the shift is the fact that hard disks haven’t gotten much faster as they have gotten a lot larger. One part of the shift is the move to use solid state storage, which has very different characteristics to spinning disks. But, there’s another shift associated with drives of ever-increasing capacity: The move away from RAID. For a long time we have used RAID to protect against the failure of hard disks, storing data redundantly across a small group of disks. When one disk fails we use a spare disk and rebuild the redundant data, either from a mirror copy in RAID1 or from the remaining data and parity in RAID5 or RAID6. The RAID1 or RAID5 array cannot cope with another disk failing during the rebuild.  A RAID6 array, thanks to its dual parity nature, can cope with two concurrent disk failures.

Time is of the Essence

The fundamental problem is that it takes too long to fill the large disks we are now using just to regain redundancy. All of these RAID levels will use a single disk to replace a failed disk and all will need to fill the one disk with data. The time it takes to recover from the failed disk cannot be less than the size of the disk divided by its sequential write speed. For a 72GB disk with an 80MBps write rate we get 72,000MB / 80MBps = 900 seconds, about 15 minutes. This is an acceptable rebuild time and ten years ago when we used 72GB disks RAID was good.  Today we use at least 1TB disks and their sequential write rate has only gone up to around 115MBps, the same math is 1,000,000MB / 115MBps = approximately 8700 seconds which is nearly two and a half hours. If you are using 4TB disks then your rebuild time will be at least ten hours.

But Wait… There’s More!

These are best case rebuild times, where the limiting factor is the hard disk throughput and there is nothing else that uses the disks. If you use RAID5 or RAID6 there is a parity calculation required to rebuild redundancy, this can be hard work for the array if it isn’t done in dedicated hardware leading to a longer rebuild time, possibly much longer. If the array is still in use (it usually is) then the rebuild must compete with the normal IO operations that are always placed on the array. If the array was heavily loaded before the disk failed the rebuild will be very slow. Worst of all is the heavily loaded RAID5 or RAID6 array where the parity calculation is required to rebuild the array and also to provide real data to the servers that use the array.

The worrying part of the rebuild is that a whole disk’s worth of data needs to be read off the remaining disks in the array, this is rather more work than most arrays do in a day so the chance of a second disk failure is much higher during the rebuild than in normal operation. A second disk failure on RAID1 or RAID5 will cause data loss, a second disk failure on RAID6 will remove redundancy and cause a second spare disk to be used to rebuild.

Summary

Since sequential write speeds of hard disks have not increased as fast as the capacity the time to recover from hard disk failure on RAID arrays has become huge. To avoid a large risk to data we need to stop using RAID and move to new data layouts that better protects us from failure of these large disks.

Comments

  1. Are there any new disk layouts you would recommend for redundancy which do not have these limitations?

    -ASB: http://XeeMe.com/AndrewBaker

    • My followup article on wide striped and chunked data layouts doesn’t appear to have been published yet.

  2. David Bandel

    Yeah.. would be nice if you could offer an alternative.

    And don’t say ZFS. Because ZFS is not better than RAID.

    I personally think RAID logic at the platter-level would alleviate a lot of these problems. This would require a re-engineering of the standard HDD to allow for removal and insertion of fresh platters.

    Or a switch back to single platter drives with vastly reduced thickness.

    Other than stuff like this I don’t see a solution to the problem of RAID rebuild times.

    Other than HDD’s being held to a higher manufacturing standard, a breakthrough in write speeds, an agreement by users to halt use during rebuilds, and a minimum of dual parity (raid 6) with some implementations of triple and quadruple parity hardware RAID options.

    (yes n-parity for n>2 is perfectly feasible as the Galois operations that lend themselves to the second parity layer are arbitrarily extensible and theoretically should have no greater increase in calculation complexity for successive parity tiers than raid 6 has over raid 5. not to mention the high degree of mutually dependent polynomials insists that there will be heavy optimizations for each successive degree of redundancy)

  3. Yes, drop RAID all together of with a system with a dispersal algorithm and a replication (or) Erasure Coding. I will go even further and say drop the file system while your at it.

    Disks drives need to be used RAW!!! No file systems to corrupt, limit performance or compromise store efficiency. i-nodes and b-trees waste space and lose efficiency as drives grow. Protection against data lose is done by replication or Erasure Coding not RAID.. Disk rebuilds at scale of the cluster, not the RAID set…. Storage of the future is already here and it called OBJECT.. Do you think Amazon and Google worry about RAID rebuild times? No: because they left RAID and filesystems a decade ago… Enterprises need to starting thinking like cloud providers, storage as usual is barely cutting it today, tomorrow its offically dead

  4. H in OH

    Interesting article. Proposed solutions have generally dealt with large pools of storage flowing across collections of servers, both physical and virtual. What do you do about a small isolated system that needs a few protected TB of storage?

  5. check out RAID 2.0+ technologie developed by Huawei.

    http://support.huawei.com/ecommunity/bbs/10241110.html

    They post a rebuild time of 30mins for 1TB….

  6. Alec Weder

    The biggest issue with RAID are the unrecoverable read errors.
    If you loose the drive, the RAID has to read 100% of the remaining drives even if there is no data on portions of the drive. If you get an error on rebuild, the entire array will die.

    http://www.enterprisestorageforum.com/storage-management/making-raid-work-into-the-future-1.html

    A UER on SATA of 1 in 10^14 bits read means a read failure every 12.5 terabytes. A 500
    GB drive has 0.04E14 bits, so in the worst case rebuilding that drive in a five-drive
    RAID-5 group means transferring 0.20E14 bits. This means there is a 20% probability
    of an unrecoverable error during the rebuild. Enterprise class disks are less prone to this problem:

    http://www.lucidti.com/zfs-checksums-add-reliability-to-nas-storage

    • Brian

      This is the point of maintenance. All major RAID card vendors recommend that you run weekly volume checks, which compare all data to all parity data which ensures that if a rebuild is needed, your data is 100% sound.

      Obviously 2 disks could fail within that week, but with RAID 6 even that is okay.

      Some places run daily volume checks in off hours. I personally use weekly volume checks and daily SMART tests.

      I have expanded and rebuilt several RAID arrays, from various vendors, and as long as you keep an eye on your volume checks, you are fine.

  7. Bill Denison

    Your observation (follows) needs data to support it. While you think it is true, you have no numbers. I suspect it is a very very low chance of happening. We have had it happen here one time in the last 10 years, and we have LOTS of good disk.

    “The worrying part of the rebuild is that a whole disk’s worth of data needs to be read off the remaining disks in the array, this is rather more work than most arrays do in a day so the chance of a second disk failure is much higher during the rebuild than in normal operation. “

Submit a Comment