计算机专业时文选读:BulletproofStorage

文章作者 100test 发表时间 2007:03:10 17:53:54
来源 100Test.Com百考试题网


Bulletproof Storage

  Disk systems will repair themselves or can be left unrepaired for years.

  You can fly a two-engine plane with one engine, but how many passengers would want to be on it?

  That’s the idea behind “bulletproof storage,” a concept that IBM has been developing for two years and plans to begin unveiling incrementally over the next one to three years.

  IBM’s technology initiative deals with fault tolerance in every part of a storage system: disk, controller, network cards, power supplies and software. By building more-robust storage systems that can defer replacement of failed parts for up to three years because of redundant components, IBM believes it can also eliminate many human errors that happen when failing components are replaced.

  According to Stanley Zaffos, an analyst at Gartner Inc. the bulletproof storage concept still has another five to 10 years before it’s broadly embraced by users. But once it is, storage systems will require less maintenance and, therefore, cost less to maintain.

  “We know how to build very reliable code. We use appliances every day that have software built into them that work forever: your automobile, your calculator, the disk drive in your PC, your telephone,”Zaffos says.

  But IBM is looking to attack far more complex systems than telephones or calculators.

  Under its bulletproof initiative, IBM is addressing disk-sector failures that grow along with disk capacity. While disk capacities double every 12 to 18 months, uncorrectable read/write error rates haven’t improved, nor has the probability of an uncorrectable error occurring on a disk read decreased. There are more sectors on today’s disks and, therefore, a greater chance of an uncorrectable error.

  The answer is to create self-healing capabilities for storage management software and more-robust RAID configurations.

  IBM says that in about a year it will release storage systems that can support three simultaneous disk-drive failures in a single array by introducing additional parity disks into RAID configurations, offering many times the resiliency of a RAID configuration with two parity disks. Today, standard systems allow for only two disk failures.

  But Zaffos argues that 80% of downtime today is caused by user error and software failures, not hardware failures. He says that the failures resulting from software are created by complexity and that there is an almost infinite number of failures that can occur in a complex system.

  IBM is addressing those code failures with a software project called N-Version Programming, where two pieces of code in the same application save data and then compare the data to ensure that there are no errors.

  In N-Version Programming, two copies of data are protected using different means. One copy might be protected by standard RAID-5 programming coded by Programmer A.

  The second copy is protected by a different algorithm coded by Programmer B. That way, if the first copy gets corrupted due to a particular bug in the program written by Programmer A, then the second copy can be used.

  The second copy may have its own bugs, but they will manifest in different ways at different times, and when they do, the first copy will be the one which is good and which you can then use. It’s kind of like having a second person check the work of a first person and keep fixing it whenever it finds mistakes.

  One way IBM plans to detect and correct corrupted data is to create more-resilient storage software with repairable data structures. The code checks that certain conditions, which are described in rules, are met. For example, in a file system with multiple files, the sum of the space taken by the files plus the free space in the system must be equal to the total available space. The code will check this property automatically at various times and use a procedure to repair and fix problems if the property isn’t met.

  In this case, the software isn’t checking the code to see that it’s functioning properly and isn’t checking data contents. If certain properties aren’t met, the software knows how to fix the data structures.

  But don’t expect to see fruit from N-Version Programming or checkable data structures for another two to three years.


相关文章


网络技术《电子商务和电子政务》同步练习1
网络技术《电子商务和电子政务》同步练习2
计算机专业时文选读:BulletproofStorage
网络技术《网络安全技术》同步练习1
澳大利亚华人论坛
考好网
日本华人论坛
华人移民留学论坛
英国华人论坛