RAID in the Digital Darkroom
No, the FBI isn't coming to make sure you activated your copy of Photoshop. At least that's not what this article is about if they are. Instead, "RAID" stands for "Redundant Array of Inexpensive Disks". And it's something you should know about if you want to make sure your image files are safe and your system performs at its peak.
There once was a time when the only hard disks that existed where huge washing machine sized contraptions attached to expensive mainframe computers. By today's standards, not only were they big, they didn't really hold very much. The early IBM 3380 models back in the 1980s held just 630 MB each. But they were reliable. They were also quite expensive.
Shortly after home computers started becoming available, home hard drives did too. My first hard drive held just twenty megabytes and was housed in a box the size of a large toaster. It cost a thousand dollars to boot, but at least that meant I was booting from a hard drive rather than a stack of floppies.
But as home hard drives became commodities and their costs started to drop, mainframe hard drives remained very much a niche market and priced as such. By the late 80s researches at the University of California, Berkeley and IBM started looking at ways to tie together large numbers of commodity hard drives to form storage arrays that were capable of holding the quantities of information needed for business processing but were cheaper to construct that traditional mainframe disks. To compensate for the lower reliability these drives (especially since they were using a lot of them), they came up with various means of incorporating redundant data that would allow for recovery if any one component failed. And thus RAID was born.
In addition to cost, RAID arrays had other inherent advantages. In order to increase the capacity of mainframe drives they had become quite large. Even spinning as fast as they did, it could take far longer for the read/write head to reach its intended target along the lengthy outer circumference than was the case for smaller commodity hard drives. Simply because it took longer to get all the way around the darned thing it was harder for mainframe disk designers to compete with the seek times achievable by commodity hardware. Good throughput rates were also problematic when a single head serviced a larger disk. The narrow channel that carried data to and from the read/write head often became a bottleneck for I/O intensive operations. By using large numbers of independent drives RAID could avoid this pitfall since each drive had its own read/write head.
Of all the systems they originally detailed, only a few have proven generally useful. These are designated by the terms RAID 0, RAID 1 and RAID 5.
RAID 0 isn't really RAID at all in the sense that it is not redundant. It does address the throughput problem though through sheer force of numbers in terms of drive heads. Data is broken up into chunks with consecutive chunks being written on consecutive drives rather than one right after each other on the same drive as would normally be the case. By "striping" the data across multiple drives, the read of a single file would inherently involve multiple read/write heads with each doing its part in a coordinated fashion to satisfy a request. But with no fault tolerance built in, the chances of catastrophic error increased as the number of drives in the array did. RAID 0 is very useful for temporary data such as page files and Photoshop scratch disk placement. It is not recommended for files you need to keep. Certainly not for your best images.
RAID 1 is generally referred to as "mirroring." It consists of using two drives to each store the same data normally contained on a single drive. Since both drives contain the same thing, the loss of either one won't take your files down the drain with it. Each write operation is done in parallel to both drives to maintain consistency. As such, there is no speed benefit for write access, but since reads can be satisfied from either drive, read access throughput can increase dramatically. RAID 1 is an excellent option for data you care about keeping but it comes with the added cost of requiring twice as many drives as a non-RAID system would require.
RAID 5 involves striping data across multiple drives as is done in RAID 0, but adds one additional drive to hold what is termed "parity" data. Essentially, if you add up the value in a given bit across all the data drives in a RAID 5 array, you will end up with either an even or an odd number. This even/odd fact gives you the parity bit and gets stored in the logically equivalent location on the parity drive. Obviously if the parity drive were to fail you would still have all your original data, but if one of the data drives were to fail it too could be recovered by calculating each bit based on what would be needed to get the known parity bit for that stripe. In most RAID 5 configurations things get a bit more complicated though in that the parity information gets placed on different drives, rotating which drive holds parity for any given stripe in a "round robin" fashion. This lets the system spread read operations across all drives in order to improve throughput. Write operations in RAID 5 though are inherently expensive since they involve placing bits on all drives in the array. RAID 5 used to be a good option since it provided for fault tolerance while requiring only one additional drive rather than double the number of drives as in RAID 1. It's still very useful for long term storage, but as drive prices have come down, RAID 1 is recommended for your active image files.
There are also configurations with names derived from the basic RAID 0, RAID 1 and RAID 5 designations including RAID 1+0 (also known as RAID 10) where two mirror sets are striped, and RAID 6 where the parity information in a RAID 5 array gets mirrored. Some manufacturers have also come up with their own extensions to RAID such as BeyondRAID from Data Robotics (Drobo) which, while proprietary in nature, seems quite promising in terms of performance.
In the end, although the individual disks used in any of these may have been relatively inexpensive, RAID systems as a whole really weren't for a long time since you needed at least several drives plus the controller card, case, and other supporting components. Nonetheless, the acronym stuck even as the RAID industry association quietly tried to substitute the word "Independent" for "Inexpensive" in what the letters stood for. At this point though, drive prices have plummeted to the point where RAID is finally living up to the inexpensive label so if you are shopping for a new computer or storage solution for your images it can be well worth considering RAID.