OS News

27 April 1998


Linux and RAID: The Present and the Future

By Donald Barnes


 
   
When you stripe data between two or more disks you can be reading data from one disk while the other is seeking/spinning into position.  
If one disk fails, then the RAID system being used will simply ignore the failed disk and use only operable disks.  
RAID is an important and hot topic among those who run servers that are critical for some reason.  
The least used but most promising way to do RAID is in software via the Linux kernel.  
Software RAID is coming along very nicely and could be the best choice down the road. The performance is excellent and the low cost is a huge bonus.  

RAID (Redundant Array of Inexpensive Disks) is a scheme in which data is stored on hard disks in one of several ways to allow for more speed, more protection, or robust use of data. The advantages of RAID systems are such that they are an important part of server hardware, but their application in Linux systems present a few problems.

Intro to RAID

RAID levels vary, but the most common are 0, 1, and 5. RAID level 0 offers no protection of data, but does increase performance as well as provide an increase in device size. In RAID 0 you can use two or more disks to stripe data across multiple disks. This will yield a virtual device available to the OS that is the sum total of the two actual devices being used (provided they are the same size; otherwise your yield is two times the size of the smallest of the two devices). For example, using RAID 0 on two 2G drives will yield an available file system of 4G with that file system being striped across both disks. That means that the data is actually alternating between disks in chunks. That way, any reads or writes of data on the disks should be alternating between disks regularly and thus provide a noticeable performance boost. This is because the most time-consuming part of a read operation is seek time and rotational latency. When you stripe data between two or more disks you can be reading data from one disk while the other is seeking/spinning into position.

RAID level 1 is the first level that offers protection. This level allows you to use two or more disks to "mirror" each other and thus data is reproduced in whole across more than one disk. If one disk fails, then the RAID system being used will simply ignore the failed disk and use only operable disks. RAID 1 space usage for two disks would be the same as the size of one of the disks (if they were the same size; otherwise the available space is exactly the same amount as the smaller of the two disks). RAID 1 performance for reads should be good because the RAID system can balance reading between all disks in the array. Write performance is poor, however, since all writes must occur on all disks. That's fine for many systems (especially file server type systems) since they are heavily read oriented (often as high as 95% of the time, in fact).

RAID level 5 is the best compromise in terms of both data availability and performance. RAID 5 allows you to create an array of at least three disks. The amount of space is usually calculated by the sum total of the amount of disks minus one. For example, the amount of usable space when using four 9G disks would be three times 9G, or 27G. There may be a little extra overhead as well, but this is close. With RAID 5, data is stored in a striped format across all disks. What is a little different is that parity information is also striped along with the data. So, if any disk has lost its contents, it can be re-created using the data that is still available plus the parity. Since striping is used, performance is generally good. Read performance is good since reads are balanced across disks. Write performance is a slight bit slower than normal due to the parity generation and extra data to write. Performance does drop considerably when the array loses a disk, though, as some data will have to be re-created by reading several disks and generating the data based on the data and parity information. The pay off is that all the data is still available, even though an entire disk may have died.

RAID is an important and hot topic among those who run servers that are critical for some reason. One of the more common reasons that servers go down is hard disk failure. This can be one of the worst failures to try to fix since hardware and data must be replaced. At best this is going to involve hours of down time while backups are restored. Even then, data is lost that was written between the last backup and the disk failure. The best way to avoid this scenario is to not lose the data at all. Enter RAID. RAID allows you to use multiple disks in such a way that if one dies the rest can keep working without loss of data. An administrator is notified of the failure and if hot swappable drives are used the administrator can actually repair the machine without ever having to take it off line. Even if hot swappable devices are not used, the down time is minimal since only the hardware has to be replaced and the system can be brought back on line immediately. RAID will then actually replace the data on the new drive in the background and the server will still work fine.

There are three ways to do RAID on Intel PC hardware. The most common is the PCI SCSI RAID controller. The problem with these under Linux is that many are high end and require an NDA to get the programming information. These NDAs are prohibitive to free software because the source code can not be released.

The next most common way to do RAID under Linux is through the use of a SCSI to SCSI RAID controller. This requires a supported SCSI controller (of which there are many). On that controllers bus is the RAID controller which simply looks like one or more drives (depending on how you set up the array in the controller). The RAID controller then has its own SCSI bus(es) that are connected to the physical drives that comprise the array(s).

The least used but most promising way to do RAID is in software via the Linux kernel. You can use any supported block device (IDE disks, supported SCSI, etc) to setup a RAID array. All RAID operations are handled by kernel threads. This will be available in full form from the 2.2 Linux kernel.

PCI Controllers

There are two PCI SCSI RAID controllers that are now available and supported under Linux, the DPT and the ICP-Vortex. Linux developers are working on support for the Mylex cards as well, but it hasn't been completed yet.

The DPT has been supported for a few years now, but the problem with it is that it requires DOS or SCO to run its configuration utility. There have been promises to port it to Linux, but it has yet to happen. DPT does support Linux by providing the information necessary, but the actual work is being done mostly by Michael Neuffer. Unfortunately he hasn't yet had time to port the StorageManager to Linux, but he does seem to think he will have time to do it in the not-so-distant future. The DPT does have some nice features including multiple channel controllers and caching. There is an audible alarm on the card as well as driver notification available to an external process. There is no monitoring software, but a simple program to monitor the RAID array and email an administrator would be easy to write.

The ICP-Vortex is fully supported by Linux and doesn't require any other OS for its configuration. All ICP-Vortex configuration happens at the BIOS level of the card. There is a Linux daemon that will alert the system administrator if there are problems with any of the disks in the array. ICP has several models available ranging from cards that can do only RAID 0 and 1 to multi-channel RAID 5 cards (and even Fiber Channel). The RAID 0/1 card can be upgraded via software to full RAID 5 capability as well if you later find the need for RAID 5. All cards have the ability to add cache via a 72 pin SIMM socket (both EDO and standard 50ns SIMMs work fine).

The ICP-Vortex is only supported as of 2.0.33, but the company appears very dedicated to supporting Linux. They wrote and maintain their own Linux driver and are becoming very active in the Linux community (they will be exhibiting at LinuxExpo this year).

SCSI to SCSI Controllers

The SCSI to SCSI solutions vary wildly. There are solutions from many companies including Mylex, CMD, and others. These solutions are usually external or requiring a full height slot in the case for the controller. The actual administration is done either via an LCD panel with buttons on the front and/or via a terminal emulator and a serial port. Some models have one or the other and some have both. You set up your array on the controller itself and then the controller presents the array to the OS as one logical drive (or more if that's how you set it up). The biggest drawback to these types of controllers is that they are usually on the expensive side.

The CMD controllers are the only ones I know of that offer Linux software to do the administration. They have several models available including one that is a dual redundant hot swappable controller. If one of your controllers fails you can replace it without bringing your server down at all! CMD controllers can also be upgraded to multiple channels with expansion cards.

Many companies produce SCSI to SCSI solutions. My own personal experience is only with the Mylex. The Food for the Hungry International server runs Linux on a Mylex DAC-960 SUI RAID controller and has performed extremely well (their server was built by Linux Hardware Solutions). They have been very happy with the performance and ease of use of their RAID solution.

Software RAID

The final solution is kernel software RAID. RAID 0 and 1 was introduced quite some time ago in the kernel, but now patches are available for the 2.0.x kernel that allow RAID 0-5. This will be a standard option in the 2.2 kernel.

Software RAID has been proven to have the advantage of speed over all of the hardware options that have been tested. It also has the advantage of allowing the use of any supported block device. That means that you can mix things like IDE and SCSI in one array which is completely impossible with existing hardware solutions (I doubt you would want to, but it may at least be helpful in situations where you have lost a drive and the only available replacement is an IDE one). The main disadvantage appears to be that a server may need those CPU cycles for things other than calculating RAID parity.

The biggest advantage of software RAID is price. Hardware RAID controllers seem to range in price from about $500 to $5,000. For many systems it will likely be sufficient to use a single $200 supported SCSI controller (non RAID) and simply use the kernel's software RAID across multiple disks on that controller. Folks on a really tight budget might even want to go with RAID 5 across four similar IDE disks using the IDE controller built into today's motherboards. With 9G UDMA IDE drive prices at about $400, this is becoming a very interesting option. You can get about 27G of RAID 5 protected space for a grand total of $1600 if you already have a machine available with two free IDE controllers.

Software RAID does have one current problem, though, and that is the ability to do full RAID for the root filesystem. This can be overcome by using a boot floppy or a small non-RAID boot partition to boot your system from. I don't consider this a real obstacle, though. Any server should have a floppy drive available any way, and given that they are cheap and easy to replace should one of *those* die, they make perfect boot media for this. You can even keep extra copies of your boot floppy around if you are worried about them. Once the 2.2 kernel is available (and stable...), we hope to have RAID support at install time that allows you to set up a system using a boot floppy, small boot partition, or perhaps some other form of removable media (Zip drive, CD-ROM, etc).

Conclusion

To summarize, RAID is very much alive and well on Intel PC hardware. There are supported PCI RAID controllers that work well and more to come (Mylex is working with Linux developers to get their card supported). All SCSI to SCSI solutions should work very well, because they are OS independent. Software RAID is coming along very nicely and could be the best choice down the road. The performance is excellent and the low cost is a huge bonus.

I do have benchmark reports from various sources on many of these controller options, but I am not going to print them. No single source was able to compare all three of the major types of RAID controllers (PCI, SCSI to SCSI, software). That means that we have no real comparison of all types on similar hardware. It would be unfair to print numbers given that, so I won't.

I will discuss performance, though. Given my experience, I doubt you would see too much difference between a single channel PCI controller and a SCSI to SCSI solution. If you have performance problems on either one, you need to move to software RAID with multiple controllers or a multi-channel PCI controller. The best news might be that software RAID seems to be performing much better than any of the hardware solutions available. It is still not completely mature, but there are several folks using it in some mission critical places and it is working very well. Given the price and the fact that the performance is awesome, I look for many sites to be moving from hardware RAID to software RAID when the 2.2 kernel is released.

If you're looking for a vendor to supply Linux RAID solutions, look at http://www.redhat.com/redhat/hardware-list.phtml. Many of the vendors listed there can supply RAID ready Linux machines, including Linux Hardware Solutions and VA Research. Thanks to both of those companies for contributing data for this article.

If you know of other RAID solutions that work under Linux, please feel free to mail me and hopefully I'll be able to re-visit this topic in a future article.

Home | News | Features

Copyright © 1998 OS News