posted by Robert Escue on Wed 20th Sep 2006 17:45 UTC

"Solaris, Page 2/3"
Zones + ZFS + ZFS snapshots= Heaven

One of the first things I thought of for using ZFS was creating a raidz filesystem and create Non-Global Zones using the ZFS filesytem as the location to install the Zones. So on a Blade 100 with 2 GB of memory, 2 80 GB IDE disks, a dual channel PCI SCSI card and a Sun StorEdge MultiPack with 6 36 GB SCSI disks I created a 102 GB ZFS raidz filesystem called Zones. I then created 3 Non-Global whole root Zones called sol10script, adllms and oracle-test. The sol10script Zone is used to test a shell script we are writing to lock down Solaris 10 machines after they are built. The adllms Zone is being used to test the feasibility of installing and using a particular Open Source Learning Management System with Solaris.

The oracle-test Zone is a little more ambitious in that not only am I creating a Non-Global Zone, I am also mounting another ZFS volume to install Oracle on within the Zone. I first created a second raidz volume called u01 using 3 36 GB disks and dismounted the filesystem. I then set the filesystem to legacy (a requirement to mount a ZFS volume in a Non-Global Zone). In order to mount the Oracle DVD I created a loopback filesystem to use the Blade 100's DVD drive in the Non-Global Zone. I then made all of the standard modifications necessary to install Oracle 10g with one exception, I made all of the changes to /etc/system in the Global Zone (which is inherited to all Non-Global Zones). During the installation of Oracle, there was a complaint about not being able to read /etc/system which I ignored. Once the installation was complete I started the Database Assistant and created a database and started it with no issues. Additionally I could have used various Resource Controls to create a Container for Oracle, but considering the system I was running it on I chose not to.

To test ZFS snapshots I first created a snapshot of the sol10script Zone and ran the lockdown script noticing the changes made to the Zone and any complaints made while the script ran. Once I noted what needed to be fixed in the script I rolled back the snapshot then rebooted the Zone. End result was the Zone was restored to the point where I was at prior to running the script. I was also able to accomplish the same thing with the u01 filesystem for the oracle-test Zone, I deleted it and restored it from a snapshot and started my Oracle database as if noting happened.

ZFS and JumpStart

Another use I had in mind for ZFS is to create a volume to store JumpStart configurations and Solaris Flash images of systems as part of a disaster recovery plan. In my initial testing installing Solaris 10 6/06 went off without any issues, however this was not to be the case with any other release of Solaris. The setup_install_server script for 6/06 recognizes ZFS and all of the other releases do not. So you have the choice of either using the setup_install_server script from the 6/06 Release or setting up your ZFS volume as Legacy so that previous Solaris Releases can see the mountpoint.

The potential for this is huge, imagine a single machine running an entire environment on a ZFS filesytem. Or if you are more than a little paranoid, break it up into several systems with shared storage. The ability to restore a filesystem (or a "complete" system) with two commands cannot be ignored, especially in cases of development environments or patching (all of the DBA's I know copy the oracle directory in case the patching goes South). With almost instantaneous recovery and the ability to clone Zones almost at will, ZFS makes an extremely welcome addition to Solaris 10.

SATA framework, a possible cure for the Solaris IDE woes

SPARC or x86 systems that use IDE disks have always been at a disadvantage due to either the chipset support for most SPARC systems (prior to the Blade 100) or the abysmal I/O of a Solaris SPARC or x86 machine with IDE disks where the maxphys is 57 kb of x86 or the 131 kb of SPARC. The only tunables for IDE drives is to set the dma-enabled property and the blocking factor in the ata.conf file, and that was limited to x86 only. To further frustrate users you cannot tune maxphys on an IDE system at all since the ATA driver does not map to the sd (SCSI) driver. Although the dma-enabled property improved disk performance considerably, the system was still hampered by the maxphys of either system. That all changes with the SATA framework included as part of Solaris 10 6/06. From the /boot/solaris/devicedb/master file, here are the SATA controllers supported by Solaris 10 6/06:

pci1095,3112 pci-ide msd pci ata.bef "Silicon Image 3112 SATA Controller"
pci1095,3114 pci-ide msd pci ata.bef "Silicon Image 3114 SATA Controller"
pci1095,3512 pci-ide msd pci ata.bef "Silicon Image 3512 SATA Controller"
pci1000,50 pci1000,50 msd pci none "LSI Logic 1064 SAS/SATA HBA"

The problem I have with the Silicon Image controllers supported by Solaris 10 x86 is that none of them support 300 Mbit/sec drives while the LSI Logic 1064 does. It is more likely that your average enthusiast is going to buy a motherboard or controller card with a Silicon Image card than buy a LSI Logic card that costs considerably more than most motherboards. My Gateway system has an onboard nVidia MCP51 SATA controller, which is recognized but is not used, the output of /usr/X11/bin/scanpci -v is below:

 pci bus 0x0000 cardnum 0x0e function 0x00: vendor 0x10de device 0x0266
nVidia Corporation MCP51 Serial ATA Controller
CardVendor 0x105b card 0x0ca8 (Foxconn International, Inc., Card unknown)
STATUS 0x00b0 COMMAND 0x0007
CLASS 0x01 0x01 0x85 REVISION 0xa1
BIST 0x00 HEADER 0x00 LATENCY 0x00 CACHE 0x00
BASE0 0x000009f1 addr 0x000009f0 I/O
BASE1 0x00000bf1 addr 0x00000bf0 I/O
BASE2 0x00000971 addr 0x00000970 I/O
BASE3 0x00000b71 addr 0x00000b70 I/O
BASE4 0x0000e001 addr 0x0000e000 I/O
BASE5 0xfebfd000 addr 0xfebfd000 MEM
MAX_LAT 0x01 MIN_GNT 0x03 INT_PIN 0x01 INT_LINE 0x0b
BYTE_0 0x5b BYTE_1 0x10 BYTE_2 0xa8 BYTE_3 0x0c

Despite the fact that my system is not supported per se, the performance of the root disk with no tuning is quite good. I used the test described in Richard McDougall's Weblog entry titled "Tuning for Maximum Sequential I/O Bandwidth", an example is below:

extended device statistics

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
239.0 0.0 61184.1 0.0 0.0 1.0 0.0 4.1 0 99 c2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 amanet:vold(pid932)

Table of contents
  1. "Solaris, Page 1/3"
  2. "Solaris, Page 2/3"
  3. "Solaris, Page 3/3"
e p (7)    31 Comment(s)

Technology White Papers

See More