Linked by moondevil on Wed 11th Jan 2012 00:10 UTC
Windows The latest blog entry from Steven Sinofsky about Windows 8 describes the Storage Spaces functionality . From the blog entry it seems Windows 8 is getting something ZFS-like. The Storage Spaces can be created in the command line via Powershell, or in the Control Panel for the ones that prefer a more mouse-friendly interface.
Thread beginning with comment 503374
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[4]: OSNews RTFA and comments
by hechacker1 on Sat 14th Jan 2012 02:22 UTC in reply to "RE[3]: OSNews RTFA and comments"
hechacker1
Member since:
2005-08-01

Without over-provisioning at the beginning, you are forced to run utilities (even in Linux) to:

1. Expand the volume onto a new disk.

2. Allow the underlying storage pool to recalculate and distribute parity for the new disk (which affects the entire pool).

3. Resize the file system on top with a tool. This can be fast, but it can also be slow and risky depending on the utility. NTFS generally can resize up well. Still, it's an operation were you probably want a backup before doing.

In contrast with over provisioning, you don't have to do the above steps. It's handled automatically and from the start.

With regards to having a SSD as a backing device, it allows you to speed up parity in situations like RAID 5. The read-modify-write cycle is slow to perform for a HDD, but fast for a SSD. Especially in cases where small writes dominate the workload. A SSD allows for fast random r/w, but then writes to the HDD pool can happen in a serialized operation.

Some RAIDs get around this problem with a volatile cache, but you are risking data to minimize the parity performance hit. Using an SSD means it's non-volatile and the journal could playback to finish the operation. I guess you could do it on a regular HDD, but you would still be measuring performance latency in 10s of milliseconds instead of <1ms. It's an order of magnitude difference.

It's all theoretical at this point, but Microsoft briefly mentioned having an SSD would be faster. We'll have to wait for more details.

Edited 2012-01-14 02:26 UTC

Reply Parent Score: 1

Alfman Member since:
2011-01-28

hechacker1,

"1. Expand the volume onto a new disk."

Not different than with overprovisioning.

"2. Allow the underlying storage pool to recalculate and distribute parity for the new disk (which affects the entire pool)."

First of all, empty clusters don't technically need to be initialized at all since they're going to be added to free chains anyway.
Second of all, both the new data and parity tracks can be trivially initialized to zeros without computing any parity.
Thirdly, whether or not the initial pool was over-provisioned doesn't determine the need (or not) to compute parity for the new disk.

"3. Resize the file system on top with a tool. This can be fast, but it can also be slow and risky depending on the utility. NTFS generally can resize up well. Still, it's an operation were you probably want a backup before doing."

Here we agree, the ability to resize safely and transparently depend totally on the tools and fs used. However, if these these are the design goals around which we build our fs/tools, I don't believe initial over-provisioning is implicitly required required to achieve them.

As far as safety is concerned, the preparation that is absolutely necessary to complete a resize is the generation of a free cluster list for the new disk (which is trivial because it's initially empty) and maybe the creation of new inodes on the new disk. Then, in one instant, the new prepared disk can be immediately added to the disk mapping, this change can even managed by an existing journal for safety.

None of those tasks are inherently dependent upon an initial over-provisioning. So I'm still wondering why ms would opt for an over-provisioned implementation. It occurs to me that they might be designing around software patents, in which case it makes more sense.


Edit: If you still think I'm missing something, well that may be, let me know what it is.


"In contrast with over provisioning, you don't have to do the above steps. It's handled automatically and from the start."

Well that's the thing I'm worried about. If microsoft's implementation is dependent upon it's initial over-provisioning, then that means that a windows disk pool will need to be rebuilt from scratch once it's initial over-provision limit is reached. This is worse than an implementation which can be dynamically expanded without a static limit.


"With regards to having a SSD as a backing device, it allows you to speed up parity in situations like RAID 5..."

I agree with that, but we were talking about an external journal, in which case is the performance of the journal device is almost certainly going to be faster (or at least no slower) than the primary disk because all of it's writes are linear. However I didn't mean to get sidetracked by this.

Edited 2012-01-14 05:06 UTC

Reply Parent Score: 2

phoenix Member since:
2005-07-11

Well that's the thing I'm worried about. If microsoft's implementation is dependent upon it's initial over-provisioning, then that means that a windows disk pool will need to be rebuilt from scratch once it's initial over-provision limit is reached. This is worse than an implementation which can be dynamically expanded without a static limit.


You are confusing two separate things: expanding a filesystem and expanding a storage pool. Probably because you are thinking in terms of a single system where all the storage is local.

Think bigger, like in terms of virtual machines running on a server, getting their storage from a storage pool.

The options are:
* create a storage pool of size X GB using all of the physical storage available; split that up into X/10 GB virtual disks to support 10 VMs.
* create a storage pool of size Y GB, 10x the size of the physical storage available; split that up into Y/10 GB virtual disks to support 10 VMs (meaning, each VM in this setup has 10x the storage space as the VMs in the previous setup).

If you go with the first option, and stuff your VMs full of X/10 GB of data, then you run into a sticky situation. Now you have to add storage to the pool, expand the pool to use the new storage, expand the size of the virtual disks (usually done while the VM is off), then expand the disk partitions inside the VM, then expand the filesystem inside the VM. This leads to lots of downtime, and many places for this to go sideways.

If you go with the second option, your VMs have disks 10x the size of the first option already, even though they aren't using that much data, and you don't expect them to for awhile. Now you stuff your VMs with X/10 GB of data, meaning the pool is run out of physical storage space. Now, all you do is add physical storage, expand the pool to use the new storage, and carry on. That's it. The VMs never need to know how much actual storage space is in the pool, as they just see huge virtual disks. They still have free space in their virtual disks and filesystems. Saves you a lot of time, effort, and potential crashes.

Eventually, the VMs will get stuff full of data to the point that they run out of disk space, and you have to resort to option 1 (expand pool, expand virtual disks, expand filesystems). But option 2 lets you push this way out into the future.

Edited 2012-01-16 18:47 UTC

Reply Parent Score: 2