Linked by theosib on Sun 14th Feb 2010 10:45 UTC
Linux

Recently, I bought a pair of those new Western Digital Caviar Green drives. These new drives represent a transitional point from 512-byte sectors to 4096-byte sectors. A number of articles have been published recently about this, explaining the benefits and some of the challenges that we'll be facing during this transition. Reportedly, Linux should unaffected by some of the pitfalls of this transition, but my own experimentation has shown that Linux is just as vulnerable to the potential performance impact as Windows XP. Despite this issue being known about for a long time, basic Linux tools for partitioning and formatting drives have not caught up.

Order by: Score:
Comment by Kroc
by Kroc on Sun 14th Feb 2010 10:48 UTC
Kroc
Member since:
2005-11-10

Thanks for the excellently prepared article, I didn’t have to do anything to it.

I really didn’t know about this issue, thanks for investigating and presenting it so clearly. Does this affect disks of a certain size (and above), or any size that are manufactured to this setting?—I certainly want to avoid this problem when replacing HDDs for WinXP machines.

Reply Score: 2

RE: Comment by Kroc
by aaronb on Sun 14th Feb 2010 12:16 UTC in reply to "Comment by Kroc"
aaronb Member since:
2005-07-06

Indeed an excellent article!

I think it would happen in all 4K sector drives regardless of size. If you started at LBA 63, the drive would be forced to update 2 4K sectors because the virtual sectors would lay across 2 physical sectors.

P=======P=======P=======
=V=======V=======V======

Edited 2010-02-14 12:20 UTC

Reply Score: 2

RE[2]: Comment by Kroc
by Kroc on Sun 14th Feb 2010 12:27 UTC in reply to "RE: Comment by Kroc"
Kroc Member since:
2005-11-10

Reading the Aandtech article, it appears intended for 1TB+ drives, but as you say, it could apply to drives of any size if WD decided to adopt it across the board.

Reply Score: 1

RE[2]: Comment by Kroc
by f0dder on Sun 14th Feb 2010 14:09 UTC in reply to "RE: Comment by Kroc"
f0dder Member since:
2009-08-05

Hm, how is LBA addresses specified exactly?

I thought it was specified as multiples of drive sector size, in which case addressing via LBA should always be aligned - but is it in reality always a multiple of 512?

Is there a difference between how LBAs in the MBR are interpreted, and how ATAPI interprets LBAs?

Reply Score: 1

RE[3]: Comment by Kroc
by f0dder on Sun 14th Feb 2010 14:21 UTC in reply to "RE[2]: Comment by Kroc"
f0dder Member since:
2009-08-05

...but perhapse the 4k-sector WD drives are currently running in 512b-sector emulation mode? That would at least explain the "LBA weirdness" ;)

Reply Score: 1

RE: Comment by Kroc
by theosib on Sun 14th Feb 2010 15:06 UTC in reply to "Comment by Kroc"
theosib Member since:
2006-03-02

My drives are 1GB, but I'm sure it'll affect any 4K-sector drive the same.

Reply Score: 2

v RE: Comment by Kroc
by chris_l on Sun 14th Feb 2010 15:48 UTC in reply to "Comment by Kroc"
RE[2]: Comment by Kroc
by r00kie on Sun 14th Feb 2010 17:14 UTC in reply to "RE: Comment by Kroc"
r00kie Member since:
2009-12-10

As if gparted (which is just a pretty gui + parted) can't screw things up too. fdisk and the likes are for people that know what they are doing.

In case you don't know there are distros that have text mode installs and use utilities like fdisk/cfdisk. Just because you can use gparted it doesn't mean you have to.

Reply Score: 1

RE[2]: Comment by Kroc
by FishB8 on Sun 14th Feb 2010 18:02 UTC in reply to "RE: Comment by Kroc"
FishB8 Member since:
2006-01-16

I use it all the time. Furthermore I actually use gdisk so that I can create hybrid MBR-GPT partitions. Parted (and gparted) totally trashes hybrid MRB-GPT setups.

Reply Score: 2

RE[2]: Comment by Kroc
by Quag7 on Mon 15th Feb 2010 15:29 UTC in reply to "RE: Comment by Kroc"
Quag7 Member since:
2005-07-28

I've never used anything but fdisk, and never had any bugs or problems with it. But then I run Gentoo and don't use the installer.

Reply Score: 2

fdisk and DOS-compatiple partitions
by Lennie on Sun 14th Feb 2010 11:06 UTC
Lennie
Member since:
2007-09-22

It's not so much Linux, it's the DOS-compatible partition that fdisk creates.

If you don't need DOS-compatiblility, you wouldn't have a problem.

It's a DOS/Windows-compatibility thing you are trying to attribute to Linux. DOS/Windows has a problem, Linux just tries to be compatible.

As you said, parted does it just fine.

Edited 2010-02-14 11:09 UTC

Reply Score: 11

bralkein Member since:
2006-12-20

The fdisk man page (on my machine at least) contains a lengthy warning about how fdisk will quite happily create some pretty dodgy partition layouts and it recommends parted for doing anything even remotely unusual. I guess this falls into that category.

All the major noob-friendly distros use gparted for doing the partition editing, don't they? Will that protect users from these kinds of problem, then?

Reply Score: 4

WereCatf Member since:
2006-02-15

All the major noob-friendly distros use gparted for doing the partition editing, don't they? Will that protect users from these kinds of problem, then?

I do consider Mandriva to be pretty newbie-friendly all in all; it's clear, consistent, and provides an extensive selection of documentation and loads of online help if needed. Also, it's really stable and has excellent control center utility.

But alas, Mandriva doesn't actually use gparted. They use some sort of a tool of their own which apparently uses libparted as its backend. As far as I know quite a few distros actually do it that way. But as the article states, you seemingly have to use "--align optimal" option which does the right thing. It doesn't automatically align the partitions properly without that. And I have no idea if those custom partitioning tools employed by various distros pass such an option to libparted. If they don't then that'll be a very important issue to fix immediately.

I'd actually prefer if distros rolled out an update of some sort which will check the currently installed system and its partitioning scheme and warn if they are misaligned and would provide a way of fixing it; not everyone re-installs their system all the time and as such could be using misaligned partitions for years before next re-install.

Reply Score: 3

Lennie Member since:
2007-09-22

I think it will be mostly ok if they use it on new installs from now on, these WDs are not widely available on the market yet (and when you do buy them: their is a BIG warning on the front).

Reply Score: 2

Lennie Member since:
2007-09-22

gparted get it's right:

"When enabled, Round to cylinders aligns partition boundaries on the cylinder boundaries for the disk device. Enabled is the default setting."

The Ubuntu installer uses Partman from Debian, which uses parted in the background.

So that's a start.

But if parted will do it right when run from partman, I don't know yet.

Edited 2010-02-14 13:19 UTC

Reply Score: 2

Lennie Member since:
2007-09-22

I just had a look at what partman does and what parted does. Parted does align by default as well, just like gparted. And partman just passes the MB's to parted, if I looked in the right places. That means it will do the right thing by default I think.

Reply Score: 2

theosib Member since:
2006-03-02

Aligning partition to "cylinder" boundaries is BAD. The cylinders are fake and they're in units of 63.

Reply Score: 2

Lennie Member since:
2007-09-22

You could be right.

I guess my brain is off because it's weekend.

Reply Score: 2

bralkein Member since:
2006-12-20

Yeah, Mandriva would certainly fall within the category of distros which should sort all of this stuff out automatically without the user having to worry. Distros like Arch, Gentoo & Slackware all generally expect their users to be aware of the technical issues. But for the mainstream distros this does need to be fixed.

Maybe we could take a look at our respective distros and file a bug report if there could be an issue.

Reply Score: 2

modmans2ndcoming Member since:
2005-11-09

Mandriva was the first distro to make partitioning the hard drive mortal friendly.

Mandrake Linux 7.0 was when they first released it and I recall thinking "Holy crap, this thing needs to be sold separately"

I went so far as to use the install disk up to the partition step to repartition my hard drives for a while.

Reply Score: 2

akajeff Member since:
2010-02-14

So, since you're using Linux, wouldn't it behoove you to use the GPT (GUID Parition Table) scheme which handles, by design, the new block size?

ref: http://en.wikipedia.org/wiki/GUID_Partition_Table

Reply Score: 2

bralkein Member since:
2006-12-20

Well apart from anything, fdisk doesn't support GPT. Parted does, but as he said, you can get parted to automatically solve the problem anyway.

The problem as TFA describes it is that a bunch of the tools & tutorials out there today will give you bad results if you use them with the new block size. They will probably give you bad results if you use them with GPT, too.

Edited 2010-02-14 17:57 UTC

Reply Score: 2

Brendan Member since:
2005-11-16

Hi,

It's a DOS/Windows-compatibility thing you are trying to attribute to Linux. DOS/Windows has a problem, Linux just tries to be compatible.


No.

Very old disk drives used "CHS" (Cylinders, Heads, Sectors) instead of LBA. Due to limitations this didn't work for drives larger than about 500 MiB, so the industry shifted to LBA; and created a CHS->LBA translation scheme.

Due to BIOS limits, this CHS->LBA translation scheme usually uses "63 sectors per track", which is the highest number of sectors per track that the old BIOS disk interface can handle.

For performance reasons OSs make partitions that start/end on track boundaries (having a few sectors at the start or end of a partition that are on a track by themselves causes more disk head movement).

Basically what I'm saying is that the problem wasn't caused by *any* OS. The problem is caused by 30 years of backward compatibility (and the lack of foresight, from BIOS, disk and OS designers).

The ironic part is that the original IBM design supported floppy disks and hard drives with different sector sizes. It's unfortunate that this aspect of the original design was lost, and unfortunate that these new hard drives need to emulate 512-byte sectors to begin with.

-Brendan

Reply Score: 11

Sorry to nitpick
by s-peter on Sun 14th Feb 2010 11:06 UTC
s-peter
Member since:
2006-01-29

Thanks for the interesting and informative article! Should be careful when installing my next pair of drives...

Just one minor observation: can you actually have a "230% performance loss?" That sounds as if the performance of the system turned negative... I think it would be clearer to say 230% overhead (in operation time) or 70% performance loss (in average throughput).

Reply Score: 2

Discussion on util-linux-ng mailing-list
by Idefix on Sun 14th Feb 2010 11:10 UTC
Idefix
Member since:
2010-02-14

There recently was a discussion about this on the util-linux-ng mailing list:
http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/2926

Reply Score: 2

Try with .32 ?
by jaymzh on Sun 14th Feb 2010 13:13 UTC
jaymzh
Member since:
2010-02-14

The posts on the fdisk list seem to imply that the version of fdisk you are using will do the right thing provided you're using a .32 kernel that can properly report the disk topology. Could you test with that?

Here's the post I'm referring to:

http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/2926

Reply Score: 1

RE: Try with .32 ?
by malxau on Sun 14th Feb 2010 14:17 UTC in reply to "Try with .32 ?"
malxau Member since:
2005-12-04

The posts on the fdisk list seem to imply that the version of fdisk you are using will do the right thing provided you're using a .32 kernel that can properly report the disk topology. Could you test with that?

Here's the post I'm referring to:

http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/2926


Reporting disk topology requires the hardware to also communicate that topology. AFAIK many of these drives do not. Christoph Zimmermann summarized the same observation in the thread you point to:

alexander an I have got the same conclusion on these topic:
- it is a must that the partitions are aligned correctly to 4KiB
boundries. else the drive is unusable slow.

- the drive does NOT report its physical sector size. so doing proper
programming is not enough.


It looks like the discussion in that thread is about aligning partitions by default on a sufficiently large granularity to "get by" (as Vista and above do.) Note that other technologies (eg. SSDs, RAID) benefit from large alignment (larger than 512 bytes or 4Kb.)

Reply Score: 1

RE: Try with .32 ?
by theosib on Sun 14th Feb 2010 15:02 UTC in reply to "Try with .32 ?"
theosib Member since:
2006-03-02

I was using a 2.6.32 kernel. It clearly did not "do the right thing."

Reply Score: 2

geez
by sj87 on Sun 14th Feb 2010 13:24 UTC
sj87
Member since:
2007-12-16

I did battle with this issue just this morning. I had to manually configure the partitions to both begin and be length of multiples of eight using parted.

I didn't actually suffer of major read/write performance loss, only some random reads made the drive go crazy and very slow. Write speed topped at 70 MB/s, and after setting the partitions well-aligned it has gone up to 80 MB/s.

It has been suggested that WD might internally offset block addresses by 1 so that LBA 63 maps to LBA 64. -- I performed a test that demonstrates that WD has not done this


You're both wrong and wrong. It's a hack mode that can be enabled by connecting pins 7-8 on the HDD. But of course it's not enabled by default, because it would totally screw up every other OS.

Edited 2010-02-14 13:25 UTC

Reply Score: 3

RE: geez
by soulrebel123 on Sun 14th Feb 2010 17:28 UTC in reply to "geez"
soulrebel123 Member since:
2009-05-13

I think I might have the same problem. What model are you talking about?
mine is WD15EADS.

Perhaps not only linux is not ready. They are not ready even at WD support.

Reply Score: 1

RE[2]: geez
by sj87 on Mon 15th Feb 2010 14:10 UTC in reply to "RE: geez"
sj87 Member since:
2007-12-16

I think I might have the same problem. What model are you talking about?
mine is WD15EADS.


Currently the only "Advanced Format" disks (as WD calls it) are the Green models ending in "EARS". WD15EADS is of the previous generation and therefore not affected.

The Advanced Format disks are called WD10EARS, WD15EARS and WD20EARS.

Hm, how is LBA addresses specified exactly?

I thought it was specified as multiples of drive sector size, in which case addressing via LBA should always be aligned - but is it in reality always a multiple of 512?


I read that every major HDD manufacturer has agreed to using a 512-byte-emulation mode until the end of 2014.

I also read that there is a way for software to ask the disk about its real physical layout but that Western Digital hasn't implemented such feature in its current line of 4k disks. Therefore no software can detect them and take care to stay aligned.

Edited 2010-02-15 14:11 UTC

Reply Score: 1

RE[3]: geez
by soulrebel123 on Tue 16th Feb 2010 17:57 UTC in reply to "RE[2]: geez"
soulrebel123 Member since:
2009-05-13

I know it is not supposed to be an advanced format drive, but running the code at the and of the article indeed shows performance differences with alignment other than 0 and 8. (I ran it on a partition starting at sector 64)

Also the disk freezes a lot when doing io.

I am sending it back.

Reply Score: 1

RE[3]: geez
by kjmph on Wed 17th Feb 2010 18:12 UTC in reply to "RE[2]: geez"
kjmph Member since:
2009-07-17

Where did you get this info? If you read WD's site:

http://www.wdc.com/en/products/products.asp?driveid=773

Formatted Capacity 2,000,398 MB
Capacity 2 TB
Interface SATA 3 Gb/s
User Sectors Per Drive 3,907,029,168

That's 512 byte sectors, model # WD20EARS

Reply Score: 1

'Parted Magic' uses gparted and parted
by slysir on Sun 14th Feb 2010 13:29 UTC
slysir
Member since:
2010-02-14

'Parted Magic' uses core programs of GParted and Parted to handle partitioning tasks with ease, while featuring other useful programs (e.g. Partimage, TestDisk, Truecrypt, G4L, SuperGrubDisk, ddrescue, etc...).

If you ever used PartitionMagic with windows, 'Parted Magic' is a superior linux partitioning tool that you can use from a cd, usb or load it from its own directory on the drive.

http://partedmagic.com/

Reply Score: 1

fdisk options
by r00kie on Sun 14th Feb 2010 13:46 UTC
r00kie
Member since:
2009-12-10

I don't know if you have tried it or not but would be nice to know if fdisk -b 4096 /dev/sd? does make any difference. Also it would be good to know if other similar utilities like cfdisk create unaligned partitions or not.

Reply Score: 1

v XBOX
by Pr3st00 on Sun 14th Feb 2010 14:32 UTC
v Please, oh please???
by arkeo on Sun 14th Feb 2010 14:52 UTC
RE: Please, oh please???
by theosib on Sun 14th Feb 2010 15:05 UTC in reply to "Please, oh please???"
theosib Member since:
2006-03-02

I made a mistake by leaving "be" out of my sentence. I think you're smart enough to make the appropriate inference. This was not intended to be a professional publication, so I didn't bother editing it 20 times and having it proof-read by lots of other people.

Reply Score: 6

RE: Please, oh please???
by siride on Sun 14th Feb 2010 15:18 UTC in reply to "Please, oh please???"
siride Member since:
2006-01-02

Aside from forgetting a word ("should /be/ unaffected"), I see no problems with what he wrote.

Reply Score: 3

RE: Please, oh please???
by rockwell on Sun 14th Feb 2010 15:55 UTC in reply to "Please, oh please???"
rockwell Member since:
2005-09-13

wow, he forgot one word, and therefore he's incapable of communicating in English?

Please, drink a fifth of scotch and calm the hell down.

Reply Score: 7

RE: Please, oh please???
by BluenoseJake on Sun 14th Feb 2010 17:29 UTC in reply to "Please, oh please???"
BluenoseJake Member since:
2005-08-11

That seemed pretty understandable to me, perhaps you need to bone up on your reading comprehension.

Reply Score: 5

RE: Please, oh please???
by arkeo on Mon 15th Feb 2010 16:23 UTC in reply to "Please, oh please???"
arkeo Member since:
2008-04-21

I would like to apologize to you all.

I was angry for other reasons.

I just lost my job.

I love OSNews, and I hate grammar mistakes on the front page. A little more care by the authors, that's all I'm asking for.

Cheers...

Reply Score: 1

Alter your output buffer size
by pagerc on Sun 14th Feb 2010 15:12 UTC
pagerc
Member since:
2009-10-27

So using cp is about as braindead as rm -rf /* for testing disk io. Its all about the block size that's read/written which in the case of cp is 1 character at a time. Something like dd or tar would provide a better metric for streaming writes. tar -cpf - some_path/ | tar -xpf - -C /path/to/final/destination

Or you can use dd which allows you to slice and dice and adjust block sizes trivially, then you can write to a raw block device and see what it can do sans filesystem crap.

An interesting test would use variable block sizes of 512, 768, 1024, 2048, 4096, 8192, 16384, which will show an odd output block size at 768, and the performance of 1 and 2 bit shifts above and below the new block size. Just to show how brain dead a block size of 1 is, I am throwing that in here too.

for BS in 1 512 768 1024 2048 4096 8192 16384 ; do
for SKIP in 0 1 2 4 8; do
dd if=/dev/zero of=/dev/sdc bs=${BS} seek=${SKIP} count=1024k
done
done

Reply Score: 0

RE: Alter your output buffer size
by siride on Sun 14th Feb 2010 15:20 UTC in reply to "Alter your output buffer size"
siride Member since:
2006-01-02

Just make sure to do that on a disk whose data you don't care about. I only say this in case some naive user decides to test their primary hard-drive in this fashion and ends up destroying the first X megabytes of the drive (including bootloader and partition tables).

Reply Score: 2

RE: Alter your output buffer size
by smashIt on Sun 14th Feb 2010 15:22 UTC in reply to "Alter your output buffer size"
smashIt Member since:
2005-07-06

So using cp is about as braindead as rm -rf /* for testing disk io. Its all about the block size that's read/written which in the case of cp is 1 character at a time.


it's the duty of the blockdevice-driver and/or the filesystem to collect several such manipulations before writing them to the disk

Reply Score: 2

RE: Alter your output buffer size
by legume42 on Mon 15th Feb 2010 07:41 UTC in reply to "Alter your output buffer size"
legume42 Member since:
2006-08-30

Well, no. cp doesn't do 1 char at a time, it tries to minimize io. It might not be clear from the code though. Even stdio keeps an internal io buffer you can't normally see.

A quick truss/strace on recent FreeBSD/CentOS/Solaris shows 64k buffers/4k buffers/mmap the whole darned file approaches.

Reply Score: 1

is this something partitioning fixes?
by graigsmith on Sun 14th Feb 2010 15:14 UTC
graigsmith
Member since:
2006-04-05

you could get around this type of issue by partitioning your drive right?

Reply Score: 2

Cyphase
Member since:
2010-02-14

I just got a WD15EARS (1.5 TB, SATA 3 Gb/s, 64 MB Cache) a couple of days ago. I formatted it as ext3 and have filled it ~90% at ~20MB/sec. Is there anyway (please-oh-please-oh-please) that I can fix this in-place, i.e. without having to reformat? It must be technically possible..

Reply Score: 1

GNU Parted
by lemur2 on Mon 15th Feb 2010 02:19 UTC
lemur2
Member since:
2007-02-17

I have two drives, /dev/sdc and /dev/sdd, both identical Green drives. I partitioned them as follows:

For /dev/sdd, I used fdisk to add a Linux (0x83) primary partition, taking up the whole disk, using fdisk defaults. By default, the partition starts at LBA 63.

For /dev/sdc, I used fdisk the same as with sdd, but after creating the partition, I realigned it. I did this by entering expert mode ("x"), then setting the start sector ("b") to 64.


http://www.gnu.org/software/parted/faq.shtml

Does GNU Parted support physical sector sizes not equal to 512?
Starting from 1.7, GNU Parted will automatically align partitions to the physical sector size reported by an ATAPI-compliant drive.


Surely you should be using parted to partition drives, and not fdisk?

Reply Score: 2

RE: GNU Parted
by theosib on Mon 15th Feb 2010 04:27 UTC in reply to "GNU Parted"
theosib Member since:
2006-03-02

This drive does not in any way report its physical sector size. It says its sectors are 512 and that's it.

Reply Score: 2

RE[2]: GNU Parted
by darknexus on Mon 15th Feb 2010 08:18 UTC in reply to "RE: GNU Parted"
darknexus Member since:
2008-07-15

Shouldn't WD correct that then? How is a disk tool supposed to partition a drive properly if the drive itself is reporting incorrect data? If the physical sectors are 4096, the drive should report 4096 shouldn't it?

Reply Score: 2

RE[3]: GNU Parted
by smitty on Mon 15th Feb 2010 09:28 UTC in reply to "RE[2]: GNU Parted"
smitty Member since:
2005-10-13

I believe the whole point is to emulate the 512b sectors so that legacy OS's will work. The drives can't query the OS and then modify how they report themselves depending on what is supported. They do provide a jumper so you can manually turn the legacy emulation on or off, but most people aren't going to mess with that.

Reply Score: 2

RE[3]: GNU Parted
by theosib on Mon 15th Feb 2010 15:41 UTC in reply to "RE[2]: GNU Parted"
theosib Member since:
2006-03-02

We have knowledge of the problem. We can deal with it, regardless of how the drive lies.

Reply Score: 2

RE[4]: GNU Parted
by darknexus on Tue 16th Feb 2010 00:13 UTC in reply to "RE[3]: GNU Parted"
darknexus Member since:
2008-07-15

Yes it can be dealt with, yet at the same time it seems as though these drives should be able to report the correct geometry when queried properly. That would mean the partitioning tools would be aware of it from the start rather than having to manually deal with the problem. Most people I know, even ones with good technical knowledge, wouldn't have known how to handle this one as they don't delve that deep into drive partitioning. For the sake of avoiding trouble whenever possible the drive should report the geometry properly when queried by an os that knows how to ask for the *real* geometry and not that ridiculous LBA compatibility hack we've had to live with for so long thanks to bios and Windows.

Reply Score: 2

RE[2]: GNU Parted
by mkpetersen on Mon 15th Feb 2010 14:01 UTC in reply to "RE: GNU Parted"
mkpetersen Member since:
2010-02-15

First of all it's important to distinguish between logical block size which is used when sending commands to a device and the physical block size which is used by the device internally.

Linux has supported (SCSI) drives that present 4KB logical block sizes for a long time. For compatiblity with legacy OS'es, however, consumer grade ATA drives with 4KB physical blocks continue to present a 512-byte logical block interface. The knob indicating that the drive has 4KB physical blocks is orthogonal to the logical block size reporting, allowing the information to be communicated without interfering with legacy OS'es like XP that only know about 512-byte sectors.

We have worked closely with disk manufacturers for a long time to make sure we were ready. Western Digital have been instrumental in the ATA specification in terms of the alignment and physical block size parameters. The engineering sample drives I have received from WDC have all implemented the physical block size knobs correctly. Which makes it even more baffling that they end up shipping an advanced format drive that gets it wrong. I have no idea why they did that. The location of the block size information in IDENTIFY DEVICE is unlikely to be inspected by legacy systems, so I highly doubt it's a compatibility thing. Brown paper bag time for Western Digital...

It is true that the effects of this particular drive reporting incorrect information could have been mitigated by a 1MB default alignment. However, that would still have caused misalignment for other drives that come wired with 1-alignment to compensate for the legacy DOS sector 63 offset. So blindly aligning to 1MB won't cut it. Windows Vista/7 don't do that either. Like Linux, they compensate based upon what the drive reports.

Linux 2.6.31 and beyond will report device alignment and physical block size for all block devices. It is then up to the userland partitioning utilities etc. to adjust start offsets accordingly. You'll find that both parted and util-linux-ng have been updated to do this. And that modern fdisk will in fact align on a 1MB (+/- drive alignment) boundary by default.

Caveat being that Fedora is the only community distribution I know of that's using the updated bits. I don't think all of them made it into Fedora 12 but I'm sure Fedora 13 will do the right thing.

So I encourage you to work with your distribution vendor to ensure they start shipping recent partition tooling.

Martin K. Petersen
Kernel Developer, Oracle Linux Engineering

Reply Score: 5

Fix <pre>/<code> please?
by coolvibe on Mon 15th Feb 2010 11:35 UTC
coolvibe
Member since:
2007-08-16

Because all the #include statements in the source code snippet are empty.

Reply Score: 1

TRU RND TEST
by pavlinux on Mon 15th Feb 2010 21:53 UTC
pavlinux
Member since:
2010-02-15

[code]
#define _LARGEFILE64_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define LIMIT 1000

char buffer[4096];

int main(int argc, char *argv[]) {

int fd, i, off;
long bk[LIMIT], byte;

if (argc<2) {
off = 0;
} else {
off = atoi(argv[1]);
}
srandom(off);

/* fill array of randoms */
for (i = 0; i < LIMIT; i++) {
*(bk+i) = random() % 2000000000;
}

*bk = 0; /* goto begin */
off *= 512; // mul
off += 4096 // add

fd = open("/dev/sds", O_RDWR | O_SYNC);
printf("fd = %d", fd);

for (i = 0 ; i < LIMIT; i++) {
byte = bk * off;
lseek64(fd, byte, SEEK_SET);
write(fd, buffer, 4096);
}

close(fd);

return 0;

}
[/code]

[i]Edited 2010-02-15 22:07 UTC

Reply Score: 1

proper sectors/track setting
by pietrek on Mon 15th Feb 2010 22:10 UTC
pietrek
Member since:
2010-02-15

Wouldn't simply setting proper sectors/tracks option do the job? Like it's described here: http://www.ocztechnologyforum.com/forum/showthread.php?48309-Partit...

Reply Score: 1

LVM does alignment?
by pauld on Tue 16th Feb 2010 13:46 UTC
pauld
Member since:
2006-02-24

I've heard that LVM does a bit of the alignment for you, when creating a new logical volume. Not sure if that would help here, it would plea for use of logical volumes even in simple setups.

Reply Score: 1

the_olo
Member since:
2006-10-19

Hi!

In the util-linux-ng mailing thread that some commenters have already mentioned here (thread started by myself, BTW), I did a test similar to yours, only fully automated and using a ready-made benchmark named PostMark which is quite well suited for exposing this particular performance problem:

http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/2926/fo...

This benchmark script is able to automatically expose the optimal partition offset that offers the best performance.

In the same thread, you can read that the util-linux-ng guys have already committed a fix for this issue in fdisk:

http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/2926/fo...

What's left to be fixed now is parted:

http://parted.alioth.debian.org/cgi-bin/trac.cgi/ticket/251

Reply Score: 1

WHY didn't you use 4k blocksize?
by Tuxie on Tue 16th Feb 2010 16:12 UTC
Tuxie
Member since:
2009-04-22

Why didn't you use 4k blocksize? That should also make a big difference! Just supply "-b 4096" to mkfs.

Reply Score: 1

RE: WHY didn't you use 4k blocksize?
by sj87 on Tue 16th Feb 2010 19:53 UTC in reply to "WHY didn't you use 4k blocksize?"
sj87 Member since:
2007-12-16

Why didn't you use 4k blocksize? That should also make a big difference! Just supply "-b 4096" to mkfs.


4k blocksize is the default value, not something that needs to configured manually.

Reply Score: 1

Great article.
by Tuishimi on Tue 16th Feb 2010 16:17 UTC
Tuishimi
Member since:
2005-07-06

Thank you!

Reply Score: 2

This is true for GUID partion table?
by twitterfire on Wed 17th Feb 2010 14:40 UTC
twitterfire
Member since:
2008-09-11

Does this issue affect both GPT and MBR partition tables?

Reply Score: 1