How to enable case sensitivity for NTFS support for folders

Thom Holwerda 2018-05-29 Windows 74 Comments

Although you can now run a number of Linux distros natively on Windows 10, this integration has been a little tricky when it comes to handling filename case, as Linux is case sensitive and Windows is not.
In order to overcome this limitation, starting with the Windows 10 April 2018 Update (version 1803), NTFS includes a new flag that you can enable on a per-folder basis allowing the file system to treat files and folders as case sensitive.

I’m sure there are countless technical reasons as to why case sensitive is the preferred route to go, but is there a case to be made for case insensitivity being simpler and less confusing to use?

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

74 Comments

2018-05-29 11:53 pm
bartgrantham
If all you have are Western languages maybe, but “case” is not a universal language construct. It’s meaningless to do a case-insensitive compare of Arabic, for example.
Magical “do what I want, not what I say” behavior in a filesystem is already dangerous enough, having that behavior hinge on the character set is even worse.
Edit: I read too fast. You’re asking for good reasons for genuine case-insensitivity in file system operations. I can’t think of any.
Edited 2018-05-29 23:59 UTC

2018-05-30 12:29 am
Alfman verbose=1
bartgrantham,
If all you have are Western languages maybe, but “case” is not a universal language construct. It’s meaningless to do a case-insensitive compare of Arabic, for example. [/q]
I agree with your take on case sensitivity not being a universal construct. Also, case insensitive string comparison in unicode (to say nothing of accents etc) is non-trivial whereas a binary comparison is absolutely trivial.
[q]Edit: I read too fast. You’re asking for good reasons for genuine case-insensitivity in file system operations. I can’t think of any.
On the other hand, I generally feel that it is bad practice for multiple files to be in a directory differing only by letter case. Same thing with URLs for that matter. Prior to tab completion and GUIs, having the type files in the right case (ie via command line or ftp, etc) was a major annoyance of unix. “Do what I want and not what I say” would generally lean towards case insensitivity IMHO.
The overwhelming majority of the time I want my text searches to match both upper and lower case letters. I find it annoying that unix text editors use binary search by default. Maybe it’s just me?

2018-05-30 5:42 pm
acobar
I find it annoying that unix text editors use binary search by default.
One man’s trash is another man’s treasure.
I absolutely love the fact that I can search with or without case sensitivity on pretty much every editor I use under Linux as it allows me to more easily filter out noise results.
Anyway, on any decent editor in Linux, it is a click away and can be made default. By the way, the KDE file manager has an option to do the same.
Also, I have an old habit of naming files with some characteristics with case consideration. I love it because, on directories with many files, it helps sort them the way I like.
For example, all music tracks I have are always {artist}-{title}[-{differentiator}], with initials capitalized (and I move track number, album, genre and other information to ID3/metadata tags). For songs with lyrics files, I convert things to lower case. I do it because the name of the files never get too large and also because I can filter them very fast.
I do similar things in my projects.

2018-05-30 6:20 pm
Alfman verbose=1
acobar,
I absolutely love the fact that I can search with or without case sensitivity on pretty much every editor I use under Linux as it allows me to more easily filter out noise results. [/q]
Both searches have merit, but for me it’s just the default that sucks. I think case insensitivity is a better default for searching text, the risk of missing something that I meant to find is worse than the risk of matching something with the wrong case. If a search returns too much and “find next” is a hassle, only then would I switch to case sensitive search.
[q]Anyway, on any decent editor in Linux, it is a click away and can be made default. By the way, the KDE file manager has an option to do the same.
Possibly. I mostly use command line tools from ssh server connections. I’ve already encountered other scenarios where changing the defaults helps (like fixing broken vim auto indentation), but it’s just a pain to have to do so over and over again across so many servers. I need a way to auto deploy my preferences when I connect, but that could confuse other users though, haha. “What the hell, I’ve done the same thing on these two servers and the results are different!”

2018-06-05 1:07 am
zima
I generally feel that it is bad practice for multiple files to be in a directory differing only by letter case. Same thing with URLs for that matter. Prior to tab completion and GUIs, having the type files in the right case (ie via command line or ftp, etc) was a major annoyance of unix. “Do what I want and not what I say” would generally lean towards case insensitivity IMHO.
The overwhelming majority of the time I want my text searches to match both upper and lower case letters. I find it annoying that unix text editors use binary search by default. Maybe it’s just me?
Not just you; I also think case insensitive is a better fit to most user interactions… But for example for some reason Rockbox (free software firmware/OS for mp3 players, rockbox.org ) is by default case sensitive when sorting files, possibly because devs are on Linux… luckily, in this case the behaviour can be toggled.

2018-05-30 5:07 am
l3v1
If all you have are Western languages maybe, but “case” is not a universal language construct. It’s meaningless to do a case-insensitive compare of Arabic, for example.
That’s no excuse not having support for it in general. However, over the years, I’ve come to accept that complaining for Windows’ lack of support for lots of things doesn’t matter much. But sometimes I still get angry about dozens of small things (like this particular example) that would’ve made life on Windows easier. Yes, there’s some support for this right now, but come on, we had to wait until 2018 for this? They can add dozens of complicated features, but this was “planned” for 2018? It’s a “fun” OS where the support of such things raises to the level of news.

2018-05-30 12:13 am
FlyingJester
I believe the two main reasons are:
* Mac and Windows are both case-insensitive by default (lacking this and not using UFS in OS X, assuming that is still an option like it used to be), so it’s more “normal” to be case insensitive.
* Case doesn’t distinguish names: “Nicholas Cage” and “NICHOLAS CAGE” are clearing the same thing in meaning.
I’m not saying those are good reasons, but those are the reasons I hear the most.

2018-05-30 5:08 am
l3v1
Mac and Windows are both case-insensitive by default (lacking this and not using UFS in OS X, assuming that is still an option like it used to be), so it’s more “normal” to be case insensitive.
Or, you could just as well say both are really dumb for not supporting it from day 1.

2018-05-30 4:53 pm
FlyingJester
Mac always did support it, through UFS. I’ve had Panther installed using UFS in the past, and it worked fine. I’m sure there are applications out there that would not have worked, but I did not notice any.

2018-05-30 5:13 am
Kochise
Just some food for thought :
2006 https://www.hanselman.com/blog/SubversionCasesensitivityProblems.asp…
2011 https://superuser.com/questions/266110/how-do-you-make-windows-7-ful…
2014 https://github.com/owncloud/client/issues/1348
2014 https://github.com/syncthing/syncthing/issues/430
2014 https://stackoverflow.com/questions/20969987/dropbox-unicode-encodin…
2016 https://www.endpoint.com/blog/2016/01/07/file-names-same-except-for
2017 https://www.dropboxforum.com/t5/Syncing-and-uploads/Suddenly-have-ma…
Looks like an all-time problem. And you also have UTC to handle.
2018-05-30 6:13 am
galvanash
Case doesn’t distinguish names: “Nicholas Cage” and “NICHOLAS CAGE” are clearing the same thing in meaning.
That isn’t universally true… I know its a nitpick, but “kb” (kilobit) and “KB” (kilobyte) mean two entirely different things. There are probably other examples regarding abbreviations, although you can blame most of them on the metric system
Or what about “Polish” and “polish”? Those are two entirely different words…
Just saying, there are actually a few corner cases where the case of letters completely changes the meaning of a word or phrase. Not many at all, and probably not enough to matter, but they do exist.
https://en.wikipedia.org/wiki/Capitonym
ps. As noted in link above, in German all nouns (not just proper nouns) are capitalized, so this is actually a very common thing to run into in German.
Edited 2018-05-30 06:22 UTC

2018-05-30 2:28 pm
Alfman verbose=1
galvanash,
That isn’t universally true… I know its a nitpick, but “kb” (kilobit) and “KB” (kilobyte) mean two entirely different things. There are probably other examples regarding abbreviations, although you can blame most of them on the metric system
You should be right about kb and kB, but but realistically we are so inconsistent in practice that assuming the author’s intended meaning based on letter case is not a sure thing. I don’t know if you realize it or not, but you took the liberty of arbitrarily flipping the case of the “K” to write “KB”, which is technically a wrong SI unit.
The pedantic interpretation of “mb” would be “milli-bits”, although it’s far more likely the author actually means megabytes and didn’t bother with letter casing. The incorrect use of letter cases is so prevalent that we can’t really make assumptions based on the case of a letter. We can blame the users, vendors, and even technical users for using the wrong case, but I think it’s futile and in hindsight it was a bad idea to have upper and lower case of “B” to represent bytes and bits.

2018-05-30 8:25 am
Lobotomik
“BegoÃ±a FernÃ¡ndez”, “BegoÃ±a Fernandez”, “BEGOÃ‘A FERNANDEZ” and “BEGOÃ‘A FERNÃNDEZ” are the same person too. Also “BEGONA FERNANDEZ”, which is very wrong but enforced, for example, by airlines. And rules start not being so obvious.
In Spanish, diacriticals are often ommitted in upper case letters, though for the sake of correctness they shouldn’t. It is very bad style to ommit them from lower-case letters, but still it is sometimes done in computerworld because som many computer systems choke on them. And what do you do with the “Ã±”? It is considered a real letter, not an “n” with a “~”. Oh, and sorts should always be both case and diacritical-insensitive (“Ã±” is between “n” and “o”).
And that is just in Spanish, which is quite a simple case; surely Cestina, Hungarian or Viet make things vastly more complex. Not to talk about Arabic, or other languages with more than two cases. Maybe, in some, case can change meaning. Case insensitivity is a complex subject, not a just a matter of substracting 0x20 from each char.

2018-05-30 1:31 pm
panzi
And in Unicode you can write letters like Ã¶ in 2 different ways: as NFD (2 codepoints) or NFC (1 codepoint). Are they the same file name? For crappy (=basically all) American and UK systems I have to write my last name as Panzenboeck instead of PanzenbÃ¶ck. Is that the same file name?

2018-06-01 10:50 pm
ajs124
There isn’t only the differentiation between normalizations (Normalformen, whatever), but also the fact that o != Ð¾. But that’s a whole different problem.

2018-05-30 12:18 am
evert
This makes life easier for me. Unison and Synthing are great sync tools but this is one issue that was never really solved because of the underlying filesystem case handling.
2018-05-30 1:00 am
galvanash
I’m sure there are countless technical reasons as to why case sensitive is the preferred route to go, but is there a case to be made for case insensitivity being simpler and less confusing to use?
Simplest answer is its not always ideal to tie your file systems to a specific locale, and without doing that, case folding is virtually impossible (and you have to do case folding if you want a case-insensitive but case-preserving file system). Even when you DO tie to a specific locale, the rules are often complex and sometimes ambiguous.
You don’t always name your files, sometimes someone else did. For example, take these three words:
MaÃŸe
MASSE
Masse
In case-insensitive English, the first word is distinct from the others, but the last two are equal. However, in Swiss, the first and third are equivalent, but in German the first and second are (but not the third)…
Its easy to say “just use the user’s locale”, but that often leads to naming issues between their files and files that come from 3rd parties. This problem doesn’t happen in case-sensitive file systems, because all 3 of the above are distinct.
Of course this isn’t the most common problem in the world, but it IS a problem that case-sensitive file systems handily avoid.
I personally don’t find case-sensitive file systems confusing, but I understand the argument…
Edited 2018-05-30 01:11 UTC

2018-05-30 7:11 am
Drumhellar
Too bad I commented before I voted your comment up.
I actually came here to argue in favor of case-insensitivity, but your post completely changed mind. My opinion was definitely Western-centric, and specifically English-centric.
So, thanks for that.
2018-05-30 1:54 pm
mkone
I think that is a good argument for case preserving file naming, but not for case sensitivity.
By all means, filesystems should preserve the case that the user chose, but I think the case for case sensitive filesystems is really a case of the tail wagging the dog. Case sensitivity causes more problems than it solves in my opinion.

2018-05-30 2:51 pm
Alfman verbose=1
mkone,
I think that is a good argument for case preserving file naming, but not for case sensitivity.
By all means, filesystems should preserve the case that the user chose, but I think the case for case sensitive filesystems is really a case of the tail wagging the dog. Case sensitivity causes more problems than it solves in my opinion.
+1 for bringing this up. Case preserving is obviously very useful, yet it doesn’t imply case sensitivity. I find it bad practice to store multiple files differing only in letter case. It creates confusion and doesn’t seem like something I’d want to do intentionally. The main problem is that case sensitivity can get complex in other languages. A file system that supports unicode suddenly has to deal with that. It makes it hard to find an ideal solution.
2018-05-30 3:19 pm
galvanash
I think that is a good argument for case preserving file naming, but not for case sensitivity.
The point is that a case-insensitive/case-preserving file system must make a choice as to which locale it wants to function in, because at that point case folding becomes a mandatory operation just in order to store a file by name. Having file naming and uniqueness rules be different for different users would be a nightmare.
Case-sensitive file systems don’t have to deal with this. They can take locale into account for things like searching and sorting, but those things are in a sense UI operations, so the file system can just present them to each users according to their chosen locale (and operate in a case-sensitive manner below the hood). Its in the fundamental process of naming things where this becomes a challenge to deal with across locales, and case-sensitive file systems can happily not bother.
Case-insensitive/case-preserving file systems have to function in a system locale, because naming things (and the requisite case folding required) has impact beyond the current user on a multi-user system, and it is an operation that has to be performed even when there is no user other than the system itself (on behalf of a process for example).
What happens when a user uploads a file to a web server? Do you know what locale they operate in? Do you know what locale the user who will next look at the file operates in? Sure, you can just name the file whatever you want when stored and ignore or override the uploader’s filename on store (and that is what most systems actually do), but I’m just pointing out the challenges involved for case-insensitive file systems. All a case-sensitive file system has to do is see if a file with the exact same name exists, it doesn’t have to deal with locale until later (if it chooses to)…
Again, I’m not trying to make a strong argument either way. I’m just pointing out there are factors involved most people don’t think about.
Edited 2018-05-30 15:22 UTC

2018-05-30 5:35 pm
Alfman verbose=1
galvanash,
The point is that a case-insensitive/case-preserving file system must make a choice as to which locale it wants to function in, because at that point case folding becomes a mandatory operation just in order to store a file by name. Having file naming and uniqueness rules be different for different users would be a nightmare. [/q]
Correct me if I’m wrong, but isn’t it set once for the whole file system? I don’t think setting code page per user would be viable.
Case-sensitive file systems don’t have to deal with this. They can take locale into account for things like searching and sorting, but those things are in a sense UI operations, so the file system can just present them to each users according to their chosen locale (and operate in a case-sensitive manner below the hood).
I think you are making many good points, but at the same time you are overlooking some technical issues. When the file system semantics don’t match the UI, trivial file operations become much less efficient.
With a case sensitive file system and a case insensitive UI, it becomes impossible to open/access/refer to files by name without a full scan. For example: mydocs/myfile.txt could be MyDocs/MyFile.txt or MyDocs/Myfile.TXT or MYDOCS/MYFILE.TXT, etc. In windows, this doesn’t matter at all because the file system is already normalized and will match any case. However with a case sensitive file system you have to resort to scanning entire directory structures for each component to find potential matches. For large directories and paths, this can result in significantly more disk IO and scanning overhead.
What happens when a user uploads a file to a web server? Do you know what locale they operate in? Do you know what locale the user who will next look at the file operates in? Sure, you can just name the file whatever you want when stored and ignore or override the uploader’s filename on store (and that is what most systems actually do), but I’m just pointing out the challenges involved for case-insensitive file systems.
For security purposes, I think it’s a bad idea to allow users to control filenames in the first place. Doubly so for special unicode characters. I was shocked when I first encountered this hack, but sure enough it worked and is convincing:
https://www.howtogeek.com/127154/how-hackers-can-disguise-malicious-…
The straitforward answer is that the file system should be configured to use the same code pages as the web server and that’s that. Of course it’s worth pointing out that there are more “what-ifs” regardless of file system semantics. What if the server is configured for Latin1 and the user wants to enter some Unicode characters? What if the webpage is using UTF8 but the database is not? When it comes to locale, it’s generally the server that tells the clients what to use. We can debate whether that’s acceptable or not, but that’s really a topic unto itself.
[q]Again, I’m not trying to make a strong argument either way. I’m just pointing out there are factors involved most people don’t think about.
It certainly is easiest to stick with a binary comparison, and in the interest of simplicity it seems like the best way to go. However I am still bothered by some of the consequences, like URLs being case sensitive.
Case sensitive file systems are generally the reason why URLs are case sensitive on linux servers even though I think ideally they should not be.
Works:
http://www.osnews.com/story/30418/UTC_is_enough_for_everyone_Right_
Broken:
http://www.osnews.com/Story/30418/UTC_is_enough_for_everyone_Right_
Edited 2018-05-30 17:43 UTC

2018-05-30 5:54 pm
mkone
Case sensitive file systems are generally the reason why URLs are case sensitive on linux servers even though I think ideally they should not be.
Works:
http://www.osnews.com/story/30418/UTC_is_enough_for_everyone_Right_
Broken:
http://www.osnews.com/Story/30418/UTC_is_enough_for_everyone_Right_
And the following is OK:
http://www.osnews.com/story/30418/UTC_IS_ENOUGH_FOR_EVERYONE_RIGHT_
That is just ridiculous in my opinion. I know this is an edge case, but it does not make any sense at all that you can’t capitalise “story” but you can capitalise just about any other part of the URL.
2018-05-30 6:35 pm
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!)
And the following is OK:
http://www.osnews.com/story/30418/UTC_IS_ENOUGH_FOR_EVERYONE_RIGHT_
That is just ridiculous in my opinion. I know this is an edge case, but it does not make any sense at all that you can’t capitalise “story” but you can capitalise just about any other part of the URL. [/q]
As a web developer, I can say that what you’re looking at is only tangentially related to case-sensitivity in filesystems.
URL routes are defined using pattern matches (typically regular expressions) against the path portion of the URL as provided by the HTTP server and OSNews is using something like this:
[q]^story/(\d+)(/.*)?$
As you might have guessed from that, the portion between the initial / and the ?, #, or end of the URL can be any string and making it look like a hierarchical path is merely convention.
The “story/” is a literal character-sequence match, the first capture group, which only matches one or more digits, is used as a primary key for a database lookup and the second capture group, which matches / followed by zero or more of anything, is probably ignored, aside from being filled from a post slug column when generating links in order to make more human-friendly URLs.
The following URLs also work:
http://www.osnews.com/story/30418
http://www.osnews.com/story/30418/THIS_CAN_BE_ANYTHING!
http://www.osnews.com/story/30418/C:\WINDOWS\SYSTEM\BLANK.SCR
(Though Firefox does a \ to / normalization on it before loading it.)
Edited 2018-05-30 18:49 UTC
2018-05-30 6:59 pm
galvanash
Correct me if I’m wrong, but isn’t it set once for the whole file system? I don’t think setting code page per user would be viable. [/q]
I’m sure for some file systems it is, but I don’t know any modern ones offhand. On Windows the codepage just changes a few system fonts, it has no effect on the underlying filesystem (or am I wrong about that?).
Afaik NTFS, HFS, EXT2, etc all have baked in coallation and naming rules, the system codepage doesn’t affect them. I know NTFS can set coallation rules per directory, but I don’t know when or how this feature is really used. Regardless, everything is finally moving to unicode where codepages mostly don’t matter, so its not really an issue with codepages anymore and more just an issue about how the user’s locale interacts with things.
With a case sensitive file system and a case insensitive UI, it becomes impossible to open/access/refer to files by name without a full scan.
Oh definitely. I would never suggest it work that way. I was specifically talking about offering case insensitive search where you can return multiple results, not unique indexes like you need to find a file entry. Just because your file system is case-sensitive doesn’t mean you can’t search it ignoring case. Most operating systems provide search features above the file system level using separate indexes, that is what I was talking about. Same goes for sorting things, you can easily provide locale specific sorting at the user level if you need to. Its more expensive of course, but you can make it cheaper by providing secondary indexes for it where it matters.
[q]It certainly is easiest to stick with a binary comparison, and in the interest of simplicity it seems like the best way to go. However I am still bothered by some of the consequences, like URLs being case sensitive.
That’s the thing though… URLs are not (necessarily) case sensitive, the web servers are. Its not even really an issue with the file system being case-sensitive, its just that the web server doesn’t bridge the gap properly. Apache fixed this ages ago, and most modern web servers handle it fine (at least optionally). It very slightly impacts the response time on the first request to a static file, but once it has been served once its pretty much invisible from a performance perspective.
Again, I have no issue whatsoever with how Windows and OSX handle case-insensitivity. I get it, it works fine in most cases, and it is less confusing for users in many regards. Just playing devil’s advocate.
Edited 2018-05-30 19:03 UTC
2018-05-30 10:55 pm
Alfman verbose=1
galvanash,
Oh definitely. I would never suggest it work that way. I was specifically talking about offering case insensitive search where you can return multiple results, not unique indexes like you need to find a file entry. Just because your file system is case-sensitive doesn’t mean you can’t search it ignoring case. Most operating systems provide search features above the file system level using separate indexes, that is what I was talking about. Same goes for sorting things, you can easily provide locale specific sorting at the user level if you need to. Its more expensive of course, but you can make it cheaper by providing secondary indexes for it where it matters. [/q]
Well, this is where linux and windows differ at the file system level. The file system’s own native indexes need to have a policy on case sensitivity. Unless you intend to use extra indexes on top of the native file system, then case matters within the file system and not merely in the UI.
To mimic case insensitive, one could store all files with a “strlower()” function. This way “MyDoc.txt” and or “MYDOC.TXT” will always map to “mydoc.txt” in the file system and can be retrieved without a full search. However we loose the intended letter case. While you can use an external database to fix this limitation on linux, windows allows us to keep the intended letter case and uses case insensitive indexing at the same time.
That’s the thing though… URLs are not (necessarily) case sensitive, the web servers are. Its not even really an issue with the file system being case-sensitive, its just that the web server doesn’t bridge the gap properly. Apache fixed this ages ago, and most modern web servers handle it fine (at least optionally). It very slightly impacts the response time on the first request to a static file, but once it has been served once its pretty much invisible from a performance perspective.
Disagree. The overhead of case insensitive searches on top of a case sensitive file system grows with the size of a directory.
I timed this myself on a production server with 1.2M unique files.
Direct file access (uncached):
real 0m0.006s
Full directory scan (uncached):
real 0m4.686s
Full directory scan (cached):
real 0m2.332s
Accessing files directly is virtually instant, yet if I have to scan directories for potential file matches of differing case, the overhead of doing that in user space is ugly.
With apache, mod_speling is used to do this and you can confirm via the source code that a single file request ends up amplifying into a full directory scan in order to achieve a case insensitive match. Whether or not people care, this is a grossly inefficient way to achieve case insensitivity!
https://github.com/apache/httpd/blob/trunk/modules/mappers/mod_speli…
With nginx devs aimed to be as lean as possible. They got rid of all the fat that makes other web servers slower. Consequently the case sensitivity of nginx websites gets exposed as is. Of course people aren’t happy with this, especially when porting a website from a windows system with case insensitive urls. It’s to the point where some recommend hosting files over a samba share or case insensitive file system mount to work around app level case sensitivity issues.
https://serverfault.com/questions/825107/migrating-from-iis-to-nginx
https://unix.stackexchange.com/questions/32467/case-insensitive-file…
Without saying whether this is a good or bad solution, I just wanted to highlight some of the consequences of case sensitivity in file systems.
[q]Again, I have no issue whatsoever with how Windows and OSX handle case-insensitivity. I get it, it works fine in most cases, and it is less confusing for users in many regards. Just playing devil’s advocate.
There may not be a one size fits all solution, being able to configure it is probably the best that we can hope to do. While it’s not a file system, I like the way mysql handles it, allowing each table and column to set character sets and collation independently in whatever ways are needed by the application.
Edited 2018-05-30 23:06 UTC
2018-05-31 12:08 am
galvanash
Well, this is where linux and windows differ at the file system level. The file system’s own native indexes need to have a policy on case sensitivity. [/q]
I’m not really that familiar with linux file systems, but NTFS is neither case-sensitive or case-insensitive at the file system level. What it does is let you whether a particular file should be stored using a ci or cs index upon file creation, so it is actually implemented at the file/directory level. It doesn’t use proper case folding to do this though, it is just straight 1-byte ASCII type conversion (it treats special characters as distinct at all times, which was my original point of contention because this is basically the same thing as the file system being hard coded to English US).
On Windows, because the win32 api is inherently case-insensitive (mostly because of historic baggage, but that is enough of a reason to be honest), it just defaults all files created using the API to flagged as ci and everything works as expected.
This works perfectly fine until you mix things up, which it doesn’t really stop you from doing It doesn’t work well unless you have something up higher in the call chain deciding which type of index to use in a logical manner (which on Windows is a single registry entry determining the default behavior). If two file exist on Windows with the same name (differing only in case) generally things will work fine until a open call happens using the wrong flag…
This isn’t a new feature (which the article incorrectly indicates). It has been this way since Windows 2000… The new feature is a command line tool to allow users to flip the flag (was always possible with low level NT api calls, but I don’t think the win32 api could do it), primarily for better compatibility for directories used primarily by the linux subsystem I assume.
[q]Disagree. The overhead of case insensitive searches on top of a case sensitive file system grows with the size of a directory. I timed this myself on a production server with 1.2M unique files.
Fair enough. I expected it would have less of an impact, even on that large a directory… I just never run into this problem though, static file serving has become almost a non-issue in my work (i.e. it so rarely happens that it just doesn’t matter much). With CDNs caching everything, my servers rarely even see them to be honest, and when they do I only have a few hundred files to scan at most. But yeah, I can see it becoming an issue with millions of files…
Edited 2018-05-31 00:26 UTC
2018-05-31 2:12 am
Alfman verbose=1
galvanash,
I’m not really that familiar with linux file systems, but NTFS is neither case-sensitive or case-insensitive at the file system level. What it does is let you whether a particular file should be stored using a ci or cs index upon file creation, so it is actually implemented at the file/directory level. It doesn’t use proper case folding to do this though, it is just straight ASCII type conversion (it treats special characters as distinct at all times, which was my original point of contention because this is basically the same thing as the file system being hard coded to English US). [/q]
I think our discussion is making progress, albeit in a long winded way
You could be right, I don’t know if windows is doing full/proper case folding in file systems.
https://www.w3.org/International/wiki/Case_folding
Nevertheless, that’s an implementation detail, one that doesn’t affect the design requirements of a case insensitive file system in principal. I agree that NTFS’s case sensitivity can be toggled on and off, I assume this is the reason you say “NTFS is neither case-sensitive or case-insensitive at the file system level”? The thing is, when case insensitivity is enabled, it does change the indexing behavior at the file system level by applying case-normalization to filenames before they get indexed. Consider that if the raw values got indexed without case normalization first, then it would be impossible to find matching files without doing a full scan of the index.
Pretend this is just a tiny subset of an enormous B+tree index containing millions of entries:
touch “AB file.txt”
touch “AB random1.txt”
touch “Ab file.txt”
touch “Ab random2.txt”
touch “aB file.txt”
touch “aB random3.txt”
touch “ab file.txt”
touch “ab random4.txt”
…
In a case sensitive file system, the index should contain all entries…
”AB file.txt”
“AB random1.txt”
“Ab file.txt”
“Ab random2.txt”
“aB file.txt”
“aB random3.txt”
“ab file.txt”
“ab random4.txt”
However in a case insensitive file system, we need all of the “ab file.txt” variants to collide with each other and only produce one single file, which means only one of these is allowed to exist…
”AB file.txt” <- *
“AB random1.txt”
“Ab file.txt” <- *
“Ab random2.txt”
“aB file.txt” <- *
“aB random3.txt”
“ab file.txt” <- *
“ab random4.txt”
If we index the raw filename as is with no change, that would mean we don’t know where to locate the file in the index. Depending on case variations, it could be found at the beginning, middle, or end, if it is present at all. So not applying case normalization implies a full scan is needed to find case insensitive collisions/matches.
However applying case normalization prior to indexing removes all ambiguity about where the files are in the index and eliminates the need for a file scan.
”ab file.txt” <- *
“ab random1.txt”
“ab random2.txt”
“ab random3.txt”
“ab random4.txt”
For me, this is the main difference between case sensitive file systems and case insensitive file systems. It’s not a particularly complex change, but I still consider it an important difference.
In an oracle database, case insensitive indexes are used explicitly with the use of function based indexes:
http://www.dba-oracle.com/t_oracle10g_release_2_case_insensitive_se…
[q]
create index upper_full_name on customer ( upper(full_name));
select full_name from customer
where upper(full_name) = ‘DON BURLESON’;
This is more powerful than one might realize at first glance because you can index the result of almost any function, including those you write! It would be awesome to have a similarly powerful indexing capability inside of file systems!
2018-05-31 3:04 am
galvanash
When I said “at the file system level” I just meant that it was not a property of the volume as a whole, it can be toggled per file/directory. On the other hand, it is my understanding that on OSX/APFS case-sensitivity is actually a volume property that gets baked in on format.
Anyway, don’t disagree with anything. For a good example of the pain required to do proper case folding/unicode normalization, read up on the development of APFS. This guys blog covers a lot of it and how it has slowly progressed in the last couple of years to being usable.
https://eclecticlight.co/2017/07/05/high-sierra-and-filenames-apple-…
https://eclecticlight.co/2017/04/06/apfs-is-currently-unusable-with-…
https://eclecticlight.co/2017/04/07/apfs-and-macos-10-13-many-apps-a…
Edited 2018-05-31 03:21 UTC
2018-05-31 4:49 am
oiaohm
Well, this is where linux and windows differ at the file system level. The file system’s own native indexes need to have a policy on case sensitivity.
I’m not really that familiar with linux file systems, but NTFS is neither case-sensitive or case-insensitive at the file system level. [/q]
The reality here NTFS as a file system is case-sensitive. Its operating system features on top that present file names as case-insensitive.
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\obcaseinsensitive
The above value set to 0 instead of the default 1. Shows the raw NTFS file system nature. Nothing on file system changes when you switch that value between 0/1 the value purely makes the kernel code perform tasks on the file system so to users think the file system is case insensitive if the value is 1.
This is why when Windows has case insensitive off you get a performance boost.
Also when you turn case insensitive off on Windows NT based operating systems(NT3-Windows 10) your file system perform improves for file look ups.
There is a overhead to doing case insensitive as this requires extra processing.
[q]The new feature is a command line tool to allow users to flip the flag (was always possible with low level NT api calls, but I don’t think the win32 api could do it), primarily for better compatibility for directories used primarily by the linux subsystem I assume.
If the kernel mode case insensitive is off then win32 api is fully case sensitive. It basically a big myth that Windows is not case sensitive. Windows NT line of operating systems is case sensitive but before windows 10 new feature it was turn case insensitively on and off system wide or use NT direct functions. Reason why the NT direct function could handle case sensitive is if you had created files with case insensitively off and they were double ups ie hello.txt and Hello.txt in the same directory both files in insensitively would not be readable to win32 applications.
There is a common mistake by windows application developers incorrectly presuming windows API is always in case insensitivity mode and this causes problems when you are attempting to tweak a windows serve for performance.
Really from a performance point of view you don’t want application code demanding case insensitivity. For user created documents there might be a reason for case insensitivity.
2018-05-31 2:28 pm
Alfman verbose=1
oiaohm,
The reality here NTFS as a file system is case-sensitive. Its operating system features on top that present file names as case-insensitive.
The reality here NTFS as a file system is case-sensitive. Its operating system features on top that present file names as case-insensitive.
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\obcaseinsensitive
The above value set to 0 instead of the default 1. Shows the raw NTFS file system nature. Nothing on file system changes when you switch that value between 0/1 the value purely makes the kernel code perform tasks on the file system so to users think the file system is case insensitive if the value is 1. [/q]
This is provably wrong, the collation (organization) of the index changes. While the linux code for NTFS is feature incomplete and doesn’t support writing because of missing filename collation…
https://wiki.archlinux.org/index.php/NTFS-3G
https://sourceforge.net/p/linux-ntfs/mailman/linux-ntfs-dev/?viewmon…
…we do have a complete NTFS implementation for fuse, which is installed in ubuntu and other distros. If you take a close look you can see exactly how collation is used within the NTFS file system. You are right that it can be turned on and off, but you are wrong that “Nothing on file system changes when you switch that value between 0/1 the value purely makes the kernel code perform tasks on the file system so to users think the file system is case insensitive” and that “Its operating system features on top that present file names as case-insensitive.”
For one, it changes filename comparisons within NTFS to be case insensitive (depending on the flag).
https://www.tuxera.com/community/open-source-ntfs-3g/
BOOL ntfs_names_are_equal(const ntfschar *s1, size_t s1_len,
const ntfschar *s2, size_t s2_len,
const IGNORE_CASE_BOOL ic,
const ntfschar *upcase, const u32 upcase_size)
{
if (s1_len != s2_len)
return FALSE;
if (!s1_len)
return TRUE;
if (ic == CASE_SENSITIVE)
return ntfs_ucsncmp(s1, s2, s1_len) ? FALSE: TRUE;
return ntfs_ucsncasecmp(s1, s2, s1_len, upcase, upcase_size) ? FALSE:
TRUE;
}
For another, the flag changes the collation order that NTFS uses through an upper case mapping table defined by windows.
int ntfs_names_full_collate(const ntfschar *name1, const u32 name1_len,
const ntfschar *name2, const u32 name2_len,
const IGNORE_CASE_BOOL ic, const ntfschar *upcase,
const u32 upcase_len)
{
u32 cnt;
u16 c1, c2;
u16 u1, u2;
…
You can see the functions that generate the letter case mapping tables.
[q]void ntfs_upcase_table_build(ntfschar *uc, u32 uc_len)
{
struct NEWUPPERCASE {
unsigned short first;
unsigned short last;
short diff;
unsigned char step;
unsigned char osmajor;
unsigned char osminor;
} ;
/*
* This is the table as defined by Windows XP
*/
static int uc_run_table[][3] = { /* Start, End, Add */
{0x0061, 0x007B, -32}, {0x0451, 0x045D, -80}, {0x1F70, 0x1F72, 74},
…
There’s a lot more interesting tidbits, but suffice it to say that NTFS does support case insensitivity at the file system level! It’s not merely something added on top of a case sensitive file system. If you read my earlier posts it should become clear why adding case insensitive interface on top of a case sensitive file system doesn’t work too well.
Edited 2018-05-31 14:37 UTC
2018-05-31 3:04 pm
oiaohm
oiaohm,
The reality here NTFS as a file system is case-sensitive. Its operating system features on top that present file names as case-insensitive.
The reality here NTFS as a file system is case-sensitive. Its operating system features on top that present file names as case-insensitive.
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\obcaseinsensitive
The above value set to 0 instead of the default 1. Shows the raw NTFS file system nature. Nothing on file system changes when you switch that value between 0/1 the value purely makes the kernel code perform tasks on the file system so to users think the file system is case insensitive if the value is 1.
This is provably wrong, the collation (organization) of the index changes. While the linux code for NTFS is feature incomplete and doesn’t support writing because of missing filename collation… [/q]
https://msdn.microsoft.com/en-us/library/dn410382.aspx
Sorry no collation processing has nothing todo with if case insensitive is on or not with proper Microsoft NTFS drivers.
\$UpCase Table of uppercase characters used for collating
Collating order of indexs of NTFS are defined by what is in this file. So absolutely nothing to-do with case insensitive status in the Microsoft Windows NTFS drivers.
For another, the flag changes the collation order that NTFS uses through an upper case mapping table defined by windows.
This is true for totally broken clones of NTFS. Windows NTFS driver does not behave this way. Why because having to rewrite the indexs just because you have turned case sensitivity on or off is just pure insanity.
[q]There’s a lot more interesting tidbits, but suffice it to say that NTFS does support case insensitivity at the file system level!
Everything you quoted is not really written to the NTFS disc by Windows just way open source implementation has decide to-do it.
The NTFS file system itself being the disc format is case sensitive. Ordering of file entries on disc has capitals ahead of lower case. So Abc comes ahead of abc. This is important you turn case insensitivity on with windows when you had it off it create Abc and abc result “Abc” opens and “abc” is hidden. You turn it back off and both file appear again. Windows NTFS case insensitivity is basically open the first file that matches in the index if the file was all caps.
If your driver with NTFS is changing the ordering of things based on case insensitive or case sensitive is broken. This does bring out different bugs.
2018-05-31 3:27 pm
Alfman verbose=1
oiaohm,
Your statements were proven wrong, if you have another point to make, then make it, otherwise I’m not interested in arguing over facts.
Edited 2018-05-31 15:30 UTC
2018-05-31 4:49 pm
malxau
Hi, I used to work on the Windows NTFS driver.
oiaohm’s comments are roughly correct. NTFS collates indexes case insensitively, then applies case sensitive matching after case insensitive matching. This means that “a” comes before “B” in a directory, for example. And I’d wholeheartedly agree with his comment that “having to rewrite the indexs just because you have turned case sensitivity on or off is just pure insanity”; obviously if you can change a registry key and reboot, and your system does in fact boot successfully, then that tells you about how NTFS collation here works. You didn’t need to reformat or rewrite all the indexes, because the current (insensitive) indexes are correct for case sensitive mode. If other implementations do something different for case sensitive behavior, they won’t interop well with Windows. Every index collation needs to go through $Upcase.
There are two things I’d disagree with oiaohm on though.
First, I don’t think the system will be more efficient in case sensitive mode. The above behavior implies it will be somewhat less efficient, because every lookup is always insensitive first (since the trees are collated insensitively), but if a case sensitive lookup is requested there’s additional work to look through the case insensitive matches for a case sensitive match. In case insensitive mode, any match is a valid match, so there’s no need to perform additional case sensitive compares.
Second, the behavior of obcaseinsensitive is a poorly understood mess. Pre-XP, the NT API is case sensitive by default, but individual opens request case insensitive behavior via OBJ_CASE_INSENSITIVE. Win32 generally requests insensitive behavior, unless FILE_FLAG_POSIX_SEMANTICS is specified, where it requests case sensitive behavior. obcaseinsensitive, added in XP, overrides the NT API and forces _all_ opens to be case insensitive. So changing the registry key doesn’t make the system case sensitive; it makes NT callers using default options case sensitive and it allows Win32 applications to request case sensitive semantics. After changing the key, most Win32 applications will be case insensitive, and the behavior of NT callers is as good as random.
Connecting these back to the earlier point though, note that case sensitive behavior is a per-open request. That also tells you that directory collation order isn’t going to change as a result of case sensitivity.
2018-05-31 9:36 pm
Alfman verbose=1
malxau,
oiaohm’s comments are roughly correct. NTFS collates indexes case insensitively, then applies case sensitive matching after case insensitive matching. This means that “a” comes before “B” in a directory, for example. And I’d wholeheartedly agree with his comment that “having to rewrite the indexs just because you have turned case sensitivity on or off is just pure insanity”; obviously if you can change a registry key and reboot, and your system does in fact boot successfully, then that tells you about how NTFS collation here works. You didn’t need to reformat or rewrite all the indexes, because the current (insensitive) indexes are correct for case sensitive mode. If other implementations do something different for case sensitive behavior, they won’t interop well with Windows. Every index collation needs to go through $Upcase.
Yes I agree that the filename index is always collated with case insensitivity. The thing about NTFS is that it is a super-set of both case sensitive and case insensitive file systems. Can we all agree on that?
One of the consequences of this is that unlike purely case sensitive file systems like ext3, the physical organization of NTFS indexes on disk is dependent upon unicode international case mappings. Different versions of windows have updated this mapping over time to accommodate unicode additions. This is NOT a property of case sensitive file systems. Ext3 doesn’t even consider what characters mean according to unicode, it’s just bytes. The NTFS indexes on the other hand have a hard dependency on the meaning of letters from the unicode standard. Future letter code points would require updates to the mapping used by NTFS, Agreed?
This is why I am not going to agree with oiaohm that NTFS is a case sensitive file system in the same sense that ext3 or other file systems are. NTFS is a hybrid who’s disk structure depends on letter cases.
Edit:
I never had access to the windows source code like you, so I’d be curious if you have any opinions on how the open source NTFS driver got things right or wrong? I’m guessing all the open source code came about through reverse engineering.
Edited 2018-05-31 21:55 UTC
2018-06-01 1:13 am
malxau
malxau,
[T]he physical organization of NTFS indexes on disk is dependent upon unicode international case mappings. Different versions of windows have updated this mapping over time to accommodate unicode additions. This is NOT a property of case sensitive file systems. Ext3 doesn’t even consider what characters mean according to unicode, it’s just bytes. The NTFS indexes on the other hand have a hard dependency on the meaning of letters from the unicode standard. Future letter code points would require updates to the mapping used by NTFS, Agreed?
[/q]
Roughly, yes. Each NTFS volume has a $Upcase file that specifies the mapping in use by that volume. Volumes formatted on different systems will be collated using a different version of the unicode mapping table. Indeed, it’d be a valid NTFS volume that has an $Upcase table where nothing is upcased, and the result would collate like ext3. But note the effect of this is the directories are always collated in a fixed way that will not change for the lifetime of the volume (without major surgery.)
[q]I’d be curious if you have any opinions on how the open source NTFS driver got things right or wrong? I’m guessing all the open source code came about through reverse engineering.
I have the opposite problem, I don’t want to look at this particular piece of open source code, which makes comparison difficult. What I’m hearing though, which your earlier posts appeared to imply, is that it’s not applying $Upcase from the volume at all and is assuming the current system Unicode table matches the collation order on the volume. This would be wrong/dangerous, but without carefully looking at the code, I’m not certain that’s what’s happening.
2018-06-01 1:58 am
Alfman verbose=1
malxau,
Roughly, yes. Each NTFS volume has a $Upcase file that specifies the mapping in use by that volume. Volumes formatted on different systems will be collated using a different version of the unicode mapping table. Indeed, it’d be a valid NTFS volume that has an $Upcase table where nothing is upcased, and the result would collate like ext3. But note the effect of this is the directories are always collated in a fixed way that will not change for the lifetime of the volume (without major surgery.)
The fuse NTFS implimentation also loads the $upcase map on existing volumes, which makes sense for data integrity, although it means we have to reformat NTFS volumes to incorporate any unicode updates.
2018-06-02 1:23 pm
zima
I have the opposite problem, I don’t want to look at this particular piece of open source code
So it won’t “contaminate” you?
2018-06-01 1:55 am
oiaohm
Yes I agree that the filename index is always collated with case insensitivity. The thing about NTFS is that it is a super-set of both case sensitive and case insensitive file systems. Can we all agree on that? [/q]
I would not agree about this. NTFS is a case sensitive file system end story. If it was a case insensitive file system perform a case sensitive look up would be impossible.
[q]One of the consequences of this is that unlike purely case sensitive file systems like ext3, the physical organization of NTFS indexes on disk is dependent upon unicode international case mappings. Different versions of windows have updated this mapping over time to accommodate unicode additions. This is NOT a property of case sensitive file systems. Ext3 doesn’t even consider what characters mean according to unicode, it’s just bytes. The NTFS indexes on the other hand have a hard dependency on the meaning of letters from the unicode standard. Future letter code points would require updates to the mapping used by NTFS, Agreed?
This is not a unique feature of case insensitive file systems. You do find case sensitive file systems ordering there look-ups based on many different rules.
There is a downside to this and this downside explains why its not a popular feature. When unicode mappings get updated/ordering rules change the data on disc can now now be wrong. Please note fat that is a pure case insensitive file system does not do ordering. Ordering is a look-up optimisation.
Most of the Unix/Linux/BSD file systems don’t both with look-up optimisations that can change over time due to trouble it can bring. Please note I said most you do see rare ones that have done ordering based on unicode and other things.
There was an attempt to added unicode ordering to xfs and unicode case insensitivity this was dropped due to the multi levels of broken.
As people have stated depending on what langauge you are if a letter in unicode is a uppercase or lowercase in fact change. If you are writing a file system to have a constant disc format between different users with different languages you cannot do case insensitive with Unicode.
Letter upper/lower case categories of unicode standard is only rough guide. Yes it mostly is this a large written char(uppercase) or a small written char(lowercase) in the unicode standard. Not that it makes any sense for the language the person is using. So when you have a language that the small written char has a different meaning to the large written char the unicode upper/lower case screws you over.
Windows NTFS usage does not magically fix the broken.
https://www.fileformat.info/info/unicode/category/Ll/list.htm
This is the lower case list of unicode. Have a good look at how many times you have a letter “a” with a different unicode number. Yes even case folding upper and lower you are still left with a headache and a half.
Lot of the problem starts with the introduction of the printing press and the standard char-sets. This resulted in upper chars being abused as lower in particular languages mostly to avoid the book production house requiring to make a unique set of letters for different languages. So a historic short cut comes forwards today.
2018-06-01 2:02 am
Alfman verbose=1
oiaohm,
NTFS is a case sensitive file system end story. If it was a case insensitive file system perform a case sensitive look up would be impossible.
NTFS clearly is not a pure case sensitive file system nor is it a pure case insensitive file system. It is a hybrid built around unicode case mappings with properties of both. If you want to disagree over semantics then so be it, let’s agree to disagree.
Edited 2018-06-01 02:15 UTC
2018-06-01 1:13 am
oiaohm
Pre-XP, the NT API is case sensitive by default, but individual opens request case insensitive behavior via OBJ_CASE_INSENSITIVE. Win32 generally requests insensitive behavior, unless FILE_FLAG_POSIX_SEMANTICS is specified, where it requests case sensitive behavior. obcaseinsensitive, added in XP, overrides the NT API and forces _all_ opens to be case insensitive.
There is a older non documented flag in first version of windows NT that does from NT to 2000 to force case insensitive off. It is still a on/off flag. XP on provide it standard in a different location in registry just to be fun. Yes it was a win32 subsystem flag.
It is in fact required because there is a nice little issue with case insensitive due to the upper and lower case being bases on local setting not written to early NTFS so you had a drive from a different country/locale setting install of Windows and you could have problems seeing the files. So work around for this was always with NT turn everything case sensitive. XP the work around was formalised. But I do agree Windows implementation of case sensitive and case insensitive can be highly random at times what happens.
You do see overhead on NTFS when applications are stupid and don’t keep their case constant. Yes case insensitive is on by default but the fast path is always the case sensitive look up. If an application has it file names wrong it will be costing you performance in case insensitive mode and fails when you are in case sensitive mode. This is why people developing applications really check in case sensitive mode if their program bites it there is a problem.
2018-06-01 2:47 am
malxau
It is in fact required because there is a nice little issue with case insensitive due to the upper and lower case being bases on local setting not written to early NTFS so you had a drive from a different country/locale setting install of Windows and you could have problems seeing the files. [/q]
I’m skeptical about this. I see the Upcase file being written to disk as early as 1992, before the first release of NT. I think the real issue in the whole design is that an application cannot call stricmp (using whatever Unicode table/locale is in effect for that application) and assume that NTFS will resolve things the same way. However, this is something applications often want/need to do, which makes writing “correct” code all but impossible.
XP the work around was formalised. But I do agree Windows implementation of case sensitive and case insensitive can be highly random at times what happens.
The XP change was mainly a security change to avoid squatting attacks where insensitive openers could be tricked into opening an attacker’s file.
[q]You do see overhead on NTFS when applications are stupid and don’t keep their case constant. Yes case insensitive is on by default but the fast path is always the case sensitive look up.
I don’t believe this is true and this thread has gone into a lot of detail as to why. A file called “a” is collated before “B”. If an open for a file on disk that is called “a” arrives as “a”, it still needs to be upcased to “A” to locate the file in the index. The fact that case matches won’t make the index lookup any faster, because the index lookup isn’t based on the original case of the file. The only time a case sensitive compare needs to occur is to open a file in case sensitive mode – ie., this is purely a performance tax on case sensitive operations. I’d really love to see a benchmark that shows case sensitive being faster, because all the code I’ve seen will not behave that way.
I could believe there is other code in the universe that has such a “fast path” (eg. a case sensitive hash table based cache) which causes case mismatch to trigger a slow path, but that is not code in NTFS.
2018-06-01 7:28 pm
galvanash
That’s why I love discussions like this on osnews… I felt like I knew a bit about how NTFS worked, only to find out my knowledge only scratched the surface.
Thanks for all the info everyone.

2018-05-30 5:48 pm
mkone
I think that is a good argument for case preserving file naming, but not for case sensitivity.
The point is that a case-insensitive/case-preserving file system must make a choice as to which locale it wants to function in, because at that point case folding becomes a mandatory operation just in order to store a file by name. Having file naming and uniqueness rules be different for different users would be a nightmare.
I am possibly quite ignorant about this, but I wasn’t suggesting that the filesystem shouldn’t be case sensitive “behind the scenes”. Only that the user shouldn’t have to deal with case sensitivity in the filesystem that they are exposed to. I would add that even programmers should not have to deal with case sensitive filesystems. Therefore, having a file called “Names.txt” in a folder should preclude having a file called “names.TXT” in the same folder – the OS should prevent that. Therefore the uniqueness rules would be the same for all users. I think users can handle the OS telling them that they can’t use that filename because there is already another filenames with the same (or very similar) filenames.

2018-05-30 6:27 pm
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!)
I am possibly quite ignorant about this, but I wasn’t suggesting that the filesystem shouldn’t be case sensitive “behind the scenes”. Only that the user shouldn’t have to deal with case sensitivity in the filesystem that they are exposed to. I would add that even programmers should not have to deal with case sensitive filesystems. Therefore, having a file called “Names.txt” in a folder should preclude having a file called “names.TXT” in the same folder – the OS should prevent that. Therefore the uniqueness rules would be the same for all users. I think users can handle the OS telling them that they can’t use that filename because there is already another filenames with the same (or very similar) filenames.
The problem people keep covering is that different locales have different rules for which glyphs form lowercase-uppercase pairs, and, among other things, different users on the same system may have selected different locales.
The most famous example being that, in most locales, i and I are a lowercase-uppercase pair but, in Turkish, they aren’t because Turkish adds a dotless “i” and a dotted “I”.
Imagine if a Turkish person had “i.txt” and “I.txt” and they were allowed, because they’re not lowercase/uppercase variations on each other in Turkish, then you log in with another locale.
…or do you think Turkish people should be confusingly forced to obey another language’s case-folding rules?
2018-05-31 12:56 am
mkone
Imagine if a Turkish person had “i.txt” and “I.txt” and they were allowed, because they’re not lowercase/uppercase variations on each other in Turkish, then you log in with another locale.
…or do you think Turkish people should be confusingly forced to obey another language’s case-folding rules?
It’s not about Turkish people being forced to obey another language’s case folding rules. It’s about a computer making certain things unambiguous for the user. Again, I am not an expert in this, but I can’t see why you couldn’t have a glyph replacements for a Turkish locale so that the dotted i is the i we know and love in the “west”, while the dotless i is a different letter completely. However, in that case, the dotted i is never capitalised to the dotless i and the dotless i is never “lower cased” to the dotted i.
2018-05-31 7:06 am
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!)
It’s not about Turkish people being forced to obey another language’s case folding rules. It’s about a computer making certain things unambiguous for the user. Again, I am not an expert in this, but I can’t see why you couldn’t have a glyph replacements for a Turkish locale so that the dotted i is the i we know and love in the “west”, while the dotless i is a different letter completely. However, in that case, the dotted i is never capitalised to the dotless i and the dotless i is never “lower cased” to the dotted i.
But then how do you implement your case-insensitive filesystem semantics if two different users of the same filesystem are using locales with contradictory interpretations of characters in filenames?

2018-05-30 3:59 am
oiaohm
https://superuser.com/questions/266110/how-do-you-make-windows-7-ful…
Reality is all versions Windows from the NT line using NTFS have had the ability to turn case sensitive on by turn case insensitivity off.
It is fun with wine to see how many windows applications the code has build and only tested with case insensitivity on.
Yes the fun problem is not all windows programs are able to work on case sensitive file systems due to programmer coding issues. Some of these can be nice never ending loops like the following
1) Create file “abc”
2) Check for file existence but using “ABC” file does not exist
3) since file does not exist was return attempt to create file “abc” fail with error because “abc” exists. Goto 2.
Yes there are windows programs out there that are that stupid. Of course they should have failed quality control if quality control was done with case insensitivity off. Maybe this change will get all applications testing with case insensitivity off.

2018-05-30 6:41 am
Drumhellar
It isn’t really fair to blame Windows programmers.
While NTFS is case-sensitive, Win32 is not. If you want case-sensitivity in your file handling, you have to step outside of Win32 and deal with NT system calls, which means your software wouldn’t be guaranteed to be portable across Windows versions.
While this isn’t really an issue now since everything is NT, this wasn’t the case when Win9x was around, and might not always be the case.

2018-05-30 7:04 am
bert64
Although prior to win95 there was no such thing as a lowercase filename, all filenames were entirely in uppercase, so rather than introduce sensible filename support they did a kludge which has resulted in further problems down the line.

2018-05-30 7:26 am
Drumhellar
No, long case-preserving filenames in Windows existed before 9x – specifically, Windows NT 3.1
And, what problems has it caused, other than “It isn’t the same as other OSs?” Because, while not being the same is an inconvenience, it isn’t the same as being a problem, at least a problem that falls squarely on Microsoft.

2018-05-30 8:07 am
Duke
Inconveniece IS a problem.
Apparently, you have your own definition of a “problem”?
2018-05-30 10:16 am
Drumhellar
What’s inconvenient for one person is convenient for another.
I want to know about any actual problems other than “It isn’t the way my preferred system does things.”
As in, what actual problems exist by the nature of being case-insensitive that aren’t merely “It isn’t case sensitive”
I find case sensitivity an inconvenience. From my perspective, the problem exists in Unix, and the Windows way of doing things solves that problem.
2018-05-30 10:46 am
Duke
By that logic, random OS crashes is merely an inconvenience. You know, they may be an inconvenience for you, but for me it provides a nice break to go make some coffee.

2018-05-30 5:46 am
Duke
I have always assumed case-sensitive filesystem to be a dumb thing from users perspective. As a user, one will always get annoyed by case-sensitivity in file/folder names. Maybe its important for programmers, but as end-user I feel its stupid to assume two file names are different just because of case even though they contain exactly the same string of letters. Does the meaning of the word change if you capitalize first letter? No? Does it change if you write it all in CAPS? No? The why the hell would you think otherwise in filenames?
Let’s take 3 cases of folder name:
1. “My Vacation Pictures”
2. “my vacation pictures”
3. “My vacation pictures”
Who could ever think these should be 3 separate folders??? That’s just… Dumb.
Edited 2018-05-30 05:48 UTC

2018-05-30 7:00 am
Kochise
It’s all about freedom. Some are used about capitalization, hence more confident with case sensitiveness.
But cannot cope with space in file/folder names and replace them with whatever else (dots, underscores, etc) It’s like the endianess war.
I don’t care at all about case in file name as like you put it, WFT is the same as wtc, the meaning is not the filename, it’s its content.

2018-05-30 7:15 am
Duke
Again, only programmers/admins/geeks care about things you mentined. I am confident that majority of regular consumers don’t give a rats ass about case sensitivity and would be very baffled if Windows (or Mac) FS became case-sensitive by default. It’s just counter intuitive and you need to learn/be conditioned to be OK with it.
2018-05-30 8:55 am
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!)
But cannot cope with space in file/folder names and replace them with whatever else (dots, underscores, etc) It’s like the endianess war.
Actually, that’s because shell script has such a screwy approach to quoting (especially the “split apart by default” handling of variables), some other Unixy things inherited that mistake via the system(3) function for executing subprocesses via strings rather than arrays, and, if you don’t account for that, you find mysterious errors in the darndest places.
(eg. My mother uses LyX as a “looks great by default” way to typeset books. She was getting a mysterious error message. On a hunch, I tried replacing the spaces in the filename with underscores. Sure enough, somewhere deep in the maze of LaTeX scripts and config files, there was a spot that couldn’t handle filenames with spaces.)
Edited 2018-05-30 09:03 UTC

2018-05-30 10:49 am
kwan_e
some other Unixy things inherited that mistake via the system(3) function for executing subprocesses via strings rather than arrays,
Well, system(3) is inherited from C. The UNIXy way is with posix_spawn.

2018-05-30 1:58 pm
Bill Shooter of Bul Platinum Prime
ARE YOU TELLING ME THAT NO ADDITIONAL INFORMATION CAN EVER BE RELAYED BY CHANGING THE CASE? SO THIS COMMENT DOESN’T SEEM LIKE I”M SHOUTING AT FULL VOLUME INTO YOUR EAR DRUMS? STRANGE. OK. HAVE A NICE DAY, AND DON’T FORGET TO GET YOUR PET SPAYED OR NEUTERED.

2018-05-31 9:55 am
mkone
ARE YOU TELLING ME THAT NO ADDITIONAL INFORMATION CAN EVER BE RELAYED BY CHANGING THE CASE? SO THIS COMMENT DOESN’T SEEM LIKE I”M SHOUTING AT FULL VOLUME INTO YOUR EAR DRUMS? STRANGE. OK. HAVE A NICE DAY, AND DON’T FORGET TO GET YOUR PET SPAYED OR NEUTERED.
Case insensitive doesn’t mean you can use upper case. It means that if that is a filename, then you can’t have a file with the name “are you telling me that no additional information can ever be relayed by changing the case? so this comment doesn’t seem like i”m shouting at full volume into your ear drums? strange. ok. have a nice day, and don’t forget to get your pet spayed or neutered” in the same folder.

2018-05-30 7:09 am
bert64
So now you can have a single filesystem where some locations are case sensitive and some are not… And nodoubt lots of software which is not case sensitive and will break when it encounters a case sensitive location.
Poor design, resulting in massive complexity which leads to bugs and security problems.

2018-05-30 7:16 am
Duke
Poor design, resulting in massive complexity which leads to bugs and security problems.
Every Windows version ever.

2018-05-30 10:05 am
Kochise
But Windows deals pretty well with spaces in file/folder names while Linux don’t because reason (see comment a bit above)

2018-05-30 10:47 am
Duke
That is also true… I am always laughing when I run into another person with this stigma of “never use spaces in file names”. I always use spaces in file names and never had a problem in my life.
2018-05-30 2:53 pm
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!)
But Windows deals pretty well with spaces in file/folder names while Linux don’t because reason (see comment a bit above)
Linux itself deals well with spaces in filenames, as does most of the infrastructure.
Heck, the vast majority of my stuff deals perfectly well when some kind of encoding or copy-paste goof results in filenames containing newlines, bytestrings that aren’t valid UTF-8, etc.
It’s just that, as with case-sensitivity on Windows, there are some UNIX/Linux applications which rely on old APIs and aren’t properly tested with spaces.
Edited 2018-05-30 14:55 UTC

2018-05-30 1:37 pm
TommyD
I tend to manage my system such that it doesn’t matter. Here are some of my general rules:
1. Don’t use spaces in file names.
2. Don’t have files that differ only by the case of the name.
3. Have a standard for naming things, just like you do in programming (don’t you??). For instance, when do you capitalize the first letter, do you use camel case, or underscores to separate names, etc. Keep it consistent.
4. Avoid anything but letters, numbers and underscores.
5. When putting a date stamp in a file name, always put yyyy_mm_dd so the files sort in a natural order.
I’ve followed these rules since my early days of computing just so I could avoid these kinds of issues. It has served me well.

2018-05-31 11:24 am
Duke
1. Don’t use spaces in file names.
This one is approaching the stage of archetypal myth where everyone knows its “wrong” but no one knows why any more. Ask any regular non-techie computer user why are they avoiding spaces and/or national letters in filenames and the only thing they will be able to tell is that using spaces in filenames “will make bad things happen”.

2018-05-31 11:29 am
TommyD
I know why. I want to make my life easier. I know I can use spaces on all major OS’s. But I am a programmer. I have specifically had issues with scripts and code that didn’t always quote the spaces correctly. So don’t say I don’t know why. I do. Just say it doesn’t matter to you.

2018-05-31 11:49 am
Duke
Maybe you should read my post again. Especially the sentence where I write “non-techie”.
Not sure about the world you are living in, but here we consider a programmer to be a pretty “techie” type.

2018-05-31 11:51 am
TommyD
Good point.

2018-05-31 11:35 am
ahferroin7
The biggest thing to understand here is that filesystems were originally case insensitive because the operating systems they were designed for were case insensitive. In other words, the VFS layer (or in some older cases, the input layer) itself did case folding, so the filesystem didn’t need to store letters in differing cases.
Properly case folding is hard however, just like Unicode normalization, and is very dependent on the language being used. The only reason those older systems had no issues is because they weren’t localized, they were English only, so they could get away with just ignoring bit 6 if bit 7 is set and the low 5 bits are between 1 and 26. In modern systems though, it leads to some rather complicated processing on every system call interacting with the filesystem, and the only practical reason that it’s still kept around is for compatibility.
Of course, if you want to go with ‘normalizing’ input, why not just store a language tag with every file and directory name, and accept all the different phonetizations and transliterations too? I mean, I’m sure it would be useful to be able to treat the Japanese word å¯Œå£«å±± the same as any of: ãµã˜ã•ã‚“, Fujisan, Huzisan, and Huzisan (it’s the same in Nihon-shiki and Kunrei-shiki romanizations), and that’s not even including wÄpuro rÅmaji, IPA phonetization, and non-romanized transliterations (using, for example, Cyrillic or Hangul).

2018-05-31 2:03 pm
oiaohm
The biggest thing to understand here is that filesystems were originally case insensitive because the operating systems they were designed for were case insensitive. In other words, the VFS layer (or in some older cases, the input layer) itself did case folding, so the filesystem didn’t need to store letters in differing cases.
Majority of file systems are in fact case sensitive on disc.
https://lwn.net/Articles/754508/
There are a few rare ones like xfs created with “case insensitive” that disc format is modified.
https://en.wikipedia.org/wiki/Comparison_of_file_systems
Do note the “Case sensitivity” and “Case preservation”
Most file systems that are Case insensitive on disc are also don’t have Case preservation. So all file names have to be all the same. So every file name could be like upper case and that would be stated in the file system specifications and this would be ODS-2 file system and others. This keeps case insensitive processing simple create a file normalise the file name done.
Yes there are true case insensitive file systems out there. Reality is most operating systems started with case sensitive. Of course the biggest exception was MS DOS with fat file system that was case insensitive and lacking case preservation.
But its less than 30 operating systems total that had case insensitive file systems as operating system file system with most of them being from Microsoft or clones of Microsoft.
So it turns out file system being case insensitive in operating history is a rarity.

2018-05-31 2:27 pm
ahferroin7
It’s a rarity though simply because the OS did case folding. For almost all applications in existence other than some really low level forensics and data recovery stuff, what matters is whether the driver for that particular filesystem does case-folding (or the VFS layer does it), not whether or not the on-disk format is case insensitive or not (strictly speaking, the on-disk format is never case insensitive unless an encoding is used that doesn’t provide case differences, only the driver is).
There’s two other things to keep in mind though:
* You’re not accounting for really old legacy stuff that used the 1963 version of ASCII, which did not have lower case letters, as well as other systems which used similarly case insensitive and non-case-preserving encodings. Handling of those encodings is why DNS and URI’s are case insensitive (though most people don’t know this because almost all browsers inexistence case-fold URI’s to lowercase), and the same goes for almost all other case-insensitive protocols.
* It doesn’t really matter if only 30 or so OS’es were case insensitive, because the ones that were were the important ones, and therefore they had the most influence on modern systems. In particular, the original Mac OS, MS-DOS and derivatives are all case insensitive, and they have largely shaped modern client system development. The only case sensitive system that had a huge impact on modern system designs was UNIX (I mean, VMS could be counted, but it’s functionally dead and most of its modern influence was on Windows).