One of the innovations that the V7 Bourne shell introduced was built in shell wildcard globbing, which is to say expanding things like
↫ Chris Siebenmann*
,?
, and so on. Of course Unix had shell wildcards well before V7, but in V6 and earlier, the shell didn’t implement globbing itself; instead this was delegated to an external program,/etc/glob
(this affects things like looking into the history of Unix shell wildcards, because you have to know to look at theglob
source, not the shell).
I never knew expanding wildcars in UNIX shells was once done by a separate program, but if you stop and think about the original UNIX philosophy, it kind of makes sense. On a slightly related note, I’m currently very deep into setting up, playing with, and actively using HP-UX 11i v1 on the HP c8000 I was able to buy thanks to countless donations from you all, OSNews readers, and one of the things I want to get working is email in dtmail, the CDE email program. However, dtmail is old, and wants you to do email the UNIX way: instead of dtmail retrieving and sending email itself, it expects other programs to those tasks for you.
In other words, to setup and use dtmail (instead of relying on a 2010 port of Thunderbird), I’ll have to learn how to set up things like sendmail, fetchmail, or alternatives to those tools. Those programs will in turn dump the emails in the maildir format for dtmail to work with. Configuring these tools could very well be above my paygrade, but I’ll do my best to try and get it working – I think it’s more authentic to use something like dtmail than a random Thunderbird port.
In any event, this, too, feels very UNIX-y, much like delegating wildcard expansion to a separate program. What this also shows is that the “UNIX philosophy” was subject to erosion from the very beginning, and really isn’t a modern phenomenon like many people seem to imply. I doubt many of the people complaining about the demise of the UNIX philosophy today even knew wildcard expansion used to be done by a separate program.
Mandatory prof Kernighan on the whole one-program-one-action thing and very cool pipelining demos:
https://www.youtube.com/watch?v=tc4ROCJYbm0
I strongly recommend his UNIX: A History and a Memoir. Tons of tales about how specific UNIX utilities came to be.
The UNIX philosophy is alive for the ones who want it. Tons of utilities remain alive and maintained in Linux and the BSDs.
Case in point – I write some quite complex automations for Azure in Bash and I prefer to use curl or wget and the REST APIs vs. Azure CLI.
Azure CLI is so heavy and so slow and eats so much RAM that it actually using plain REST saves me from requiring more RAM in the machines that run the automations.
I can’t find the article now but, a year or so ago, I went through a blog post and the author claims having done faster (an order of magnitude) parse of a huge (few TB) dataset using basic awk, sort than natively via SQL.
Sometimes I prefer to use the basic tools because I know they will be there. Not everyone using my scripts can install applications, so I know I can count on awk being installed, but I can’t always count on jq. Usually, there will be curl or wget, so using REST makes my automations compatible out-of-the-box with BSD, rather than counting on the user installing Azure CLI, etc..
Thanks to the magic of the open source world and people being as different as they are, we can live the raw pure UNIX way if we want. Or count on modern combined tools if we wish. For example, I have full access to my email on my self hosted server from my HP 712, point-to-point, in my “vintage VLAN”. Different rules apply connecting from the Internet. All thanks to the power of mixing and matching UNIX tools. =)
Or… https://www.terminal.shop
There’s really for something for everyone these days!
Shiunbird,
We take it all for granted today, but in those years it was revolutionary.
Ahh, I absolutely hate shell programming, haha. I would go strait to any real programming language for all but the most trivial of commands. I don’t mind shells executing commands as they were designed to do in the beginning, but IMHO cramming programming capabilities into them has created some of the most hamfisted & awkward programming languages we have. Even simple algorithms and structures have to get translated into obtuse shell form. Just because we can doesn’t make them a good choice. But to each their own 🙂
SQL databases can be optimized. However I do find that getting data into a relational database in the first place can take time and create bottlenecks. When that’s the case external tools that compute results directly on source data can eliminate tons of data transfer/conversion overhead. If you don’t need a database or the power of SQL, then go ahead and use simpler tools instead 🙂
We actually went from a C# application doing tons of string concatenations to call Linux commands to Bash scripts, which was DEFINITELY the correct move. =)
I use Bash a lot to automate system administration tasks, and this is probably the ideal use case for it. Also, I do a lot of one liners to go through text/parse output of commands, as second nature already – tons and tons of pipes. But it’s nothing that I’d recommend to someone else. I just grew into it naturally.
It seems that we are in agreement – there is always the best tool for the job. And “best” depends on the operator, too. Sometimes better to use the solution you know best than doing a shitty job with the “theoretically ideal” solution.
We have choice these days, and computers are fast enough to run most things just fine.
Shiunbird,
Everyone’s entitled to an opinion, I say to each their own 🙂
But I can assure you that you will not convince me that bash is good for programming of anything significant. If you are only using it to call other command line tools that’s one thing, but the native programming facilities are awful and I do think the majority of programmers would feel the same way.
That’s true, computers can be fast enough to make inefficient approaches viable. However my main criticism of bash scripting isn’t the spawning tons of children and opening up lots of pipes, but rather how painful it is to implement advanced programming algorithms and structures natively in bash on it’s own merits. As a professional programmer, why would I bother using tools that are both beneath my skill set and harder to use?
I agree with you about how cool unix pipes are though.
Very true – it is painful to implement advanced algorithms.
If you need a lot of that, it is truly not the tool for the job. We are actually going to take some of our flows out of Bash in the near future.
About efficiency and “best tool for the job”, we also had an interesting situation some time ago.
We needed, at some point, a quick and dirty way to store outgoing emails coming out of our application, all running locally, to troubleshoot a bug.
I am not a good programmer by any stretch of imagination. My background is sysadmin (thus, Bash). I put together 238 lines of C in one hour, pretending to be a SMTP server, doing very basic handshake, dumping message into a file and waiting for next message. Tested, checked for memory leaks, done. The binary is 18K and it consumes 868K of memory when it runs.
We discussed this exercise with 2 other colleagues – one would use Python and the other, .NET/C#. The Python colleague completed the exercise in 15 minutes and the C# colleague took 30 minutes. The python file is smaller but takes a few seconds to load. I don’t recall the memory usage for them, but in both cases they were loading multiple email handling libraries and nothing would consume less than 100M of memory. When we went to do a mass test in Kubernetes with multiple instances, the lightweight C program helped us use less nodes than we would use otherwise and save money.
Anyway – it is like that. I love multiple options to do things and, as time goes, we specialize and, agreed, we pick our tools based on our skills and preferences. Multiple roads go to Rome.
Shiunbird,
Garbage collected languages typically use more memory, although it really shouldn’t be that bad. If I were to guess it probably wasn’t a language difference so much as a library or implementation difference. Maybe the library does a lot more than what you needed. If you had written your code solution in a language other than C, you could have likely gotten the resource consumption down.
I’m a fan of efficient programming and I also like to develop with resource optimization in mind, but I find this to be relatively uncommon. The prevailing attitude is that we shouldn’t waste developer time on optimization when hardware is cheap. The main reason this bothers me is that the costs of inefficient software get multiplied by time and deployed instances. Spending a bit more time optimizing can make a huge difference on a much bigger scale than the original developer…the problem is that from a project management standpoint short term thinking almost always prevails.
There are lots of ways to do things on almost every level, not only in terms of language and libraries, but also in terms of models. I like designing software around event oriented asynchronous programming models, which can handle thousands of event handlers with low overhead, but the problem is that many languages and even operating systems make async implementations awkward. I like C#’s native async support. Most languages don’t support this though and their libraries favor multithreading instead. Threads are quite expensive however, every thread needs a stack that imposes cache/memory/sync overhead not to mention the notoriety of race conditions.
I might be accused of over-optimizing stuff sometimes, which is fair. But I see so much code being used in production capacities with millions of instances (such as wordpress websites) that ends up performing significantly worse and I can’t help but feel that popular platforms should be better optimized. Oh well.
I prefer when the program being launched performs the expansion.only when it’s appropriate rather than the shell presuming that * inside of command line parameters should match files. IMHO DOS did this better since commands could explicitly decide if that was the appropriate thing to do, which seems superior to me.
Consider something like
Today on a unix system what this does is ambiguous and it depends on the contents of the current working directory.
If there are no files/directory matching test, then the command is executed as is printing “test*”. But if there is a file or directory matching test, then you’ll get output like “test test.c test.html” despite the fact that echo has nothing to do with files. This is just an illustrative example but consider other applications where * makes sense however the tool is meant to operate beyond the domain of the local file system, maybe the tool performs math, matches text in a file, runs sql commands, or it’s for a filespec on a remote system…
Most of the time this will execute correctly because “othermachine” does not exist as a local file, but if coincidentally or maliciously the local file does exist then it will be expanded and the wrong rsync command gets executed. This expansion can actually make things unnecessarily dangerous in some cases of user error:
Since the shell performs the expansion without cp being able to see that a destination file operand is missing, cp doesn’t know any better than to accept the expansion as a source and destination, which is wrong.
Of course I understand that the * can/should be escapped so the shell won’t replace it, but nevertheless I find such inconsistencies to be extremely annoying and this design makes for bad outcomes. I find it would be much better to let the tool decide if/when local file expansion is appropriate…alas the convention is already set in stone and all our commandline tools assume the shell is responsible for expanding wildcards. It’s way too late to fix this.
That’s what Windows still does. It’s been a source of multiple exploitable bugs because there’s no guaranteed-to-be-reliable way to provide applications with a subprocess-invoking equivalent to parameterized SQL.
I see and understand your point, but I think a proper solution would be a middle-ground where applications still receive a pre-split list of arguments and any changes in how globbing work occur in the shell.
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!),
You’d need to give me a clear example, but it’s worse when the shell tries to interpret operands when the user & process clearly doesn’t benefit from it.
Even if you split the arguments into an array from the shell, what those arguments are for shouldn’t be presumed by the shell. It’s not good that unix shells do that and the inconsistent result of expansion is downright bad design. I wouldn’t be opposed to shells that can pass well formed structured data as input to processes, this might have cool use cases, but it needs to be properly designed.
Look up CVE-2024-24576 (Rust) and other expressions of the same vulnerability.
TL;DR: Many programming languages were vulnerable. Here are two examples I could easily pull up.
https://foxoman.hashnode.dev/exploring-command-injection-vulnerabilities-in-windows-with-nim
https://blog.rust-lang.org/2024/04/09/cve-2024-24576.html
This is literally the same song and dance as trying to write secure SQL querying without parameterized SQL.
…but that has nothing to do with how arguments are passed to programs. Look at how it works when you’re using something like the subprocess module from Python’s standard library. It just takes an array that gets passed literally to the exec/spawn function and, if you want globbing or parameter expansion or whatever, you add something like shlex.split() or glob.glob().
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!),
https://nvd.nist.gov/vuln/detail/CVE-2024-24576
That is very interesting and I appreciate the example. It emphasizes the importance of consistency be it on windows and unix. This became an issue with rust and other languages precisely because the semantics were inconsistently defined & implemented. We can agree this is a bad outcome stemming from bad/inconsistent specifications.
I don’t consider it the same because SQL is well defined. SQL injection won’t happen when parameters are escaped or sent via command parameters. The problem here isn’t that the libraries aren’t used correctly, but that the libraries have no solution to deal with the broken specifications. I think we can agree that the parameter passing specifications needs to be well defined and consistent.
I disagree with this statement, but maybe we aren’t talking about the same thing. Hypothetically if the POSIX standard supported the passing of structured data in a standard way, then it absolutely needs to concern itself with how arguments are passed to programs. You can’t just say programs shall support structured data input without specifying a consistent mechanism for passing it in. A parameter specification is critical if programs are going to be able to talk to their children. I think it’s be cool to have a standard way to pass structured data back from children too.
That’s what the shell should do too. It’s quite bad that the executed command can change based on files that exist in the working directory (as illustrated by my original post).
“command parameters” (i.e. parameterized SQL) are what the DOS/Windows approach is incapable of because it’s just a single dumb string that the receiving parser defines the semantics of. (And, even on Linux, it’s not uncommon to see people layering non-standard homegrown parsers on top of the argv list they’re given.)
As for escaping, CVE-2024-24576 is literally the “I wrote my escaper based on MySQL but PostgreSQL/SQLite/Oracle/MSSQL/whatever diverges subtly from that” kind of bug that used to be not-uncommon in the bad old days.
I have two responses here.
First, I fully agree and I think the reason “the UNIX philosophy” as many people think of it is dying is because of the lack of this. Ad hoc agreements to have –json options and use the jq tool aren’t sufficient.
That’s why we’re seeing people experiment with things like PowerShell and Nushell.
Second, compared to DOS and Windows, we already have POSIX-standardized support for structured data. With DOS and Windows, it’s a dumb string. With POSIX platforms, it’s a list of strings where any confusion about its contents comes in the user-facing command parser, not the mechanism for invoking a subprocess and passing arguments to it.
In any language of note except shell scripting languages (eg. Bourne Shell), the invoking program gives a list and the invoked program receives that same list, unchanged.
But that’s the thing. That’s ENTIRELY up to the shell.
The only part that the rest of the OS even participates in is the concept of the working directory and the working directory only gets considered when searching for the binary to execute if you explicitly specify a path relative to it (the ./foo trick) or PATH contains a path relative to it and the shell invokes the subprocess with posix_spawnp(), execvp(), or execvpe() instead of posix_spawn() or execv().
The behaviours you dislike are entirely internal to the shell and vary from shell to shell and user configuration to user configuration, as demonstrated by bash options like noglob, extglob, nullglob, failglob, nocaseglob, dotglob, globstar, globasciiranges and variables like GLOBIGNORE.
`set noglob` (A.K.A. `set -f`) will turn off name expansion entirely in bash… but half the problem is that the original POSIX subset of Bourne shell has no concept of proper arrays, so you need to manage whitespace splitting.
It’s a shell issue, not an exec/spawn issue.
Bourne shell just happened to get designed on the assumption that you’d want to manipulate files more often than strings, so, if you want a literal echo*, you put it in single quotes to suppress metacharacter processing.
Correction: `set -o noglob` or `set -f`. It’s been too long since I toggled bash options.
(I normally use zsh and don’t change my config often.)
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!),
A single dumb string is fine. An array is also fine. Any conceivable structure would be fine. The important thing for programs to interoperate is that the standard, whatever it is, be consistently adhered to for the platform. With a robust specification that all languages can implement, there would be no ambiguity and no problem.
When I said “structured data”, I meant more than just a parameter array like argv.. But no platforms do this as far as I’m aware (short of passing in json or some other non-standard approach).
Of course, this is exactly the problem. Bash’s inconsistent behavior can lead to bad/surprising results regardless of how parameters are being passed between processes.
Objectively, the program being launched knows better how to interpret what a parameter is for. For the shell to presume each parameter is a potential filespec is a very dirty hack and the fact that the behavior is inconsistent is even worse.
Yes you’re right it is a shell issue. Not only does expanding files in the child fix all the aforementioned problems mentioned, which IMHO is reason enough to declare it a bad approach, but it also solves more problems that we haven’t even talked about yet…
1) Calling a child process from inside a program (not the shell) will fail to perform expansion even for parameters that are ostensibly meant to be file specifications.
2) useless FS lookups for parameters that don’t really represent files.
3) execution failing due to command line being too long
4) A shell process needs to read the inodes once to generate the list and then the child needing to read them again. This delays execution and harms cache locality. If the expansion is done in the child, it can start executing it’s loop strait away, using less memory, and significantly increases the odds that file operations will hit cached inodes and therefor be faster.
Some of these are performance benefits that don’t matter a huge deal for small directories, but on a busy hosting VPS where directories can contain hundreds of thousands of images (for example) shell expansion can actually create major disk IO bottlenecks (ask me how I know).
Given the totality of cons, I think that having the child perform file expansion when needed is clearly the better engineering choice. That said, changing course now would just cause more problems, My point is that it would have been better in hindsight.
Whole-heartedly disagree. If there’s one thing prior experience has demonstrated, it’s that “make something idiot-proof, nature builds a better idiot” is alive and well in the world of argument parsing… and that’s with getopt() being part of libc by POSIX decree.
The only reason your typical Linux CLI works as well as it does is because GNU enforces strict standardization for the wide swath of commands under their control, the vast majority of other things people run are open-source so users can submit patches to fix their argument parsing, and then useful GNU things like help2man also push for compliance.
Even despite that, there’s still the infamous split between GNU-style –long (two dashes) and X11/Java-style -long (one dash)… and that’s before you get to all the little things like UnRAR (tar-style single-character subcommands and option-parsing saying “ERROR: Unknown option:” for –help, -help, or -h. You need a DOS-inspired -?) and various other niche tools that haven’t been hammered into compliance.
In fact, The UNIX Haters Handbook, which did agree with your “pass the raw string in” for the purpose of having `rm` be able to say “Whoa there! Are you sure?” for things particularly destructive globs (zsh fixes that particular example by just watching for harmful `rm` invocations), also lambasted UNIX for not having a standard command-line argument parser that everything has to use.
My point was that, compared to what DOS/Windows does, what POSIX does is structured data.
No, actually, bash’s behaviour in the context of things like echo is consistent. That’s why I prefer it. It’s bad enough that commands like rsync apply new semantics to things like presence/absence of trailing slashes.
Glob expansion will run or be disabled by quoting/escaping and produce the same result regardless of which commands is in argv[0].
I’m calling [citation needed] on the behaviour being inconsistent. I’ve never seen it be inconsistent and that’s part of why I defend it.
I don’t want the shell equivalent of function_call(“foo”, “bar”, “baz”); deciding to second-guess my quoting and commas and open up an exploitable vulnerability.
There’s a reason that protected-mode OSes don’t allow applications to “know better” about how to interface with hardware. There’s a reason Wayland doesn’t allow applications to “know better” about their window management. There’s a reason we’re seeing more and more sandboxing for applications in general.
Experience has proven that the application doesn’t know better on matters where consistency is involved.
Beyond that, people have made various alternative shells which don’t glob or even whitespace-split over the years. It turns out most people glob and whitespace-split enough that they want those to be low-friction.
But it causes more problems. That’s why people who know the history of things and work with this sort of stuff agree that it’s bad to require every program to get the same bit of behaviour correct rather than having one centralized place for it.
That’s by design. The old “DEPRECATED! Don’t use this!” APIs do perform parameter expansion by default and it was found to be an endless parade of security holes because you’d get a literal filename returned by something like os.listdir() which contained a character that also worked as a metacharacter, and then that intended-to-be-literal filename would get interpreted by the invocation of the subcommand unless you remembered to run it through an escaper.
Your attempt to apply the principle of least surprise in one location would un-apply it in a much greater number of locations that just occur less consistently and, thus, are easier to miss until they bite you.
The modern APIs require you to explicitly invoke the kinds of parameter expansion you want before handing the list off to the subprocess… which has the added benefit of giving you the opportunity to do stuff like “canonicalize each path resulting from the glob and refuse to run if any of them turn out to be symlinks to files outside the specified root directory” if that’s relevant to your use-case.
Again, shell problem, not argv problem… and shells are swappable. Your arugment reminds me of the Buddhist poem where the student says “If only we could cover the world in leather” to avoid pain and the teacher hands him a pair of shoes.
I don’t like how annoying the whitespace splitting and metacharacter expansion make non-trivial shell scripting… so I don’t write non-trivial scripts in shell script.
Fair problem… but your solution is only a workaround.
The proper solution wouldn’t be akin to how my yt-dlp wrapper hacks around yt-dlp’s filename limiting working on codepoints and not bytes (and thus braking with very long CJK “titles”, like from somewhere like Twitter) by just specifying a short enough truncation to hopefully do the trick.
Personally, if I’m going to use a workaround, I prefer ones like find’s -exec/-execdir with the + terminator, which causes it to group invocations while maintaining awareness of length limits, or writing to a temporary file in /dev/shm and then asking the command to read from that.
…and a DOS game can get more efficiency by doing the things that make it only work using Windows 95’s “Reboot in MS-DOS Mode”.
…plus, if it’s performance I care about, I get much more noticeable performance improvements by interpreting the “reusable modules” part of the UNIX philosophy as “Python packages” instead of “binaries” and just writing a Python script. No marshalling/unmarshalling overhead. No fork/exec overhead. etc. etc. etc.
I don’t see how this point outweighs the downsides.
And I’m saying that, in contexts where people saw the results of their design decisions while change was still possible, they actively worked to move away from what you’re pushing for.
To clarify this, Windows mostly dodges this problem by just inheriting that DOS forbid common glob characters in filenames… something you can’t retrofit onto POSIX platforms for the same reason that Windows allows un-paired UTF-16 surrogates in their APIs in case you’ve got some old filenames kicking around from when they implemented UCS-2 instead.
(File/directory names on platforms like Linux are bytestrings… not even valid UTF-8… just bytestrings, where the only two forbidden characters are NUL (string terminator) and / (path separator).)
Funny enough, the Windows NT kernel, which was designed to surface a POSIX personality as a sibling to the Win32 personality, uses counted strings internally, so NUL is a valid character in its object mapper paths… leaving only the path separator as disallowed.)
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!),
This is getting way off track… If you want to standardize the commands and parameters, I’d consider that an entirely different discussion. It doesn’t refute any of the problems that arise from implementing filespec expansion on the shell side.
I disagree with that terminology, but I think I understand what you are saying.
It is NOT consistent as I proved at the top. The fact of the matter is that the shell has absolutely no idea whether each parameter is supposed to be a literal or a file. Only the child knows the purpose of each argument, the shell is at best opportunistically guessing. Even a parameter that looks like a file may in fact be a remote file or url that should not be expanded by the local shell. The shell simply does not have the information that the child does and that’s why it makes more sense to leave the decision to the child rather than relying on heuristics in the shell that can sometimes be wrong.
Then please review my earlier examples. The shell clearly treats parameters either as a literal, or as a filespec, but the behavior is objectively NOT consistent. The determination of whether an argument will be a literal or a filespec on any given invocation is not determined by the command alone, but rather it is dependent on the files that happen to be in the working directory. This is a bad design especially when local files shouldn’t affect a command.
I have no idea why your bringing any of this up…. A child process explicitly do a search is more robust than having the shell heuristically guess that the child process might want a file list.
Think about it. If the convention was for the child process to perform file searches instead of the shell. The command line usage would still be the same!!! But now it wouldn’t be ambiguous whether an argument was meant to be a file spec and it has all the benefits we’ve been talking about with no cons for usability.
You haven’t identified any problems though. Standardizing behaviors can be done with stdc functions for it like we do with everything else. Obviously I get that changing things today would cause problems for today’s software, but if unix had standardized this practice from the get-go then neither you, nor I, nor anyone else would be complaining about it…it would just work. Furthermore no one would want the model we have today because the issues I’ve been outlining make it inferior.
That problem don’t exist in the proposed model. And not for nothing but having the shell heuristically perform expansions is actually worse for security. Having the child process perform the search has the benefit of always treating arguments correctly regardless of if it’s called in a shell or not. There’s no con.
Create a helper function and put in in stdc. Done.
The standard helper functions can be extended to do it or you can even replace with custom ones. The more sophisticated your requirements are though, the more I’d be inclined to look at chaining together powerful unix tools like find and xarg to achieve the job without having to rely on a specific shell.
Also if you have hard security requirements, my advice would be to look at kernel solutions.
Yes, I know it’s a shell problem! But that’s exactly the point, it’s a shell problem that wouldn’t be a problem at all if it were engineered better.
That’s not right though. There are many challenges to fixing established standards even if we can acknowledge that there were better solutions that we could have gone with at the start.
The more responses I see from you, the more I see we’re talking past each other. Hopefully, this time, concrete examples will help.
First, unless you have something like Apple’s App Store moderation, you can’t stop people from implementing globbing shell-side… something which people do in pursuit of greater consistency between commands (defined as ensuring that the same glob, in the same working directory, will expand to the same list of files no matter what program they’re passed to and whether it calls the glob-evaluating function), more features (eg. providing a single place to implement extended patterns like the img{001..121}.png or foo.{c,h} that I often use when not in COMMAND.COM or CMD.EXE), and improved auditability (ensuring that there’s a single place to audit for how patterns will be expanded).
Hell, that’s been done.
Cygwin bash implements bash’s usual globbing and, because it’s running on Windows, then quotes the resulting expanded argv into a single string to call Windows’s process spawning APIs. Programs which expect to receive the raw string will think that you just manually typed out the result of evaluating the glob.
Take Command Console for Windows (formerly 4NT), which doesn’t have “ported POSIX application as an excuse”, explicitly lists “Enhanced wildcards” as one of its features, which is only possible to implement by doing shell-side glob expansion, and it’s had that feature since it was 4DOS.
If you and the thing which sits you and process invocation disagree on how argument expansion should work, you lose. Period.
Passing a single string instead of an argv doesn’t limit the shell’s ability to do shell-side glob expansion but does encourage apps to second-guess strings that are meant to be literals if there’s a mismatch between how you and they implemented quoting/escaping and also encourages programs to surprise you by refusing to evaluate a glob when you wanted it to happen.
(And yes, I do often use globs and expansions in places your solution would probably break, such as using “Some Title”{,.html} when something takes Title and Filename as CLI arguments.)
Again, that’s a crappy language syntax… as evidenced by how all these problems go away when you invoke your processes using something like Python’s subprocess.call().
Bourne shell has three kinds of strings:
1. Single-quoted literals (always literal strings. No metacharacter stuff at all.)
2. Double-quoted literals (Literals strings with $VAR substitution.)
3. Barewords
Barewords, in turn, can be either string literals or match patterns:
1. If it contains no globbing characters, it’s a literal string.
2. If it contains globbing characters, it’s a match pattern.
You’re not really even complaining about that. You’re complaining about the default behaviour of match patterns decaying into literal strings if they find zero matches.
ssokolow@monolith-tng:~$ echo "Hello*" World* 'Demo*'
Hello* World* Demo*
ssokolow@monolith-tng:~$ shopt -s failglob
ssokolow@monolith-tng:~$ echo "Hello*" World* 'Demo*'
bash: no match: World*
ssokolow@monolith-tng:~$ shopt -u failglob
ssokolow@monolith-tng:~$ shopt -s nullglob
ssokolow@monolith-tng:~$ echo "Hello*" World* 'Demo*'
Hello* Demo*
ssokolow@monolith-tng:~$
This is like my dislike for CoffeeScript and Haskell making parens optional for calling functions. Don’t like it? Use a different shell/language.
Hell, you don’t even have to use a different shell. Just set either `failglob` or `nullglob` in your .bashrc and get on with your life.
As Cygwin bash demonstrates by implementing the exact same behaviour on Windows, and 4DOS and TCC demonstrate by doing it and calling it a feature, passing a single string to the client does nothing to stop this, because Bash or 4DOS or TCC or whatever just lies to the program about what the user actually typed.
And as I said, Cygwin Bash, 4DOS, and 4NT/Take Command Console are object lessons that you can download and run right now to prove that your argument doesn’t hold water.
That IS the convention on DOS and Windows, and yet those (and no doubt other) shells still implement client-side globbing in the name of retrofitting a richer, more expressive glob syntax.
First, not everything uses the stdc functions. That’s where those occasional CVEs come from.
Second, as I keep pointing to, it wouldn’t actually solve the problem you think it would.
Third, from a software architecture standpoint, if you are using stdc to re-construct argv, it’s just a worse way to arrive at the same result.
If I wouldn’t be, it’s because I’d lack the imagination to envision something better. Plenty of people bemoan the state of argument passing on Windows as yet another irritating DOS-ism because they can see POSIX demonstrating that something less hostile to non-shell subprocess invocation is viable.
Yes, it does, because shells aren’t the only things that invoke subprocesses and, when you are sending a preprocessed list of literal paths to a subprocess, you don’t want it performing metacharacter expansion on them.
Again, CVEs have happened because of that. There’s an entire class of vulnerabilities known as “shell injection” vulnerabilities fundamentally based around tricking things which perform shell metacharacter expansion into getting confused about where the delimiters are and which metacharacters should be escaped.
The POSIX way is superior because it encourages programs to have a default mode of operation where they just treat their path arguments as opaque “ID tokens” to be passed to the OS’s filesystem APIs without attempting to modify them.
(Hell, with interfaces like openat() and platform APIs like WASI with capability-based security models, we’re starting to see a codification of that, where the sandbox host just passes in a pre-opened file descriptor and the application doesn’t even know what the absolute path of the granted resource is… just any path components relative to the root directory they were given a descriptor for.)
You’re trying to kill a fly with a sledgehammer and ignoring how many holes it will make in the walls and floor.
Again, not viable unless you’ve got something like Apple’s App Store denying developers access if they don’t use it when appropriate and use it correctly.
And, as I said in my previous examples, alternative shells for DOS and Windows like 4DOS, Take Command Console, and Cygwin Bash show that your solution does nothing to prevent shell authors from implementing globbing shell-side and then just generating a new quoted string to feed to the subprocess in the name of overriding the default globbing semantics.
Hell, one of Zsh’s big claims to fame is that it provides richer globbing than Bash… not that I really need much of that, given that bash 4+ copied ** for recursive path globs and I don’t use the really fancy ones.
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!),
If someone wants to implement a feature go right ahead, shells can & should evolve. The problem is file expansion in shell that doesn’t understand what the parameters do creates broken and inconsistent results which makes for a poor technical foundation on which to build up from.
I’m not at all opposed to the idea of building more sophisticated shells. A context sensitive TUI like powershell can be appealing. However I argue that we’d be in a stronger place to build more sophisticated shells if we’ve started with good standards to begin with. If your starting from a point of having to compensate for foundational flaws, including bad assumptions and requiring heuristics, that’s just not very good engineering.
I’m not saying there aren’t other things that can’t be improved too, surely we could agree there are many such examples. I’m just pointing out that as it stands, file expansion is flawed and the results can’t easily be fixed because software has been engineered around a weak foundation. Again, I know this is very hard to fix now in today’s reality, but my point is merely that a better engineered foundation at the start would have been possible and nobody would want to look back.
Not for nothing, but if applications perform their own file spec expansion, it doesn’t break any of your examples where you want to add more features to the shell. A better engineered standard doesn’t prevent you from creating a shell with globbing if you have special requirements.
The discussion about doing “email the UNIX way” goes a bit deeper. Some of the “UNIX way” here is multi-user..
Having all this power on a desktop ( nevermind phone ) is not how it always was. UNIX was not designed as a single-user desktop. Mail on UNIX was designed with the expectation that relatively simple computers with little storage would be working as terminals to much more powerful computers with “lots” of storage. The mail system in UNIX ran on a central computer where everybody would have their emails stored. A Mail Transfer Agent (MTA), like Sendmail, would fetch, send, and organize mail by user account on the central server as local files in a standard layout and format. The email client that each user would use on their terminal was just a simple front-end to this system.
On a desktop system where even “root” and “user” are the same person, none of this local complexity makes much sense and is in fact a bit contrary to the UNIX philosophy. Mail clients that can directly interact with remote servers are a better reflection of the UNIX philosophy on a modern desktop.
Mail/mailx (POSIX mail) can be configured to interact with remote SMTP servers directly without needing a local MTA like Sendmail. Older versions of mailx may lack this option though.
I do not know much about dtmail. The man page talks about mailx but it is not clear if it uses mailx under the hood or just uses the same commands and directory structure. I suspect the latter. In that case, or if mailx lacks SMTP options, dtmail will require a local MTA.
My favourite email client (or Mail User Agent – MUA) from back in the day was ELM which was actually created at HP (quite likely on HP-UX though I used it on Sun workstations and early Linux).
Getting dtmail working is not that different than setting up mutt or pine locally on a Linux system. Yes, they need external programs to work. You need a mail delivery agent (MDA) and a mail transport agent (MTA).
Rather than set up sendmail (which I can and have done many times) locally, I recommend a smaller drop-in like msmtp (https://marlam.de/msmtp/). This will work like sendmail would when dtmail invokes it, but you can configure msmtp to smart host your email through an ISP, which is more or less required these days. If you have gmail, you can configure msmtp to hand the email you compose over to gmail.
On the receiving end, I would look at mbsync (https://isync.sourceforge.io/mbsync.html) which will let you sync your mail via IMAP. So you can have it locally and wherever else it exists, thus eliminating the problem of having email in once location only.
dtmail is pretty old at this point and I don’t know if it works with Maildir format, but it will work with mbox which is what the above helper programs can do.
The version of sendmail that comes with HP-UX is likely outdated anyway and might even lack certain features you would need to correctly set things up. Also, it being sendmail, it is definitely advisable to not run an old version.
If you have questions, I’m happy to help. Just send me an email.
This is incredibly helpful, thank you! I’ll dive into the docs over the coming days to see if I can get it working.
I’m always reminded of how the meaning of “the UNIX philosophy” has changed over time, with things like “handle text streams” being a later interpretation of its original form.
The original UNIX philosophy from 1978 is as follows:
1. Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
2. Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
3. Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.
4. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.
To me, that sounds like, given the change in hardware limitations and available tooling, that could be rewritten as:
1. Write your programs as well-factored modules. To do a new job, write a new module instead of complicating the existing ones.
2. Ensure what you write is easy to compose and automate.
3. Get to the dogfooding stage as quickly as possible. Don’t be afraid to throw away and redesign parts that aren’t living up to expectations.
4. Assume that “one-off” tasks will turn out to not be “one-off”. Automate the manual labour.
I don’t know about you, but it sure sounds to me like a set of rules that just sort of outgrew the ossified “no standard way to pass structured data or verify APIs match at build time” limitations of POSIX pipes and shell scripting and evolved into the philosophy people are encouraged to follow in languages with package repositories like Crates.io, PyPI, and NPM.
Heck, in a sense, you could say that we’ve been abandoning POSIX pipes and Bourne shell as a means of composing modules out of loyalty to point 4 of the UNIX philosophy. We’re inventing ways to automate away the unskilled labour of constantly marshalling and un-marshalling structured data, and verifying that our APIs are mating up correctly.