Post a Comment
I'm sympathetic to his viewpoint, but the Unix shell is a bit of stretch unless your data entries are one line each. It's easy for shell scripts to become spaghetti code if you have data entries that are more complex or nested; in these cases you want a programming language that is more verbose, but also more readable. And if you have huge data sets, you need to think carefully about memory management.
In particle physics, where experimenters deal in petabytes of data often structured in nested trees, ROOT has become the standard. It takes the philosophy of early optimization and runs with it
Pure C++, with a RTTI and an interpreter tacked on.
In astronomy, where the data is more array-like, people use IRAF and IDL (or its quasi-clones, like SciPy).
I'd be curious to learn how the biologists deal with their gobs of data.
That is where Perl and Python (depending on your camp) come into play. However the point of those tutorials is to show that basic data EDA can be done from the command line.
Though stuff like that comes into it's own if you're a sys admin and need to quickly churn through log files. I love working in a CLI but even I wouldn't advocate using Bash et al for analysing large, complex, datasets.
Edited 2012-12-04 00:57 UTC
To be fair, Unix log files are actually designed in a way that make it easy analyze them with awk or perl -- they are structured line-by-line. It's a wonderfully sensible convention set to be discarded by the systemd folks
Regarding Python, SciPy is a bunch of high-performance C(++) and Fortran libraries glued together by Python+NumPy. It's really becoming a viable substitute for IDL.
Ayyup:
https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97...
To be fair, they let you run your old logger in parallel -- all you have to do is change you tried-and-true tools to use their new super-duper interface:
http://www.freedesktop.org/wiki/Software/systemd/syslog
Are they not merciful?
More than that - not only are they not stopping you from running a traditional logger if you want one, they're providing fancy tools designed specifically for running complex queries over the binary log format.
In short, they're actually making things easier to parse logging data.
Why not the other way around? The systemd binary logging approach by default, and the ability to install a traditional text logger (i.e a plugin) for those that want it?
Ok, not as horrible as I initially thought. syslog is a pretty shitty logging system, that I can agree with.
I would have liked to see the example use more structured data though and not a freeform "user blah logged in" message.
Don't see why this should be a feature of systemd though and not a standalone system. The arguments for forcing systemd on us are pretty bogus. Tightly integrated? Yeah, right. Good thing we don't have message passing technologies these days or something.
Also, what's with being so defensive about UUId's? (I don't mind them, it's just fascinating how big a deal it seems to be)
As a big fan of Upstart (and daemontools/runit/etc) I think it's about time the abomination known as SysV init is abandoned (along with runlevels) and in that respect Systemd is a step forward. I kinda wish it didn't try to weasel in everywhere though. (A GNOME dependency? WTF?)
I've been pretty happy with OpenRC on Gentoo, though it does depend on /sbin/init :
https://en.wikipedia.org/wiki/OpenRC
Seems almost fitting... https://en.wikipedia.org/wiki/Poettering ;p
A mad sysadmin's dream company network, where everything runs some bare-bones variant of UNIX and the home partition is mounted in noexec mode?
Not sure if bash would still agree to run shell scripts in the latter case, though. And even if it spontaneously would, the mad sysadmin might well have patched it by hand so that it fails instead. After all, he can patch everything he wants since he never updates anything anyway.
Edited 2012-12-04 07:29 UTC
Agreed - at work, we develop on Linux, and deploy to AIX, Solaris, and HP-UX. And of those, AIX is decent enough, Solaris is a pain in the ass, and the less said about HP-UX the better.
It really depends on the lab and the people there. Where I work, we mostly deal with array data, which is large-but-not-insane: Many GB, but not (yet) the TB-filling unpleasantness of large-scale sequencing. We mostly do things in R, with some of the rougher filtering done in shellscript and simple command line tools (I ended up writing a fast matrix transposer for tab-separated text files just because R was using hours while it could be done in minutes with a bit of C).
Going by the tools we run into from other labs, it's roughly what you'd expect from a bunch of biologists allied with whatever IT people they've roped in: Some places really love perl, some write exclusively in java, a few tools are C++, and there's always the odd diva company that only really support a GUI tool for windows if you want to interpret their raw files. Some python, though it feels to be a bit less popular than in comparable fields.
There is an R package for everything, though. The language itself is a bit weird, which is not surprising given the long and meandering development history, and it's the first language where I've looked at the OO system and decided I'd be better off ignoring it. (If you thought stereotypical OO-fetishist java was obfuscated, you've never looked at an R S3 class). Still, it's by far the best language I've used for dealing with tabular data.
Edited 2012-12-06 16:01 UTC




