Linked by Thom Holwerda on Mon 3rd Dec 2012 22:51 UTC
General Unix "Few tools are more indispensable to my work than Unix. Manipulating data into different formats, performing transformations, and conducting exploratory data analysis (EDA) is the lingua franca of data science.1 The coffers of Unix hold many simple tools, which by themselves are powerful, but when chained together facilitate complex data manipulations. Unix's use of functional composition eliminates much of the tedious boilerplate of I/0 and text parsing found in scripting languages. This design creates a simple and succinct interface for manipulating data and a foundation upon which custom tools can be built. Although languages like R and Python are invaluable for data analysis, I find Unix to be superior in many scenarios for quick and simple data cleaning, idea prototyping, and understanding data. This post is about how I use Unix for EDA."
Thread beginning with comment 544067
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[6]: But ....
by Hypnos on Tue 4th Dec 2012 05:07 UTC in reply to "RE[5]: But ...."
Hypnos
Member since:
2008-11-19

I can see how this is useful for corporate deployments with many, many machines. But it's overkill to incude by default for 99% of Linux users. It should at most be a plugin.

Reply Parent Score: 3

RE[7]: But ....
by Delgarde on Tue 4th Dec 2012 20:31 in reply to "RE[6]: But ...."
Delgarde Member since:
2008-08-19

I can see how this is useful for corporate deployments with many, many machines. But it's overkill to incude by default for 99% of Linux users. It should at most be a plugin.


Why not the other way around? The systemd binary logging approach by default, and the ability to install a traditional text logger (i.e a plugin) for those that want it?

Reply Parent Score: 3

RE[8]: But ....
by Hypnos on Wed 5th Dec 2012 00:36 in reply to "RE[7]: But ...."
Hypnos Member since:
2008-11-19

It's better software design for tools to be simpler by default, not more complex by default. This gives better maintainability and fork-ability, as well as fewer bugs and security problems.

Reply Parent Score: 3