All opinions expressed are those of the authors and not necessarily those of, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by (Greg Sabino Mullane) on 2015-07-01 18:22:00 in the "postgres" category

Back in the old days, upgrading Postgres required doing a pg_dump and loading the resulting logical SQL into the new database. This could be a very slow, very painful process, requiring a lot of downtime. While there were other solutions (such as Bucardo) that allowed little (or even zero) downtime, setting them up was a large complex task. Enter the pg_upgrade program, which attempts to upgrade a cluster with minimal downtime. Just how fast is it? I grew tired of answering this question from clients with vague answers such as "it depends" and "really, really fast" and decided to generate some data for ballpark answers.

Spoiler: it's either about 3.5 times as fast as pg_dump, or insanely fast at a flat 15 seconds or so. Before going further, let's discuss the methodology used.

I used the venerable pgbench program to generate some sample tables and data, and then upgraded the resulting database, going from Postgres version 9.3 to 9.4. The pgbench program comes with Postgres, and simply requires an --initialize argument to create the test tables. There is also a --scale argument you can provide to increase the amount of initial data - each increment increases the number of rows in the largest table, pgbench_accounts, by one hundred thousand rows. Here are the scale runs I did, along with the number of rows and overall database size for each level:

Effect of --scale
--scaleRows in pgbench_accountsDatabase size
10010,000,0001418 MB
15015,000,0002123 MB
20020,000,0002829 MB
25025,000,0003535 MB
30030,000,0004241 MB
35035,000,0004947 MB
40040,000,0005652 MB
45045,000,0006358 MB
50050,000,0007064 MB
55055,000,0007770 MB
60060,000,0008476 MB

To test the speed of the pg_dump program, I used this simple command:

$ pg_dump postgres | psql postgres -q -p 5433 -f -

I did make one important optimization, which was to set fsync off on the target database (version 9.4). Although this setting should never be turned off in production - or anytime you cannot replace all your data, upgrades like this are an excellent time to disable fsync. Just make sure you flip it back on again right away! There are some other minor optimizations one could make (especially boosting maintenance_work_mem), but for the purposes of this test, I decided that the fsync was enough.

For testing the speed of pg_upgrade, I used the following command:

$ pg_upgrade -b $BIN1 -B $BIN2 -d $DATA1 -D $DATA2 -P 5433

The speed difference can be understood because pg_dump rewrites the entire database, table by table, row by row, and then recreates all the indexes from scratch. The pg_upgrade program simply copies the data files, making the minimum changes needed to support the new version. Because of this, it will always be faster. How much faster depends on a lot of variables, e.g. the number and size of your indexes. The chart below shows a nice linear slope for both methods, and yielding on average a 3.48 increase in speed of pg_upgrade versus pg_dump:

pg_dump versus pg_upgrade
--scaleDatabase sizepg_dump
1001.4 GB210.074.72.82
1502.1 GB305.079.43.86
2002.8 GB457.6122.23.75
2503.5 GB636.1172.13.70
3004.2 GB832.2215.13.87
3504.9 GB1098.8320.73.43
4005.7 GB1172.7361.43.25
4506.4 GB1340.2426.73.15
5007.1 GB1509.6476.33.17
5507.8 GB1664.0480.03.47
6008.5 GB1927.06073.17

If you graph it out, you can see both of them having a similar slope, but with pg_upgrade as the clear winner:

I mentioned earlier that there were some other optimizations that could be done to make the pg_dump slightly faster. As it turns out, pg_upgrade can also be made faster. Absolutely, beautifully, insanely faster. All we have to do is add the --link argument. What this does is rather than copying the data files, it simply links them via the filesystem. Thus, each large data file that makes up the majority of a database's size takes a fraction of a second to link to the new version. Here are the new numbers, generated simply by adding a --link to the pg_upgrade command from above:

pg_upgrade --link is crazy fast
--scaleDatabase sizepg_upgrade --link
1001.4 GB12.9
1502.1 GB13.4
2002.8 GB13.5
2503.5 GB13.2
3004.2 GB13.6
3504.9 GB14.4
4005.7 GB13.1
4506.4 GB13.0
5007.1 GB13.2
5507.8 GB13.1
6008.5 GB12.9

No, those are not typos - an average of thirteen seconds despite the size of the database! The only downside to this method is that you cannot access the old system once the new system starts up, but that's a very small price to pay, as you can easily backup the old system first. There is no point in graphing these numbers out - just look at the graph above and imagine a nearly flat line traveling across the bottom of the graph :)

Are there any other options that can affect the time? While pgbench has a handy --foreign-keys argument I often use to generate a more "realistic" test database, both pg_dump and pg_upgrade are unaffected by any numbers of foreign keys. One limitation of pg_upgrade is that it cannot change the --checksum attribute of a database. In other words, if you want to go from a non-checksummed version of Postgres to a checksummed version, you need to use pg_dump or some other method. On the plus side, my testing found negligible difference between upgrading a checksummed versus a non-checksummed version.

Another limitation of the pg_upgrade method is that all internal stats are blown away by the upgrade, so the database starts out in a completely unanalyzed state. This is not as much an issue as it used to be, as pg_upgrade will generate a script to regenerate these stats, using the handy --analyze-in-stages argument to vacuum. There are a few other minor limitations to pg_upgrade: read the documentation for a complete list. In the end, pg_upgrade is extraordinarily fast and should be your preferred method for upgrading. Here is a final chart showing the strengths and weaknesses of the major upgrade methods.

Postgres upgrade methods compared
  • Always works
  • Battle tested
  • Slowest method
  • Maximum downtime
  • Requires lots of disk space
  • Very fast
  • --link mode super fast
  • Cannot always be used (finicky)
  • Stats are lost
  • Minimal but non-zero downtime
  • Handles complex cases
  • Zero-downtime possible
  • Complex to setup
  • Requires primary keys on large tables
  • Requires lots of disk space

(As an addendum of sorts, pg_upgrade is fantastic, but the Holy Grail is still out of sight: true in-place upgrades. This would mean dropping in a new major version (similar to the way revisions can be dropped in now), and this new version would be able to read both old and new data file formats, and doing an update-on-write as needed. Someday!)

published by (Muhammad Najmi Ahmad Zabidi) on 2015-07-01 06:52:00 in the "python" category
Recently I worked on a program which required me to filter hundred of lines of blog titles. Throughout the assignment I stumbled upon a few interesting problems, some of which are outlined in the following paragraphs.

Non Roman characters issue

During the testing session I missed one title and investigating why it happened, I found that it was simply because the title contained non-Roman characters.

Here is the code's snippet that I was previously using:

for e in results:                                                                                                                        
    if freqs.get(simple_author,0) < 1:                                                                                               
        print parse(e['published']).strftime("%Y-%m-%d") , "--",simple_author, "--", e['title']

And here is the fixed version

for e in results:                                                                                                                        
    if freqs.get(simple_author,0) < 1:                                                                                               
        print parse(e['published']).strftime("%Y-%m-%d") , "--",simple_author, "--", e['title'].encode('UTF-8') 

To fix the issue I faces I added .encode('UTF-8') in order to encode the characters with the UTF-8 encoding. Here is an example title that would have been otherwise left out:

2014-11-18 -- Unknown -- Novo website do Liquid Galaxy em Português!

Python 2.7 uses ASCII as its default encoding but in our case that wasn't sufficient to scrap web contents which often contains UTF-8 characters. To be more precise, this program fetches an RSS feed in XML format and in there it finds UTF-8 characters. So when the initial Python code I wrote met UTF-8 characters, while using ASCII encoding as the default sets, it was unable to identify them and returned an error.

Here is an example of the parsing error it gave us while fetching non-roman characters while using ASCII encoding:
UnicodeEncodeError: 'ascii' codec can't encode character u'xea' in position 40: ordinal not in range(128)

Right and Left text alignment

In addition to the error previously mentioned, I also had the chance to dig into several ways of formatting output.
The following format is the one I used as the initial output format:

Name                                                     Age

Using "ljust" and "rjust" method

I want to improve the readability in the example above by left-justify "Name" by 30 characters and "Age" by another 30 characters distance.

Let's try with the '*' fill character. The syntax is str.ljust(width[, fillchar])

Name**************************                           Age

And now let's add .rjust:


By using str, it counts from the left by 30 characters including the word "Name" which has four characters
and then another 30 characters including "Age" which has three letters, by giving us the desired output.

Using "format" method

Alternatively, it is possible to use the same indentation approach with the format string method:

print("{!s:<{fill}}{!s:>{fill}}".format("Name", "Age",fill=30))
Name                                                     Age

And with the same progression, it is also possible to do something like:

print("{!s:*<{fill}}{!s:>{fill}}".format("Name", "Age",fill=30))
Name**************************                           Age
print("{!s:*<{fill}}{!s:#>{fill}}".format("Name", "Age",fill=30))

"format" also offers a feature to indent text in the middle. To put the desired string in the middle of the "fill" characters trail, simply use the ^ (caret) character:

Feel free to refer the Python's documentation on Unicode here:

And for the "format" method it can be referred here:

published by (Jeff Boes) on 2015-06-26 13:30:00 in the "ajax" category

Perl POD is a handy, convenient, but low-tech approach to embedded documentation. Consider a web service in Dancer:

get time => sub {
  return scalar(localtime());

(Disclaimer: my actual use-case of this technique was even more legacy: I was documenting Interchange Actionmaps that returned images, JSON, etc.)

Your application might have several, or even dozens of these, with various parameters, returning data in JSON or TXT or CSV or who-knows-what. I chose to document these in Perl POD (Plain Old Documentation) format, e.g.,


=head1 time

Retrieves the current time

=over 3

=item Parameters


=item Example

=begin html

=end html



This block gets inserted right in-line with the web service code, so it's immediately obvious to anyone maintaining it (and thus has the best chance of being maintained if and when the code changes!). Now I can generate an HTML page directly from my Perl code:

$ pod2html

Your output looks something like this (excerpted for clarity):


Retrieves the current time




Where the magic comes in is the Javascript code that allows an in-line example, live and accurate, within the documentation page. You'll actually get something more like this:


Retrieves the current time



(results appear here)

Note that the code I have below is not factored by choice; I could move a lot of it out to a common routine, but for clarity I'm leaving it all in-line. I am breaking up the script into a few chunks for discussion, but you can and should construct it all into one file (in my example, "js/example-time.js").

/* example-time.js */
" + /* Note 1 */ "" + "" + "
" + "
" );

Note 1: This being a painfully simple example of a web service, there are no additional inputs. If you have some, you would add them to the HTML being assembled into the <form> tag, and then using jQuery, add them below to the url parameter, or into the data structure as required by your particular web service.

This step just inserts a simple <form> into the document. I chose to embed the form into the Javascript code, rather than the POD, because it reduces the clutter and separates the example from the web service.

    var $form = $('form[action="/time"]');
          'url': $form.attr('action') /* Note 1 also */,
          'data': {},
          'dataType': 'text',
          'async': false,

Here we have a submit handler that performs a very simple AJAX submit using the form's information, and upon success, inserts the results into a result <div> as a pre-formatted block. I added a "json" class which just tweaks the font and other typographic presentation a bit; you can provide your own if you wish.

I'm aware that there are various jQuery plug-ins that will handle AJAX-ifying a form, but I couldn't get the exact behavior I wanted on my first tries, so I bailed out and just constructed this approach.

                 $('#time-result').html('Error retrieving data!')
/* */

(That stray-looking comment above is just a work-around for the syntax highlighter.)

Error handling goes here. If you have something more comprehensive, such as examining the result for error codes or messages, this is where you'd put it.

      return false;

And just a bit of UI kindness: we have a "hide" button to make the example go away. Some of my actual examples ran to dozens of lines of JSON output, so I wanted a way to clean up after the example.

published by (Kannan Ponnusamy) on 2015-06-18 18:14:00 in the "ipython" category
Recently I have been working on Python automation scripts. Very often I use IPython to develop/debug the code.
IPython is an advanced interactive python shell. It is a powerful tool which has many more features. However, here I would like to share some of the cool tricks of IPython.

Getting help

Typing object_name? will print all sorts of details about any object, including docstrings, function definition lines (for call arguments) and constructor details for classes.
In [1]: import datetime
In [2]: datetime.datetime?
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints or longs.
File:      /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
Type:      type

Magic commands


This will bring up an editor to type multiline code and execute the resulting code.
In [3]: %edit
IPython will make a temporary file named: /var/folders/xh/2m0ydjs51qxd_3y2k7x50hjc0000gn/T/ipython_edit_jnVJ51/
In [3]: %edit -p
This will bring up the editor with the same data as the previous time it was used or saved. (in the current session)

Run a script

This will execute the script and print the results.
In [12]: %run
Current date and time:  2015-06-18 16:10:34.444674
Or like this:  15-06-18-16-10
Week number of the year:  24
Weekday of the week:  4


Activate the interactive debugger.
In [15]: %run
Current date and time:  2015-06-18 16:12:32.417691
Or like this: ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/kannan/playground/ in ()
      4 print "Current date and time: " ,
----> 5 print "Or like this: " ,datetime.datetime.strftime("%y-%m-%d-%H-%M")
      6 print "Week number of the year: ","%W")
      7 print "Weekday of the week: ","%w")

TypeError: descriptor 'strftime' requires a '' object but received a 'str'

In [16]: %debug
> /Users/kannan/playground/
      4 print "Current date and time: " ,
----> 5 print "Or like this: " ,datetime.datetime.strftime("%y-%m-%d-%H-%M")
      6 print "Week number of the year: ","%W")

I made a error in the line number 5, it should have to look like this. So %debug command took me into the Python debugger.
print "Or like this: " ,"%y-%m-%d-%H-%M")


This will save the specified lines to a given file. You can pass any number of arguments separated by space.
In [21]: %save 1-2 2-3
The following commands were written to file ``:
import datetime
%edit -p


Repeat a command, or get command to input line for editing.
In [28]: %recall 21

In [29]: import datetime


Time execution of a Python statement or expression
It can be one line or multiline statement. In a one liner we can pass through multiple ones separated by semicolon.
In [33]: %timeit range(100)
1000000 loops, best of 3: 752 ns per loop

Shell Commands

Basic UNIX shell integration (you can run simple shell commands such as cp, ls, rm, cp, etc. directly from the ipython command line)

To execute any other shell commands we just need to add '!' beginning of the command line. We can assign the result of the system command to a Python variable to further use.
In [38]: list_of_files = !ls

In [39]: list_of_files


Print input history, with most recent last.
In [41]: %history 20-22
import datetime
%history ~1/4 #Line 4, from last session
This will list the previous session history.


This will upload the specifed input commands to Github?s Gist paste bin, and display the URL
It will upload the code as anonymous user
In [43]: %pastebin [-d ?Date Example?] 20-23
Out[43]: u''

For more info on this topic:

published by (Marina Lohova) on 2015-06-17 11:00:00 in the "database" category

If you need to dump the production database locally Heroku has a nice set of tools to make this as smooth as humanly possible. In short, remember these two magic words: pg:pull and pg:push. This article details the process

However, when I first tried it I had to resolved few issues.

My first problem was:

pg:pull not found

To fix this:

1. Uninstall the 'heroku' gem with

gem uninstall heroku (Select 'All Versions')

2. Find your Ruby 'bin' path by running

gem env

3. Cd to the 'bin' folder.

4. Remove the Heroku executable with

rm heroku

5. Restart your shell (close Terminal tab and re-open)

6. Type

heroku version
you should now see something like:
heroku-toolbelt/2.33.1 (x86_64-darwin10.8.0) ruby/1.9.3

Now you can proceed with the transfer:

1. Type

heroku config --app production-app

Note the DATABASE_URL, for example let's imagine that the production database url is HEROKU_POSTGRESQL_KANYE_URL, and the staging database url is HEROKU_POSTGRESQL_NORTH

2. Run

heroku pg:pull HEROKU_POSTGRESQL_KANYE rtwtransferdb --app production-app
heroku config --app staging-app
heroku pg:push rtwtransferdb HEROKU_POSTGRESQL_NORTH --app rtwtest

This is when I hit the second problem:

database is not empty

I fixed it by doing:


Happy database dumping!

published by (Greg Davidson) on 2015-06-11 12:14:00 in the "html5" category

Debugging Broken Maps

A few weeks ago I had to troubleshoot some Google Maps related code that had suddenly stopped working. Some debugging revealed the issue: the code adding markers to the page was attempting to access properties that did not exist. This seemed odd because the latitude and longitude values were the result of a geocoding request which was completing successfully. The other thing which stood out to me were the property names themselves:

var myLoc = new google.maps.LatLng(results[0].geometry.location.k, results[0].geometry.location.D);

It looked like the original author had inspected the geocoded response, found the 'k' and 'D' properties which held latitude and longitude values and used them in their maps code. This had all been working fine until Google released a new version of their JavaScript API. Sites that did not specify a particular version of the API were upgraded to the new version automatically. If you have Google Maps code which stopped working recently this might be the reason why.

The Solution: Use the built-in methods in the LatLng class

Screen Shot 2015 06 10 at 3 47 32 PM

I recalled there being some helper methods for LatLng objects and confirmed this with a visit to the docs for the LatLng class which had recently been updated and given the Material design treatment — thanks Google! The lat() and lng() methods were what I needed and updating the code with them fixed the issue. The fixed code was similar to this:

var myLoc = new google.maps.LatLng(results[0], results[0].geometry.location.lng());

Digging Deeper

I was curious about this so I mapped out the differences between the three latest versions of the API:

API Version Latitude Property Longitude Property Constructor Name
3.21.x (experimental) A F rf
3.20.x (release) A F pf
3.19.x (frozen) k D pf

It seems to me that the property name changes are a result of running the Google Maps API code through the Closure Compiler. Make sure to use the built-in lat() and lng() methods as these property names are very likely to change again in future!

published by (Zdenek Maxa) on 2015-06-09 19:40:00 in the "Chef" category

This post describes some of our experiences at End Point in designing and working on comprehensive QA/CI facilities for a new system which is closely related to the Liquid Galaxy.

Due to the design of the system, the full deployment cycle can be rather lengthy and presents us with extra reasons for investing heavily in unit test development. Because of the very active ongoing development on the system we benefit greatly from running the tests in an automated fashion on the Jenkins CI (Continuous Integration) server.

Our Project's CI Anatomy

Our Jenkins CI service defines 10+ job types (a.k.a. Jenkins projects) that cover our system. These job types differ as far as source code branches are concerned, as well as by combinations of the types of target environments the project builds are executed on.

The skeleton of a Jenkins project is what one finds under the Configure section on the Jenkins service webpage. The source code repository and branch are defined here. Each of our Jenkins projects also fetches a few more source code repositories during the build pre-execution phase. The environment variables are defined in a flat text file:

Another configuration file is in the JSON format and defines variables for the test suite itself. Furthermore, we have a preparation phase bash script and then a second bash script which eventually executes the test suite. Factoring out all degrees of freedom into two pairs of externally managed (by Chef) concise files allows for pure and simple Jenkins job build definition:

It?s well possible to have all variables and content of the bash scripts laid out directly in the corresponding text fields in the Jenkins configuration. We used to have that. It?s actually a terrible practice and the above desire for purity comes from a tedious and clumsy experience that changing a variable (e.g. an URL or such) in 10+ job types involves an unbearable amount of mouse clicking through the Jenkins service webpage. Performing some level of debugging of the CI environment (like when setting up ROS stack which the project depends on) one is in for repetitive strain injury.

In essence, keeping knowledge about job types on the Jenkins server itself at a minimum and having it managed externally serves us well and is efficient. Another step forward would be managing everything (the entire job type definition) by Chef. We have yet to experiment with the already existing Chef community cookbooks for Jenkins.

The tests themselves are implemented in Python using pytest unit testing envelope. The test cases depend on Selenium - the web automation framework. Python drives the browser through Selenium according to testing scenarios, sometimes rather complex. The Selenium framework provides handles by which the browser is controlled - this includes user data input, clicking buttons, etc.

We use Selenium in two modes:
local mode: selenium drives a browser running on the Jenkins CI machine itself, locally. The browser runs in the Xvfb environment. In this case everything runs on the Jenkins master machine.
remote mode: the remote driver connects to a browser running on a remote machine (node A, B) and drives the browser there, as described in the diagram below. The test cases are run on the Jenkins slave machine located on a private network. The only difference between browser A and B is that they load their different respective Chrome extensions.

The usual unit testing assertions are made on the state or values of HTML elements in the web page.

Custom dashboard

Our Jenkins server runs builds of 10+ various job types. The builds of each type are executed periodically and the builds are also triggered by git pushes as well as by git pull requests. As a result, we get a significant number of builds on daily basis.

While Jenkins CI is extensible with very many plugins available out there, enabling and configuring a plugin gets cumbersome as the number of job types to configure rises. This is just to explain my personal aversion to experimenting with plugins on Jenkins for our project.

The Jenkins service webpage itself does not offer creating a simple aggregated view across a number of job types to allow for a simple, concise, single page view. Natively, there is just the single job type trends $JOB_URL/buildTimeTrend page (see below).

A view which immediately tells whether there is an infrastructure problem (such as loss of connectivity) or conveys straight away that everything passes on Jenkins,  seems to be missing. Such a view or feature is even more important in an environment suffering from occasional transient issues. Basically, we wanted a combination of JENKINS/Dashboard+View and JENKINS/Project+Statistics+Plugin, yet a lot simpler (see below).
So yes, we coded up our own wheel, circular just according to our liking and thus developed the jenkins-watcher application.


The application is freely available from this repository, deploys on the Google App Engine platform and so utilizes certain platform features like Datastore, Cron jobs, TaskQueue and Access Control. A single configuration file contains mainly Jenkins CI server access credentials and job type names we are interested in. The above repository merely provides a template of this (secret) config file. AngularJS is used on the frontend and a smashing Jenkins API Python library is used to communicate from Python to the Jenkins CI server through its REST API. See below the result view it provides, the screenshot is cropped to show only 5 job types and their builds within the last 24 hours:

Colour coding in green (passed), red (failed) and grey (aborted) shows a build status and is in fact just standard Jenkins colour coding. Each table row corresponds to 1 build of the build ID, build timestamp (start of the build), build duration, number of test cases which passed (P), failed (F), were skipped (S), or suffered from errors (E). The last item in the row is a direct link to the build console output, very handy for immediate inspection. In my experience, this is enough for a Jenkins babysitter?s swift daily checks. This is nothing fancy: no cool stats, graphs or plots. It is just a brief, useful overview.

The application also performs periodic checks and aborts builds which take too long (yes, a Jenkins plugin with this functionality exists as well).

For example, at a glance it?s obvious that the following failed builds suffer from some kind of transient infrastructure problems: no tests were run, nothing failed, the builds were marked as failure since some command in either their prep or build scripts failed:

Or let?s take a look at another situation proving how simple visualisation can sometimes be very useful and immediately hint-providing. We observed a test case, interestingly only on just one particular job type, which sometimes ended up with a ?Connection refused? error between the Selenium driver and the web browser (in the remote mode):

Only after seeing the failures visualized, the pattern struck us. We immediately got an idea that something is rotten in the state of Denmark shortly after midnight: from that point on, the previously mysterious issue boiled down to an erroneous cronjob command. The killall command was killing everything and not just what it was supposed to (bug filed here):

killall --older-than 2h -r chromedriver

Once we fixed the cronjob with a more complex but functional solution, without the killall command this time, so that the builds had not the chromedriver blanket pulled from under them while running, the mysterious error disappeared.

Summary, conclusion

Jenkins CI proved in general very useful for our Portal project. Keeping its configuration minimal and handling it externally worked most efficient. The custom jenkins-watcher application provides useful, aggregated, dashboard-like view. It is very easily configurable and not in any way dependent on the base project - take it for free, configure a bit and push as your own Google App Engine project. The visualisation can sometimes be a useful debugging tool.

published by Eugenia on 2015-06-07 20:59:47 in the "Entertainment" category
Eugenia Loli-Queru

Sense8 is a Netflix production, originally developed by the people behind “The Matrix” and “Babylon 5″. When I saw the trailer a few weeks ago, I was so stoked about it: DMT, oneness, spiritual and philosophical discussions… Are you kidding me? This would be so cool! But now that I’ve seen all 12 episodes, I’m not as stoked anymore. Here’s a list of what went wrong:

1. No mystery. While the show has 4 more seasons (if they don’t get cancelled) to explain more things, its mysteries aren’t holding together well. They could have gone instead for a full episode per sensate (8 + 4 more exhilarating episodes at the end glueing together the story). Let the sensates and the overall story unveil in a way that is more interesting (not slow, but in a more brainf*ck way), rather than laying out the stories block by block and only have a central mystery to solve at the end of the season (in this case, Whispers and the company behind him). LOST worked because it knew how to build anticipation and thrilling by twisting the way it informed the viewer about what is what. Sense8 doesn’t. Sense8 is very traditional in its story telling instead, no matter if it likes to think the opposite for itself.

2. Boring, cheesy drama. Especially the ones set in India & Mexico. Very little action (except in the last 1-2 episodes), which is not fit for sci-fi. There have been at least 5-7 extremely cheesy scenes in the season too. I cringed in a similar way I did for the Star Wars prequels for some reason.

3. Unneeded sex scenes. Even on Game on Thrones, sex usually acts as a plot threading or character development, rather than filling up time. On Sense8, it was just too much of it because we already seen the same lovers over and over again having sex (we get it, they have sex daily, good for them!). I loved the trans story, but the gay one had way too many cheesy scenes in it for me to take it seriously. It felt that the whole series were revolving around the trans & gay sex scenes, rather than these sex scenes being simply part of the story. For the record, I would complain just as much if it was as much hetero-sex from the same lovers over and over too. My complain is not the gay/trans sex, it’s just the fact that we see the SAME lovers doing the same thing all the time, which is something that doesn’t serve the story and the plot. The only time I felt that the sex scene was excellent AND very much needed by the plot (because it **explains** what sex can be for a sensate) was the sensate orgy scene (3 men + 1 trans woman). This scene needed to be there because it’s the only way we can understand the unlimitness of being a sensate. It was rightfully part of the plot! But seeing the other two same set of lovers making out on each and every episode, was unneeded, too much, and ended up being cheesy at the end.

4. Single-dimensional characters. This is mostly because of how the series was setup (everyone dividing their screen time with the others, not leaving much time for development).

5. The language. JMS explained on his Facebook page why they decided to use English as the language set in other countries (see: that’s how Hollywood does it traditionally), but this just doesn’t work today. If anything, it makes the series less interesting, less mysterious even. It levels the playing field in a way that takes realism away.

So my verdict is that this is another FlashForward (remember that show, from ABC?). Great ideas, bad implementation.

published by Eugenia on 2015-06-07 19:34:13 in the "General" category
Eugenia Loli-Queru

I wish some people stopped asking others to “not drink bottled spring water”. While bottling is indeed harmful for a variety of reasons, my health would be harmed even more if I was to drink that fluoridated, chlorinated, DISGUSTING tap water. So, no, I will not stop drinking spring water (bottled or otherwise), and I won’t stop buying wild-only fish (another such thing the same kind of people ask).

Farmed fish is in worse fate than farmed mammals are because it’s treated worse: it’s fed food that it didn’t evolved with at all, like soy and beef bone meal. At least the farmed mammals, while miserable in these nightmare farms, they eat food that resembles the food they evolved with.

So if you want me to drink tap water, the quality must become as high as spring water. And if you want me to eat farmed fish, then I need the nutritional composition, health of the fish, and feed, to be the same as in the wild. For example, farmed salmon is advised to be eaten *only* once a month, because even the government agrees that it’s a sick fish. While true wild salmon, it can be eaten daily, without any consequences (only benefits).

So, fix all that, and then we’ll talk. But under no circumstances I would put my own health into jeopardy just to be among the few who fight for environmental causes that never have any true impact. I have been very sick for 38 years, so now, well, now, it’s my turn to be healthy. Having spent 10 years of these 38 years in nightmare health situation, I now owe it to myself to get the best damn water/food I can. Even if it’s detrimental to the rest of the environment. At some point, being selfish only means self-preservation, and not necessarily arrogance.

So get off my face with your “bottled water is evil” shiz. I don’t care.

published by (Greg Sabino Mullane) on 2015-06-06 23:13:00 in the "mediawiki" category

Being able to create a quick copy of your MediaWiki site is an important skill that has many benefits. Any time you are testing an upgrade, whether major or minor, it is great to be able to perform the upgrade on a test site first. Tracking down bugs becomes a lot easier when you can add all the debugging statements you need and not worry about affecting any of the users of your wiki. Creating and modifying extensions also goes a lot smoother when you can work with an identical copy of your production wiki. I will outline the steps I use to create such a copy, also known as a "test wiki".

Before creating a copy, there are two things that should be done to an existing MediaWiki installation: use git, and move the images directory. By "use git", I mean to put your existing mediawiki directory (e.g. where your LocalSettings.php file lives) into version control. Because the MediaWiki software is not that large, it is simplest to just add nearly everything into git, with the exception of the images and the cache information. Here is a recipe to do just that:

$ cd /var/www/mediawiki
$ git init .
Initialized empty Git repository in /var/www/mediawiki/.git/
$ echo /cache/ >> .gitignore
$ echo /images/ >> .gitignore
$ git add --force .
$ git commit -a -m "Initial MediaWiki commit, version 1.24"
[master (root-commit) bd7db2b] Initial MediaWiki commit, version 1.24
 10024 files changed, 1910576 insertions(+)
 create mode 100644 .gitignore

Replace that commit message with your specific version, of course, or whatever you like, although I highly recommend your git commits always mention the version on major changes.

The second thing that should be done is to move the images directory and use a symlink to the new location. The "images" directory in MediaWiki is special in many ways. It is the only directory (except 'cache') directly written by MediaWiki (all other changes are stored in the database, not on disk). It is the only directory that comes pre-populated in the MediaWiki tarballs and is always a pain to upgrade. Finally, it invariably contains a large collection of static files that are not well suited for version control, and are usually better backed up and stored (better?) in ways different than the rest of MediaWiki. For all these reasons, I recommend making images into a symlink. The simplest recipe is to just move the images directory "up a level". This will also help us below when cloning the wiki.

$ cd /var/www/mediawiki
$ mv images ..
$ ln -s ../images .

Now that those two important prerequisites are out of the way, let's get a quick overview of the steps to create a clone of your wiki:

  • Make a backup of your existing wiki (files and database)
  • Make a copy of your database
  • Create a new directory, and copy the mediawiki files into it
  • Create a new git branch
  • Adjust the LocalSettings.php file
  • Mark it clearly as a test wiki
  • Do a git commit
  • Adjust your web server

The first step is to make a backup of your existing wiki. You can never have too many backups, and right before you go copying a lot of files is a great time to create one. Before backing up, make sure everything is up to date in git with "git status". Make a backup of the mediawiki directory, for example with tar, making sure the resulting backup file is well labeled:

$ tar cfz /backups/mediawiki.backup.20150601.tar.gz --exclude=mediawiki/cache --anchored mediawiki/

If your images directory is somewhere else, make sure you back that up as well. Backing up your database is dead simple if you are using Postgres:

$ pg_dump wikidb | gzip --fast > /backups/

The next step is to create a new copy of the database for your cloned wiki to access:

$ dropdb test_wikidb
$ createdb -T wikidb test_wikidb
$ psql test_wikidb -c 'alter database test_wikidb owner to wikiuser'

Now we want to create a new directory for the cloned wiki, and populate it with files from the production wiki. For this example, the existing production wiki lives in /var/www/mediawiki, and the new cloned test wiki will live in /var/www/test_mediawiki.

$ cd /var/www
$ mkdir test_mediawiki
$ rsync -a -W --exclude=/images/ mediawiki/ test_mediawiki/
## rsync will copy symlinks as well - such as the images directory!

I like to create a new git branch right away, to avoid any confusion with the "actual" git repository in the production mediawiki directory. If you do end up making any changes in the test directory, it's easy enough to roll them into the other git repo. Branch names should be short and clearly indicate why you have created this copy of the wiki. Doing this means the name shows up as the first line whenever you do a "git status", which is nice.

$ cd /var/www/test_mediawiki
$ git checkout -b testing_version_1.25.2
Switched to a new branch 'testing_version_1.25.2'

The next step is critical: editing the LocalSettings.php file! As this was copied from the production wiki, we need to make sure it points back to itself via paths, and that it connects to our newly created database. Add all these to the bottom of your test_mediawiki/LocalSettings.php file:

## Change important paths:
$wgArticlePath          = '/testwiki/$1';
$wgScriptPath           = '/test_mediawiki';
## Point to the correct database:
$wgDBname               = 'test_wikidb';
## The logo may be hardcoded, so:
$wgLogo                 = '/test_mediawiki/End_Point_logo.png';
## Disable all email notifications:
$wgUsersNotifiedOnAllChanges = array();

It's also a good idea to make this wiki read-only until needed. Also important if you symlinked the images directory is to disallow any uploads. If you need to enable uploads, and thus writes to the images directory, make sure you remove the symlink and create a new images directory! You can either copy all the files from the production wiki, or simply leave it empty and expect to see a lot of "missing file" errors, which can safely be ignored.

$wgReadOnly       = 'Test wiki: upgrading to MediaWiki 1.25.2';
$wgEnableUploads  = false;

The $wgReadOnly message will appear when people try to edit a page, but we want to make it very visible to all users so as soon as they see the wiki that "here be Danger" (and edits will be lost). To that end, there are four additional steps you can take. First, you can set a sitewide message. This will appear near the top of every page. You can add HTML to this, and it is set in your LocalSettings.php file as $wgSiteNotice. You can also change the $wgSiteName parameter, which will appear in the title of every page.

$wgSiteNotice  = '<strong>TEST WIKI ONLY!</strong>';
$wgSitename    = 'TEST WIKI';

The third additional step is to change the CSS of every page. I use this to slightly change the background color of every page. This requires that the $wgUseSiteCss setting is enabled. It is on by default, but there is no harm setting it to true explicitly. Getting it to work on all pages, including the login page, requires enabling $wgAllowSiteCSSOnRestrictedPages as well.

$wgUseSiteCss                     = true;
$wgAllowSiteCSSOnRestrictedPages  = true;

Once the above is done, navigate to MediaWiki:Common.css and add the text below. Note that you may need to wait until "making the wiki active" step below - and comment out the $wgReadOnly variable.

* { background-color: #ddeeff !important }

The last method to mark the wiki as test only is to change the wiki logo. You can replace it with a custom image, or you can modify the existing logo. I like the latter approach. Annotating text is easy from the command line by using ImageMagick. Use the "polaroid" feature to give it a nice effect (use "-polaroid 0" to avoid the neat little rotation). The command and the result:

$ convert End_Point.logo.png -caption "TEST WIKI ONLY!" -gravity center -polaroid 20 End_Point.tilted.testonly.png

At this point, all of the changes to the test wiki are complete, so we/you should commit all your changes:

$ git commit -a -m "Changes for the test wiki"

The final step is to make your test wiki active by adjusting your web server. Generally this is easy and basically means copying the existing wiki parameters. For Apache, it can be as simple as adding a new Alias directive to your http.conf file:

Alias /testwiki /var/www/test_mediawiki/index.php

Reload the web server, and Bob's your uncle. You now have a fully functional, safely sandboxed, magnificently marked-up copy of your production wiki. The above may seem like a lot of work, but this was an overly-detailed post - the actual work only takes around 10 minutes (or much less if you script it!)

published by (Josh Williams) on 2015-06-05 02:14:00 in the "benchmarks" category

Back in April, we published a benchmark report on a number of NoSQL databases including Cassandra MongoDB, HBase, and Couchbase. We endeavored to keep things fair and configured as identically as possible between the database engines. But a short while later, DataStax caught two incorrect configuration items, in Cassandra and HBase, and contacted us to verify the problem. Even with the effort we put in to keeping everything even, a couple erroneous parameters slipped through the cracks! I'll save the interesting technical details for another post coming soon, but once that was confirmed we jumped back in and started work on getting corrected results.

With the configuration fixed we re-ran a full suite of tests for both Cassandra and HBase. The updated results have published a revised report that you can download in PDF format from the DataStax website (or see the overview link).

The revised results still show Cassandra leading MongoDB, HBase, and Couchbase in the various YCSB tests.

For clarity the paper also includes a few additional configuration details that weren't in the original report. We regret any confusion caused by the prior report, and worked as quickly as possible to correct the data. Feel free to get in contact if you have any questions.

published by Eugenia on 2015-06-04 15:16:56 in the "General" category
Eugenia Loli-Queru

Some say that smoothies aren’t Paleo, but I don’t agree with that sentiment. I would instead say that not ALL smoothies are Paleo, for example, the ones with just exotic fruits in them. I would also argue that it’s all fruit juices that aren’t Paleo, because they lack the fiber that stops fructose from running havoc in the body. But smoothies retain all fiber!

Yes, smoothies have more carbs than most Paleo meals (usually up to 30 gr net carbs, in the versions I make them as), but if you eat 80-100 gr net carbs per day overall (as I do — I will never go Paleo-keto again, I did that mistake once and I lost my hair), this fits perfectly into that diet regiment. Heck, I’m still low carb!

Smoothies are important in my diet for other reasons too: I get to add some powders that are not palatable otherwise (e.g. exotic berries that I can’t find in my grocery store, added fiber via psyllium husk, home-made goat kefir, goat whey, ginger & turmeric for extra health, ceylon cinnamon for extra blood glucose control), but most importantly, I pack my smoothies with GREENS, and often, other veggies too (e.g. raw carrots, beets). So basically, I get to eat more raw veggies this way!

If some fat is required on breakfast for satient/leptin/cortisol reasons, one can add a tablespoon of nut or seed butter in it too (except peanut butter, which is a legume and not a nut)! And sure, have an egg on the side too (why only have a smoothie?).

Having said all that, I would argue against the “Paleoification” of baked goods. These are not Paleo, even if they might be using coconut/almond flour and honey (which are Paleo ingredients on their own). When these nuts are flour-ed they become acellular (which is not so good to consume them often), when they’re baked they oxidize, and finally, honey loses ALL its medicinal properties when heated. So, for desserts, I would suggest people make RAW desserts (with the exception of adding some warm-ish grass-fed gelatin if the recipe asks for it), and even then, only ONCE a week, as a treat. Let green smoothies, or plain fruit, be your daily dessert otherwise.

published by Eugenia on 2015-06-03 23:38:14 in the "General" category
Eugenia Loli-Queru

To make an educated guess or decision, you need to first have the educational part checked out. Unfortunately, for most matters, people don’t have the time or the interest to get “educated”, so they end up making bad decisions.

This is true for the subject of nutrition too. No matter the amount of “alternative blogging” and instagraming about how healthy Paleo is, the majority of people won’t follow it unless the government tells them so clearly.

This has worked with smoking. A lot of measures were taken against smoking in the last 40 years, however, one measure that is always omnipresent is that message of “Tobacco severely damages health”, on each and every cigarette product. It has worked for most!

What if we had something similar for all packaged food?

What if, we had an indicator score about the nutrition and anti-nutrition present in the said packaged food? For example, given that we know that wheat bread has many phytates, lectins, and other antinutrients going on, along with its capacity to bind into certain nutrients and neutralize them, it could get a score of, let’s say, 30 out of 100.

But kale or blueberries, having no major antinutrients to speak about, but instead having many nutrients, antioxidants, polyphenols etc etc, would get a score of, let’s say, 80 out of 100.

Yes, this would require re-sending all these products onto labs for full measurement on much more than just the 4 basic vitamins found on each label today (CoQ10, PQQ, iodine etc anyone?), but I’m confident that such a nutritional index could have such an impact.

I’m an artist, and it’s my job to know what makes people tick visually. Such a nutritional index could have huge implications in the decision of an individual to buy or not candy, donuts, bread, or cakes. When they can QUANTIFY how bad they’re doing diet-wise, then they have to ACT.

But right now, things are too abstract for them: eat this but not that, eat vegan not paleo, eat paleo not vegan, etc etc. There’s too much information flying from all over the place, so much, that most people simply prefer to shut out their ears and just ignore the whole thing.

But when they see an easy to understand number (without even knowing all the details behind it), clearly labeling the nutritional value of a product, they won’t be able to ignore it anymore.

published by Eugenia on 2015-06-03 14:44:39 in the "General" category
Eugenia Loli-Queru

What do you know? I’m actually a Pegan (or as I call it, the Chris Kresser’s version of Paleo: Paleo +fermented dairy +some specific soaked beans +rarely some rice). Basically, in my updated Paleo diet regiment, there’s little red meat, due to it being loaded with sialic acid (which creates inflammation). As long someone doesn’t have Neu5gc antibodies or has Hashimoto’s, eating medium amounts of mammalian meat is not hurtful. But if you’re inflamed for no good reason (as I am), then taking out or reducing mammalian meat is probably the logical thing to do (poultry and seafood don’t exhibit much Neu5gc).

So here’s how I’ve decided a few weeks ago to go about it:
– 1 meal a week (probably Sunday lunch): mammalian pastured meat/offal.
– 1 meal a week (Wednesday dinner): organic poultry (I’d eat it more if it wasn’t so loaded with omega-6 here in the US).
– 2 meals a week (dinners): Wild fish.
– 2 meals a week (dinners): Shellfish (farmed ok).
– 1 day a week (3 meals on Monday): totally Vegan (detox).
– For the rest 12 meals in the week, I’m Vegetarian.

Plus, even just up to 40 years ago, my ancestors didn’t used to eat too much meat (they’d eat red meat 4-5 times a year only, poultry once a month, some fish from the nearby river occasionally), so I think this is what makes sense for me. My grandma lived such a life, and the first 20 years of my parents were as such too (even if they were goat & sheep herders!). Taking into account that I’m coming from steep, mountainous terrain that life hasn’t evolved much for thousands of years prior to 1970s (when electricity finally came about), it’s safe to assume that most of my ancestors ate that way too (soaked beans were a staple). So I believe it’s detrimental to my health to gorge on meat so much.

published by (Greg Sabino Mullane) on 2015-05-28 16:43:00 in the "nmap" category

The wonderful tail_n_mail program continues to provide me with new mysteries from our Postgres clients. One of the main functions it provides is to send an immediate email to us when an unexpected FATAL (or ERROR or PANIC) message appears in the Postgres logs. While these are often simple application errors, or deeper problems such as running out of disk space, once in a blue moon you see something completely unexpected. Some time ago, I saw a bunch of these messages appear in an email from a tail_n_mail email:

[1] From files A to B Count: 2
First: [A] 2015-12-01T06:30:00 server1 postgres[1948]
Last:  [B] 2015-12-01T06:30:00 server2 postgres[29107]
FATAL: unsupported frontend protocol 65363.19778: server supports 1.0 to 3.0

I knew what caused this error in general, but decided to get to the bottom of the problem. Before we go into the specific error, let's review what causes this particular message to appear. When a Postgres client (such as psql or DBD::Pg) connects to Postgres, the first thing it does is to issue a startup message. One of the things included in this request is the version of the Postgres protocol the client wishes to use. Since 2003, Postgres servers have been using version 3.1. It is very rare to see a client or server that uses anything else. Because this protocol number request occurs at the very start of the connection request, non-Postgres programs often trigger this error, because the server is expecting a number at the start of the request.

We can verify this by use of a small Perl script that connects to the server, and sends an invalid protocol request:

#!/usr/bin/env perl

use strict;
use warnings;
use IO::Socket;

my $server = IO::Socket::UNIX->new('/tmp/.s.PGSQL.5432')
  or die "Could not connect!: $@";

my $packet = pack('nn', 1234,56789) . "userpg";
$packet = pack('N', length($packet) + 4). $packet;
$server->send($packet, 0);

After running the above program, a new error pops up in the Postgres logs as expected:

$ tail -1 /var/lib/pgsql/data/pg_log/postgres-2015-05-20.log
2015-05-21 12:00:00 EDT [unknown]@[unknown] [10281] FATAL:  unsupported frontend protocol 1234.56789: server supports 1.0 to 3.0

There is our error, as expected. The "unknown"s are because my log_line_prefix looks like this: %t %u@%d [%p] . While the timestamp (%t) and the process ID (%p) are easily filled in, the login failed, so both the user (%u) and database (%d) are still unknown.

Now on to our specific error, which you will recall is "unsupported frontend protocol 65363.19778". The above program shows that the protocol number is sent in a specific format. Let's use Perl to display the numbers 65363.19778 and see if there are any clues buried within it:

$ perl -e 'print pack "nn", 65363,19778'

Some sort of unprintable character in there; let's take a deeper look just for completeness:

$ perl -e 'print pack "nn", 65363,19778' | hd
00000000  ff 53 4d 42                                       |.SMB|

Aha! SMB is not just a random placement of three letters, it is a big clue as to what is causing this message. SMB stands for Server Message Block, and is used by a variety of things. We can guess that this is either some program randomly hitting the Postgres port without realizing what it is, or some sort of purposeful port scanner. Why would something want to connect to the port but not log in? For one, you can determine the version of Postgres without logging in.

To cut to the chase, the culprit is the nmap program. In addition to simply scanning ports, it has the ability to do a deeper inspection to determine not only what is running on each port, but what version it is as well (with the "-sV" argument). Let's see nmap in action. We will use a non-standard Postgres port so as not to give it any additional hints about what is on that port:

$ nmap -p 5930 localhost -sV
Starting Nmap 6.40 ( ) at 2015-05-20 12:00 EDT
Nmap scan report for localhost (
Host is up (0.000088s latency).
5930/tcp open  postgresql PostgreSQL DB
1 service unrecognized despite returning data. If you know the service/version, please submit the following fingerprint at :

Service detection performed. Please report any incorrect results at .
Nmap done: 1 IP address (1 host up) scanned in 6.73 seconds

It looks like it triggered the "unsupported protocol" message, based on what was returned. Taking a peek at the Postgres 9.3 logs shows our mystery message:

$ tail -1 /var/lib/pgsql/pg9.3/pg_log/postgres-2015-05-20.log
2015-05-21 12:00:00 EDT [unknown]@[unknown] [2318] FATAL:  unsupported frontend protocol 65363.19778: server supports 1.0 to 3.0

As a final check, let's confirm that nmap is using SMB when it runs the version check:

$ nmap localhost -p 5930 -sV --version-trace 2>/dev/null | grep SMB
Service scan sending probe SMBProgNeg to (tcp)
Service scan match (Probe SMBProgNeg matched with SMBProgNeg line 10662): is postgresql.  Version: |PostgreSQL DB|||

Bingo. Mystery solved. If you see that error in your logs, it is most likely caused by someone running nmap in version detection mode.