All opinions expressed are those of the authors and not necessarily those of, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by (Neil Elliott) on 2015-04-20 19:58:00 in the "linux" category

As a high-performance video rendering appliance, the Liquid Galaxy requires really good video cards -- better than your typical on-board integrated video cards. Despite ongoing attempts by competitors to displace them, Nvidia remains the best choice for high-end video, if you use the proprietary Nvidia driver for Linux.

In addition to providing regular security and system updates, End Point typically provides advanced remote monitoring of our customers' systems for issues such as unanticipated application behavior, driver issues, and hardware errors. One particularly persistent issue presents as an error with an Nvidia kernel module.  Unfortunately, relying on proprietary Nvidia drivers so as to maintain an acceptable performance level limits the available diagnostic information and options for resolution.

The issue presents when the system ceases all video output functions as Xorg crashes. The kernel log contains the following error message:

2015-04-14T19:59:00.000083+00:00 lg2 kernel: [  719.850677] NVRM: Xid (0000:01:00): 32, Channel ID 00000003 intr 02000000

The message is repeated approximately 11000 times every second until the disk fills and the ability to log in to the system is lost. The only known resolution at this time is to power-cycle the affected machine. In the error state, the module cannot be removed from the kernel, which also prevents Linux from shutting down properly. All affected systems were running some version of Ubuntu x86-64. The issue seems to be independent of driver version, but is at least present in 343.36 and 340.65, and affects all Geforce cards. Quadro cards seem unaffected.

The Xid message in the kernel log contains an error code that provides a little more information. The Nvidia docs list the error as "Invalid or corrupted push buffer stream". Possible causes listed include driver error, system memory corruption, bus error, thermal error, or frame buffer error. All affected systems were equipped with ECC RAM and were within normal operating temperature range when the issue presented.

Dealing with bugs like these can be arduous, but until they can be fixed, we cope by monitoring and responding to problems as quickly as possible.

published by (Brian Gadoury) on 2015-04-17 23:00:00 in the "culture" category

A conversation with a co-worker today about the value of improving one's professional skills reminded me of Joe Mastey's talk he gave at the 2015 Mountain West Ruby Conference. That then reminded me that I had never finished my write up on that conference. Blogger won't let me install harp music and an animated soft focus flashback overlay, so please just imagine it's the day after the conference when you're reading this. "That reminds me of the time..."

I've just finished my second MWRC and I have to give this one the same 5-star rating I gave last year's. There were a few small sound glitches here and there, but overall the conference is well-run, inclusive, and packed with great speakers and interesting topics. Rather than summarizing each talk, I want to dig into the one most relevant to my interests. "Building a Culture of Learning" by Joe Mastey

I was excited to catch Joe's talk because learning and teaching have always been very interesting to me, regardless of the particular discipline. I find it incredibly satisfying to improve upon my own learning skills, as well as improving my teaching skills by teasing out how different individuals learn best and then speak to that. There's magic in that one on one interaction when everything comes together just right. I just really dig that.

Joe's work as the Manager of Internal Learning at Enova has gone way beyond the subtleties of the one-on-one. He's taken the not-so-simple acts of learning and training, and scaled them up in an environment that does not sound, on paper, like it would support it. He's created a culture of learning ("oh hey they just said the title of the movie in the movie!") in a financial company that's federally regulated, saw huge growth due to an IPO, and had very real business-driven deadlines for shipping their software.

Joe broke his adventure down into three general phases after refreshingly admitting that "YMMV" and that you can't ignore the existing corporate culture when trying to build a culture of learning within.

Phase 1 - Building Credibility

I would hazard a guess that most software development shops are perpetually at Phase 1: Learning is mostly ad-hoc by way of picking things up from one's daily work; and has little to no people pushing for more formal training. People probably agree that training is important, but the mandate has not come down from the CTO, and there's "no time for training" because there's so much work to do.

How did Joe help his company evolve past Phase 1? Well, he did a lot of things that I think many devs would be happy to just get one or two of at the their company. My two favorites from his list probably appeal to polar opposite personality types, but that's part of why I like them.

My first favorite is that books are all-you-can-eat. If a developer asks Joe for a tech book, he'll say yes, and he'll buy a bunch of extra copies for the office. I like having a paper book to read through to get up to speed on a topic, ideally away from my desk and the computer screen. I've also found that for some technologies, the right book can be faster and less frustrating than potentially spotty documentation online.

My second favorite is how Joe implemented "new hire buddies." Each new hire is teamed up with an experienced dev from a different team. Having a specific person to talk to, and get their perspective on company culture, can really help people integrate into the culture much more quickly. When I joined End Point in 2010, I worked through our new hire "boot camp" training like all new hires. I then had the occasionally-maddening honor of working directly with one of the more senior evil super-geniuses at End Point on a large project that I spent 100% of my time on. He became my de facto new hire buddy and I could tell that despite the disparity in our experience levels, being relatively joined at the hip with someone like that improved my ramp-up and cultural integration time greatly.

Phase 2 - Expand Reach and Create Impact

If my initial guess about Phase 1 is correct, it follows that that dev shops in Phase 2 are more rare: More people are learning more, more people are driving that learning, but it's still mostly focused on new hires and the onboarding process.

Phase 2 is where Joe's successful efforts are a little more intimidating to me, especially given my slightly introverted nature. The efforts here scale up and get more people speaking publically, both internally and externally. It starts with a more formal onboarding process, and grows to things like weekly tech talks and half day internal workshops. Here is where I start to make my "yeah, but?" face. We all have it. It's the face you make when someone says something you don't think can work, and you start formulating your rebuttal immediately. E.g. "Yeah, but how do you get management and internal clients to be OK with ?shutting down development for half a day' for training?" Joe does mention the danger of being perceived as "wasting too much time." You'll want to be sure you get ahead of that and communicate the value of what you're spending "all that dev time on."

Phase 3 - Shift The Culture

It would be interesting to know how many shops are truly in Phase 3 because it sounds pretty intense: Learning is considered part of everyone's job, the successes from the first two phases help push the culture of learning to think and act bigger, the acts of learning and training others are part of job descriptions, and things like FOSS contributions and that golden unicorn of "20% personal project time" actually happen on company time. Joe describes the dangers or downsides to Phase 3 in a bit of a "with great power comes great responsibility" way. I've never personally worked somewhere that's in Phase 3, but it make sense that the increased upside has increased (potential) downside.

At End Point, we have some elements of all three phases, but we're always looking to improve. Joe's talk at MWRC 2015 has inspired me to work on expanding our own culture of learning. I think his talk is also going to serve as a pretty good road-map on how to get to the next phase.

published by (Steph Skardal) on 2015-04-17 13:37:00 in the "Conference" category

Next week, I'm headed to my 6th RailsConf 2015 in Atlanta, with the whole family in tow:

The gang. Note: Dogs will not be attending conference.

This will be a new experience since my husband will be juggling two kids while I attend the daily sessions. So it makes sense going into the conference fairly organized to aid in the kid juggling, right? So I've picked out a few sessions that I'm looking forward to attending. Here they are:

RailsConf is a multi-track conference, with tracks including Distributed Systems, Culture, Growing Talent, Testing, APIs, Front End, Crafting Code, JavaScript, and Data & Analytics. There are also Beginner and Lab tracks, which might be suitable to those looking for a learning & training oriented experience. As you might be able to tell, the sessions I'm interested in cover a mix of performance, open source, and front-end dev. As I've become a more experienced Rails developer, RailsConf has been more about seeing what's going on in the Rails community and what the future holds, and less about the technical nitty-gritty or training sessions.

Stay tuned for a handful of blog posts from the conference!

published by Eugenia on 2015-04-16 05:20:12 in the "Filmmaking" category
Eugenia Loli-Queru

My mini review of the Cinema FV-5 Android application, tested on the OnePlus One phone. The app is able to do all the basic things needed to get a good quality, flat result out of your phone’s camera. It supports: exposure set & lock, WB set & lock, manual focus and lock, low contrast/saturation/sharpness (“flat” colors), 24p support among other frame rates, and up to 40 mbps bitrate. On my phone, it supports up to DCI 4k resolution (the free version of Cinema FV-5 does all the above too, but it goes only up to 720p — which can be enough for most cases). In some phones (particularly if they have Lollipop), the app also supports manual shutter speed and ISO settings. If you couple your phone with a variable ND filter for phones (to control the camera better outdoors, where they tend to overexpose), and a small steadicam, you could have a winner.

It worked fine on my OnePlus One, but on the 1st Gen Moto G with Android 4.4.4 it had problems: no focus, no recording (it only recorded once, and then it refused to do so again). It’s very possible that these problems will go away with a Lollipop upgrade, because Lollipop has a much better camera API.

published by (Selvakumar Arumugam) on 2015-04-15 20:23:00 in the "community" category

The 6th edition of RubyConf India 2015 was held at Goa (in my opinion, one of the most amazing places in India). The talks were spread over various topics, mainly related to Ruby generally and RoR.

Aaron Patterson (a core member of Ruby and Rails team) gave a very interesting talk about Pair Programming, benchmarking on Integration tests vs Controller tests and precompiling the view to increase the speed in Rails 5.

Christophe Philemotte presented a wonderful topic on "Diving in the unknown depths of a project" with his experience of contributing to the Rails project. He mentioned that 85% of a developer?s time is spent on reading the code and 15% of the time is spent on writing the code. So he explained a work process plan to make use of the developer?s time effectively which should adopt well to any kind of development. Here is the list of steps he explained:

  1. Goal (ex: bug fixing, implement new feature, etc? )
  2. Map (ex: code repository, documentation, readme, etc?)
  3. Equipment (ex: Editor, IDE) and Dive (read, write, run and use)
  4. Next Task

Rajeev from ThoughtWorks talked about "Imperative vs Functional programming" and interesting concepts in Haskell which can be implemented in Ruby, such as function composition, lazy evaluation, thunks, higher order functions, currying, pure functions and immutable objects.

Aakash from C42 Engineering talked about an interesting piece of future web components called "Shadow DOM" which has interoperability, encapsulation and portability features built-in. He also mentioned polymer as a project to develop custom Shadow DOM.

Vipul and Prathamesh from BigBinary showed an experimental project of "Building an ORM with AReL" which is Torm (Tiny Object Relation Mapping) to gain more control over the ORM process in Rails.

Smit Shah from Flipkart gave a talk on "Resilient by Design" which follows some design patterns like 1. Bounding - change the default timeouts 2. Circuit breakers 3. Fail Fast

Christopher Rigor shared some awesome information about "Cryptography for Rails Developers". He explained some concepts like Public Key cryptography, symmetric cryptography, and SSL/TLS versions. He recommended we all use TLS 1.2 and AES-GCM on production to keep the application more secure.

Eleanor McHugh, a british hacker gave an important talk on "Privacy is always a requirement". The gist of the talk is to keep security tight by encrypt all transports, encrypt all passwords, provide two-factor authentication, encrypt all storage and anchor trust internally. The data won't by safe by privacy or trust or contract in the broken internet world.

Laurent Sansonetti who works for RubyMotion gave a demo of a Floppy bird game which he created using RubyMotion with code walkthrough. RubyMotion is used to develop native mobile applications for iOS, OS X and Android platforms. It provides features to use Objective-C API and Android API, and the whole build process is Rake-based.

Shadab Ahmed gave a wonderful demo on the 'aggrobot' gem, used to perform easy aggregations. aggrobot runs on top of ActiveRecord as well as working directly with database to provide good performance on query results.

Founder and CEO of CodeClimate Bryan Helmkamp spoke about "Rigorous Deployment" using a few wonderful tools. The ?rollout? gem is extensively helpful to deploy to specific users, specific branch deployment, etc? So there is no need of staging environment and it helps to avoid things like 'it works in staging, not production'

Erik Michaels-Ober gave an awesome talk on "Writing Fast Ruby" (must-visit slides) with information about how to tweak regular code to improve the performance of the application. He also presented the Benchmark/IPS results of two versions of code with working logic and execution time. He mentioned as a rule-of-thumb that any performance improvement changes should require at least 12% improvement compare to current code.

The final presenter Terence Lee from the Ruby Security Team gave a talk on "Ruby & You" which summarised all the talks and gave information on the Ruby Security Team and its contribution to Ruby community. He suggested everyone to keep their Ruby version up-to-date so to get the latest security patches and avoid vulnerabilities. He also encouraged the audience to submit bug reports to the official Ruby ticketing system because, quoting him, "Twitter is not bug tracker".

It was really fascinating to interact with like minded people and I was very happy to leave the conference with many new interesting ideas and input about Ruby and RoR along with some new techie friends.

published by (Jon Jensen) on 2015-04-13 23:44:00 in the "benchmarks" category

Today we are pleased to announce the results of a new NoSQL benchmark we did to compare scale-out performance of Apache Cassandra, MongoDB, Apache HBase, and Couchbase. This represents work done over 8 months by Josh Williams, and was commissioned by DataStax as an update to a similar 3-way NoSQL benchmark we did two years ago.

The database versions we used were Cassandra 2.1.0, Couchbase 3.0, MongoDB 3.0 (with the Wired Tiger storage engine), and HBase 0.98. We used YCSB (the Yahoo! Cloud Serving Benchmark) to generate the client traffic and measure throughput and latency as we scaled each database server cluster from 1 to 32 nodes. We ran a variety of benchmark tests that included load, insert heavy, read intensive, analytic, and other typical transactional workloads.

We avoided using small datasets that fit in RAM, and included single-node deployments only for the sake of comparison, since those scenarios do not exercise the scalability features expected from NoSQL databases. We performed the benchmark on Amazon Web Services (AWS) EC2 instances, with each test being performed three separate times on three different days to avoid unreproduceably anomalies. We used new EC2 instances for each test run to further reduce the impact of any ?lame instance? or ?noisy neighbor? effect on any one test.

Which database won? It was pretty overwhelmingly Cassandra. One graph serves well as an example. This is the throughput comparison in the Balanced Read/Write Mix:

Our full report, Benchmarking Top NoSQL Databases, contains full details about the configurations, and provides this and other graphs of performance at various node counts. It also provides everything needed for others to perform the same tests and verify in their own environments. But beware: Your AWS bill will grow pretty quickly when testing large numbers of server nodes using EC2 i2.xlarge instances as we did!

(Earlier this morning we also sent out a press release to announce our results and the availability of the report.)

published by (Jon Jensen) on 2015-04-08 23:27:00 in the "devtools" category

Git?s birthday was yesterday. It is now 10 years old! Happy birthday, Git!

Git was born on 7 April 2005, as its creator Linus Torvalds recounted in a 2007 mailing list post. At least if we consider the achievement of self-hosting to be ?birth? for software like this. :)

Birthdays are really arbitrary moments in time, but they give us a reason to pause and reflect back. Why is Git a big deal?

Even if Git were still relatively obscure, for any serious software project to survive a decade and still be useful and maintained is an accomplishment. But Git is not just surviving.

Over the past 5?6 years, Git has become the standard version control system in the free software / open source world, and more recently, it is becoming the default version control system everywhere, including in the proprietary software world. It is amazing to consider how fast it has overtaken the older systems, and won out against competing newer systems too. It is not unreasonable these days to expect anyone who does software development, and especially anyone who claims to be familiar with version control systems, to be comfortable with Git.

So how did I get to be friends with Git, and end up at this birthday celebration?

After experimenting with Git and other distributed version control systems for a while in early and mid-2007, I started using Git for real work in July 2007. That is the earliest commit date in one my personal Git repositories (which was converted from an earlier CVS repository I started in 2000). Within a few weeks I was in love with Git. It was so obviously vastly superior to CVS and Subversion that I had mostly used before. It offered so much more power, control, and flexibility. The learning curve was real but tractable and it was so much easier to prevent or repair mistakes that I didn?t mind the retraining at all.

So I?m sounding like a fanboy. What was so much better?

First, the design. A distributed system where all commits were objects with a SHA-1 hash to identify them and the parent commit(s). Locally editable history. Piecemeal committing thanks to the staging power of the Git index. Cheap and quick branching. Better merging. A commit log that was really useful. Implicit rename tracking. Easy tagging and commit naming. And nothing missing from other systems that I needed.

Next, the implementation. Trivial setup, with no political and system administrative fuss for client or server. No messing with users and permissions and committer identities, just name & email address like we?re all used to. An efficient wire protocol. Simple ssh transport for pushes and/or pulls of remote repositories, if needed. A single .git directory at the repository root, rather than RCS, CVS, or .svn directories scattered throughout the checkout. A simple .git/config configuration file. And speed, so much speed, even for very large repositories with lots of binary blobs.

The speed is worth talking about more.

The speed of Git mattered, and was more than just a bonus. It proved true once again the adage that a big enough quantitative difference becomes a qualitative difference. Some people believed that speed of operations wasn?t all that important, but once you are able to complete your version control tasks so quickly that they?re not at all bothersome, it changes the way you work.

The ease of setting up an in-place repository on a whim, without worrying about where or if a central repository would ever be made, let alone wasting any time with access control, is a huge benefit. I used to administer CVS and Subversion repositories and life is so much better with the diminished role a ?Git repository administrator? plays now.

Cheap topic branches for little experiments are easy. Committing every little thing separately makes sense if I can later reorder or combine or split my commits, and craft a sane commit before pushing it out where anyone else sees it.

Git subsumed everything else for us.

RCS, despite its major limitations, stuck around because CVS and Subversion didn?t do the nice quick in-place versioning that RCS does. That kind of workflow is so useful for a system administrator or ad-hoc local development work. But RCS has an ugly implementation and is based on changing single files, not sets. It can?t be promoted to real distributed version control later if needed. At End Point we used to use CVS, Subversion, and SVK (a distributed system built on top of Subversion), and also RCS for those cases where it still proved useful. Git replaced them all.

Distributed is better. Even for those who mostly use Git working against a central repository.

The RCS use case was a special limited subset of the bigger topic of distributed version control, which many people resisted and thought was overkill, or out of control, or whatever. But it is essential, and was key to fixing a lot of the problems of CVS and Subversion. Getting over the mental block of Git not having a single sequential integer revision number for each commit as Subversion did was hard for some people, but it forces us to confront the new reality: Commits are their own objects with an independent identity in a distributed world.

When I started using Git, Mercurial and Bazaar were the strongest distributed version control competitors. They were roughly feature-equivalent, and were solid contenders. But they were never as fast or compact on disk, and didn?t have Git?s index, cheap branching, stashing, or so many other niceties.

Then there is the ecosystem.

GitHub arrived on the scene as an unnecessary appendage at first, but its ease of use and popularity, and social coding encouragement, quickly made it an essential part of the Git community. It turned the occasional propagandistic accusation that Git was antisocial and would encourage project forks, into a virtue, by calling everyone?s clone a ?fork?.

Over time GitHub has played a major role in making Git as popular as it is. Bypassing the need to set up any server software at all to get a central Git repository going removed a hurdle for many people. GitHub is a centralized service that can go down, but that is no serious risk in a distributed system where you generally have full repository mirrors all over the place, and can switch to other hosting any time if needed.

I realize I?m gushing praise embarrassingly at this point. I find it is warranted, based on my nearly 8 years of using Git and with good familiarity with the alternatives, old and new.

Thanks, Linus, for Git. Thanks, Junio C Hamano, who has maintained the Git open source project since early on.

Presumably someday something better will come along. Until then let?s enjoy this rare period of calm where there is an obvious winner to a common technology question and thus no needless debate before work can begin. And the current tool stability means we don?t have to learn new version control skills every few years.

To commemorate this 10-year birthday, interviewed Linus and it is a worthwhile read.

You may also be interested to read more in the Wikipedia article on Git or the Git wiki?s history of Git.

Finally, an author named Stephen has written an article called The case for Git in 2015, which revisits the question of which version control system to use as if it were not yet a settled question. It has many good reminders of why Git has earned its prominent position.

published by (David Christensen) on 2015-04-06 22:10:00 in the "Conference" category
I recently just got back from PGConf 2015 NYC.  It was an invigorating, fun experience, both attending and speaking at the conference.

What follows is a brief summary of some of the talks I saw, as well as some insights/thoughts:

On Thursday:

"Managing PostgreSQL with Puppet" by Chris Everest.  This talk covered experiences by staff in deploying PostgreSQL instances and integrating with custom Puppet recipes.

"A TARDIS for your ORM - application level timetravel in PostgreSQL" by Magnus Hagander. Demonstrated how to construct a mirror schema of an existing database and manage (via triggers) a view of how data existed at some specific point in time.  This system utilized range types with exclusion constraints, views, and session variables to generate a similar-structured schema to be consumed by an existing ORM application.

"Building a 'Database of Things' with Foreign Data Wrappers" by Rick Otten.  This was a live demonstration of building a custom foreign data wrapper to control such attributes as hue, brightness, and on/off state of Philips Hue bulbs.  Very interesting live demo, nice audience response to the control systems.  Used a python framework to stub out the interface with the foreign data wrapper and integrate fully.

"Advanced use of pg_stat_statements: Filtering, Regression Testing & More" by Lukas Fittl.  Covered how to use the pg_stat_statements extension to normalize queries and locate common performance statistics for the same query.  This talk also covered the pg_query tool/library, a Ruby tool to parse/analyze queries offline and generate a JSON object representing the query.  The talk also covered the example of using a test database and the pg_stat_statements views/data to perform query analysis to theorize about planning of specific queries without particular database indexes, etc.

On Friday:

"Webscale's dead! Long live Postgres!" by Joshua Drake.  This talk covered improvements that PostgreSQL has made over the years, specific technologies that they have incorporated such as JSON, and was a general cheerleading effort about just how awesome PostgreSQL is.  (Which of course we all knew already.)  The highlight of the talk for me was when JD handed out "prizes" at the end for knowing various factoids; I ended up winning a bottle of Macallan 15 for knowing the name of the recent departing member of One Direction.  (Hey, I have daughters, back off!)

"The Elephants In The Room: Limitations of the PostgreSQL Core Technology" by Robert Haas.  This was probably the most popular talk that I attended.  Robert is one of the major developers of the PostgreSQL team, and is heavily knowledgeable in the PostgreSQL internals, so his opinions of the existing weaknesses carry some weight.  This was an interesting look forward at possible future improvements and directions the PostgreSQL project may take.  In particular, Robert looked at the IO approach Postgres currently take and posits a Direct IO idea to give Postgres more direct control over its own IO scheduling, etc.  He also mentioned the on-disk format being somewhat suboptimal, Logical Replication as an area needing improvement, infrastructure needed for Horizontal Scalability and Parallel Query, and integrating Connection Pooling into the core Postgres product.

"PostgreSQL Performance Presentation (9.5devel edition)" by Simon Riggs.  This talked about some of the improvements in the 9.5 HEAD; in particular looking at the BRIN index type, an improvement in some cases over the standard btree index method.  Additional metrics were shown and tested as well, which demonstrated Postgres 9.5's additional performance improvements over the current version.

"Choosing a Logical Replication System" by David Christensen.  As the presenter of this talk, I was also naturally required to attend as well.  This talk covered some of the existing logical replication systems including Slony and Bucardo, and broke down situations where each has strengths.

"The future of PostgreSQL Multi-Master Replication" by Andres Freund.  This talk primarily covered the upcoming BDR system, as well as the specific infrastructure changes in PostgreSQL needed to support these features, such as logical log streaming.  It also looked at the performance characteristics of this system.  The talk also wins for the most quote-able line of the conference:  "BDR is spooning Postgres, not forking", referring to the BDR project's commitment to maintaining the code in conjunction with core Postgres and gradually incorporating this into core.

As part of the closing ceremony, there were lightning talks as well; quick-paced talks (maximum of 5 minutes) which covered a variety of interesting, fun and sometimes silly topics.  In particular some memorable ones were one about using Postgres/PostGIS to extract data about various ice cream-related check-ins on Foursquare, as well as one which proposed a generic (albeit impractical) way to search across all text fields in a database of unknown schema to find instances of key data.

As always, it was good to participate in the PostgreSQL community, and look forward to seeing participants again at future conferences.

published by (Szymon Guz) on 2015-04-03 08:31:00 in the "python" category

Some time ago I was working on a simple Python script. What the script did is not very important for this article. What is important, is the way it parsed arguments, and the way I managed to improve it.

All below examples look similar to that script, however I cut most of the code, and changed the sensitive information, which I cannot publish.

The main ideas for the options management are:

  • The script reads all config values from a config file, which is a simple ini file.
  • The script values can be overwritten by the command line values.
  • There are special command line arguments, which don't exist in the config file like:
    • --help - shows help in command line
    • --create-config - creates a new config file with default values
    • --config - the path to the config file which should be used
  • If there is no value for a setting in the config file, and in the command line arguments, then a default value should be taken.
  • The option names in the configuration file, and the command line, must be the same. If there is repo-branch in the ini file, then there must be --repo-branch in the command line. However the variable where it will be stored in Python will be named repo_branch, as we cannot use - in the variable name.

The Basic Implementation

The basic config file is:

repo-branch = another

The basic implementation was:

#!/usr/bin/env python

import sys
import argparse
import ConfigParser

import logging
logger = logging.getLogger("example")

ch = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s : %(lineno)s - %(levelname)s - %(message)s')

class Options:

    def __init__(self, args):
        self.parser = argparse.ArgumentParser(description="Example script.")
        self.args = args

                                 help="Create configuration file with default values")

                                 help="Path to example.cfg")

                                 help="git branch OR git tag from which to build")


        self.options = self.parser.parse_args()
        print "repo-branch from command line is: {}".format(self.options.repo_branch)

    def get_options(self):
        return self.options

    def get_parser(self):
        return self.parser

class UpgradeService():

    def __init__(self, options):
        if not options:
        self.options = options
        if self.options.config:
            self.config_path = self.options.config

    def init_config_file(self):
        """ This function is to process the values provided in the config file """

        self.config = ConfigParser.RawConfigParser()

        self.repo_branch = self.config.get('example', 'repo-branch')


        print "repo-branch from config is: {}".format(self.repo_branch)

    def init_options(self):
        """ This function is to process the command line options.
            Command line options always override the values given in the config file.
        if self.options.repo_branch:
            self.repo_branch = self.options.repo_branch


    def run(self):

if __name__ == "__main__":
    options = Options(sys.argv).get_options()
    upgrade_service = UpgradeService(options)

    print "repo-branch value to be used is: {}".format(upgrade_service.repo_branch)

The main idea of this code was:

  • All the command line arguments parsing is done in the Options class.
  • The UpgradeService class reads the ini file.
  • The values from the Options class and the ini file are merged into the UpgradeService fields. So a config option like repo-branch will be stored in the upgrade_service.repo_branch field.
  • The method does all the script's magic, however this is not important here.

This way I can run the script with:

  • ./ - which will read the config file from /tmp/example.cfg, and the repo_branch should contain another.
  • ./ --config=/tmp/a.cfg - which will read the config from the /tmp/a.cfg.
  • ./ --help - which will show the help (this is automatically supported by the argparse module).
  • ./ --repo-branch=1764 -- and the repo_branch variable should contain 1764.

The Problems

First of all, there is a lot of repeated code, and repeated option names. Repeating code is a great way to provide lots of bugs. Each each option name is mentioned in the command line arguments parser (see the line 36). It is repeated later in the config file parser (see line 68). The variable name, which is used for storing each value, is repeated a couple of times. First in the argparse declaration (see line 36), then in the function init_options (see line 79). The conditional assignment (like in the lines 76-77) is repeated for each option. However for some options it is a little bit different.

This makes the code hard to update, when we change an option name, or want to add a new one.

Another thing is a simple typo bug. There is no check if an option in the config file is a proper one. When a user, by a mistake, writes in the config file repo_branch instead of repo-branch, it will be ignored.

The Bug

One question: can you spot the bug in the code?

The problem is that the script reads the config file. Then overwrites all the values with the command line ones. What if there is no command line argument for --repo-branch? Then the default one will be used, and it will overwrite the config one.

./ --config=../example.cfg
repo-branch from command line is: something
repo-branch from config is: another
repo-branch value to be used is: something

Fixing Time

The code for the two implementations (the one described above, and the one described below) can be found on github:

I tried to implement a better solution, it should fix the bug, inform user about bad config values, be easier to change later, and give the same result: the values should be used as UpgradeService fields.

The class Options is not that bad. We need to store the argparse configuration somewhere. I'd like just to have the option names, and default values declared in one place, without repeating it in different places.

I left the Options class, however I moved all the default values to another dictionary. There is no default value for any option in the argparse configuration. So now, if there is no command line option e.g. for --repo-branch then the repo_branch field in the object returned by the method Options.get_options() will be None.

After the changes, this part of the code is:


class Options:

    def __init__(self, args):
        self.parser = argparse.ArgumentParser(description="Example script.")
        self.args = args

                                 help="Create configuration file with default values")

                                 help="Path to example.cfg")

                                 help="git branch OR git tag from which to build")

        self.options = self.parser.parse_args()
        print "repo-branch from command line is: {}".format(self.options.repo_branch)

        # Here comes the next about 20 arguments

    def get_options(self):
        return self.options

    def get_parser(self):
        return self.parser

So I have a dictionary with the default values. If I would have a dictionary with the config values, and a dictionary with the command line ones, then it would be quite easy to merge them, and compare.

Get Command Line Options Dictionary

First let's make a dictionary with the command line values. This can be made with a simple:

def parse_args():
  return Options(sys.argv).get_options().__dict__

However there are two things to remember:

  • There is the command --create-config which should be supported, and this is the best place to do it.
  • The arguments returned by the __dict__, will have underscores in the names, instead of dashes.

So let's add creation of the new config file:

def parse_args():
    """ Parses the command line arguments, and returns dictionary with all of them.

    The arguments have dashes in the names, but they are stored in fields with underscores.

    :return: arguments
    :rtype: dictionary
    options = Options(sys.argv).get_options()
    result = options.__dict__
    logger.debug("COMMAND LINE OPTIONS: {}".format(result))

    if options.create_config:"Creating configuration file at: {}".format(options.create_config))
        with open(options.create_config, "w") as c:
            for key in sorted(DEFAULT_VALUES.keys()):
                value = DEFAULT_VALUES[key]
                c.write("{}={}n".format(key, value or ""))
    return result

The above function first gets the options from an Options class object, then converts it to a dictionary. If there is the option create_config set, then it creates the config file. If not, this function returns the dictionary with the values.

Get Config File Dictionary

The config file converted to a dictionary is also quite simple. However what we can get is a dictionary with keys like they are written in the config file. These will contain dashes like repo-branch, but in the other dictionaries we have underscores like repo_branch, I will also convert all the keys to have underscores instead of the dashes.

def read_config(fname, section_name=CONFIG_SECTION_NAME):
    """ Reads a configuration file.

    Here the field names contain the dashes, in args parser,
    and the default values, we have underscores.
    So additionally I will convert the dashes to underscores here.

    :param fname: name of the config file
    :return: dictionary with the config file content
    :rtype: dictionary
    config = ConfigParser.RawConfigParser()

    result = {key.replace('-','_'):val for key, val in config.items(section_name)}"Read config file {}".format(fname))
    logger.debug("CONFIG FILE OPTIONS: {}".format(result))
    return result

And yes, I'm using dictionary comprehension there.

Merging Time

Now I have three dictionaries with configuration options:

  • The config values, returned by the read_config function.
  • The command line values, returned by the parse_args function.

And I need to merge them. Merging cannot be done automatically, as I need to:

  • Overwrite or add values read from the config file.
  • Overwrite or add values from command line, but only if the values are not None, which is a default value when an argument it not set.
  • At the end I want to return an object. So I can call the option with settings.branch_name instead of the settings['branch_name'].

For merging I created this generic function, it can merge the first with the second dictionary, and can use the default values for the initial dictionary.

At the end it uses the namedtuple to get a nice object with fields' names taken from the keys, and filled with the merged dictionary values.

def merge_options(first, second, default={}):
    This function merges the first argument dictionary with the second.
    The second overrides the first.
    Then it merges the default with the already merged dictionary.

    This is needed, because if the user will set an option `a` in the config file,
    and will not provide the value in the command line options configuration,
    then the command line default value will override the config one.

    With the three-dictionary solution, the algorithm is:
    * get the default values
    * update with the values from the config file
    * update with the command line options, but only for the values
      which are not None (all not set command line options will have None)

    As it is easier and nicer to use the code like:
    the merged dictionary is then converted into a namedtuple.

    :param first: first dictionary with options
    :param second: second dictionary with options
    :return: object with both dictionaries merged
    :rtype: namedtuple
    from collections import namedtuple
    options = default
    options.update({key:val for key,val in second.items() if val is not None})
    logger.debug("MERGED OPTIONS: {}".format(options))
    return namedtuple('OptionsDict', options.keys())(**options)

Dictionary Difference

The last utility function I need is something to compare dictionaries. I think it is a great idea to inform the user that he has a strange option name in the config file. Let's assume, that:

  • The main list of the options is the argparse option list.
  • The config file can contain less options, but cannot contain options which are not in the argparse list.
  • There are some options which can be in the command line, but cannot be in the config file, like --create-config.

The main idea behind the function is to convert the keys for the dictionaries to sets, and then make a difference of the sets. This must be done for the settings names in both directions:

  • config.keys - commandline.keys - if the result is not an empty set, then it is an error
  • commandline.keys - config.keys - if the result is not an empty set, then we should just show some information about this

The below function gets two arguments first and second. It returns a tuple like (first-second, second-first). There is also the third argument, it is a list of the keys which we should ignore, like the create_config one.

def dict_difference(first, second, omit_keys=[]):
    Calculates the difference between the keys of the two dictionaries,
    and returns a tuple with the differences.

    :param first:     the first dictionary to compare
    :param second:    the second dictionary to compare
    :param omit_keys: the keys which should be omitted,
                      as for example we know that it's fine that one dictionary
                      will have this key, and the other won't

    :return: The keys which are different between the two dictionaries.
    :rtype: tuple (first-second, second-first)
    keys_first = set(first.keys())
    keys_second = set(second.keys())
    keys_f_s = keys_first - keys_second - set(omit_keys)
    keys_s_f = keys_second - keys_first - set(omit_keys)

    return (keys_f_s, keys_s_f)

Build The Options

And now the end. The main function for building the options, which will use all the above code. This function:

  • Gets a dictionary with command line options from the parse_args function.
  • Finds the path to the config file (from the command line, or from the default value).
  • Reads the dictionary with config file options from the read_config function.
  • Calculates the differences between the dictionaries using the dict_difference function.
  • Prints information about the options which can be set in the config file, but are not set currently. Those options are in the Options class, but are not in the config file.
  • Prints information about the options which are in the config file, but shouldn't be there, because they are not declared in the argparse options list, in the Options class.
  • If there are any options which cannot be in the config file, the script exits with error code.
  • Then it merges all three dictionaries using the function merge_options, and returns the named tuple.
    Builds an object with the merged opions from the command line arguments,
    and the config file.

    If there is an option in command line which doesn't exist in the config file,
    then the command line default value will be used. That's fine, the script
    will just print an info about that.

    If there is an option in the config file, which doesn't exist in the command line,
    then it looks like an error. This time the script will show this as error information,
    and will exit.

    If there is the same option in the command line, and the config file,
    then the command line one overrides the config one.

    options = parse_args()
    config = read_config(options['config'] or DEFAULT_VALUES['config'])

    (f, s) = dict_difference(options, config, COMMAND_LINE_ONLY_ARGS)
    if f:
        for o in f:
  "There is an option, which is missing in the config file,"
                        "that's fine, I will use the value: {}".format(DEFAULT_VALUES[o]))
    if s:
        logger.error("There are options, which are in the config file, but are not supported:")
        for o in s:

    merged_options = merge_options(config, options, DEFAULT_VALUES)
    return merged_options

Other Changes

There are some additional changes. I had to add a list with the command line argumets, which are fine to be omitted in the config file:

COMMAND_LINE_ONLY_ARGS = ["create_config"]

The UpgradeService class is much simpler now:

class UpgradeService():

    def __init__(self, options):
        if not options:
        self.options = options

    def run(self):

The runner part also changed a little bit:

if __name__ == "__main__":
    options = build_options()
    upgrade_service = UpgradeService(options)

    print "repo-branch value to be used is: {}".format(upgrade_service.options.repo_branch)

The only main difference between those two implementations is that in the first, the options could be accessed as upgrade_service.repo_branch, and in the second they need to be accessed as: upgrade_service.options.repo_branch.

Full Code

The code for the two implementations can be found on github:

published by (Kamil Ciemniewski) on 2015-03-26 12:50:00 in the "Elixir" category

Some time ago I started working on the Elixir library that would allow me to send emails as easily as ActionMailer known from the Ruby world does.

The beginnings were exciting ? I got to play with a very clean and elegant new language which Elixir is. I also quickly learned about the openness of the Elixir community. After hacking some first draft-like version and posting it on GitHub and Google groups ? I got a very warm and thorough code review from the language?s author José Valim! That?s just impressive and it made me even more motivated to help out the community by getting my early code into a better shape.

Coding the ActionMailer like library in a language that was born 3 years ago doesn?t sound like a few hours job ? there?s lots of functionality to be covered. An email?s body has to be somehow compiled from the template but also the email message has to be transformed to the form in which the SMTP server can digest and relay it. It?s also great if the message?s body can be encoded with ?quoted printable? ? this makes even the oldest SMTP server happy. But there?s lots more: connecting with external SMTP servers, using the local in-Elixir implementation, ability to test etc?

Fortunately Elixir?s built on top of the Erlang?s ?Virtual Machine? - BEAM - which makes you able to use its libraries ? a lot of them. For the huge part of the functionality I needed to cover I chose the great gen_smtp library ( This allowed me to actually send emails to SMTP servers and have them properly encoded. With the focus on developer?s productivity, Elixir made me come up with the nice set of other features that you can check out here:

This serves as a shout out blog post for the Elixir ecosystem and community. The friendliness that it radiates with makes open source work like this very rewarding. I invite you to make your contributions as well ? you?ll like it.

published by (Matt Vollrath) on 2015-03-24 21:16:00 in the "javascript" category

ROS and RobotWebTools have been extremely useful in building our latest crop of distributed interactive experiences. We're continuing to develop browser-fronted ROS experiences very quickly based on their huge catalog of existing device drivers. Whether a customer wants their interaction to use a touchscreen, joystick, lights, sound, or just about anything you can plug into the wall, we now say with confidence: "Yeah, we can do that."

A typical ROS system is made out of a group ("graph") of nodes that communicate with (usually TCP) messaging. Topics for messaging can be either publish/subscribe namespaces or request/response services. ROS bindings exist for several languages, but C++ and Python are the only supported direct programming interfaces. ROS nodes can be custom logic processors, aggregators, arbitrators, command-line tools for debugging, native Arduino sketches, or just about any other imaginable consumer of the data streams from other nodes.

The rosbridge server, implemented with rospy in Python, is a ROS node that provides a web socket interface to the ROS graph with a simple JSON protocol, making it easy to communicate with ROS from any language that can connect to a web socket and parse JSON. Data is published to a messaging topic (or topics) from any node in the graph and the rosbridge server is just another subscriber to those topics. This is the critical piece that brings all the magic of the ROS graph into a browser.

A handy feature of the rosbridge JSON protocol is the ability to create topics on the fly. For interactive exhibits that require multiple screens displaying synchronous content, topics that are only published and subscribed between web socket clients are a quick and dirty way to share data without writing a "third leg" ROS node to handle input arbitration and/or logic. In this case, rosbridge will act as both a publisher and a subscriber of the topic.

To develop a ROS-enabled browser app, all you need is an Ubuntu box with ROS, the rosbridge server and a web socket-capable browser installed. Much has been written about installing ROS (indigo), and once you've installed ros-indigo-ros-base, set up your shell environment, and started the ROS core/master, a rosbridge server is two commands away:

$ sudo apt-get install ros-indigo-rosbridge-suite
$ rosrun rosbridge_server rosbridge_websocket

While rosbridge is running, you can connect to it via ws://hostname:9090 and access the ROS graph using the rosbridge protocol. Interacting with rosbridge from a browser is best done via roslibjs, the JavaScript companion library to rosbridge. All the JavaScripts are available from the roslibjs CDN for your convenience.

<script type="text/javascript"
<script type="text/javascript"

From here, you will probably want some shared code to declare the Ros object and any Topic objects.

//* The Ros object, wrapping a web socket connection to rosbridge.
var ros = new ROSLIB.Ros({
  url: 'ws://localhost:9090' // url to your rosbridge server

//* A topic for messaging.
var exampleTopic = new ROSLIB.Topic({
  ros: ros,
  name: '/com/endpoint/example', // use a sensible namespace
  messageType: 'std_msgs/String'

The messageType of std_msgs/String means that we are using a message definition from the std_msgs package (which ships with ROS) containing a single string field. Each topic can have only one messageType that must be used by all publishers and subscribers of that topic.

A "proper" ROS communication scheme will use predefined message types to serialize messages for maximum efficiency over the wire. When using the std_msgs package, this means each message will contain a value (or an array of values) of a single, very specific type. See the std_msgs documentation for a complete list. Other message types may be available, depending on which ROS packages are installed on the system.

For cross-browser application development, a bit more flexibility is usually desired. You can roll your own data-to-string encoding and pack everything into a single string topic or use multiple topics of appropriate messageType if you like, but unless you have severe performance needs, a JSON stringify and parse will pack arbitrary JavaScript objects as messages just fine. It will only take a little bit of boilerplate to accomplish this.

 * Serializes an object and publishes it to a std_msgs/String topic.
 * @param {ROSLIB.Topic} topic
 *       A topic to publish to. Must use messageType: std_msgs/String
 * @param {Object} obj
 *       Any object that can be serialized with JSON.stringify
function publishEncoded(topic, obj) {
  var msg = new ROSLIB.Message({
    data: JSON.stringify(obj)

 * Decodes an object from a std_msgs/String message.
 * @param {Object} msg
 *       Message from a std_msgs/String topic.
 * @return {Object}
 *       Decoded object from the message.
function decodeMessage(msg) {
  return JSON.parse(;

All of the above code can be shared by all pages and views, unless you want some to use different throttle or queue settings on a per-topic basis.

On the receiving side, any old anonymous function can handle the receipt and unpacking of messages.

// Example of subscribing to a topic with decodeMessage().
exampleTopic.subscribe(function(msg) {
  var decoded = decodeMessage(msg);
  // do something with the decoded message object

The sender can publish updates at will, and all messages will be felt by the receivers.

// Example of publishing to a topic with publishEncoded().
// Explicitly declare that we intend to publish on this Topic.

setInterval(function() {
  var mySyncObject = {
    myFavoriteColor: 'red'
  publishEncoded(exampleTopic, mySyncObject);
}, 1000);

From here, you can add another layer of data shuffling by writing message handlers for your communication channel. Re-using the EventEmitter2 class upon which roslibjs depends is not a bad way to go. If it feels like you're implementing ROS messaging on top of ROS messaging.. well, that's what you're doing! This approach will generally break down when communicating with other non-browser nodes, so use it sparingly and only for application layer messaging that needs to be flexible.

 * Typed messaging wrapper for a std_msgs/String ROS Topic.
 * Requires decodeMessage() and publishEncoded().
 * @param {ROSLIB.Topic} topic
 *       A std_msgs/String ROS Topic for cross-browser messaging.
 * @constructor
function RosTypedMessaging(topic) {
  this.topic = topic;
RosTypedMessaging.prototype.__proto__ = EventEmitter2.prototype;

 * Handles an incoming message from the topic by firing an event.
 * @param {Object} msg
 * @private
RosTypedMessaging.prototype.handleMessage_ = function(msg) {
  var decoded = decodeMessage(msg);
  var type = decoded.type;
  var data =;
  this.emit(type, data);

 * Sends a typed message to the topic.
 * @param {String} type
 * @param {Object} data
RosTypedMessaging.prototype.sendMessage = function(type, data) {
  var msg = {type: type, data: data};
  publishEncoded(this.topic, msg);

Here's an example using RosTypedMessaging.

//* Example implementation of RosTypedMessaging.
var myMessageChannel = new RosTypedMessaging(exampleTopic);

myMessageChannel.on('fooo', function(data) {
  console.log('fooo!', data);

setInterval(function() {
  var mySyncObject = {
    myFavoriteColor: 'red'
  myMessageChannel.sendMessage('fooo', mySyncObject);
}, 1000);

If you need to troubleshoot communications or are just interested in seeing how it works, ROS comes with some neat command line tools for publishing and subscribing to topics.

### show messages on /example/topicname
$ rostopic echo /example/topicname

### publish a single std_msgs/String message to /example/topicname
### the quotes are tricky, since rostopic pub parses yaml or JSON
$ export MY_MSG="data: '{"type":"fooo","data":{"asdf":"hjkl"}}'"
$ rostopic pub -1 /example/topicname std_msgs/String "$MY_MSG"

To factor input, arbitration or logic out of the browser, you could write a roscpp or rospy node acting as a server. Also worth a look are ROS services, which can abstract asynchronous data requests through the same messaging system.

A gist of this example JavaScript is available, much thanks to Jacob Minshall.

published by (Dave Jenkins) on 2015-03-24 19:09:00 in the "Austin" category

End Point enjoyed an opportunity to work with, who bought a Liquid Galaxy to show their great efforts, at last week's SXSW conference in Austin. has a number of projects worldwide, all focused on how tech can bring about unique and inventive solutions for good. To showcase some of those projects, Google asked us to develop presentations for the Liquid Galaxy where people could fly to a given location, read a brief synopsis of the grantee organizations, and view presentations which included virtual flight animations, map overlays, and videos of the various projects.

Some of the projects included are as follows:
  • Charity:Water - The charity: water presentation included scenes featuring multi screen video of Scott Harrison (Founder/CEO) and Robert Lee (Director of Special Programs), and an animated virtual tour of charity: water well sites in Ethiopia.
  • World Wildlife Fund - The World Wildlife Fund presentation featured a virtual tour of the Bouba N?Djida National Park, Cameroon putting the viewer into the perspective of a drone patrolling the park for poachers. Additional scenes in the presentation revealed pathways of transport for illegal ivory from the park through intermediate stops in Nigeria and Hong Kong before reaching China.
  • Samasource - The Samasource presentation showed slums where workers start and the technology centers they are able to migrate to while serving a global network of technology clients.

But the work didn?t stop there! was also sharing the space with the XPrize opening gala on Monday night. For this event, we drew up a simple game where attendees could participate in various events around the room, receive coded answers at each station, and then enter their code into the Liquid Galaxy touchscreen. If successful, the 7-screen display whisked them to the Space Port in the Mojave, then into orbit. If the wrong code was entered, the participant got splashed into the Pacific Ocean. It was great fun!

The SXSW engagement is the latest in an ongoing campaign to bring attention to global challenges and how is helping solve those challenges. End Point enjoys a close working relationship with Google and We relish the opportunity to bring immersive and inviting displays that convey a wealth of information to viewers in such an entertaining and engrossing manner.

This event ran for 2 days at the Trinity Hall venue just 1 block from the main SXSW convention center in downtown Austin, Texas.

published by (Szymon Guz) on 2015-03-24 08:54:00 in the "AngularJS" category

The best thing in AngularJS is the great automation of actualizing the data in the html code.

To show how easy Angular is to use, I will create a very simple page using AngularJS and Github.

Every Github user can get lots of notifications. All of them can be seen at Github notification page. There is also the Github API, which can be used for getting the notification information, using simple http requests, which return jsons.

I wanted to create a simple page with a list of notifications. With information if the notification was read (I used "!!!" for unread ones). And with automatical refreshing every 10 minutes.

To access the Github API, first I generated an application token on the Github token page. Then I downloaded a file from the AngularJS page, and a Github API javascript wrapper.

Then I wrote a simple html file:


      <body ng-app="githubChecker">

Github Notifications

!!! {{ n.subject.title }}
</body> </html>

This is the basic structure. Now we need to have some angular code to ask Github for the notifications and fill that into the above html.

The code is also not very complicated:

  var githubChecker = angular.module('githubChecker', []);

  githubChecker.controller("mainController", ['$scope', '$interval', function($scope, $interval){

    $scope.notifications = [];

    var github = new Github({
      username: "USERNAME",
      token:    "TOKEN",
      auth:     "oauth"
    var user = github.getUser();

    var getNotificationsList = function() {
      user.notifications(function(err, notifications) {
        $scope.notifications = notifications;


    $interval(getNotificationsList, 10*60*1000);


First of all I've created an Angular application object. That object has one controller, in which I created a Github object, which gives me a nice way to access the Github API.

The function getNotificationsList calls the Github API, gets a response, and just stores it in the $scope.notifications object.

Then the angular's magic comes into play. When the $scope fields are updated, angular automatically updates all the declarations in the html page. This time it is not so automatic, as I had to call the $scope.$apply() function to trigger it. It will loop through the $scope.notifications and update the html.

For more information about the Angular, and the commands I used, you can check the AngularJS Documentation.

published by (Jon Jensen) on 2015-03-23 21:05:00 in the "design" category

A few weeks ago, Google announced that starting on April 21 it will expand its ?use of mobile-friendliness as a ranking signal? which ?will have a significant impact in our search results?.

The world of search engine optimization and online marketing is aflutter about this announcement, given that even subtle changes in Google?s ranking algorithm can have major effects to improve or worsen any particular site?s ranking. And the announcement was made less than two months in advance of the announced date of the change, so there is not much time to dawdle.

Google has lately been increasing its pressure on webmasters (is that still a real term?) such as with its announcement last fall of an accelerated timetable for sunsetting SSL certificates with SHA-1 signatures. So far these accelerated changes have been a good thing for most people on the Internet.

In this case, Google provides an easy Mobile-Friendly Site Test that you can run on your sites to see if you need to make changes or not:


So get on it and check those sites! I know we have a few that we can do some work on.

published by (Steph Skardal) on 2015-03-18 19:33:00 in the "ecommerce" category

One of my recent projects for Paper Source has been to introduce advanced product filtering (or faceted filtering). Paper Source runs on Interchange, a perl-based open source ecommerce platform that End Point has been involved with (as core developers & maintainers) for many years.

In the case of Paper Source, personalized products such as wedding invitations and save the dates have advanced filtering to filter by print method, number of photos, style, etc. Advanced product filtering is a very common feature in ecommerce systems with a large number of products that allows a user to narrow down a set of products to meet their needs. Advanced product filtering is not unlike faceted filtering offered by many search engines, which similarly allows a user to narrow down products based on specific tags or facets (e.g. see many Amazon filters on the left column). In the case of Paper Source, I wrote the filtering code layered on top of the current navigation. Below I'll go through some of the details with small code examples.

Data Model

The best place to start is the data model. A simplified existing data model that represents product taxonomy might look like the following:

Basic data model linking categories to products.

The existing data model links products to categories via a many-to-many relationship. This is fairly common in the ecommerce space – while looking at a specific category often identified by URL slug or id, the products tied to that category will be displayed.

And here's where we go with the filtering:

Data model with filtering layered on top of existing category to product relationship.

Some notes on the above filtering data model:

  • filters contains a list of all the files. Examples of entries in this table include "Style", "Color", "Size"
  • filters_categories links filters to categories, to allow finite control over which filters show on which category pages, in what order. For example, this table would link category "Shirts" to filters "Style", "Color", "Size" and the preferred sort order of those filters.
  • filter_options includes all the options for a specific filter. Examples here for various options include "Large", "Medium", and "Small", all linked to the "Size" filter.
  • filter_options_products links filter options to a specific product id with a many to many relationship.

Filter Options Exclusivity

One thing to consider before coding are the business rules pertaining to filter option exclusivity. If a product is assigned to one filter option, can it also have another filter option for that same filter type? IE, if a product is marked as blue, can it also be marked as red? When a user filters by color, can they filter to select products that are both blue and red? Or, if a product is is blue, can it not have any other filter options for that filter? In the case of Paper Source product filtering, we went with the former, where filter options are not exclusive to each other.

A real-life example of filter non-exclusivity is how Paper Source filters wedding invitations. Products are filtered by print method and style. Because some products have multiple print methods and styles, non-exclusivity allows a user to narrow down to a specific combination of filter options, e.g. a wedding invitation that is both tagged as "foil & embossed" and "vintage".

URL Structure

Another thing to determine before coding is the URL structure. The URL must communicate the current category of products and current filter options (or what I refer to as active/activated filters).

I designed the code to recognize one component of the URL path as the category slug, and the remaining paths to map to the various filter option url slugs. For example, a URL for the category of shirts is "/shirts", a URL for large shirts "/shirts/large", and the URL for large blue shirts "/shirts/blue/large". The code not only has to accept this format, but it also must create consistently ordered URLs, meaning, we don't want both "/shirts/blue/large" and "/shirts/large/blue" (representing the same content) to be generated by the code. Here's what simplified pseudocode might look like to retrieve the category and set the activated filters:

my @url_paths = split('/', $request_url); #url paths is e.g. /shirts/blue/large
my $category_slug = shift(@url_paths)
# find Category where slug = $category_slug
# redirect if not found
# @url_paths is active filters

Applying the Filtering

Next, we need a couple things to happen:

  • If there is an activated filter for any filter option, apply it.
  • Generate URLs to toggle filter options.

First, all products are retrieved in this category with a query like this:

SELECT products.*,
  COALESCE((SELECT GROUP_CONCAT(fo.url_slug) FROM filters_options_item foi 
    JOIN filters_options fo ON = foi.filter_option_id 
    WHERE foi.product_id =, '') AS filters
FROM products 
JOIN categories_products cp ON cp.product_id =
JOIN categories c ON = cp.category_id
WHERE c.url_slug = ?

Next is where the code gets pretty hairy, so instead I'll try to explain with pseudocode:

#@filters = all applicable filters for current category
# loop through @filters
  # loop through filter options for this filter
  # filter product results to include any selected filter options for this filter
  # if there are no filter options selected for this filter, include all products
  # build the url for each filter option, to toggle the filter option (on or off)
# loop through @filters (yes, a second time)
  # loop through filter options for this filter
  # count remaining products for each filter option, if none, set filter option to inactive
  # build the final url for each filter option, based on all filters turned on and off

My pseudocode shows that I iterate through the filters twice, first to apply the filter and determine the base URL to toggle each filter option, and second to count the remaining filtered products and build the URL to toggle each filter. The output of this code is a) a set of filtered products and b) a set of ordered filters and filter options with corresponding counts and links to toggle on or off.

Here's a more specific example:

  • Let's say we have a set of shirts, with the following filters & options: Style (Long Sleeve, Short Sleeve), Color (Red, Blue), Size (Large, Medium, Small).
  • A URL request comes in for /shirts/blue/large
  • The code recognizes this is the shirts category and retrieves all shirts.
  • First, we look at the style filter. No style filter is active in this request, so to toggle these filters on, the activation URLs must include "longsleeve" and "shortsleeve". No products are filtered out here.
  • Next, we look at the color filter. The blue filter option is active because it is present in the URL. In this first loop, products not tagged as blue are removed from the set of products. To toggle the red option on, the activation URL must include "red", and to toggle the blue filter off, the URL must not include "blue", which is set here.
  • Next, we look at the size filter. Products not tagged as large are removed from the set of products. Again, the large filter has to be toggled off in the URL because it is active, and the medium and small filter need to be toggled on.
  • In the second pass through filters, the remaining items applicable to each filter option are counted, for long sleeve, short sleeve, red, medium, and small options. And the URLs are built to turn on and off all filter options (e.g. applying the longsleeve filter will yield the URL "/shirts/longsleeve/blue/large", applying the red filter will yield the URL "/shirts/blue/red/large", turning off the blue filter will yield the URL "/shirts/large").

The important thing to note here is the double pass through filters is required to build non-duplicate URLs and to determine the product count after all filter options have been applied. This isn't simple logic, and of course changing the business rules like exclusivity will change the loop behavior and URL logic.

Alternative Approaches

Finally, a few notes regarding alternative approaches here:

  • Rather than going with the blacklist approach described here, one could go with a whitelist approach where a set of products is built up based on the filter options set.
  • Filtering could be done entirely via AJAX, in which case URL structure may not be a concern.
  • If the data is simple enough, products could potentially be filtered in the database query itself. In our case, this wasn't feasible since we generate product filter option details from a number of product attributes, not just what is shown in the simplified product filter data model above.