All opinions expressed are those of the authors and not necessarily those of, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by Eugenia on 2016-06-18 00:17:34 in the "Hardware" category
Eugenia Loli-Queru

As you may already know, I have the most interesting dreams, hehe…

Apart from seeing weird alien entities during my nap time, I was also shown how the CubeSat idea can be properly commercialized. Beat that, MIT (or DMT).

So basically, I was shown a bunch of 3U CubeSats (around 10 or 12 of them), held together by some sort of string, forming a web. At the edges of the web, there were semi-large solar panels and antennas, while in the middle of the web, there was a propulsion engine, not larger than a 3U CubeSat itself.

Right now, all CubeSats are released in the wild on their own, with no propulsion (sometimes they end up facing the wrong way), terrible power abilities, and even more terrible communication (FM among others!!!). These satellites usually die within 3-5 months, quickly burning in the atmosphere. On top of that, they’re usually get released as secondary payload in LEO, while CubeSats are benefited in higher SSO orbit.

Here’s the business idea behind of what I saw:

– You let customers buy one of the CubeSats and customize it out of an array of most-popular components (third party components that pass evaluation can be accepted — that costs extra).

– The CubeSats run Android, so writing drivers for it, updating them over the air, or even completely erase them to their default status can be done. Each of the 12 CubeSats runs a slightly different version of the OS, and has different hardware — depending on the customer needs.

– The customer can access their CubeSat via a secure backend on the manufacturer’s web site. Even firmware updates can be performed, not just plain updates or downlink data.

– Because of the shared propulsion, the constellation web can be in SSO for up to 5 years.

– 1 year of backend support is included in the overall price, but after that time, owners can continue using it for an additional fee, or lease or sell the rights to their CubeSat to another commercial entity, getting back some of that invested value.

– Even if 1 CubeSat goes bad, the others continue to work, since they’re independent of each other. Triple redundancy system in case of shorting. To avoid over-usage of power due to faulty hardware or software (that could run down the whole system), a pre-agreed specific amount is allocated to each CubeSat daily.

– Eventually, a more complex system could be developed, under agreement with all the responsible parties, to have CubeSats share information with their neighbor CubeSats (either an internal wired network, or Bluetooth — whatever proves more secure and fast). For example, if there’s a hardware ability one CubeSat in the web has, but the others don’t, and one of the other CubeSats needs it, they could ask for its service — for the right price.

– Instead of dispensing the CubeSats one by one, the web is a single machine, about 2/3s the size of a dishwasher. The CubeSats have very specific allowed weight in their specification, so overall, while the volume is medium size, the overall weight doesn’t have to be more than 100 kg. That easily fits on the payload of small, inexpensive rockets, like the upcoming RocketLab Electron, which costs just $4.9 million per launch. Falcon 9 becomes cheaper only if it could launch 13 of these webs at once. While it can very easily lift their weight, it might not have the volume required (the Falcon9 fairing is rather small at 3.2m).

– This comes overall to about $600,000 per CubeSat overall (with a rather normal configuration).

The current 3U CubeSats cost anywhere between $20k and $50k to make, plus another $200k or so to launch. Overall, sure, $600k is more than the current going price, but with the web idea you get enough power, communication that doesn’t suck, propulsion, and an extended life — plus the prospect of actually making money out of them by leasing them or selling them. A lot of the revenue will come after the launch, as a service/marketplace business.

In a sense, this business idea is the equivalent of a shared hosting server service, which has revolutionized the way servers work, and has democratized people’s ability to run code or servers online. PlanetLabs is doing something similar by leasing “time” on their CubeSats, but by releasing them one by one, they fall on the stated shortcomings.

For all of this to become true, the CubeSats themselves would need an overhaul of how customizable their modularity is, and easy access to the latest mobile hardware. Overall, we’re probably 2-3 years away from such an idea getting even started to materialize, and possibly 5 years away from becoming reality. I haven’t seen anyone else suggested it, so, here I am. Thank my weird dreams.

published by (Greg Sabino Mullane) on 2016-06-13 20:47:00 in the "postgres" category

(A Unicode rabbit face 🐰 will never be as cute
as this real bunny. Photo by Wade Simmons)

One of our clients recently reached out to us for help in upgrading their Postgres database. The use of the pg_upgrade program was not an option, primarily because the client was also taking the opportunity to change from their SQL_ASCII encoding to UTF-8. (If any of your databases, gentle reader, are still SQL_ASCII, please do the same!). Naturally, we also took advantage of the lack of pg_upgrade to enable the use of data checksums, another action we highly recommend. Although there were plenty of wrinkles, and stories to be told about this migration/upgrade, I wanted to focus on one particular problem we had: how to detect if a table has changed.

We needed to know if any applications were modifying certain tables because the speed of the migration was very important. If we could assert that no changes were made, there were some shortcuts available that would greatly speed things up. Initial testing showed that the migration was taking over eight hours, a time unacceptable to the client (no worries, we eventually reduced the time to under an hour!).

Looking closer, we found that over half that time was spent converting a single small (50MB) table from SQL_ASCII to UTF-8. How this conversion was performed is a story for another day, but suffice to say the table had some really, really messy bytes inside of it; the conversion program had to struggle mightily. When you are converting a database to a new encoding, it is imperative to examine every byte and make sure it gets changed to a format that Postgres will accept as valid UTF-8, or the entire table import will fail with an error similar to this:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf4 0xa5 0xa3 0xa5

Looking closer at the data in the table showed that it might - just might! - be a historical table. In other words, it no longer receives updates, just selects. We really wanted this to be true, for it meant we could dump the whole table, convert it, and simply load the converted table into the new database (which took only a few seconds!). First, however, we had to confirm that the table was not changing.

Detecting changes may be done in several ways. For all of them, you can never prove that the table shall not change at some point in the future, but you can prove that it has not changed over a certain period of time. How you go about doing that depends on what kind of access you have. If you do not have super-user access, you could add a simple trigger to the table that updates another table when a update, insert, or delete is performed. Then, checking in on the second table will indicate if any changes have been made.

A better solution is to simply look at the underlying file that makes up the table. To do this, you need be a Postgres superuser or have access to the underlying operating system. Basically, we will trust the operating system's information on when the table was last changed to determine if the table itself has changed. Although not foolproof, it is an excellent solution. Let's illustrate it here. First: create a test table and add some rows:

$ psql
greg=# CREATE TABLE catbox AS SELECT 8675309::INT AS id FROM generate_series(1,1000);

Now we can use the pg_stat_file() function, which returns some basic information about a file on disk. With the help of the pg_relation_filepath() function, we can see when the table was last modified:

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') ) xg
Expanded display is on.
-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:36:00-04
change       | 2015-11-08 22:36:00-04
creation     | 
isdir        | f

Next we will revisit the table after some time (e.g. 24 hours) and see if the "modification" timestamp is the same. If it is, then the table has not been modified either. Unfortunately, the possibility of a false positive is possible due to VACUUM, which may change things on disk but does NOT change the data itself. (A regular VACUUM *may* modify the file, and a VACUUM FULL *always* modifies it).

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') ) xg

-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:36:00-04
change       | 2015-11-08 22:36:00-04
creation     | 
isdir        | f

greg=# vacuum catbox;

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') );

2016-06-09 22:53:24-04
-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:40:14-04
change       | 2015-11-08 22:40:14-04
creation     | 
isdir        | f

A second (and more foolproof) method is to simply generate a checksum of the entire table. This is a fairly straightforward approach; just pump the output of pg_dump to a checksum program:

$ pg_dump -t catbox --data-only | sha1sum
6f724565656f455072736e44646c207472536e61  -

The advantage here is that even a VACUUM FULL will not change the checksum. However, because pg_dump does no ORDER BY when dumping out the table, it is possible for the rows to be returned in a different order. To work around that, issue a VACUUM FULL yourself before taking the checksum. As before, come back later (e.g. 24 hours) and re-run the command. If the checksums match, then the table has not changed (and is probably no longer updated by the application). By using this method, we were able to verify that the large, SQL_ASCII byte-soup table was indeed not being updated, and thus we took it out of the direct migration.

Of course, that table needed to be part of the new database, but we simply dumped the table, ran the conversion program on it, and (four hours later), had a complete dump of the table that loads extremely fast into the new database.

That solved only one of the problems, however; another table was also slowing down the migration. Although it did not have the SQL_ASCII conversion issue, it was a large table, and took a large percentage of the remaining migration time. A quick look at this table showed it had a "creation_time" column as well as a SERIAL primary key, and was obviously being updated quite often. Close examination showed that it was possible this was an append-only table, such that older rows were never updated. This called for a similar approach: could we prove that a large chunk of the table was not changing? If we could, we could pre-populate the new database and copy over only the most recent rows during the migration, saving a good bit of time.

The previous tricks would not work for this situation, because the underlying file would change constantly as seen by pg_stat_file(), and a pg_dump checksum would change on every insert. We needed to analyze a slice of the table - in this particular case, we wanted to see about checksumming all rows except those created in the last week. As a primary key lookup is very fast, we used the "creation_time" column to determine an approximate primary key to start with. Then it was simply a matter of feeding all those rows into the sha1sum program:

greg=# CREATE TABLE catbox2 (id SERIAL PRIMARY KEY, creation_time TIMESTAMPTZ);
greg=# INSERT INTO catbox2(creation_time) select now() - '1 year'::interval + (x* '1 hour'::interval) from generate_series(1,24*365) x;
INSERT 0 8760

greg=# select * from catbox2 where creation_time > now()-'1 week'::interval order by 1 limit 1
  id  |         creation_time         
 8617 | 2016-06-11 10:51:00.101971-08

$ psql -Atc "select * from catbox2 where id < 8617 order by 1" | sha1sum
456272656d65486e6f203139353120506173733f  -

## Add some rows to emulate the append-only nature of this table:
greg=# insert into catbox2(creation_time) select now() from generate_series(1,1000)
INSERT 0 1000

## Checksums should still be identical:
$ psql -Atc "select * from catbox2 where id < 8617 order by 1" | sha1sum
456272656d65486e6f203139353120506173733f  -

Despite the large size of this table (around 10 GB), this command did not take that long to run. A week later, we ran the same commands, and got the same checksum! Thus, we were able to prove that the table was mostly append-only - or at least enough for our use case. We copied over the "old" rows, then copied over the rest of the rows during the critical production migration window.

In the future, this client will able to take advantage of pg_upgrade, but getting to UTF-8 and data checksums was absolutely worth the high one-time cost. There were several other tricks used to speed up the final migration, but being able to remove the UTF-8 conversion of the first table, and being able to pre-copy 99% of the second table accounted for the lion's share of the final speed improvements.

published by (Dave Jenkins) on 2016-06-10 20:54:00 in the "Korea" category
image courtesy of retaildesignblog
End Point and AZero, a South Korean system integrator, have partnered to deploy a 10-screen Liquid Galaxy to the newly-opened Hyundai Card Travel Library in Seoul, South Korea. This project presented a number of unique challenges for our teams, but we have launched successfully, to the great satisfaction of AZero's client.

The Hyundai Card Travel Library is an incredible space: wall-to-wall maple bookshelves hold travel brochures, photo books, and other travel-related material. The Liquid Galaxy displays itself sits in a small alcove off the main library space. Being fully enclosed, viewers can control the lighting and get a full immersion experience through the two rows of five 47" screens arrayed in an wall-mounted semi-circle arc. The viewer can control the screens via the podium-mounted touchscreen and SpaceNav mouse controller.

We solved several technical challenges for this deployment: the extremely tight space made cabling and display configuration tricky. Also, this isn't a "standard" 7-screen single row deployment, but rather two rows of 5 screens each. Working with AZero, End Point reconfigured the Liquid Galaxy display configurations to account for this unique layout. NOTE: the Liquid Galaxy scales quite easily, and can be arrayed in any number of configurations. Other recent deployments include a 40-screen control room, with 10 columns of 4 screens each!

Intended as a travel planning platform, Hyundai provided a number of set tours to showcase on the Liquid Galaxy, such as "The Orient Express", "Rio!", or "European Capitals". Each tour shows an overview map as a graphic overlay, while the Liquid Galaxy hops from each destination on the route to the next within Google Earth. At each location, the potential traveler can drop into Google Street View and see the fully panoramic images of street scenes in Paris or the top of Sugarloaf Mountain in Brazil. This should allow potential travelers to virtually experience the tour locations, and make more informed decisions about which trip might suit their tastes. Beyond that, it's a pretty cool way to view the planet inside a travel library.

published by (Richard Peltzman) on 2016-06-10 18:26:00 in the "Business" category
Wall Street. Back where it all started for me some 35 years ago. Only instead of being an employee of Oppenheimer and Company, I had the experience and honor of representing End Point as one of 7 companies chosen to ?ring the bell? at the New York Stock Exchange, chiming in Small Business Week for JPMorgan Chase. (They work with 4,000,000 small companies.)

The morning started early by going through security checks rivaling the airport, except I didn?t have to take my shoes off. After getting my nifty credential, we went back to the street where the President of the NYSE gave a welcoming speech, pointing out the buildings that are still standing from when Hamilton, Monroe and Aaron Burr all started their own banks. As well as where George Washington was sworn in.

All this while also getting free coffee from the main small business honoree, Gregory?s Coffee, and picture taking from the business paparazzi!

We then went inside the storied NYSE building and made our way to the trading floor. The surroundings were immensely impressive, as the Stock Exchange still inhabits a huge floor with gorgeous 40 foot mid-18th century ceilings high above, holding up all sorts of 21st century technology and equipment. In the center of it all is CNBC? stage set, with the show Squawk Box airing live, and with the infamous Mad Money man himself, Jim Kramer, sitting in, talking about the impact the newly minted presumptive Republican Candidate (to be nameless) might have on the markets.

At about 9:15, the 7 small business owners and the head of marketing and small market banking for Chase made our way to the real stage, a perch overlooking the entire floor where the actual apparatus for the bell ringing takes place. In front of four live cameras, we waited for 9:30 to hit, and then the bell was rung....rather, the button was pressed to start the day?s trading. (Sorry they use a button now, not an actual clapper to ring the bell.) Aired live, we shook hands, smiled for a national audience, passed around compliments and enjoyed the moment.

A few moments we went back down to the floor where we were able to float around and I could watch the operation of what is still the heart of the financial world come to life around me.

The Exchange has changed in all those years since my days in one important way; Instead of thousands of crazed floor traders frantically buying and selling millions of shares using ticker tape and hand signals, there were maybe a couple of hundred standing by their booths, watching millions of shares being exchanged electronically. Looking almost bored, they calmly talked with each other sipping more of Gregory?s coffee.

I?ll dispense with the rest of the details of the day and the ensuing reception. Rather, I shall now explain the importance of the event as it relates to End Point and to our clients and my own history.

The first day I was on the job in Wall Street, the chairman of the company said to me something that to this day still guides me as to how I run End Point and I have repeated countless times to whomever will listen. He said, ?Rick, to have a good restaurant, you have to have a great kitchen!? So, he had me learn all aspects of the company from back office, to accounting, to the fledgling world of computers to the front office and to how to work effectively with employees and customers alike.

End Point may be one of the smaller of the ?small companies? chosen by Chase this day. But, we were selected because we are a company that personifies good management, great engineering, unrelenting commitment to our customers, and a company that has great potential. Why? Because we believe in having a great kitchen! Our engineering staff is exemplary, our clients are fully appreciative of our partnerships with them, and we are doing all we can to be a model business.

While I may not have rung the actual bell this year, our banker and Chase has every confidence ? as do I ? that one day we will.

published by (Matt Galvin) on 2016-06-02 18:00:00 in the "" category has disabled the RC4 cipher suite on their test server. Their production server update will follow soon. So, in order to ensure your, or your client's, site(s) do not experience any interruption in payment processing it is wise to place a test order in the test environment.

The projects I was testing were all Spree Gem (2.1.x). The Spree Gem uses the ActiveMerchant Gem (in Spree 2.1.x it's ActiveMerchant version 1.34.x). Spree allows you to sign into the admin and select which server your payment method will hit- production or test. There is another option for selecting a "Test Mode" transaction. The difference between a test server transaction and a test mode transaction is explained quite succinctly on the documentation. To summarize it, test server transactions are never sent to financial institutions for processing but are stored in (so you can see their details). Transactions in test mode however are not stored and return a transaction ID of zero.

I wanted to use my test account to ensure my clients were ready for the RC4 Cypher Suite disablement. I ran across a few strange things. First, for three sites, no matter what I did, I kept getting errors saying my account was either inactive or I was providing the wrong credentials. I signed in to and verified my account was active. I triple checked the credentials, they were right. So, I re-read the Spree docs thinking that perhaps I needed to use a special word or format to actually use the test server ("test" versus "Test" or something like that).

Below is a screenshot of the test payment method I had created and was trying to use.

Since I kept getting errors I looked through the Spree code, then the ActiveMerchant Gem that Spree is using.

Below, you can see that the ActiveMerchant is deciding which URL to use (test or live) based on the value of test? (line 15). active_merchant/lib/active_merchant/billing/gateways/authorize_net.rb

require 'nokogiri'

module ActiveMerchant #:nodoc:
  module Billing #:nodoc:
    class AuthorizeNetGateway < Gateway
      include Empty

      self.test_url = ''
      self.live_url = ''

      def url
        test? ? test_url : live_url

How and where is this set? Spree passes the ActiveMerchant Gem some data which the ActiveMerchant Gem uses to create Response objects. Below is the code where ActiveMerchant handles this data.

module ActiveMerchant #:nodoc:
  module Billing #:nodoc:
    class Error < ActiveMerchantError #:nodoc:

    class Response
      attr_reader :params, :message, :test, :authorization, :avs_result, :cvv_result, :error_code, :emv_authorization
      def test?
      def initialize(success, message, params = {}, options = {})
        @success, @message, @params = success, message, params.stringify_keys
        @test = options[:test] || false
        @authorization = options[:authorization]
        @fraud_review = options[:fraud_review]
        @error_code = options[:error_code]
        @emv_authorization = options[:emv_authorization]

        @avs_result = if options[:avs_result].kind_of?(AVSResult)

        @cvv_result = if options[:cvv_result].kind_of?(CVVResult)
      # Are we running in test mode?
      def test?
        (@options.has_key?(:test) ? @options[:test] : Base.test?)

Now that I was more familiar with ActiveMerchant, I wanted to verify that Spree was passing the data as intended

I could see in spree/core/app/models/spree/gateway.rb that Spree was setting ActiveMerchant::Billing::Base.gateway_mode equal to the server param as a symbol. I verified it with some logging.

    def provider
      gateway_options = options
      gateway_options.delete :login if gateway_options.has_key?(:login) and gateway_options[:login].nil?
      if gateway_options[:server]
        ActiveMerchant::Billing::Base.gateway_mode = gateway_options[:server].to_sym
      @provider ||=

At this point I was satisfied that Spree was sending a server param. I also knew Spree was setting Active Merchant's Base.gateway_mode as intended. I then reviewed active_merchant/lib/active_merchant/billing/gateway.rb once more

      # Are we running in test mode?
      def test?
        (@options.has_key?(:test) ? @options[:test] : Base.test?)
and active_merchant/lib/active_merchant/billing/base.rb
      def self.test?
        self.gateway_mode == :test

So, that's it! We know from the exceptions I raised that Spree is sending a test key and a test_mode key. They seem to be the same value but with different keys (I'm guessing that's a mistake), and they both just seem to indicate if the test mode checkbox was checked or not in the Spree admin. However, Base.test? is the server selection and comes from whatever anyone enters in the server input box in the Spree admin. So, we just need to update the ternary operator to check if @options[:test] (test mode) or Base.test? (test server) is true.

Since this is Spree, I created a decorator to override the test? method.


ActiveMerchant::Billing::Gateway.class_eval do
  def test?
    @options.has_key?(:test) && @options[:test] || ActiveMerchant::Billing::Base.test?

Lastly, I placed some test orders and it all worked as intended.

Summary is disabling the RC4 Cypher suite. If your site(s) uses that, your payment processing may be interrupted. Since the test environment has been updated by, you can see if your site(s) is compliant by posting test transactions to the test environment. If it works, then your site(s) should be compliant and ready when applies the changes to the production server.

Spree 2.1.x (and perhaps all other Spree versions) ALWAYS send the test key, so the ActiveMerchant Gem will always just use the boolean value of that key instead of ever checking to see what the server was set to. Further, this fix makes things a little bit more robust in my opinion by checking if test mode OR the test server was specified, rather than only checking which server (gateway_mode) was specified if the test key was absent.

Alternatively, you could probably make Spree only pass the test key if the value was true. Either way, if you are trying to send test orders to the test environment for a Spree site of at least some versions and have not implemented one of these changes, you will be unable to do so until you add a similar fix as I have described here. If you need any further assistance, please reach out to us at

published by (Greg Sabino Mullane) on 2016-06-01 03:59:00 in the "Bucardo" category

Bucardo is very good at replicating data among Postgres databases (as well as replicating to other things, such as MariaDB, Oracle, and Redis!). However, sometimes you need to work outside the normal flow of trigger-based replication systems such as Bucardo. One such scenario is when many changes need to be made to your replicated tables. And by a lot, I mean many millions of rows. When this happens, it may be faster and easier to find an alternate way to replicate those changes.

When a change is made to a table that is being replicated by Bucardo, a trigger fires and stores the primary key of the row that was changed into a "delta" table. Then the Bucardo daemon comes along, gathers a list of all rows that were changed since the last time it checked, and pushes those rows to the other databases in the sync (a named replication set). Although all of this is done in a fast and efficient manner, there is a bit of overhead that adds up when (for example), updating 650 million rows in one transaction.

The first and best solution is to simply hand-apply all the changes yourself to every database you are replicating to. By disabling the Bucardo triggers first, you can prevent Bucardo from even knowing, or caring, that the changes have been made.

To demonstrate this, let's have Bucardo replicate among five pgbench databases, called A, B, C, D, and E. Databases A, B, and C will be sources; D and E are just targets. Our replication looks like this: ( A <=> B <=> C ) => (D, E). First, we create all the databases and populate them:

## Create a new cluster for this test, and use port 5950 to minimize impact
$ initdb --data-checksums btest
$ echo port=5950 >> btest/postgresql.conf
$ pg_ctl start -D btest -l logfile

## Create the main database and install the pg_bench schema into it
$ export PGPORT=5950
$ createdb alpha
$ pgbench alpha -i --foreign-keys

## Replicated tables need a primary key, so we need to modify things a little:
$ psql alpha -c 'alter table pgbench_history add column hid serial primary key'

## Create the other four databases as exact copies of the first one:
$ for dbname in beta gamma delta epsilon; do createdb $dbname -T alpha; done

Now that those are done, let's install Bucardo, teach it about these databases, and create a sync to replicate among them as described above.

$ bucardo install --batch
$ bucardo add dbs A,B,C,D,E dbname=alpha,beta,gamma,delta,epsilon dbport=5950
$ bucardo add sync fiveway tables=all dbs=A:source,B:source,C:source,D:target,E:target

## Tweak a few default locations to make our tests easier:
$ echo -e "logdest=.npiddir=." > .bucardorc

At this point, we have five databases all ready to go, and Bucardo is setup to replicate among them. Let's do a quick test to make sure everything is working as it should.

$ bucardo start
Checking for existing processes
Starting Bucardo

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "n" " "
alpha|0 beta|0 gamma|0 delta|0 epsilon|0

$ pgbench alpha
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average: 0.000 ms
tps = 60.847066 (including connections establishing)
tps = 62.877481 (excluding connections establishing)

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "n" " "
alpha|6576 beta|6576 gamma|6576 delta|6576 epsilon|6576

$ pgbench beta
starting vacuum...end.
tps = 60.728681 (including connections establishing)
tps = 62.689074 (excluding connections establishing)
$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "n" " "
alpha|7065 beta|7065 gamma|7065 delta|7065 epsilon|7065

Let's imagine that the bank discovered a huge financial error, and needed to increase the balance of every account created in the last two years by 20 dollars. Let's further imagine that this involved 650 million customers. That UPDATE will take a very long time, but will suffer even more because each update will also fire a Bucardo trigger, which in turn will write to another "delta" table. Then, Bucardo will have to read in 650 million rows from the delta table, and (on every other database in the sync) apply those changes by deleting 650 million rows then COPYing over the correct values. This is one situation where you want to sidestep your replication and handle things yourself. There are three solutions to this. The easiest, as mentioned, is to simply do all the changes yourself and prevent Bucardo from worrying about it.

The basic plan is to apply the updates on all the databases in the syncs at once, while using the session_replication_role feature to prevent the triggers from firing. Of course, this will prevent *all* of the triggers on the table from firing. If there are some non-Bucardo triggers that must fire during this update, you might wish to temporarily set them as ALWAYS triggers.

Solution one: manual copy

## First, stop Bucardo. Although not necessary, the databases are going to be busy enough
## that we don't need to worry about Bucardo at the moment.
$ bucardo stop
Creating ./fullstopbucardo ... Done

## In real-life, this query should get run in parallel across all databases,
## which would be on different servers:
$ QUERY='UPDATE pgbench_accounts SET abalance = abalance + 25 WHERE aid > 78657769;'

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "SET session_replication_role='replica'; $QUERY"; done | tr "n" " "
UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 

## For good measure, confirm Bucardo did not try to replicate all those rows:
$ bucardo kick fiveway
Kicked sync fiveway

$ grep Totals log.bucardo
(11144) [Mon May 16 23:08:57 2016] KID (fiveway) Totals: deletes=36 inserts=28 conflicts=0
(11144) [Mon May 16 23:09:02 2016] KID (fiveway) Totals: deletes=38 inserts=29 conflicts=0
(11144) [Mon May 16 23:09:22 2016] KID (fiveway) Totals: deletes=34 inserts=27 conflicts=0
(11144) [Tue May 16 23:15:08 2016] KID (fiveway) Totals: deletes=10 inserts=7 conflicts=0
(11144) [Tue May 16 23:59:00 2016] KID (fiveway) Totals: deletes=126 inserts=73 conflicts=0

Solution two: truncate the delta

As a second solution, what about the event involving a junior DBA who made all those updates on one of the source databases without disabling triggers? When this happens, you would probably find that your databases are all backed up and waiting for Bucardo to handle the giant replication job. If the rows that have changed constitute most of the total rows in the table, your best bet is to simply copy the entire table. You will also need to stop the Bucardo daemon, and prevent it from trying to replicate those rows when it starts up by cleaning out the delta table. As a first step, stop the main Bucardo daemon, and then forcibly stop any active Bucardo processes:

$ bucardo stop
Creating ./fullstopbucardo ... Done

$ pkill -15 Bucardo

Now to clean out the delta table. In this example, the junior DBA updated the "beta" database, so we look there. We may go ahead and truncate it because we are going to copy the entire table after that point.

# The delta tables follow a simple format. Make sure it is the correct one
$ psql beta -Atc 'select count(*) from bucardo.delta_public_pgbench_accounts'
## Yes, this must be the one!

## Truncates are dangerous; be extra careful from this point forward
$ psql beta -Atc 'truncate table bucardo.delta_public_pgbench_accounts'

The delta table will continue to accumulate changes as applications update the table, but that is okay - we got rid of the 650 million rows. Now we know that beta has the canonical information, and we need to get it to all the others. As before, we use session_replication_role. However, we also need to ensure that nobody else will try to add rows before our COPY gets in there, so if you have active source databases, pause your applications. Or simply shut them out for a while via pg_hba.conf! Once that is done, we can copy the data until all databases are identical to "beta":

$ ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts; " ; pg_dump beta --section=data -t pgbench_accounts ) | psql alpha -1 --set ON_ERROR_STOP=on
ERROR:  cannot truncate a table referenced in a foreign key constraint
DETAIL:  Table "pgbench_history" references "pgbench_accounts".
HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE ... CASCADE.

Aha! Note that we used the --foreign-keys option when creating the pgbench tables above. We will need to remove the foreign key, or simply copy both tables together. Let's do the latter:

$ ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts, pgbench_history; " ; pg_dump beta --section=data -t pgbench_accounts 
  -t pgbench_history) | psql alpha -1 --set ON_ERROR_STOP=on
COPY 100000
(1 row)
## Do the same for the other databases:
$ for db in gamma delta epsilon; do 
 ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts, pgbench_history; " ; pg_dump $db --section=data -t pgbench_accounts 
  -t pgbench_history) | psql alpha -1 --set ON_ERROR_STOP=on ; done

Note: if your tables have a lot of constraints or indexes, you may want to disable those to speed up the COPY. Or even turn fsync off. But that's the topic of another post.

Solution three: delta excision

Our final solution is a variant on the last one. As before, the junior DBA has done a mass update of one of the databases involved in the Bucardo sync. But this time, you decide it should be easier to simply remove the deltas and apply the changes manually. As before, we shut down Bucardo. Then we determine the timestamp of the mass change by checking the delta table closely:

$ psql beta -Atc 'select txntime, count(*) from bucardo.delta_public_pgbench_accounts group by 1 order by 2 desc limit 3'
2016-05-26 23:23:27.252352-04|65826965
2016-05-26 23:23:22.460731-04|80
2016-05-07 23:20:46.325105-04|73
2016-05-26 23:23:33.501002-04|69

Now we want to carefully excise those deltas. With that many rows, it is quicker to save/truncate/copy than to do a delete:

$ psql beta
beta=# BEGIN;
## To prevent anyone from firing the triggers that write to our delta table
beta=#LOCK TABLE pgbench_accounts;
## Copy all the delta rows we want to save:
beta=# CREATE TEMP TABLE bucardo_store_deltas AS SELECT * FROM bucardo.delta_public_pgbench_accounts WHERE txntime <> '2016-05-07 23:20:46.325105-04';
beta=# TRUNCATE TABLE bucardo.delta_public_pgbench_accounts;
## Repopulate the delta table with our saved edits
beta=# INSERT INTO bucardo.delta_public_pgbench_accounts SELECT * FROM bucardo_store_deltas;
INSERT 0 1885
## This will remove the temp table
beta=# COMMIT;

Now that the deltas are removed, we want to emulate what caused them on all the other servers. Note that this query is a contrived one that may lend itself to concurrency issues. If you go this route, make sure your query will produce the exact same results on all the servers.

## As in the first solution above, this should ideally run in parallel
$ QUERY='UPDATE pgbench_accounts SET abalance = abalance + 25 WHERE aid > 78657769;'

## Unlike before, we do NOT run this against beta
$ for db in alpha gamma delta epsilon; do psql $db -Atc "SET session_replication_role='replica'; $QUERY"; done | tr "n" " "
UPDATE 837265 UPDATE 837265 UPDATE 837265 UPDATE 837265 UPDATE 837265 

## Now we can start Bucardo up again
$ bucardo start
Checking for existing processes
Starting Bucardo

That concludes the solutions for when you have to make a LOT of changes to your database. How do you know how much is enough to worry about the solutions presented here? Generally, you can simply let Bucardo run - you will know when everything crawls to a halt that perhaps trying to insert 465 million rows at once was a bad idea. :)

published by (Marco Matarazzo) on 2016-05-25 15:57:00 in the "nginx" category

While working on a complex project, we had to set up a caching reverse proxying image server with the ability of automatically resize any cached image on the fly.

Looking around on the Internet, I discovered that Nginx has a neat Image Filter module capable of resizing, cropping and rotating images. I decided to try and combine this with Nginx's well-known caching capabilities, to create an Nginx-only solution.

I'll describe here a sample setup to achieve a similar configuration.

Prerequisites and requisites

We obviously need to install Nginx.

Note that the Image Filter module is not installed by default on many Linux distributions, and we may have to install it as a separate module. If we're using Nginx's official repositories, it should just be a matter of installing the nginx_module_image_filter package and restarting the service.

What we want to achieve in this example configuration is to have a URL like:<width>x<height>/<URL>

...that will retrieve the image at:<URL>

...then resize it on the fly, cache it and serve it.

Cache Storage configuration

First of all, we need to set up the cache in our main http section:

proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=nginx_cache:10M max_size=100M inactive=40d;

This will provide us with a 10MB storage space for keys and 100MB for actual images, that will be removed after not being accessed for 40 days. These values can be tuned as needed.

Caching Proxy configuration

Next, we'll configure our front facing virtual host.

In our case, we needed the reverse proxy to live within an already existing site, and that's why we chose the /image/ path prefix.

  server {
      listen       80;
      location /image/ {
          proxy_cache nginx_cache;
          proxy_cache_key "$proxy_host$uri$is_args$args";
          proxy_cache_valid 30d;
          proxy_cache_valid any 10s;
          proxy_cache_lock on;
          proxy_cache_use_stale error invalid_header timeout updating;
          proxy_http_version 1.1;
          expires 30d;
      location / {
          # other locations we may need for the site.
          root /var/www/whatever;

Every URL starting with /image/ will be server from the cache if present, otherwise it will be proxied to our Resizing Server, and cached for 30 days.

Resizing Server configuration

Finally, we'll configure the resizing server. Here, we use a regexp to extract the width, height and URL of the image we desire.

The server will proxy the request to looking for the image, resize it and then serve it back to the Caching Proxy.

  server {
      server_name localhost;
      location ~ ^/image/([0-9]+)x([0-9]+)/(.+) {
          image_filter_buffer 20M; # Will return 415 if image is bigger than this
          image_filter_jpeg_quality 75; # Desired JPG quality
          image_filter_interlace on; # For progressive JPG
          image_filter resize $1 $2;

We may want to tune the buffer size and jpeg quality here.

Note that we may also use image_filter resize and crop options, should we need different results than just resizing.

Testing the final result

You should now be able to fire up your browser and access an URL like:

...and enjoy your caching, resizing, reverse proxying image server.

Optionally securing access to your image server

As a (simple) security measure to prevent abuse from unauthorized access, you can use the Secure Link module.

All we need to do is update the Resizing Server configuration, adding some lines to the location section:

  server {
      server_name localhost;
      location ~ ^/image/([0-9]+)x([0-9]+)/(.+) {
          secure_link $arg_auth;
          secure_link_md5 "$uri your_secret";
          if ($secure_link = "") {
              return 403;
          if ($secure_link = "0") {
              return 410;
          image_filter_buffer 20M; # Return 415 if image is bigger than this
          image_filter_jpeg_quality 75; # Desired JPG quality
          image_filter_interlace on; # For progressive JPG
          image_filter resize $1 $2;

To access your server you will now need to add an auth parameter to the request, with a secure token that can be easily calculated as an MD5 hash.

For example, to access the previous URL you can use the following bash command:

echo -n '/image/150x150/wikipedia/commons/0/01/Tiger.25.jpg your_secret' | openssl md5 -binary | openssl base64 | tr +/ -_ | tr -d =

...and the resulting URL will be:

published by Eugenia on 2016-05-25 08:21:07 in the "Entertainment" category
Eugenia Loli-Queru

A few years ago, Netflix had said that by 2015, they would stop their DVD subscription, and have almost everything streaming instead. 2015 came and left, and we’re not only still have DVD subscriptions on Netflix, but their streaming service has become weaker, with fewer noted titles. At the same time, Netflix is being battled by the establishment, be it Hollywood studios, or internet and cableTV carriers.

Sure, I’d love to have Hollywood’s latest offers streaming for me via Netflix, but this is obviously never going to happen. Not for $10 per month anyway. If people could pay $25 (or even $50) per month, a more full Hollywood catalog could be offered, but that’s not going to gather a lot of subscribers because the price is too high. Creating tiers of subscribers (e.g. $10, $25 etc with different catalogs for each), will anger customers too. So what could Netflix do?

I personally see only one way out of this mess, and it’s two fold:

1. Adopt an iTunes & Amazon model, where most Hollywood movies and TV shows are offered for a rent price (e.g. $3 for a movie, $1 for a TV episode).

2. Produce in-house about 250 productions per year (instead of the current 50 or so — episodes are counted separately here).

Let’s run some numbers on a back of an envelope:
There are 30 million Netflix subscribers in the US today. Each pays $10 per month. This means it has gross sales of $3.6 billion per year. Taxes and operational costs aside, should leave the company with $2 billion to invest in its own productions.

What this means is that on average, each production can cost up to $8 million. Which is plenty of money to shoot amazing movies *if you employ the right talent*. Consider the recent and well regarded sci-fi movies “Another Earth”, and “I, Origins”, by the same director. The first one was shot for just $70k, and the second one for $1 million.

Also, considering that some TV episodes don’t necessarily have to cost more than $2 mil (at least for dramas), it means that some more expensive than the average $8mil movie productions can take place too. I certainly don’t see why the first season of “House of Cards” cost $100 million…

So anyway, every other day a new episode or a new movie can debut on Netflix, that no other service has access to. I’m personally in favor of smaller TV seasons of 6 to 8 episodes (instead of the current 10-13), with the 3 first episodes streaming immediately together, and the rest every few days. Overtime, all these new productions will accumulate, building a strong catalog.

The first couple of years might be rough, while the catalog is building, but I think it can be done successfully, since a lot of their current streaming deals will also be active for a while before they go offline from the subscriber version of Netflix (these can still re-appear on their renting side). Plus, some of these productions (e.g. documentaries) are cheap enough to license anyway, so they can still remain at the streaming side of things.

published by (Matt Galvin) on 2016-05-17 13:00:00

I was recently working on a project that was using DimpleJS, which the docs describe as "An object-oriented API for business analytics powered by d3". I was using it to create a variety of graphs, some of which were line graphs. The client had requested that the line graph display the y-value of the line on the graph. This is easily accomplished with bar graphs in Dimple, however, not so easily done with line graphs.

I had spent some time Googling to find what others had done to add this functionality but could not find it anywhere. So, I read the documentation where they add labels to a bar graph, and "tweaked" it like so:

var s = myChart.addSeries(null, dimple.plot.line);
/*Add prices to line chart*/
s.afterDraw = function (shape, data) {
  // Get the shape as a d3 selection
  var s =;
  var i = 0;
  _.forEach(data.points, function(point) {
    var rect = {
    x: parseFloat(point.x),
    y: parseFloat(point.y)
  // Add a text label for the value
  if(data.markerData[i] != undefined) {
    .attr("x", rect.x)
    .attr("y", rect.y - 10)
    // Centre align
    .style("text-anchor", "middle")
    .style("font-size", "10px")
    .style("font-family", "sans-serif")
    // Format the number

Some styling still needs to be done but you can see that the y-values are now placed on the line graph. We are using lodash on this project but if you do not want to use lodash, just replace the _.forEach (line 10)and this technique should just plug in for you.

If you're reading this it's likely you've run into the same or similar issue and I hope this helps you!

published by (Kent K.) on 2016-05-12 17:33:00 in the "database" category

I was recently asked about options for displaying a random set of items from a table using Ruby on Rails. The request was complicated by the fact that the technology stack hadn?t been completely decided on and one of the items still up in the air was the database. I?ve had an experience with a project I was working on where the decision was made to switch from MySQL to PostgreSQL. During the switch, a sizable amount of hand constructed queries stopped functioning and had to be manually translated before they would work again. Learning from that experience, I favor avoidance of handwritten SQL in my Rails queries when possible. This precludes the option to use built-in database functions like rand() or random().

With the goal set in mind, I decided to look around to find out what other people were doing to solve similar requests. While perusing various suggested implementations, I noticed a lot of comments along the lines of ?Don?t use this approach if you have a large data set.? or ?this handles large data sets, but won?t always give a truly random result.?

These comments and the variety of solutions got me thinking about evaluating based not only on what database is in use, but what the dataset is expected to look like. I really enjoyed the mental gymnastics and thought others might as well.

Let?s pretend we?re working on an average project. The table we?ll be pulling from has several thousand entries and we want to pull back something small like 3-5 random records. The most common solution offered based on the research I performed works perfectly for this situation.

records_desired = 3
count = [OurObject.count, 1].max
offsets = records_desired.times.inject([]) do |offsets|
  offsets << rand(count)
while count > offsets.uniq!.size && offsets.size < records_desired do
  offsets << rand(count)
offsets.collect {|offset| OurObject.offset(offset).first}

Analyzing this approach, we?re looking at minimal processing time and a total of four queries. One to determine the total count and the rest to fetch each of our three objects individually. Seems perfectly reasonable.

What happens if our client needs 100 random records at a time? The processing is still probably within tolerances, but 101 queries? I say no unless our table is Dalmations! Let?s see if we can tweak things to be more large-set friendly.

records_desired = 100
count = [OurObject.count - records_desired, 1].max
offset = rand(count)

How?s this look? Very minimal processing and only 2 queries. Fantastic! But is this result going to appear random to an observer? I think it?s highly possible that you could end up with runs of related looking objects (created at similar times or all updated recently). When people say they want random, they often really mean they want unrelated. Is this solution close enough for most clients? I would say it probably is. But I can imagine the possibility that for some it might not be. Is there something else we can tweak to get a more desirable sampling without blowing processing time sky-high? After a little thought, this is what I came up with.

records_desired = 100
count = records_desired * 3
offset = rand([OurObject.count - count, 1].max)
set = OurObject.limit(count).offset(offset).pluck(:id)

While this approach may not truly provide more random results from a mathematical perspective, by assembling a larger subset and pulling randomly from inside it, I think you may be able to more closely achieve the feel of what people expect from randomness if the previous method seemed to return too many similar records for your needs.

published by (Ben Witten) on 2016-05-12 15:22:00 in the "Liquid Galaxy" category

For the last few weeks, our developers have been working on syncing our Liquid Galaxy with Sketchfab. Our integration makes use of the Sketchfab API to synchronize multiple instances of Sketchfab in the immersive and panoramic environment of the Liquid Galaxy. The Liquid Galaxy already has so many amazing capabilities, and to be able to add Sketchfab to our portfolio is very exciting for us! Sketchfab, known as the ?YouTube for 3D files,? is the leading platform to publish and find 3D and VR content. Sketchfab integrates with all major 3D creation tools and publishing platforms, and is the 3D publishing partner of Adobe Photoshop, Facebook, Microsoft HoloLens and Intel RealSense. Given that Sketchfab can sync with almost any 3D format, we are excited about the new capabilities our integration provides.

Sketchfab content can be deployed onto the system in minutes! Users from many industries use Sketchfab, including architecture, hospitals, museums, gaming, design, and education. There is a natural overlap between the Liquid Galaxy and Sketchfab, as members of all of these industries utilize the Liquid Galaxy for its visually stunning and immersive atmosphere.

We recently had Alban Denoyel, cofounder of Sketchfab, into our office to demo Sketchfab on the Liquid Galaxy. We're happy to report that Alban loved it! He told us about new features that are going to be coming out on Sketchfab soon. These features will automatically roll out to Sketchfab content on the Liquid Galaxy system, and will serve to make the Liquid Galaxy's pull with 3D modeling even greater.

We?re thrilled with how well Sketchfab works on our Liquid Galaxy as is, but we?re in the process of making it even more impressive. Some Sketchfab models take a bit of time to load (on their website and on our system), so our developers are working on having models load in the background so they can be activated instantaneously on the system. We will also be extending our Sketchfab implementation to make use of some of the features already present on Sketchfab's excellent API, including displaying model annotations and animating the models.

You can view a video of Sketchfab content on the Liquid Galaxy below. If you'd like to learn more, you can call us at 212-929-6923, or contact us here.

published by (Patrick Lewis) on 2016-05-02 13:22:00 in the "dependencies" category
The third-party gem ecosystem is one of the biggest selling points of Rails development, but the addition of a single line to your project's Gemfile can introduce literally dozens of new dependencies. A compatibility issue in any one of those gems can bring your development to a halt, and the transition to a new major version of Rails requires even more caution when managing your gem dependencies.

In this post I'll illustrate this issue by showing the steps required to get rails_admin (one of the two most popular admin interface gems for Rails) up and running even partially on a freshly-generated Rails 5 project. I'll also identify some techniques for getting unreleased and forked versions of gems installed as stopgap measures to unblock your development while the gem ecosystem catches up to the new version of Rails.

After installing the current beta3 version of Rails 5 with gem install rails --pre and creating a Rails 5 project with rails new I decided to address the first requirement of my application, admin interface, by installing the popular Rails Admin gem. The rubygems page for rails_admin shows that its most recent release 0.8.1 from mid-November 2015 lists Rails 4 as a requirement. And indeed, trying to install rails_admin 0.8.1 in a Rails 5 app via bundler fails with a dependency error:

Resolving dependencies...
Bundler could not find compatible versions for gem "rails":
In snapshot (Gemfile.lock):
rails (= 5.0.0.beta3)

In Gemfile:
rails (< 5.1, >= 5.0.0.beta3)

rails_admin (~> 0.8.1) was resolved to 0.8.1, which depends on
rails (~> 4.0)

I took a look at the GitHub page for rails_admin and noticed that recent commits make reference to Rails 5, which is an encouraging sign that its developers are working on adding compatibility with Rails 5. Looking at the gemspec in the master branch on GitHub shows that the rails_admin gem dependency has been broadened to include both Rails 4 and 5, so I updated my app's Gemfile to install rails_admin directly from the master branch on GitHub:

gem 'rails_admin', github: 'sferik/rails_admin'

This solved the above dependency of rails_admin on Rails 4 but revealed some new issues with gems that rails_admin itself depends on:

Resolving dependencies...
Bundler could not find compatible versions for gem "rack":
In snapshot (Gemfile.lock):
rack (= 2.0.0.alpha)

In Gemfile:
rails (< 5.1, >= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionmailer (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionpack (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
rack (~> 2.x)

rails_admin was resolved to 0.8.1, which depends on
rack-pjax (~> 0.7) was resolved to 0.7.0, which depends on
rack (~> 1.3)

rails (< 5.1, >= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionmailer (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionpack (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
rack-test (~> 0.6.3) was resolved to 0.6.3, which depends on
rack (>= 1.0)

rails_admin was resolved to 0.8.1, which depends on
sass-rails (< 6, >= 4.0) was resolved to 5.0.4, which depends on
sprockets (< 4.0, >= 2.8) was resolved to 3.6.0, which depends on
rack (< 3, > 1)

This bundler output shows a conflict where Rails 5 depends on rack 2.x while rails_admin's rack-pjax dependency depends on rack 1.x. I ended up resorting to a Google search which led me to the following issue in the rails_admin repo:

Installing rack-pjax from GitHub:

gem 'rack-pjax', github: 'afcapel/rack-pjax', branch: 'master'

resolves the rack dependency conflict, and bundle install now completes without error. Things are looking up! At least until you try to run the Rake task to rails g rails_admin:install and you're presented with this mess:

/Users/patrick/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.beta3/lib/action_dispatch/middleware/stack.rb:108:in `assert_index': No such middleware to insert after: ActionDispatch::ParamsParser (RuntimeError)
from /Users/patrick/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.beta3/lib/action_dispatch/middleware/stack.rb:80:in `insert_after'

This error is more difficult to understand, especially given the fact that the culprit (the remotipart gem) is not actually mentioned anywhere in the error. Thankfully, commenters on the above-mentioned rails_admin issue #2532 were able to identify the remotipart gem as the source of this error and provide a link to a forked version of that gem which allows rails_admin:install to complete successfully (albeit with some functionality still not working).

In the end, my Gemfile looked something like this:

gem 'rails_admin', github: 'sferik/rails_admin'
# Use github rack-pjax to fix dependency versioning issue with Rails 5
gem 'rack-pjax', github: 'afcapel/rack-pjax'
# Use forked remotipart until following issues are resolved
gem 'remotipart', github: 'mshibuya/remotipart', ref: '3a6acb3'

A total of three unreleased versions of gems, including the forked remotipart gem that breaks some functionality, just to get rails_admin installed and up and running enough to start working with. And some technical debt in the form of comments about follow-up tasks to revisit the various gems as they have new versions released for Rails 5 compatibility.

This process has been a reminder that when working in a Rails 4 app it's easy to take for granted the ability to install gems and have them 'just work' in your application. When dealing with pre-release versions of Rails, don't be surprised when you have to do some investigative work to figure out why gems are failing to install or work as expected.

My experience has also underscored the importance of understanding all of your application's gem dependencies and having some awareness of their developers' intentions when it comes to keeping their gems current with new versions of Rails. As a developer it's in your best interest to minimize the amount of dependencies in your application, because adding just one gem (which turns out to have a dozen of its own dependencies) can greatly increase the potential for encountering incompatibilities.

published by (Greg Sabino Mullane) on 2016-04-29 00:04:00 in the "postgres" category

Postgres has a wonderful feature called concurrent indexes. It allows you to create indexes on a table without blocking reads OR writes, which is quite a handy trick. There are a number of circumstances in which one might want to use concurrent indexes, the most common one being not blocking writes to production tables. There are a few other use cases as well, including:

Photograph by Nicholas A. Tonelli

  • Replacing a corrupted index
  • Replacing a bloated index
  • Replacing an existing index (e.g. better column list)
  • Changing index parameters
  • Restoring a production dump as quickly as possible

In this article, I will focus on that last use case, restoring a database as quickly as possible. We recently upgraded a client from a very old version of Postgres to the current version (9.5 as of this writing). The fact that use of pg_upgrade was not available should give you a clue as to just how old the "very old" version was!

Our strategy was to create a new 9.5 cluster, get it optimized for bulk loading, import the globals and schema, stop write connections to the old database, transfer the data from old to new, and bring the new one up for reading and writing.

The goal was to reduce the application downtime as much as reasonably possible. To that end, we did not want to wait until all the indexes were created before letting people back in, as testing showed that the index creations were the longest part of the process. We used the "--section" flags of pg_dump to create pre-data, data, and post-data sections. All of the index creation statements appeared in the post-data file.

Because the client determined that it was more important for the data to be available, and the tables writable, than it was for them to be fully indexed, we decided to try using CONCURRENT indexes. In this way, writes to the tables could happen at the same time that they were being indexed - and those writes could occur as soon as the table was populated. That was the theory anyway.

The migration went smooth - the data was transferred over quickly, the database was restarted with a new postgresql.conf (e.g. turn fsync back on), and clients were able to connect, albeit with some queries running slower than normal. We parsed the post-data file and created a new file in which all the CREATE INDEX commands were changed to CREATE INDEX CONCURRENTLY. We kicked that off, but after a certain amount of time, it seemed to freeze up.

The frogurt is also cursed.

Looking closer showed that the CREATE INDEX CONCURRENTLY statement was waiting, and waiting, and never able to complete - because other transactions were not finishing. This is why concurrent indexing is both a blessing and a curse. The concurrent index creation is so polite that it never blocks writers, but this means processes can charge ahead and be none the wiser that the create index statement is waiting on them to finish their transaction. When you also have a misbehaving application that stays "idle in transaction", it's a recipe for confusion. (Idle in transaction is what happens when your application keeps a database connection open without doing a COMMIT or ROLLBACK). A concurrent index can only completely finish being created once any transaction that has referenced the table has completed. The problem was that because the create index did not block, the app kept chugging along, spawning new processes that all ended up in idle in transaction.

At that point, the only way to get the concurrent index creation to complete was to forcibly kill all the other idle in transaction processes, forcing them to rollback and causing a lot of distress for the application. In contrast, a regular index creation would have caused other processes to block on their first attempt to access the table, and then carried on once the creation was complete, and nothing would have to rollback.

Another business decision was made - the concurrent indexes were nice, but we needed the indexes, even if some had to be created as regular indexes. Many of the indexes were able to be completed (concurrently) very quickly - and they were on not-very-busy tables - so we plowed through the index creation script, and simply canceled any concurrent index creations that were being blocked for too long. This only left a handful of uncreated indexes, so we simply dropped the "invalid" indexes (these appear when a concurrent index creation is interrupted), and reran with regular CREATE INDEX statements.

The lesson here is that nothing comes without a cost. The overly polite concurrent index creation is great at letting everyone else access the table, but it also means that large complex transactions can chug along without being blocked, and have to have all of their work rolled back. In this case, things worked out as we did 99% of the indexes as CONCURRENT, and the remaining ones as regular. All in all, the use of concurrent indexes was a big win, and they are still an amazing feature of Postgres.

published by (Elizabeth Garrett) on 2016-04-27 18:59:00 in the "community" category

We all love a good ending. I was happy to hear that one of End Point?s clients, Cybergenetics, was involved in a case this week to free a falsely imprisoned man, Darryl Pinkins.

Darryl was convicted of a crime in Indiana in 1991. In 1995 Pinkins sought the help of the Innocence Project. His attorney Frances Watson and her students turned to Cybergenetics and their DNA interpretation technology called TrueAllele® Casework. The TrueAllele DNA identification results exonerated Pinkins. The Indiana Court of Appeals dropped all charges against Pinkins earlier this week and he walked out of jail a free man after fighting for 24 years to clear his name.

TrueAllele can separate out the people who contributed their DNA to a mixed DNA evidence sample. It then compares the separated out DNA identification information to other reference or evidence samples to see if there is a DNA match.

End Point has worked with Cybergenetics since 2003 and consults with them on security, database infrastructure, and website hosting. We congratulate Cybergenetics on their success in being part of the happy ending for Darryl Pinkins and his family!

More of the story is available at Cybergenetics? Newsroom or the Chicago Tribune.

published by (Yaqi Chen) on 2016-04-27 17:49:00 in the "Liquid Galaxy" category

Nowadays, virtual reality is one of the hottest topics in tech, with VR enabling users to enter immersive environments built up by computer technology. I attended Mobile World Congress 2016 a few weeks ago, and it was interesting to see people sit next to one another and totally ignore one another while they were individually immersed in their own virtual reality worlds.

When everyone is so addicted to their little magic boxes, they tend to lose their connections with people around them. End Point has developed a new experience in which users can watch and share their virtually immersive world together. This experience is called the Liquid Galaxy.

When a user stands in front of Liquid Galaxy and is surrounded by a multitude of huge screens arranged in a semicircle, he puts not only his eyes but his whole body into an unprecedented 3D space. These screens are big enough to cover the audience?s entire peripheral vision and bring great visual stimulation from all directions. When using the Liquid Galaxy system, the users become fully immersed in the system and the imagery they view.

Movie Night at End Point

This digital chamber can be considered a sort of VR movie theater, where an audience can enjoy the same content, and probably the same bucket of popcorn! While this setup makes the Liquid Galaxy a natural fit for any sort of exhibit, many End Point employees have also watched full length feature movies on the system during our monthly Movie Night at our Headquarters office in Manhattan. This sort of shared experience is not something that is possible on typical VR, because unlike VR the Liquid Galaxy is serving a larger audience and presenting stories in a more interactive way.

For most meetings, exhibitions, and other special occasions, the Liquid Galaxy helps to provide an amazing and impactful experience to the audience. Any scenario can be built for users to explore, and geospatial data sets can be presented immersively.

With the ability to serve a group of people simultaneously, Liquid Galaxy increases the impact of content presentation and brings a revolutionary visual experience to its audiences. If you'd like to learn more, you can call us at 212-929-6923, or contact us here.