All opinions expressed are those of the authors and not necessarily those of OSNews.com, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by noreply@blogger.com (Greg Sabino Mullane) on 2016-12-07 13:38:00 in the "postgres" category

Anytime you run a query in Postgres, it needs to compile your SQL into a lower-level plan explaining how exactly to retrieve the data. The more it knows about the tables involved, the smarter the planner can be. To get that information, it gathers statistics about the tables and stores them, predictably enough, in the system table known as pg_statistic. The SQL command ANALYZE is responsible for populating that table. It can be done per-cluster, per-database, per-table, or even per-column. One major pain about analyze is that every table *must* be analyzed after a major upgrade. Whether you upgrade via pg_dump, pg_upgrade, Bucardo, or some other means, the pg_statistic table is not copied over and the database starts as a clean slate. Running ANALYZE is thus the first important post-upgrade step.

Unfortunately, analyze can be painfully slow. Slow enough that the default analyze methods sometimes take longer that the entire rest of the upgrade! Although this article will focus on the pg_upgrade program in its examples, the lessons may be applied to any upgrade method. The short version of the lessons is: run vacuumdb in parallel, control the stages yourself, and make sure you handle any custom per-column statistics.

Before digging into the solution in more detail, let's see why all of this is needed. Doesn't pg_upgrade allow for super-fast Postgres major version upgrades, including the system catalogs? It does, with the notable exception of the pg_statistics table. The nominal reason for not copying the data is that the table format may change from version to version. The real reason is that nobody has bothered to write the conversion logic yet, for pg_upgrade could certainly copy the pg_statistics information: the table has not changed for many years.

At some point, a DBA will wonder if it is possible to simply copy the pg_statistic table from one database to another manually. Alas, it contains columns of the type "anyarray", which means it cannot be dumped and restored:

$ pg_dump -t pg_statistic --data-only | psql -q
ERROR:  cannot accept a value of type anyarray
CONTEXT:  COPY pg_statistic, line 1, column stavalues1: "{"{i,v}","{v}","{i,o,o}","{i,o,o,o}","{i,i,i,v,o,o,o}","{i,i,o,o}","{i,o}","{o,o,o}","{o,o,o,o}","{o..."

I keep many different versions of Postgres running on my laptop, and use a simple port naming scheme to keep them straight. It's simple enough to use pg_dump and sed to confirm that the structure of the pg_statistic table has not changed from version 9.2 until 9.6:

$ for z in 840 900 910 920 930 940 950; do echo -n $z: ; diff -sq <(pg_dump 
>  --schema-only -p 5$z -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p') <(pg_dump 
>  --schema-only -p 5960 -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p'); done
840:Files /dev/fd/63 and /dev/fd/62 differ
900:Files /dev/fd/63 and /dev/fd/62 differ
910:Files /dev/fd/63 and /dev/fd/62 differ
920:Files /dev/fd/63 and /dev/fd/62 are identical
930:Files /dev/fd/63 and /dev/fd/62 are identical
940:Files /dev/fd/63 and /dev/fd/62 are identical
950:Files /dev/fd/63 and /dev/fd/62 are identical

Of course, the same table structure does not promise that the backend of different versions uses them in the same way (spoiler: they do), but that should be something pg_upgrade can handle by itself. Even if the table structure did change, pg_upgrade could be taught to migrate the information from one format to another (its raison d'ĂȘtre). If the new statistics format take a long time to generate, perhaps pg_upgrade could leisurely generate a one-time table on the old database holding the new format, then copy that over as part of the upgrade.

Since pg_upgrade currently does none of those things and omits upgrading the pg_statistics table, the following message appears after pg_upgrade has been run:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Looking at the script in question yields:

#!/bin/sh

echo 'This script will generate minimal optimizer statistics rapidly'
echo 'so your system is usable, and then gather statistics twice more'
echo 'with increasing accuracy.  When it is done, your system will'
echo 'have the default level of optimizer statistics.'
echo

echo 'If you have used ALTER TABLE to modify the statistics target for'
echo 'any tables, you might want to remove them and restore them after'
echo 'running this script because they will delay fast statistics generation.'
echo

echo 'If you would like default statistics as quickly as possible, cancel'
echo 'this script and run:'
echo '    vacuumdb --all --analyze-only'
echo

vacuumdb --all --analyze-in-stages
echo

echo 'Done'

There are many problems in simply running this script. Not only is it going to iterate through each database one-by-one, but it will also process tables one-by-one within each database! As the script states, it is also extremely inefficient if you have any per-column statistics targets. Another issue with the --analyze-in-stages option is that the stages are hard-coded (at "1", "10", and "default"). Additionally, there is no way to easily know when a stage has finished other than watching the command output. Happily, all of these problems can be fairly easily overcome; let's create a sample database to demonstrate.

$ initdb --data-checksums testdb
$ echo port=5555 >> testdb/postgresql.conf 
$ pg_ctl start -D testdb
$ createdb -p 1900 alpha
$ pgbench alpha -p 1900 -i -s 2
$ for i in `seq 1 100`; do echo create table pgb$i AS SELECT * FROM pgbench_accounts;; done | psql -p 1900 alpha

Now we can run some tests to see the effect of the --jobs option. Graphing out the times shows some big wins and nice scaling. Here are the results of running vacuumdb alpha --analyze-only with various values of --jobs:

Simple graph showing time decreasing as number of jobs increases

The slope of your graph will be determined by how many expensive-to-analyze tables you have. As a rule of thumb, however, you may as well set --jobs to a high number. Anything over your max_connections setting is pointless, but don't be afraid to jack it up to at least a hundred. Experiment on your test box, of course, to find the sweet spot for your system. Note that the --jobs argument will not work on old versions of Postgres. For those cases, I usually whip up a Perl script using Parallel::ForkManager to get the job done. Thanks to Dilip Kumar for adding the --jobs option to vacuumdb!

The next problem to conquer is the use of custom statistics. Postgres' ANALYZE uses the default_statistics_target setting to determine how many rows to sample (the default value in modern versions of Postgres is 100). However, as the name suggests, this is only the default - you may also set a specific target at the column level. Unfortunately, there is no way to disable this quickly, which means that vacuumdb will always use the custom value. This is not what you want, especially if you are using the --analyze-in-stages option, as it will happily (and needlessly!) recalculate columns with specific targets three times. As custom stats are usually set much higher than the default target, this can be a very expensive option:

$ ## Create a largish table:
$ psql -qc 'create unlogged table aztest as select * from pgbench_accounts'
$ for i in {1..5}; do psql -qc 'insert into aztest select * from aztest'; done
$ psql -tc "select pg_size_pretty(pg_relation_size('aztest'))"
820 MB
$ psql -qc 'timing' -c 'analyze aztest'
Time: 590.820 ms  ## Actually best of 10: never get figures from a single run!
$ psql -c 'alter table aztest alter column aid set statistics 1000'
ALTER TABLE
$ psql -qc 'timing' -c 'analyze aztest'
Time: 2324.017 ms ## as before, this is the fastest of 10 runs

As you can see, even a single column can change the analyze duration drastically. What can we do about this? The --analyze-in-stages is still a useful feature, so we want to set those columns back to a default value. While one could reset the stats and then set them again on each column via a bunch of ALTER TABLE calls, I find it easier to simply update the system catalogs directly. Specifically, the pg_attribute table contains a attstattarget column which has a positive value when a custom target is set. In our example above, the value of attstattarget for the aid column would be 1000. Here is a quick recipe to save the custom statistics values, set them to the default (-1), and then restore them all once the database-wide analyzing is complete:

## Save the values away, then reset to default:
CREATE TABLE custom_targets AS SELECT attrelid, attname, attnum, attstattarget
  FROM pg_atttribute WHERE attstattarget > 0;
UPDATE pg_attribute SET attstattarget = -1 WHERE attstattarget > 0;

## Safely run your database-wide analyze now
## All columns will use default_statistics_target

## Restore the values:
UPDATE pg_attribute a SET attstattarget = c.attstattarget
  FROM custom_targets c WHERE a.attrelid = c.attrelid
  AND a.attnum = c.attnum AND a.attname = c.attname;

## Bonus query: run all custom target columns in parallel:
SELECT 'vacuumdb --analyze-only -e -j 100 ' || 
  string_agg(format('-t "%I(%I)" ', attrelid::regclass, attname), NULL)
FROM pg_attribute WHERE attstattarget > 0;

As to the problems of not being able to pick the stage targets for --analyze-in-stages, and not being able to know when a stage has finished, the solution is to simply do it yourself. For example, to run all databases in parallel with a target of "2", you would need to change the default_statistics_target at the database level (via ALTER DATABASE), or at the cluster level (via ALTER SYSTEM). Then invoke vacuumdb, and reset the value:

$ psql -qc 'alter system set default_statistics_target = 2' -qc 'select pg_reload_conf()'
$ vacuumdb --all --analyze-only --jobs 100
$ psql -qc 'alter system reset default_statistics_target' -qc 'select pg_reload_conf()'

In summary, don't trust the given vacuumdb suggestions for a post-upgrade analyze. Instead, remove any per-column statistics, run it in parallel, and do whatever stages make sense for you.


published by noreply@blogger.com (Selvakumar Arumugam) on 2016-12-03 16:33:00 in the "ActionCable" category

Ruby on Rails is a wonderful framework for web development. It was lacking for one important feature since the world has been moving to towards realtime data. Everyone wants to see the realtime data on the applications. Mostly, real-time web applications are now accomplished using WebSocket.

WebSocket provides full-duplex communication between server and client using TCP connection once the handshake completed through HTTP protocol. WebSocket transfers streams of messages on top TCP without being solicited by the client which boosts the data transfer performance on high level compare to HTTP request/response.

WebSocket were adopted on RoR applications with help of third party libraries. But Rails 5 came up with a module called ActionCable which is seamlessly sits with existing framework and integrates the WebSocket to the application. ActionCable provides server and client side framework to implement WebSocket with the application.

ActionCable Overview:

Server Side:

Connection: The Connection only handles authentication and authorisation part of logic. The connection object is instantiated when request from the user comes through browser tab or window or devices. Multiple connections could be created when the user access the server from different devices/browser tabs.

Channel: The Channel is the parent of all custom channels and shares the common logic with all channels. The custom channels will stream the messages to client when the corresponding channels were subscribed by client.

Client Side:

The Client side javascript framework have all functionalities to interact with server side. The Consumer will establish a WebSocket connection with Server to do all the communications. Subscriber subscribes the custom channels to receive the messages from the Server without requests.

Prerequisite:
* Ruby 2.2.2+ is the only supported version for Rails 5. Install the gem package and Rails 5 on your environment.
* ActionCable needs puma as development server to support multithreaded feature.

Let's create the rails 5 chat application...! The application structure will have following default action cable related files.

$ rails new action-cable-chat-example
 - Server Side
        app/channels
        app/channels/application_cable
        app/channels/application_cable/connection.rb
        app/channels/application_cable/channel.rb

 - Client Side
        app/assets/javascripts
        app/assets/javascripts/channels
        app/assets/javascripts/channels/.keep
        app/assets/javascripts/application.js
        app/assets/javascripts/cable.js

Below models and controllers need to be created to have basic chat application.

* User, Room and Message models
* users, rooms, messages, sessions and welcome controllers

The commands to create these items are listed below and skipping the code to focus on ActionCable but the code is available at github to refer or clone.

$ bundle install
$ rails g model user name:string
$ rails g model room title:string name:string user:references
$ rails g model message content:text user:references room:references
$ rake db:migrate

$ rails g controller users new create show
$ rails g controller rooms new create update edit destroy index show
$ rails g controller messages create

$ rails g controller sessions new create destroy
$ rails g controller welcome about

Make necessary changes to controllers, models and views to create chat application with chat rooms(Refer Github Repository). Start the application with help of puma server to verify the basic functionalities.

$ rails s -b 0.0.0.0 -p 8000

The application should meet following actions. The User will sign up or login with username to get the access new or existing rooms to chat. The user can write messages on the chat room but the messages won't appear to other users at the moment without refreshing the page. Let's see how ActionCable handles it.

Action Cable Implementation:

Configurations:

There are few configurations to enable the ActionCable on the application.

config/routes.rb - The server should be mounted on specific path to serve websocket cable requests.

mount ActionCable.server => '/cable'

app/views/layouts/application.html.erb - The action_cable_meta_tag passes the WebSocket URL(which is configured on environment variable config.action_cable.url) to consumer.

<%= action_cable_meta_tag %>

app/assets/javascripts/cable.js - The consumer should be created to establish the WebSocket connection to specified URL in action-cable-url.

(function() {
  this.App || (this.App = {});

  App.cable = ActionCable.createConsumer();

}).call(this);

Once ActionCable was enabled, the WebSocket connection will be established on accessing the application from any client. But the messages will transmitted only through channels. Here is the sample handshake to create WebSocket connection.

General:
Request URL:ws://139.59.24.93:8000/cable
Request Method:GET
Status Code:101 Switching Protocols

Request Headers:
Connection:Upgrade
Host:139.59.24.93:8000
Origin:http://139.59.24.93:8000
Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits
Sec-WebSocket-Key:c8Xg5vFOibCl8rDpzvdgOA==
Sec-WebSocket-Protocol:actioncable-v1-json, actioncable-unsupported
Sec-WebSocket-Version:13
Upgrade:websocket

Response Headers:
Connection:Upgrade
Sec-WebSocket-Accept:v46QP1XBc0g5JYHW7AdG6aIxYW0=
Sec-WebSocket-Protocol:actioncable-v1-json
Upgrade:websocket

The /cable is the default URI. if there is a custom URI, it need to mentioned environment file. The origins need to be allowed in the configuration if it is other than localhost.

environments/development.rb
# config.action_cable.url = 'wss://example.com/cable'
# config.action_cable.allowed_request_origins = [ 'http://example.com', /http://example.*/ ]
Workflow:

I created a diagram to illustrate how the pieces fit together and explain the workflow.

Channels:

The Server side messages channel need to be created to stream the messages from Server to all subscribed clients and client side framework to subscribe the channels to receive the messages. Execute the channels generator and create messages channels skeleton to code on server and client side.

$ rails generate channel Messages 

 app/channels/messages_channel.rb
 app/assets/javascripts/channels/messages.js

messages_controller.rb - Whenever the user writes a message in the room, it will be broadcasted to 'messages' channel after the save action.

class MessagesController < ApplicationController
  def create
    message = Message.new(message_params)
    message.user = current_user
    if message.save
      ActionCable.server.broadcast 'messages',
        message: message.content,
        user: message.user.username
      head :ok
    end
  end

  private

    def message_params
      params.require(:message).permit(:content, :chatroom_id)
    end
end

messages_channel.rb - Messages channel streams those broadcasted messages to subscribed clients through established WebSocket connection.

class MessagesChannel < ApplicationCable::Channel  
  def subscribed
    stream_from 'messages'
  end
end  

messages.js The MessagesChannel was subscribed on accessing the Rooms to chat. The client side receives the message as per subscriptions and populate on the chat room dynamically.

App.messages = App.cable.subscriptions.create('MessagesChannel', {  
  received: function(data) {
    $("#messages").removeClass('hidden')
    return $('#messages').append(this.renderMessage(data));
  },
  renderMessage: function(data) {
    return "

" + data.user + ": " + data.message + "

"; } });

These ActionCable channel related changes could make the Chat application to receive the messages on realtime.

Conclusion:

Rails Action Cable adds additional credits to framework by supplying the promising needed realtime feature. In addition, It could be easily implemented on existing Rails application with the nature of interacting with existing system and similar structural implementation. Also, The strategy of the channels workflow can be applied to any kind of live data feeding. The production stack uses redis by default (config/cable.yml) to send and receive the messages through channels.


published by noreply@blogger.com (Sam Batschelet) on 2016-12-01 03:17:00 in the "Dancer2" category

Perl Dancer Conference Day 1

The Perl Dancer Conference is a great event, now in its third year. The event took place in the same location as last year in Vienna, Austria at the Hotel Schani Wien. For those of you who have never visited Vienna, it is a perfect place to bring the family. From visiting the beautiful parks to taking a scenic ride on the Danube River, the beautiful and historic city is known for its rich art and musical culture, and has much to offer.

I was very excited to not only attend but also give a talk this year. My talk titled "Dancing in the Clouds" also coincided with the release of 2 new Perl modules Etcd3 and Dancer2::Plugin::Etcd. This article will be the first of a 3 part series, with the final article a focus on my talk and usage examples with the new modules.

Sawyer X (Sawyer X) - A bus tour through Dancer core

The Captain of Dancer core, SawyerX, took us on a bus tour through the core functionality of Dancer2. Using practical examples of code blocks from core, he explained how different areas of the code base worked. I personally enjoyed his explanation of how hooks are implemented and created. Learning from the 1st iteration of Dancer, the second version shows maturity and stability.

Stefan Hornburg (Racke) - No Act on ACT

If you have ever taken the time to review a Perl conference's website or even purchase tickets to attend you have no doubt been in contact with ACT. "Act (A Conference Toolkit) is a multilingual, template-driven, multi-conference website that can manage the users, talks, schedule, and payment for your conference." While this package has been around for many years, it is somewhat dreaded because of its lack of features.

Stefan outlines his work with Interchange6::Schema and the perl.dance website painting a picture of the replacement for ACT. Utilizing Dancer2, DBIx::Class, Moo and other modern Perl tools the infrastructure outlined is very compelling. The package has a user admin, e-commerce, and even a module to print out the passes. Although he notes that this is not a plug and play replacement for ACT yet, with a bit of work and support, it could be the future of Perl conference management.

Andrew Beverly - Implementing i18n in a Dancer application using Plugin::LogReport

Andrew extended his talk last year about internationalization with the Dancer2::Plugin::LogReport module. Using great examples, he not only outlined the usage of the module, but also got into details of how the process works on a technical level. Explaining the different ways that internationalization can be done, he begins to outline how he achieved his task of deploying i18n in a Dancer app.

Theo van Hoesel - Quickstep

Theo was a great addition to the conference this year. He was able to make the event on very short notice after Dancer core Jason Crome was not able to attend due to injury. Theo outlined the Act::Voyager project briefly and the general requirements of adding user friendly features to the ACT toolkit. He also spent a good amount of time explaining the concept of web caching and how many of the existing modules failed in the task of complying with RFC7234. He then explained how all of this brought him to create HTTP::Caching and how it has "The RFC 7234 compliant brains to do caching right". Along with this contribution part of the HTTP::Bundle, his Dancer focused Dancer2::Plugin::HTTP::Bundle was explained.

Job van Achterberg (jkva) - Dancing with Disabilities

Job's talk was a very interesting look into how taking a considerate approach to development and small changes to your code can improve a disabled web user's experience. Using the tools in macOS Job showed how simple things such as naming a list are reflected in a disabled users ability to get information. What I found very interesting in this presentation was how awkward the tools were to use even for an experienced pro like Job. It really made me think a lot about the challenges the disabled face in something many of us take for granted.

Jason Lewis - The Lazy Programmer's Guide to building HTML tables in Dancer2

Jason has become a regular on the #dancer freenode IRC channel. This year he decided to travel all the way from Australia to give his presentation on his experiences replacing Crystal Reports with Dancer2 Template::Toolkit and DataTables. Although a great deal of the presentation was focused on the features of the jQuery plugin DataTables, he gave nice examples of code he used to populate reports and the hurdles he faced replacing Crystal Reports functionality. The reports looked beautiful and were very easy to convert to other data types such as PDF and CSV.

Stefan Seifert (nine) - Perl 5 and Perl 6 - a great team

Stefan is a great presence at the conference, and his fun and witty personality carried over to his presentation. After opening with a really funny skit as a reporter reading the the news, he continued to outline the current state of Perl6 and how important it is for all of us as a community to embrace the fact that we are all in this together. He reminded us of the perils of Python3's launch and the lack of support even today. He then began to focus on the capabilities of using Perl5 with Perl6 together with Inline::Perl5 and Inline::Perl6 modules. To be honest before his talk I had given Perl6 very little time. Stefan's talk opened my eyes to the possibilities of utilizing the two versions together and the advantages that ecosystem has.

Please stop back for links to day 2 of the conference and a breakdown of my talk outlining etcd integration with Perl and Dancer2.


published by noreply@blogger.com (Josh Williams) on 2016-12-01 00:55:00 in the "postgres" category
We needed a way to suppress specific write commands on a Postgres streaming replica. The replica was set up for a DR configuration, with the applications able to be brought up into full service at a moment's notice. But since it's a hot standby, we'd like to still allow the applications to be used in a working but read-only state.

One of the applications on this database is MediaWiki, which worked great in this configuration. But a couple of the other apps have the classic behavior of updating its user object's "last login" field in one form or another when someone authenticates, which would cause the login process to fail entirely.

Of course we want updates to fail, up until that point when (knock on wood) the master server is declared down for the count and the official fail-over happens. Except for the one command that executes on login.

We don't really care about the "last login" type field -- the data is available through logs and other means. The affected apps could probably all be monkey patched to work around that part of the process. But we had a couple different apps doing this, and that adds a maintenance burden for each. And if we could figure out how to make it work at the database level then it'd work for all of them, plus anything else that might pop up.

The first thing we looked at was writing a trigger to intercept the commands, but triggers don't execute on a hot standby replica so that was out pretty quickly. The next hypothesis was that we could write a foreign data wrapper that'd just absorb the writes, or even just use postgres_fdw to route the commands to a local writable database that's more or less a throw-away data sink. But to our surprise, even writes to foreign tables get rejected on a hot standby. I'm slightly tempted to dig in and see what it'd take to enable that.

The third time was the charm: rules. Rules hook in pretty far down into the query parser, and they can be notoriously tricky to work with. But since they're embedded pretty deep, down to the point where views rely on them they're obeyed even on a replica.

So the technique was this: On the master (... obviously) we set up a separate schema, inside which a view was created with the same name as the target table and which had certain commands suppressed:

CREATE SCHEMA replica;

CREATE VIEW replica.users AS SELECT * FROM public.users;

CREATE RULE users_disable_update AS ON UPDATE TO replica.users DO INSTEAD NOTHING;

Plus any permission adjustments the app user might need. On the master server this schema and view are pretty much ignored, as the application user just uses the default search path. But on the replica, we adjust the default search path in postgresql.conf so that it applies to just that server:

search_path = '"$user",replica,public'
app@app:5432=> UPDATE "auth_user" SET "last_login" = now() WHERE "auth_user"."id" = 13;
UPDATE 0

It doesn't quite work everywhere, sadly! Notice the "UPDATE 0"? We found Django actually checks that, and panics with an error to the user when it tries to change something and the row count it gets back is different than what it expects.

Another caveat is that if the target table's schema changes, the view won't automatically follow. Presumably your user table doesn't receive changes all that often, but if you're applying this technique to something else, that might not be the case. Something to be aware of!

published by noreply@blogger.com (Josh Lavin) on 2016-11-28 12:00:00 in the "editor" category

Vim is a text-based editor that has been around for 25 years. It comes pre-installed on Linux distributions, so it is a great tool for developing on servers. One of the advantages of Vim is that oft-used keystrokes can be performed without moving your hands from the keyboard (there is no mouse in Vim).

Many of the engineers here at End Point use Vim for our daily development work, and recently, a few of us got together online to try to learn some new tricks and tips from each other. Being efficient with Vim not only improves productivity, it's a lot of fun.

Similar to playing a round of golf, we tested each other with various editing tasks, to see who could perform the task in the fewest number of keystrokes. This is known as "Vim Golf." There is even an entire website devoted to this.

In this post, we share some of the interesting tricks that were shown, and also some links to further learning about Vim.

Tips & Tricks

  • Indenting text: there are multiple ways to do this, but a few are:
    • Visually-select the lines of text to indent (Ctrl v or Shift v), then > to indent, or < to outdent. Press . to perform this action again and again.
    • Locate the line numbers for the lines you wish to change (:set number to turn on line numbering), then :17,36>> to indent lines 17-36 two indentation levels.
    • Define width of a tab :set tabstop=4 would for example set a tab to 4 spaces.
    • Use spaces defined in tabstop instead of an actual tab character (^I) when the Tab key is pressed :set expandtab or :set et
    • Replace tab settings for current line :retab
    • Replace tab settings for current document :retab!
  • Visually-selecting text: Ctrl v will perform a visual column selection, while Shift v will do a row selection.
  • :set will show all the currently-set options.
  • For paging up or down, use Ctrl b and Ctrl f. You can also use PgUp and PgDn keys if you want to move your hands a bit :-)
  • Moving the cursor around the page:
    • Type M to move the cursor to the middle of the screen
    • Type H to move the cursor to the top of the screen
    • Type L to move the cursor to the bottom of the screen
    • Type gg to move the cursor to the top of the document
    • Type G to move the cursor to the bottom of the document
  • Moving the page around the cursor:
    • Type zz to make the current position float to the middle of the screen
    • Type zt to make the current position float to the top of the screen
    • Type zb to make the current position float to the bottom of the screen
  • Search and replace:
    • Find and replace all instances of a string: %s/find_this/replace_with_this/g
    • Case-insensitive find and replace all instances of a string: %s/find_this/replace_with_this/gi
    • Find then ask confirmation before replacing: %s/find_this/replace_with_this/c
    • Search history: Vim maintains search history which is easy to access using / or ? then navigation through the list using the up and down arrows.
  • Deleting from the current position to the bottom of the file: dG
  • Jumping to the first position in the current line: 0
  • Find the next occurrence of a character in the current line: f then the character. To search backwards, use F
  • Undo a command: u (run this multiple times for additional undo steps)
  • Redo your undo: Ctrl r
  • Travel back in time to see the document as it was 30 mins ago :earlier 30m then revert with :later 30m
  • Reselect the last visual selection gv
  • Run a system command from within Vim :! [command]
  • Review your previous vim command history q:

For Further Learning

With contributions from Sam Batschelet


published by noreply@blogger.com (Muhammad Najmi Ahmad Zabidi) on 2016-11-16 15:47:00 in the "sysadmin" category

This is yet another yesteryear's story!

I had a chance to attend a DNS/Bind training which was organized by Malaysia's domain registry (MyNIC). The training took two days and was organized in Bangi, Selangor, Malaysia. Dated November 23 to 24, 2015, the two days' training was packed with technical training for the Domain Name System (DNS) using BIND software. Our trainer was Mr Amir Haris, who is running his own DNS specialist company named Localhost Sendirian Berhad (Sendirian Berhad is equivalent to "Private Limited").

Day One

For Day One, the trainer, Mr Amir Haris taught us on the theoretical details of the DNS. For a start, Mr Amir explained to us on the DNS operation, in which he explained the basic of what DNS is and the function of root servers. Later he explained further on the root servers' functions and locations. It was the followed by the explanation of query process.

alternate text

Mr Amir also explained to us the difference of DNS implementations across different operating system platforms. As for the training since we were using BIND as the name server?s software, we we exposed to the historical background of BIND.

The concept of master and slave DNS server was also being taught. In the master, the server will notify the secondary server if any change happened by the NOTIFY message. The NOTIFY message serves as a method to info the slave(s) that the zone file has changed. The NS records in the zone files are being used to determine who the slave(s) are. The benefit of NOTIFY is that it cuts down the delay for changes.

Day Two

For the second day we were doing pretty much on the DNS practical implementation. Each of us were a given a virtual machine access in order to experience our own BIND setup.

The contents of our lab training are as follows:

  • Installing BIND 9
  • Setting up RNDC
  • Setting up logging
  • Recursive and Cache DNS
  • Authoritative NS - Master and Slave
  • Delegation
  • IPv6
  • DNS Security Extensions (DNSSEC)
  • Stealth (split) DNS
  • Hardening DNS systems

Recursive and Cache Nameserver

Three minimum zone files is needed which are:

  • localhost (forward zone)
  • localhost (reverse zone)
  • root

Forward zone file

;filename 127.0.0.fwd
$TTL 345600
@               IN      SOA     localhost. hostmaster.localhost. (
                                2015112401      ; Serial
                                3600            ; Refresh
                                900             ; Retry
                                3600000         ; Expire
                                3600            ; Min TTL
                                )
                IN      NS      localhost.
localhost.      IN      A 127.0.0.1

Reverse zone file

; filename 127.0.0.rev
$TTL 345600
@               IN      SOA     localhost. hostmaster.localhost.        (
                                2015112401      ; Serial
                                3600            ; Refresh
                                900             ; Retry
                                3600000         ; Expire
                                3600            ; Min TTL
                                )
                IN      NS      localhost.
1               IN      PTR     localhost.

We also had chance to "get our hands dirty" for domain name setup with the DNSSEC key.

At the end of the training we were given brief explanation on DNS hardening. In short they are as follows:

  • Isolate DNS from other service
  • Run named as non-root
  • Hide BIND version
  • Use logging
  • Control zone transfer and queries
  • Prevent DNS server from responding to DNS traffic from certain networks
  • Patch BIND whenever a patch is available or when current bind version has vulnerabilities

In conclusion we were pretty much exposed to the introductory part of DNS. Honestly two days are not enough to cover all in detail but it was well done and gave a good start for attendees to initiate further self study and experimentation.

At End Point we are experienced hosting our own and clients' DNS service using BIND (including in more exotic split-horizon setups) and nsd, and using common SaaS DNS providers such as UltraDNS, SoftLayer, Amazon Web Services Route 53, Hurricane Electric, etc.

DNS has largely become an unseen commodity service to Internet users in general, but that makes it all the more important to have skill handling DNS changes safely, and due to the occasional need for unusual configurations and coping with DDoS attacks such as the recent major attack on Dyn.


published by noreply@blogger.com (Selvakumar Arumugam) on 2016-11-16 11:00:00 in the "database" category
The JPA Repository is a useful Spring Framework library that provides object-relational mapping for Java web applications to be able to connect to a wide variety of databases. Most applications need to establish a connection with one database to store and retrieve the data though sometimes there could be more than one database to read and write. There could also be some cases where the application needs to choose which database should be used dynamically, based on each request's parameters. Let's see how to configure and establish connections for these three cases.

1. Single Static Connection

In order to use JPA the following configurations are required to get the database connection handle and define the interface to map a database table by extending JpaRepository class.

UserRepository.java - this part of the code configures how to map the user table
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository <User, Integer> {
}
persistent-context.xml - the dataSourceReadWrite bean class defines the database connection while the entityManagerFactoryReadWrite bean helps to access the database from the base package com.domain
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="entityManagerFactoryReadWrite" transaction-manager-ref="transactionManager">
</jpa:repositories>

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />

<bean id="dataSourceReadWrite" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />

<bean abstract="true" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="abstractEntityManagerFactory" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
</bean>

<bean id="entityManagerFactoryReadWrite" p:datasource-ref="dataSourceReadWrite" p:persistenceunitname="readWritePU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSourceReadWrite" p:entitymanagerfactory-ref="entityManagerFactoryReadWrite" />
...
UserController.java - the userRepository object access defines how to use a static database configuration to fetch the User object record
@Api(name = "User", description = "User API Service")
@Controller
public class UserController {

  @Autowired
  private UserRepository userRepository;


  @ApiMethod(
    description = "Return the user object using the userId in the URL",
    produces = {MediaType.APPLICATION_JSON_VALUE},
    roles = {"read", "write"}
  )
  @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
  @ResponseBody
  public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId) throws ServiceException {
    User user = (userRepository.findOne(userId));
    if (user != null) {
    return new UserModel(user);
  }
  else {
    throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
  }
}
}

2. Multiple Static Connections

In some cases, we may need to connect more than one database in our application. Usually there will be a primary database and a secondary one which syncs data from the primary, most likely as a readonly replica load balancing approach. In this case the application needs to be configure to establish connection with two different datasources.

To achieve this result it's possible to define ReadWrite and ReadOnly datasources in the spring configuration and then declare the specific Repository classes for each specific datasource.

UserRepository.java - ReadWrite repository definition under the package com.domain.data
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository<User, Integer> {
}
UserReadonlyRepository.java - ReadOnly repository definition under the package com.domain.data.readonly
package com.domain.data.readonly;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserReadonlyRepository extends JpaRepository<User, Integer> {
}
persistent-context.xml - this file defines two different datasources (dataSourceReadWrite and dataSourceReadOnly) while jpa repositories specify the repositories package path
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="entityManagerFactoryReadWrite" transaction-manager-ref="transactionManager">
  <repository:exclude-filter expression="com.domain.data.readonly" type="regex"></repository:exclude-filter>
</jpa:repositories>

<jpa:repositories base-package="com.domain.data.readonly" entity-manager-factory-ref="entityManagerFactoryReadOnly" transaction-manager-ref="transactionManagerReadOnly">

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />
<bean id="dataSourceReadWrite" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />
<bean id="dataSourceReadOnly" p:password="${jdbc.readonly.password}" p:url="${jdbc.readonly.url}" p:username="${jdbc.readonly.username}" parent="abstractDataSource" />
<bean abstract="true" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="abstractEntityManagerFactory" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
</bean>

<bean id="entityManagerFactoryReadWrite" p:datasource-ref="dataSourceReadWrite" p:persistenceunitname="readWritePU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean id="entityManagerFactoryReadOnly" p:datasource-ref="dataSourceReadOnly" p:persistenceunitname="readOnlyPU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSourceReadWrite" p:entitymanagerfactory-ref="entityManagerFactoryReadWrite" />

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManagerReadOnly" p:datasource-ref="dataSourceReadOnly" p:entitymanagerfactory-ref="entityManagerFactoryReadOnly" />
...
UserController.java - in this definition it's interesting to note the the readonly flag, which will establish a connection with ReadWrite or ReadOnly database, based on that flag value
@Api(name = "User", description = "User API Service")
@Controller
public class UserController {

  @Autowired
  private UserRepository userRepository;
  @Autowired
  private UserReadOnlyRepository userReadOnlyRepository;

  @ApiMethod(
  description = "Return the user object using the userId in the URL",
  produces = {MediaType.APPLICATION_JSON_VALUE},
  roles = {"read", "write"}
  )
  @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
  @ResponseBody
  public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId, @ApiParam(name = "readOnly", description = "Param to set data source to read from") Boolean readOnly) throws ServiceException {
    User user = (readOnly ?
    userReadOnlyRepository.findOne(userId) : userRepository.findOne(userId));

    if (user != null) {
      return new UserModel(user);
    }
    else {
      throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
    }
  }
}

3. Multiple Dynamic Connections

Recently there was an application that needed to choose the database during API request processing. Unfortunately defining multiple datasources and choosing the database based on the hard coded checks in the code is really cumbersome. Instead it's possible to use JPA Repository which provides a feature to override the database lookup dynamically using AbstractRoutingDataSource when a request is sent to the application.

UserRepository.java - defines mapping to the user table
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository<User, Integer> {
}
persistence-context.xml - dataSourceRootDB and dataSourceLiveDB beans defines two different databases. MultitenantRoutingDataSource holds the datasources available to chose dynamically from the code
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="genericEntityManagerFactory" transaction-manager-ref="transactionManager">
</jpa:repositories>

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />

<bean id="dataSourceRootDB" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />

<bean id="dataSourceLiveDB" p:password="${jdbc.livedb.password}" p:url="${jdbc.livedb.url}" p:username="${jdbc.livedb.username}" parent="abstractDataSource" />

<bean class="com.domain.route.MultitenantRoutingDataSource" id="dataSource">
  <property name="targetDataSources">
    <map key-type="java.lang.String">
      <entry key="rootdb" value-ref="dataSourceRootDB"></entry>
      <entry key="livedb" value-ref="dataSourceLiveDB"></entry>
    </map>
  </property>
  <property name="defaultTargetDataSource" ref="dataSourceRootDB">
  </property>
</bean>

<bean class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="genericEntityManagerFactory" p:datasource-ref="dataSource" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter" p:persistenceunitname="readWriteDynamicPU">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
  <property name="packagesToScan">
    <list>
      <value>com.data.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSource" p:entitymanagerfactory-ref="genericEntityManagerFactory" />
...
UserController.java - this class choose the datasource dynamically based on the request and calls the selected service to complete the action
...
@Api(name = "User", description = "User API Service") 
@Controller public class UserController {
    @Autowired     private UserService userService;

    @ApiMethod(
            description = "Return the user object using the userId in the URL",
            produces = {MediaType.APPLICATION_JSON_VALUE},
            roles = {"oms-read", "oms-write"}
    )
    @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
    @ResponseBody
    public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId, @RequestHeader(value="X-Database", defaultValue= DatabaseEndpointUtils.ROOT_DB, required=false) String databaseEndpoint) throws ServiceException {
        MultiTenantContext.setCurrentTenant(databaseEndpoint);
        return userService.getUser(userId, true);
    }
}
...
MultiTenantContext.java - this code sets the datasource connection based on the request from Controller
package com.domain.common;

import com.domain.util.DatabaseEndpointUtils;
import com.domain.supplychain.app.ws.exceptions.InvalidDatabaseEndpointException;
import com.domain.exceptions.ServiceException;

public class MultiTenantContext {
    private static ThreadLocal<Object> currentTenant = new ThreadLocal<>();

    public static Logger logger = LoggerFactory.getLogger(MultiTenantContext.class.getName());
    public static void setCurrentTenant(Object tenant) throws ServiceException {
        logger.info("MultiTenantContext setCurrentTenant: [{}]", tenant);
        if(DatabaseEndpointUtils.isValidEndpoint(tenant.toString())) {
            currentTenant.set(tenant);
        } else {
            throw new InvalidDatabaseEndpointException("Invalid database endpoint");
        }
    }

    public static Object getCurrentTenant() {
        logger.info("MultiTenantContext getCurrentTenant: [{}]", currentTenant.get());
        return currentTenant.get();
    }

}
MultitenantRoutingDataSource.java - here there's the definition which determines how the datasource establish the connection. Specifically it will get the datasource which was set previously based on the request parameters
package com.domain.route;  
import com.domain.common.MultiTenantContext; 
import org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource;  

public class MultitenantRoutingDataSource extends AbstractRoutingDataSource {

    private Logger logger = LoggerFactory.getLogger(MultitenantRoutingDataSource.class.getName());
    @Override
    protected Object determineCurrentLookupKey() {
        logger.info("MultitenantRoutingDataSource determineCurrentLookupKey: [{}]", MultiTenantContext.getCurrentTenant());
        return MultiTenantContext.getCurrentTenant();
    }

}
DefaultUserService.java - Fetch the user data from the dynamically chosen database.
@Service
public class DefaultUserService implements UserService {

    @Autowired
    private UserRepository userRepository;

    @Override
    @Transactional
    public UserModel getUser(Integer userId, boolean readOnly) throws ServiceException {
        User user = (userRepository.findOne(userId));
        if (user != null) {
            return new UserModel(user);
        }
        else {
            throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
        }
    }
}

Conclusion

The application establishes a connection with database through any one of these methods, based on the requirement. The single or multiple static connections are commonly used in most of the applications. But when there is a requirement to choose the database dynamically to establish a connection, AbstractRoutingDataSource class in spring framework features a wonderful way to implement the functionality as explained above.

published by noreply@blogger.com (Elizabeth Garrett Christensen on 2016-11-15 18:10:00 in the "company" category

Over Labor Day weekend I married another End Point employee, David Christensen. I thought I?d take a minute to reflect on life as a co-working couple. In the days before everyone worked in a mad scramble to pay off their student loans, save for their kids? college, and save for retirement, lots of couples shared in the responsibilities of owning a business or farm. Today for most families those days are long gone and each spouse goes off to a long day at the office to meet back at home in the evening.

David and I are really fortunate to work at End Point and work remotely from our home in Lawrence, Kansas. David is a veteran at End Point starting as an application developer a decade ago and now is a project manager and heads up many of End Point's larger sales, database, and VR projects. I am brand-new to End Point and serve as the Client Liaison doing billing, client support, sales, and project management.

Our home office in Lawrence

What I love

Being together all the time

Like any newlywed, I cannot get enough of this guy. He?s easy to talk to, fun to be around, and pretty much makes everything better. But enough of that sappiness?

Getting some real insight on personality issues or conflicts

Working with someone that knows you really well and who can see your perspective can be really valuable?especially if you are navigating conflict or a political/personal issue, or just generally need some advice. I have always found deep friendships at work to help me work through these issues and having a spouse around has been great for this.

Getting up-to-speed on company culture

Having another person, especially one who?s been at a company for a long time, can really help you understand how things work, how each person or manager thinks, what to do, and what not to do. Who doesn?t want that kind of leg up at a new job?

A proofreader

I always like to have a work buddy. Having a spouse next to you to proofread your work, make sure you aren?t doing something crazy, and just generally keep an eye on things is really handy. Plus we can help remind each other of priorities and calendar appointments.

Near the End Point NYC Office

Things that take work

Not over-talking

I can be a real chatterbox and so can David. We have to be careful not to over-talk about work. You can quickly burn an hour or two re-hashing a meeting or a project. It takes work and setting boundaries to not overanalyze every work situation.

Setting boundaries

David and I realized that once we were working together, living together, and raising kids together, everything that people experience as separate parts of their lives had now run into one long day called life. We had virtually no separation between work, the house, or our relationship. We created artificial boundaries for work time and try really hard not to talk about work on our personal time unless it?s important and vice-versa.

Individual space

When you know so much about a person?s workload and personal life, you can sometimes be too helpful. You have to remember to give your spouse the space to be the creative, awesome person you know and love; not micro-manage their day.

Social limitations

By far the biggest challenge for me are the social limitations that working from home with a spouse inevitably creates. I used to work for a company of 400, so co-working from home with only one other person is quite a change. I?m still looking to fill up my social needs in other ways. The upside here is that I?ve become really close with my neighbors.

Thinking about becoming a co-working couple?

I?m pretty new at this, so just a couple notes. Give me a few years ? maybe I?ll have more!

Putting all your eggs in one basket

One thing that does take some planning and thought is putting all your financial eggs in one basket. Working for the same company is a little riskier than diversifying your paychecks. David and I felt like since we have both had marketable skills that would be useful anywhere, this wasn?t a deal-breaker for us.

Find a supportive company

Obviously, the biggest hurdle is finding a company that supports having co-working couples and is willing to work with you through it. End Point is a great place to work with a very family-friendly atmosphere which has made this part really easy on us.

Find unique job roles that you?ll enjoy

This is true for starting any new job at all, but if you are thinking of starting to co-work with your spouse, make sure this job is something you?ll be good at and will enjoy. There?s more than just your reputation at stake here.


And with that, I better see what David has been up to?..





published by noreply@blogger.com (Kamil Ciemniewski) on 2016-11-02 17:56:00 in the "database" category
Some time ago I was working on customizing a WordPress system for a client. The system was running in a multi-site mode, being a host of a large number of blogs.
Because some blogs had not been updated in a long while, we wanted to pull information about recent posts from all of the blogs. This in turn was going to be used for pruning any blogs that weren't considered 'active'.
While the above description may sound simple, the scale of the system made the task a bit more involving that it would be usually.

How WordPress handles the "multi-site" scenario

The goal of computing the summary of posts for many blogs residing in the hypotethical blogging platform, in the same database doesn't seem so complicated. Tasks like that are being performed all the time using relational databases.
The problem in WordPress arises though because of the very unusual way that it organises blogs data. Let's see how the database tables look like in the "normal" mode first:
It has a number of tables that start with user configurable prefix. In the case of the screenshot above, the prefix was wp_.
We can see there's a wp_posts table which contains rows related to blog posts. Thinking about the multi-blog setup, one would expect some kind of a blog_id column in the wp_posts column. Selecting data for the given blog would still be a cinch. It would also be very performant after adding an index on that column.
Instead, this is what we're getting when setting up WordPress in a multi-site mode:
WordPress just creates a new set of tables we seen before appending the index of the blog to the tables prefix! Instead of having a nice, clean and easy to use wp_posts with the blog_id column, we get a number of tables - one for each blog: wp_1_posts, wp_2_posts etc. Why does it matter that much? Just try to get the counts of posts in each blog in one query ? it's impossible with such a tables setup. Getting such info involves querying each table separately. This means that with each new blog within the system, the cost of running such sequence of queries adds up dramatically. This is also known as a N+1 problem ? bad WordPress! bad!

The approach around it

The counts of posts for each blog was needed to be computed very quickly in my case. The system consisted of hundreds of different mini-blogs and the stats were to be shown in the admin dashboard. Obviously making admins wait for the page to load for a long time wasn't an option.
I went the route of creating an additional database table, holding the info about the number of posts for each blog. This table was then being updated upon each post creation, update and deletion. It was also updated upon the blog removal.
WordPress has a helper function for ensuring the table is created in the database. You feed it the DDL containing the definition of the table and it makes sure it is present in the database.
I created a function that was being fired on each plugin class instantiation, while making that class a singleton. Here's the code that makes the plugin class a singleton:
class BlogsPruner
{
  private static $instance;

  private function __construct()
  {
    $this->ensureDatabaseSetup();
    $this->ensureBlogStatsGetUpdated();
    // ... other initialization here
  }

  public static function singleton()
  {
    if(!isset(self::$instance))
    {
      $_class = __CLASS__;
      self::$instance = new $_class;
    }
    return self::$instance;
  }

  public function __clone()
  {
    trigger_error('Cannot clone the pruner plugin', E_USER_ERROR);
  }

  // ... rest of the class
}

BlogsPruner::singleton();
The next step was to implement the function ensuring there's a stats table in the database:
function ensureDatabaseSetup()
  {
    global $wpdb;
    $tableName = $this->blogStatsTableName();
    $charsetCollate = $wpdb->get_charset_collate();

    $sql = "CREATE TABLE $tableName (
            blog_id bigint(20),
            count_posts int(2),
            UNIQUE KEY blog_id (blog_id)
    ) $charsetCollate;";

    require_once( ABSPATH . 'wp-admin/includes/upgrade.php' );
    dbDelta( $sql );
    $this->ensureBlogStatsUpdated();
  }
This code uses a helper function to correctly construct the name for the table:
function blogStatsTableName()
{
  global $wpdb;
  return $wpdb->base_prefix . 'blog_stats';
}
This made sure the table was using the correct prefix, just like all the other tables in the database.
Now, I needed to ensure the stats were updated upon each post change:
function ensureBlogStatsGetUpdated()
{
  add_action('save_post', array($this, 'onPostUpdated'));
  add_action('delete_post', array($this, 'onPostDeleted'));
}

function onPostUpdated($postId)
{
  global $blog_id;
  $post = get_post($postId);
  if(wp_is_post_revision($postId) || $post->post_status == 'auto-draft')
  {
    return;
  }
  $this->updateBlogStats($blog_id);
}

function onPostDeleted()
{
  global $blog_id;
  $this->updateBlogStats($blog_id);
}

function updateBlogStats($blogId)
{
  $count = $this->getBlogUserCreatedPostsCount($blogId);
  $this->updateBlogPostsCount($blogId, $count);
}

// Here we're specifically not including the post that is auto-created
// upon the blog creation:
function getBlogUserCreatedPostsCount($blogId)
{
  global $wpdb;
  $sql = "SELECT
            COUNT(DISTINCT `wp_" . $blogId . "_posts`.`id`) AS count_user_posts
          FROM `wp_blogs`
          INNER JOIN `wp_" . $blogId . "_posts`
                  ON `wp_blogs`.`blog_id` = $blogId
          WHERE
            `wp_" . $blogId . "_posts`.`post_type` = 'post' AND
            `wp_" . $blogId . "_posts`.`post_status` = 'publish' AND
            TIMESTAMPDIFF(SECOND, `wp_" . $blogId . "_posts`.`post_date`, `wp_blogs`.`last_updated`) > 60";
  $row = $wpdb->get_row($sql);
  return intval($row->count_user_posts);
}

function updateBlogPostsCount($blogId, $count)
{
  global $wpdb;
  $data = array('count_posts' => $count);
  $where = array('blog_id' => $blogId);
  $wpdb->update($this->blogStatsTableName(), $data, $where);
}
The actual production plugin implemented many more features than this sample code demonstrates. It was listing the blogs that could be considered stale, automatically pruning them after specified in the admin screen time and allowing admins to configure it via the WordPress interface. The full set of features is beyond the scope of this post.
The overall result made getting the statistics about very large set of blogs very fast. The cost of querying for the number of posts of each blog was moved to incremental, small updates upon each post being created, modified or removed. For the end user, this cost was imperceptable.

Final thoughts

WordPress is loved by many users. If you're not just a user but also working with the code, there's a number of traps you may fall into though. If I were to employ techniques that get advised as "WordPress usual/default/preferred" ? I'd end up with a very unhappy client who's be owning a very broken WordPress system. Fortunately, the set of WordPress tables isn't casted in stone and you can freely extend it - as long as you're cautious and know what you're doing. Provided that these two prerequisites are met ? WordPress is just a database backed platform - like any other.

published by noreply@blogger.com (Greg Sabino Mullane) on 2016-10-26 19:09:00 in the "postgres" category

Postgres has a wonderfully helpful (but often overlooked) feature called the connection service file (its documentation is quite sparse). In a nutshell, it defines connection aliases you can use from any client. These connections are given simple names, which then map behind the scenes to specific connection parameters, such as host name, Postgres port, username, database name, and many others. This can be an extraordinarily useful feature to have.

The connection service file is named pg_service.conf and is setup in a known location. The entries inside are in the common "INI file" format: a named section, followed by its related entries below it, one per line. To access a named section, just use the service=name string in your application.

## Find the file to access by doing:
$ echo `pg_config --sysconfdir`/pg_service.conf
## Edit the file and add a sections that look like this:
[foobar]
host=ec2-76-113-77-116.compute-2.amazonaws.com
port=8450
user=hammond
dbname=northridge

## Now you can access this database via psql:
$ psql service=foobar

## Or in your Perl code:
my $dbh = DBI->connect('dbi:Pg:service=foobar');

## Other libpq based clients are the same. JDBC, you are out of luck!

So what makes this feature awesome? First, it can save you from extra typing. No more trying to remember long hostnames (or copy and paste them). Second, it is better than a local shell alias, as the service file can be made globally available to all users. It also works similar to DNS, in that it insulates you from the details of your connections. Your hostname has changed because of a failover? No problem, just edit the one file, and no clients need to change a thing.

As seen above, the format of the file is simple: a named section, followed by connection parameters in a name=value format. Among the connection parameters one may use, the most common and useful are host, port, user, and dbname. Although you can set a password, I recommend against it, as that belongs in the more secure, per-user .pgpass file.

The complete list of what may be set can be found in the middle of the database connection documentation page. Most of them will seldom, if ever, be used in a connection service file.

The connection service file is not just limited to basic connections. You can have sections that only differ by user, for example, or in their SSL requirements, making it easy to switch things around by a simple change in the service name. It's also handy for pgbouncer connections, which typically run on non-standard ports. Be creative in your service names, and keep them distinct from each other to avoid fat fingering the wrong one. Comments are allowed and highly encouraged. Here is a slightly edited service file that was recently created while helping a client use Bucardo to migrate a Postgres database from Heroku to RDS:

## Bucardo source: Heroku
[bsource]
host=ec2-does.not.compute-1.amazonaws.com
user=marvin
dbname=zaphod
connect_timeout=10

## Bucardo target: RDS
[btarget]
host=cheshire.cat.us-east-2.rds.amazonaws.com
user=alice
dbname=wonderland
connect_timeout=10

## Test database on RDS
[gregtest]
host=panther.king.us-east-2.rds.amazonaws.com
user=conker
dbname=chocolate
connect_timeout=10

## Hot standby used for schema population
[replica1]
host=ec2-66-111-116-66.compute-1.amazonaws.com
user=scott
dbname=tiger
connect_timeout=10

You may notice above that "connect_timeout" is repeated in each section. Currently, there is no way to set a parameter that applies to all sections, but it's a very minor problem. I also usually set the environment variable PGCONNECT_TIMEOUT to 10 in by .bashrc, but putting it in the pg_service.conf file ensures it is always set regardless of what user I am.

One of the trickier parts of using a service file can be figuring out where the file should be located! Postgres will check for a local service file (named $USER/.pg_service.conf) and then for a global file. I prefer to always use the global file, as it allows you to switch users with ease and maintain the same aliases. By default, the location of the global Postgres service file is /usr/local/etc/postgresql/pg_service.conf, but in most cases this is not where you will find it, as many distributions specify a different location. Although you can override the location of the file with the environment variable PGSERVICEFILE and the directory holding the pg_service.conf file with the PGSYSCONFIDIR environment variable, I do not like relying on those. One less thing to worry about by simply using the global file.

The location of the global pg_service.conf file can be found by using the pg_config program and looking for the SYSCONFDIR entry. Annoyingly, pg_config is not installed by default on many systems, as it is considered part of the "development" packages (which may be named postgresql-devel, libpq-devel, or libpq-dev). While using pg_config is the best solution, there are times it cannot be installed (e.g. working on an important production box, or simply do not have root). While you can probably discover the right location through some simple investigation and trial-and-error, where is the fun in that? Here are two other methods to determine the location using nothing but psql and some standard Unix tools.

When you invoke psql with a request for a service file entry, it has to look for the service files. We can use this information to quickly find the expected location of the global pg_service.conf file. If you have the strace program installed, just run psql through strace, grep for "pg_service", and you should see two stat() calls pop up: one for the per-user service file, and one for the global service file we are looking for:

$ strace psql service=foobar 2>&1 | grep service.conf
stat("/home/greg/.pg_service.conf", 0x3526366F6637) = -1 ENOENT (No such file or directory)
stat("/var/opt/etc/postgres/pg_service.conf", 0x676F746F3131) = -1 ENOENT (No such file or directory)

What if strace is not installed? Well, perhaps gdb (the GNU debugger) can help us out:

$ gdb -q --args psql service=foobar
Reading symbols from psql...(no debugging symbols found)...done.
(gdb) start
Temporary breakpoint 1 at 0x435356
Starting program: /usr/local/bin/psql service=foobar
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.4".

Temporary breakpoint 1, 0x4452474E4B4C5253 in main ()
(gdb) catch syscall stat
Catchpoint 2 (syscall 'stat' [4])
(gdb) c
Continuing.

Catchpoint 2 (call to syscall stat), 0x216c6f65736a6f72 in __GI___xstat (vers=, name=0x616d7061756c "/usr/local/bin/psql", buf=0x617274687572)
    at ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c:35
35      return INLINE_SYSCALL (stat, 2, name, buf);
(gdb) c 4
Will ignore next 3 crossings of breakpoint 2.  Continuing.

Catchpoint 2 (call to syscall stat), 0x37302B4C49454245 in __GI___xstat (vers=, name=0x53544F442B4C "/var/opt/etc/postgres/pg_service.conf", buf=0x494543485445)
    at ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c:35
35      return INLINE_SYSCALL (stat, 2, name, buf);
(gdb) quit

The use of a connection service file can be a nice addition to your tool chest, especially if you find yourself connecting from many different accounts, or if you just want to abstract away all those long, boring host names!


published by Eugenia on 2016-10-23 21:12:28 in the "Filmmaking" category
Eugenia Loli-Queru

I watched the trailers of The Hobbit at 48fps and Avatar’s at 60fps (as their directors wished), and I really don’t like the smoothness. Originally, in principle, I was a high frame rate advocate for cinema movies (thinking that 24fps is simply a relic of a century old tech requirement), however, now I’m against upgrading the temporal velocity for regular film. The difference between my old opinion and my new one, is that now I know why 48 or 60 fps don’t work as well as 24 fps. I had to become a collage artist to understand why.

The reason is that high frame rate becomes extremely distracting. When there are more frames, it means that there is way more information on the screen. The eye and the brain gets way too tired to follow and analyze it all (it follows it by default, you can’t turn off that natural process). Because of that, the brain runs out of steam to follow the story fully, and so the movie fails because the story doesn’t shine through.

The same is true for color: if you look at Hollywood color grading, except for black & white, only 1 or 2 more color families are actively visible. For example, you will get red, yellow, and that teal color that covers both green and blue. Basically, the fewer color families that are on screen, the less processing the brain has to do, resulting in the viewer dipping into the story more.

It’s the same in collage: the fewer the color families used, the more successful a collage is. Otherwise, it looks like a mess.

Having said all that, there is a future for high frame rate (and more colors), but that’s only in VR, or in futuristic systems where the image is projected directly on the brain. Then, yes, there’s a requirement for “more”, since the whole point of VR is to “fool” the brain about the reality it displays.

But for TV & film, which is projected away from ourselves, and perceivable by only one of our senses (so it must provide less information in order to be processed fast-enough), “less is more”. That’s why 24fps is here to stay for these mediums in particular.


published by noreply@blogger.com (Szymon Lipi?ski) on 2016-10-18 12:00:00 in the "Java" category

While working with one of our clients, I was tasked with integrating a Java project with a Perl project. The Perl project is a web application which has a specific URL for the Java application to use. To ensure that the URL is called only from the Java application, I wanted to send a special hash value calculated using the request parameters, a timestamp, and a secret value.

The Perl code calculating the hash value looks like this:

use strict;
use warnings;

use LWP::UserAgent;
use Digest::HMAC_SHA1;
use Data::Dumper;

my $uri = 'param1/param2/params3';

my $ua = LWP::UserAgent->new;
my $hmac = Digest::HMAC_SHA1->new('secret_something');

my $ts = time;

$hmac->add($_) for (split (m{/}, $uri));
$hmac->add($ts);

my $calculated_hash = $hmac->hexdigest;

My first try for calculating the same hash in the Java code looked something like this (without class/package overhead):

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import org.apache.commons.codec.binary.Hex;

public String calculateHash(String[] values) throws NoSuchAlgorithmException, UnsupportedEncodingException, InvalidKeyException {

    java.util.Date date= new java.util.Date();
    Integer timestamp = (int) date.getTime()/1000;

    Mac mac = Mac.getInstance("HmacSHA1");
    
    SecretKeySpec signingKey = new SecretKeySpec("secret_something".getBytes(), "HmacSHA1");
    mac.init(signingKey);
    
    for(String value: values) {
        mac.update(value.getBytes());
    }

    mac.update(timestamp.getBytes());
    byte[] rawHmac = mac.doFinal();
    byte[] hexBytes = new Hex().encode(rawHmac);
    return new String(hexBytes, "UTF-8");
}

The code looks good and successfully calculated a hash. However, using the same parameters for the Perl and Java code, they were returning different results. After some debugging, I found that the only parameter causing problems was the timestamp. My first guess was that the problem was caused by the use of Integer as the timestamp type instead of some other numeric type. I tried a few things to get around that, but none of them worked.

Another idea was to check why it works for the String params, but not for Integer. I found that Perl treats the timestamp as a string and passes a string to the hash calculating method, so I tried emulating this by converting the timestamp into a String before using the getBytes() method:

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import org.apache.commons.codec.binary.Hex;

public String calculateHash(String[] values) throws NoSuchAlgorithmException, UnsupportedEncodingException, InvalidKeyException {

    java.util.Date date= new java.util.Date();
    Integer timestamp = (int) date.getTime()/1000;

    Mac mac = Mac.getInstance("HmacSHA1");
    
    SecretKeySpec signingKey = new SecretKeySpec("secret_something".getBytes(), "HmacSHA1");
    mac.init(signingKey);
    
    for(String value: values) {
        mac.update(value.getBytes());
    }

    mac.update(timestamp.toString().getBytes());
    byte[] rawHmac = mac.doFinal();
    byte[] hexBytes = new Hex().encode(rawHmac);
    return new String(hexBytes, "UTF-8");
}

This worked perfectly, and there were no other problems with calculating the hash in Perl and Java.


published by noreply@blogger.com (Kirk Harr) on 2016-10-17 11:00:00 in the "Chef" category

Many Containers, Many Build Systems

When working with modern container systems like Docker, Kubernetes, and Mesosphere, each provide methods for building your applications into their containers. However each build process is specific to that container system, and using similar applications across tiers of container environments would require maintaining each container's build environment. When approaching this problem for multiple container environments, Chef Software created a tool to unify these build systems and create container-agnostic builds which could be exported into any of the containers. This tool is called Habitat which also provide some pre-built images to get applications started quickly.

I recently attended a Habitat Hack event locally in Portland (Oregon) which helped me get more familiar with the system and its capabilities. We worked together in teams to take a deeper dive into various aspects of how Habitat works, you can read about our adventures over on the Chef blog.

To examine how the various parts of the build environment work, I picked an example Node.js application from the Habitat Documentation to build and customize.

Node.js Application into a Docker Container

For the most basic Habitat build, you must define a plan.sh file which will contain all the build process logic as well as all configuration values to define the application. Within my Node.js example, this file contains this content:

pkg_origin=daehlie
pkg_name=mytutorialapp
pkg_version=0.3.0
pkg_maintainer="Kirk Harr <kharr@endpoint.com>"
pkg_license=()
pkg_source=https://s3-us-west-2.amazonaws.com/${pkg_name}/${pkg_name}-${pkg_version}.tar.gz
pkg_shasum=e4e988d9216775a4efa4f4304595d7ff31bdc0276d5b7198ad6166e13630aaa9
pkg_filename=${pkg_name}-${pkg_version}.tar.gz
pkg_deps=(core/node)
pkg_expose=(8080)

do_build() {
  npm install
}

do_install() {
  cp package.json ${pkg_prefix}
  cp server.js ${pkg_prefix}

  mkdir -p ${pkg_prefix}/node_modules/
  cp -vr node_modules/* ${pkg_prefix}/node_modules/
}

Within this is defined all the application details like the name of the author, the version of the application being packaged, as well as the package name. Each package can be defined with a license for the code in use as well as any code dependencies, like the Node.js application server (core/node), as well as the repository URL for locating these files. There are also two executable statements which build the package dependencies, and perform final installation setup during the eventual package installation.

Additionally to define this application we must provide the logic for how to start the Node application server, and provide configuration on what ports to listen on as well as the message to be displayed once it has started. To do so we must create a stub Node.js config.json which provides the port and message values:

{
    "message": "Hello, World",
    "port": "8080"
}

We also need two hooks which will be executed at package install time and at runtime for the application respectively. These are named, init and run in our case, with init setting up the symbolic links to the various Node.js components from the core/node package which will be included in the build, and run provides the entry point for the applications flow effectively starting the npm application server. Just like with a Dockerfile, any additional logic needed during the process would be included in these two hooks, depending on if the logic was specific to install time or run time.

Injected Configuration Values

In this example, both the message to be displayed to the user, as well as the port that the Node.js application server will listen on are hard-coded into our build, and all the images that resulted from it would be identical. In order to allow for some customizing of the resulting image, you can replace the hard-coded values in the Node.js config.json into variables which can be replaced during the build process:

{
    "message": "{{cfg.message}}",
    "port": "{{cfg.port}}"
}

To complete the replacement, we would provide a "Tom's Obvious, Minimal Language" (.toml) file with has a key-value pair for each of these configuration variables we want to set. This .toml file will be interpreted during each build to populate these values, creating an opportunity to customize our builds by injecting specific values into the variables defined in the application configuration. Here is an example of the syntax from this example:

# Message of the Day
message = "Hello, World of Habitat"

# The port number that is listening for requests.
port = 8080

Conclusions

Habitat seeks to fill in the gaps between the various container formats for Docker, Kubernetes and others, by allowing common build infrastructure and dependency libraries to be unified in distribution. By utilizing the same build infrastructure, it becomes more feasible to have a hybrid environment with various container formats in use, without creating duplicate build infrastructure which basically performs the same task slightly differently right at the end to package the application into the proper container format. Habitat helps to decouple the actual build process and all that plumbing, from the process of exporting the build image into the proper format for whatever container is in use. In that way as new container formats are developed, all that is required to accommodate them is expanding the export function for that new format, without any changes to the overall build process or customization of your code.


published by Eugenia on 2016-10-14 22:34:51 in the "Filmmaking" category
Eugenia Loli-Queru

I received my Moondog Labs anamorphic lens for mobile devices today, and ran some tests. It is wonderful to be able to have such a wide field of view with a device such as the iPod Touch 6th gen. As you can see in the picture, the image is way wider than shooting in standard 1080p. But do not make the mistake to think that this is all the anamorphic lens’ doing. There are THREE factors that extended the wide field of view that much:

1. Shooting in 3k instead of 1080p (using the ProMovie app), the sensor gives you a completely different field of view. The crop of the sensor is smaller. This is the biggest hack you can do to get a wider field of view (it’s even bigger than the anamorphic lens hack!).

2. The anamorphic lens.

3. Turning off stabilization (which means that you must have some sort of other stabilizer at hand to shoot properly).

ipod-touch-crop

Here’s the test video I shot today:

Here’s how I shot:

1. I used the ProMovie iOS app, which allows me to record at 3k (3200×1800 resolution). I used 100 mbps, at 24 fps. I locked the exposure to 1/48th shutter speed, and then I set the ISO to lock the exposure. I set and locked focus, and white balance. The ProMovie app also has an anamorphic screen view! I set stabilization to OFF (that’s why the video is very shaky). Obviously, when shooting something seriously, use a tripod or a stabilizer/gimbal.

2. When using the Moondog Labs anamorphic lens, and you apply the 1.333 aspect ratio in the project properties and on each clip (I use Sony Vegas), the effective resolution becomes 4267×1800.

3. Then, I color graded this way, plus I added the FlmConvert filter with its “FJ H160 Pro” template, and also tweaked the template’s levels a bit.

4. Then, I exported at exactly 3840×1620, at 100 mbps bitrate (I exported no audio in my case). If using Sony Vegas, you must “disable resample” in all clips in the timeline before you export. Then, I uploaded on youtube. It is very important to export at the exact resolution stated above for 4k anamorphic btw, otherwise, people with ultra-wide monitors will get black bars on all four corners! The above resolution is ultra-wide UHD (3840 px wide).


published by noreply@blogger.com (Greg Sabino Mullane) on 2016-10-14 17:22:00

Comparing the schemas of two or more different Postgres databases is a common task, but can be tricky when those databases are running different versions of Postgres. The quick and canonical way to compare schemas is by using the exact same pg_dump program to query each database via the --schema-only option. This works great, but there are some gotchas, especially when dumping database views.

BACKGROUND

First some background as to how this issue was discovered. We have a client that is in the process of upgrading from Postgres 9.2 to the Postgres 9.6 (the latest version as of this writing). Using the pg_upgrade program was not an option, because not only are data checksums going to be enabled, but the encoding is being moved to UTF-8. A number of factors, especially the UTF-8 change, meant that the typical upgrade process of pg_dump old_database | psql new_database was not possible. Thus, we have a very custom program that carefully migrates pieces over, performing some transformations along the way.

PROBLEM

As a final sanity check, we wanted to make sure the final schema for the upgraded 9.6 database was as identical as possible to the current production 9.2 database schema. When comparing the pg_dump outputs, we quickly encountered a problem with the way that views were represented. Version 9.2 uses a very bare-bones, single-line output, while 9.6 uses a multi-line pretty printed version. Needless to say, this meant that none of the views matched when trying to diff the pg_dump outputs.

The problem stems from the system function pg_get_viewdef(), which is used by pg_dump to give a human-readable and Postgres-parseable version of the view. To demonstrate the problem and the solution, let's create a simple view on a 9.2 and a 9.6 database, then compare the differences via pg_dump:

$ psql -p 5920 vtest -c 
'create view gregtest as select count(*) from pg_class where reltuples = 0'
CREATE VIEW
$ psql -p 5960 vtest -c 
'create view gregtest as select count(*) from pg_class where reltuples = 0'
CREATE VIEW
$ diff -u <(pg_dump vtest -x -p 5920 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only)

--- /dev/fd/70          2016-09-29 12:34:56.019700912 -0400
+++ /dev/fd/72          2016-09-29 12:34:56.019720902 -0400
@@ -2,7 +2,7 @@
 -- PostgreSQL database dump
 --
 
--- Dumped from database version 9.2.18
+-- Dumped from database version 9.6.0
 -- Dumped by pg_dump version 9.6.0
 
 SET statement_timeout = 0;
@@ -35,22 +35,14 @@
 --
 
 CREATE VIEW gregtest AS
-SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision);
+ SELECT count(*) AS count
+   FROM pg_class
+  WHERE (pg_class.reltuples = (0)::double precision);
 

The only difference other than the server version is the view, which does not match at all as far as the diff utility is concerned. (For purposes of this article, the minor ways in which schema grants are done have been removed from the output).

As mentioned before, the culprit is the pg_get_viewdef() function. Its job is to present the inner guts of a view in a sane, readable fashion. There are basically two adjustments it can make to this output: adding more parens, and adding indentation via whitespace. In recent versions, and despite what the docs allude to, the indentation (aka pretty printing) can NOT be disabled, and thus there is no simple way to get a 9.6 server to output a viewdef in a single line the way 9.2 does by default. To further muddy the waters, there are five versions of the pg_get_viewdef function, each taking different arguments:

  1. by view name
  2. by view name and a boolean argument
  3. by OID
  4. by OID and a boolean argument
  5. by OID with integer argument

In Postgres 9.2, the pg_get_viewdef(text,boolean) version will toggle indentation on and off, and we can see the default is no indentation:

$ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest')"
 SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision);

$ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest',false)"
 SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision);

$ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest',true)"
 SELECT count(*) AS count                        +
   FROM pg_class                                 +
  WHERE pg_class.reltuples = 0::double precision;

In Postgres 9.6 however, you are always stuck with the pretty indentation - regardless of which of the five function variations you choose, and what arguments you give them! Here's the same function calls as above in version 9.6:

$ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest')"
  SELECT count(*) AS count
   FROM pg_class
  WHERE (pg_class.reltuples = (0)::double precision);

$ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest',false)"
  SELECT count(*) AS count
   FROM pg_class
  WHERE (pg_class.reltuples = (0)::double precision);

$ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest',true)"
  SELECT count(*) AS count
   FROM pg_class
  WHERE pg_class.reltuples = 0::double precision;

SOLUTIONS

When I first ran into this problem, the three solutions that popped into my mind were:

  1. Write a script to transform and normalize the schema output
  2. Modify the Postgres source code such that pg_get_viewdef changes its behavior
  3. Have pg_dump call pg_get_viewdef in a way that creates identical output

My original instinct was that a quick Perl script would be the overall easiest route. And while I eventually did get one working, it was a real pain to "un-pretty" the output, especially the whitespace and use of parens. A brute-force approach of simply removing all parens, brackets, and extra whitespace from the rule and view definitions almost did the trick, but the resulting output was quite uglyhard to read, and their was still some lingering whitespace problems.

Approach two, hacking the Postgres source code, is actually fairly easy. At some point, the Postgres source code was changed such that all indenting is forced "on". A single character change to the file src/backend/utils/adt/ruleutils.c did the trick:

- #define PRETTYFLAG_INDENT    2
+ #define PRETTYFLAG_INDENT    0

Although this solution will clear up the indentation and whitespace, the parenthesis are still different, and not as easily solved. Overall, not a great solution.

The third solution was to modify the pg_dump source code. In particular, it uses the pg_get_viewdef(oid) form of the function. By switching that to the pg_get_viewdef(oid,integer) form of the function, and giving it an argument of 0, both 9.2 and 9.5 output the same thing:

$ psql vtest -p 5920 -tc "select pg_get_viewdef('gregtest'::regclass, 0)"
  SELECT count(*) AS count                        +
    FROM pg_class                                 +
   WHERE pg_class.reltuples > 0::double precision;

$ psql vtest -p 5960 -tc "select pg_get_viewdef('gregtest'::regclass, 0)"
  SELECT count(*) AS count                        +
    FROM pg_class                                 +
   WHERE pg_class.reltuples > 0::double precision;

This modified version will produce the same schema in our test database:

$ diff -u <(pg_dump vtest -x -p 5920 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only)

--- /dev/fd/80               2016-09-29 12:34:56.019801980 -0400
+++ /dev/fd/88               2016-09-29 12:34:56.019881988 -0400
@@ -2,7 +2,7 @@
 -- PostgreSQL database dump
 --
 
--- Dumped from database version 9.2.18
+-- Dumped from database version 9.6.0
 -- Dumped by pg_dump version 9.6.0
 
 SET statement_timeout = 0;

The best solution, as pointed out by my colleague David Christensen, is to simply make Postgres do all the heavy lifting with some import/export magic. At the end of the day, the output of pg_dump is not only human readble, but designed to be parseable by Postgres. Thus, we can feed the old 9.2 schema into a 9.6 temporary database, then turn around and dump it. That way, we have the same pg_get_viewdef() calls for both of the schemas. Here it is on our example databases:

$ createdb -p 5960 vtest92

$ pg_dump vtest -p 5920 | psql -q -p 5960 vtest92

$ diff -s -u <(pg_dump vtest92 -x -p 5960 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only)
Files /dev/fd/63 and /dev/fd/62 are identical

CONCLUSION

Trying to compare schemas across versions can be difficult, so it's best not to try. Dumping and recreating schemas is a cheap operation, so simply dump them both into the same backend, then do the comparison.