All opinions expressed are those of the authors and not necessarily those of OSNews.com, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online


published by noreply@blogger.com (Selena Deckelmann) on 2010-02-08 16:48:00 in the "mondaylinks" category
Just a few links from the past week that are worth checking out:
  • "Spices: the internet of the ancient world!" - Planet Money Podcast. Great storytelling about the ancient spice trade and how information about where certain spices came from eventually leaked out and popped the spice trading bubble/monopoly.
  • Enterprise software is entirely bereft of soul. Reading this reminded me of antifeatures and the competitive advantages of open source software.
  • Emulating Empathy. Nice summary of how interacting with users of software (customers) on a non-technical issues, or over high-level requirements, provokes creativity. Also, that good customer communication is a *skill* not an innate talent - meaning it can be taught and learned. :)
  • Interaxon. Other than the cute name, this is a fascinating company and concept based in Vancouver, BC. Thought controlled computing! Looking forward to seeing what comes out of their "Bright Ideas" exhibit during the Winter Olympics.

published by noreply@blogger.com (Greg Sabino Mullane) on 2010-02-05 21:55:00 in the "open-source" category

So when will PostgreSQL version 9.0 come out? I decided to "run the numbers" and take a look at how the Postgres project has done historically. Here's a quick graph showing the approximate number of days each major release since version 6.0 took:

Some interesting things can be seen here: there is a rough correlation between the complexity of a new release and the time it takes, major releases take longer, and the trend is gradually towards more days per release. Overall the project is doing great, releasing on average every 288 days since version 6. If we only look at version 7 and onwards, the releases are on average 367 days apart. If we look at *just* version 7, the average is 324 days. If we look at *just* version 8, the average is 410. Since the last major version that came out was on July 1, 2009, the numbers predict 9.0 will be released on July 3, 2010, based on the version 7 and 8 averages, and on August 15, 2010, based on just the version 8 averages. However, this upcoming version has two very major features, streaming replication (SR) and hot standby (HS). How those will affect the release schedule remains to be seen, but I suspect the 9.0 to 9.1 window will be short indeed.

As a recap, the Postgres project only bumps the first part of the version number for major changes (Although many, myself included, would argue that 7.4 was such a major jump it should have been called 8.0). The second number occurs anytime a "new release" happens, and means new features and enhancements. The final number, the revision, is only incremented for security and bug fixes, and is almost always a 100% binary compatible drop in for the previous revision in the branch. (What's the average (mean) days between revisions? 84 days since version 6, and 88 days since version 7. The medians are 84 and 87 respectively.)

How busy were those periods? Here's the number of commits per release period. Note that I said release period, not release, as commits are still being made to old branches, although this is a very small minority of the commits, so I did not bother to break it down at that level.

There is a strong correlation with the previous chart. Of note is version 8.1, which had few commits and was released relatively quickly. Also note that version 8.0 is still winning as far as the sheer number of commits, most likely due to the fact that native Windows support was added in that version.

Some other items of interest from the data:

  • There have been roughly 140,000 commits from version 6.0 to 8.4.2.
  • There have been 32 CVS committers since the start of the project (and of course, many hundreds of others whose work was funnelled through those committers)
  • The mean number of commits per person is 4383, but the distribution is very skewed: Bruce, Peter, and Tom account for 80% of all commits, with the mean between them of 37,000 commits.
  • Commits changed about 40 lines on average.

Alright, two final charts: commits per time periods. I'll let the data speak for itself this time. Stay tuned for future blog posts exploring this data further!


published by noreply@blogger.com (Selena Deckelmann) on 2010-02-04 14:44:00 in the "postgres" category

I had the pleasure of attending and presenting at LinuxConf.AU this year in Wellington, NZ. Linux Conf.AU is an institution whose friendliness and focus on the practical business of creating and sustaining open source projects was truly inspirational.

My talk this year was "A Survey of Open Source Databases", where I actually created a survey and asked over 35 open source database projects to respond. I have received about 15 responses so far, and also did my own research on the over 50 projects I identified. I created a place-holder site for my research at: ossdbsurvey.org. I'm hoping to revise the survey (make it shorter!!) and get more projects to provide information.

Ultimately, I'd like the site to be a central location for finding information and comparing different projects. Performance of each is a huge issue, and there are a lot of individuals constructing good (and bad) systems for comparing. I don't think I want to dive into that pool, yet. But I would like to start collecting the work others have done in a central place. Right now it is really far too difficult to find all of this information.

Part of the talk was also a foray into the dangerous world of classification. I tried to put together basic categories, based on conversations with individual developers and some fine-tuning with Josh Berkus. Josh gave a short overview of database models during "Relational vs Non-relational" in the Data Storage mini-conf, and we collaborated some on category definition. I also saw Devdas Bhagat give a use case talk on using Postgres, yet again confirming how wonderful transactional DDL is for developers. I also gave a lightning talk (WITHOUT SLIDES!) on Bucardo at the tail end of the Data Storage mini-conf.

Josh Berkus, during "PostgreSQL Development Today", announced to the world that the new version of Postgres would be version 9.0! And he did a live demonstration of streaming replication and hot standby. The audience seemed pleased.

I was delighted to see representatives from the Postgres community on the main stage of the conference three times during LCA!

And finally, I had the pleasure of participating in the Friday keynote lightning talks. I kicked things off by telling the story of the elections in Ondo State, Nigeria, in 5 minutes. I saw that one of the IT people I met while in Akure was now helping Osun state investigate and correct election fraud in January. So glad to see that their good work continues!


published by david@endpoint.com (David Christensen) on 2010-02-01 02:54:00

A common task that comes up in PostgreSQL is the need to dump/edit a specific function. While ideally, you're using DDL files and version control (hello, git!) to manage your schema, you don't always have the luxury of working in such a controlled environment. Recent versions of psql have the ef command to edit a function from within your favorite editor, but this is available from version 8.4 onward only.

An alternate approach is to use the following invocation:

  pg_dump -Fc -s | pg_restore -P 'funcname(args)'

The -s flag is the short form of --schema-only; i.e., we don't care about wasting time/space with the data. -P tells pg_restore to extract the function with the following signature.

As always, there are some caveats: the function name must be spelled out explicitly using the full types as they occur in the dump's custom format (i.e., you must use 'foo_func(integer)' instead of 'foo_func(int)'). You can always see a list of all of the available functions by using the command:

  pg_dump -Fc -s | pg_restore -l | grep FUNCTION

published by david@endpoint.com (David Christensen) on 2010-01-28 20:48:00 in the "scalability" category

Sometime you run into a situation where you need to replicate one dataset to many machines in multiple datacenters, with different costs associated with sending to each (either real costs as in bandwidth, or virtual costs as in the amount of time it takes to transmit to each machine). Defining a Slony cluster to handle this is easy, as you can specify the topology and paths taken to replicate any changes.

    Basic topology:
  • Data center A, with machines A1, A2, A3, and A4.
  • Data center B, with machines B1, B2, B3, and B4.
  • Data center C, with machines C1, C2, C3, and C4.


Figure 1: Non-cascaded slony replication nodes/pathways.

Node A1 is the master, which propagates its changes to all other machines. In the simple setup, A1 would push all of its changes to each node, however if data centers B and C have high costs associated with transfer to the nodes, you end up transferring 4x the data needed for each data center. (We are assuming that traffic on the local subnet at each data center is cheap and fast.)

The basic idea then, is to push the changes only once to each datacenter, and let the "master" machine in the data center push the changes out to the others in the data center. This reduces traffic from the master to each datacenter, plus removes any other associated costs associated with pushing to every node.


Figure 2: Cascaded slony replication nodes/pathways

Let's look at an example configuration:

cluster_init.sh:
    #!/bin/bash

    # admin node definitions and other slony-related information are
    # stored in our preamble file.  This will define the $PREAMBLE
    # environment variable that contains basic information common to all
    # Slony-related scripts, such as slony cluster name, the nodes
    # present, and how to reach them to install slony, etc.

    . slony_preamble.sh

    slonik <<EOF
    $PREAMBLE

    init cluster ( id = 1, comment = 'A1' );

    store node (id=2,  comment = 'A2', event node=1);
    store node (id=3,  comment = 'A3', event node=1);
    store node (id=4,  comment = 'A4', event node=1);
    store node (id=5,  comment = 'B1', event node=1);
    store node (id=6,  comment = 'B2', event node=1);
    store node (id=7,  comment = 'B3', event node=1);
    store node (id=8,  comment = 'B4', event node=1);
    store node (id=9,  comment = 'C1', event node=1);
    store node (id=10, comment = 'C2', event node=1);
    store node (id=11, comment = 'C3', event node=1);
    store node (id=12, comment = 'C4', event node=1);

    # pathways from A1 -> A2, A3, A4 and back
    store path (server = 1, client = 2, conninfo = 'dbname=data host=node2.datacenter-a.com');
    store path (server = 1, client = 3, conninfo = 'dbname=data host=node3.datacenter-a.com');
    store path (server = 1, client = 4, conninfo = 'dbname=data host=node4.datacenter-a.com');
    store path (server = 2, client = 1, conninfo = 'dbname=data host=node1.datacenter-a.com');
    store path (server = 3, client = 1, conninfo = 'dbname=data host=node1.datacenter-a.com');
    store path (server = 4, client = 1, conninfo = 'dbname=data host=node1.datacenter-a.com');

    # pathway from A1 -> B1 and back
    store path (server = 1, client = 5, conninfo = 'dbname=data host=node1.datacenter-b.com');
    store path (server = 5, client = 1, conninfo = 'dbname=data host=node1.datacenter-a.com');

    # pathways from B1 -> B2, B3, B4 and back
    store path (server = 5, client = 6, conninfo = 'dbname=data host=node2.datacenter-b.com');
    store path (server = 5, client = 7, conninfo = 'dbname=data host=node3.datacenter-b.com');
    store path (server = 5, client = 8, conninfo = 'dbname=data host=node4.datacenter-b.com');
    store path (server = 6, client = 5, conninfo = 'dbname=data host=node1.datacenter-b.com');
    store path (server = 7, client = 5, conninfo = 'dbname=data host=node1.datacenter-b.com');
    store path (server = 8, client = 5, conninfo = 'dbname=data host=node1.datacenter-b.com');

    # pathway from A1 -> C1 and back
    store path (server = 1, client = 9, conninfo = 'dbname=data host=node1.datacenter-c.com');
    store path (server = 9, client = 1, conninfo = 'dbname=data host=node1.datacenter-a.com');

    # pathways from C1 -> C2, C3, C4 and back
    store path (server = 9, client = 10, conninfo = 'dbname=data host=node2.datacenter-c.com');
    store path (server = 9, client = 11, conninfo = 'dbname=data host=node3.datacenter-c.com');
    store path (server = 9, client = 12, conninfo = 'dbname=data host=node4.datacenter-c.com');
    store path (server = 10, client = 9, conninfo = 'dbname=data host=node1.datacenter-c.com');
    store path (server = 11, client = 9, conninfo = 'dbname=data host=node1.datacenter-c.com');
    store path (server = 12, client = 9, conninfo = 'dbname=data host=node1.datacenter-c.com');

    EOF

As you can see in the initialization script, we're defining the basic topology for the cluster. We're defining each individual node, and the paths that slony will use to communicate events and other status. Since slony needs to communicate status both ways, we need to define the paths for each node's edge both ways. In particular, we've defined pathways from A1 to each of the other A nodes, A1 to B1 and C1, and B1 and C1 to each of their respective nodes.

Now it's a matter of defining the replication sets and describing the subscriptions for each. We will use something like the following for our script:

cluster_define_set1.sh:
    #!/bin/bash

    # reusing our standard cluster information
    . slony_preamble.sh

    slonik <<EOF
    $PREAMBLE

    create set ( id = 1, origin = 1, comment = 'set 1' );

    set add table ( set id = 1, origin = 1, id = 1, fully qualified name = 'public.table1');
    set add table ( set id = 1, origin = 1, id = 2, fully qualified name = 'public.table2');
    set add table ( set id = 1, origin = 1, id = 3, fully qualified name = 'public.table3');

    EOF

Here we've defined the tables that we want replicated from A1 to the entire cluster; there is nothing specific to this particular scenario that we need to consider.

cluster_subscribe_set1.sh:
    #!/bin/bash

    # reusing our standard cluster information
    . slony_preamble.sh

    slonik <<EOF
    $PREAMBLE

    # define our forwarding subscriptions (i.e., A1 -> B1, C1)
    subscribe set ( id = 1, provider = 1, receiver = 5, forward = yes);
    subscribe set ( id = 1, provider = 1, receiver = 9, forward = yes);

    # define the subscriptions for each of the datacenter sets
    # A1 -> A2, A3, A4
    subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);
    subscribe set ( id = 1, provider = 1, receiver = 3, forward = no);
    subscribe set ( id = 1, provider = 1, receiver = 4, forward = no);

    # B1 -> B2, B3, B4
    subscribe set ( id = 1, provider = 5, receiver = 6, forward = no);
    subscribe set ( id = 1, provider = 5, receiver = 7, forward = no);
    subscribe set ( id = 1, provider = 5, receiver = 8, forward = no);

    # C1 -> C2, C3, C4
    subscribe set ( id = 1, provider = 9, receiver = 10, forward = no);
    subscribe set ( id = 1, provider = 9, receiver = 11, forward = no);
    subscribe set ( id = 1, provider = 9, receiver = 12, forward = no);

    EOF

The key points here are that you specify the provider nodes and the receiver nodes to specify how the particular replication occurs. For the subscription to any cascade point (i.e., B1 and C1), you need to have the 'forward = yes' parameter to ensure that the events properly cascade to the sub-nodes. In any of the other nodes' subscription, you should set 'forward = no'.

In actual deployment of this setup, you would want to wait for the subscription from A1 -> B1 and A1 -> C1 to complete successfully before subscribing the sub-nodes. Additionally, this solution assumes high availability between nodes and does not address failure of particular machines; in particular, A1, B1, and C1 are key to maintaining the full replication.


published by noreply@blogger.com (Greg Sabino Mullane) on 2010-01-28 15:44:00 in the "open-source" category

It looks like 2010 *might* be the year that Postgres officially makes the jump to git. Currently, the project uses CVS, with a script that moves things to the now canonical Postgres git repo at git.postgresql.org. This script has been causing problems, and is still continuing to do so, as CVS is not atomic. Once the project flips over, CVS will still be available, but CVS will be the slave and git the master, to put things in database terms. The conversion from git to CVS is trivial compared to the other way around, so there is no reason Postgres cannot continue to offer CVS access to the code for those unwilling or unable to use git.

On that note, I'm happy to see that the number of developers and committers who are using git - and publicly stating their happiness with doing so - has grown sharply in the last couple of years. Peter Eisentraut (with some help from myself) set up git.postgresql.org in 2008, but interest at that time was not terribly high, and there was still a lingering question of whether git was really the replacement for CVS, or if it would be some other version control system. There is little doubt now that git is going to win. Not only for the Postgres project, but across the development world in general (both open and closed source).

To drive the point home, Andrew has announced he is working on git integration with the Postgres build farm. Of course, I submitted a patch to do just that back in March 2008, but I was ahead of my time :). Besides, mine was a simple proof of concept, while it sounds like Andrew is actually going to do it the right way. Go Andrew!

Of all the projects I work on, the great majority are using git now. We've been using git at End Point as our preferred VCS for both internal projects and client work for a while now, and are very happy with our choice. There is only one other project I work on besides Postgres that uses CVS, but it's a small project. I don't know of any other project of Postgres' size that is still using CVS (anyone know of any?). Even emacs recently switched away from CVS, although they went with bazaar instead of git for some reason. Subversion is still being used by a substantial minority of the projects I'm involved with, mostly due to the historical fact that there was a window of time in which CVS was showing its limitations, but subversion was the only viable option. Sure would be nice if perl.org would offer git for Perl modules, as they do for subversion currently (/hint). Finally, there are a few of my projects that use something else (mercurial, monotone, etc.). Overall, git accounts for the lion's share of all my projects, and I'm very happy about that. There is a very steep learning curve with git, but the effort is well worth it.

If you want to try out git with the Postgres project, first start by installing git. Unfortunately, git is still new enough, and actively developed enough, that it may not be available on your distro's packaging system, or worse, the version available may be too old to be useful. Anything older than 1.5 should *not* be used, period, and 1.6 is highly preferred. I'd recommend taking the trouble to install from source if git is older than 1.6. Once installed, here's the steps to clone the Postgres repo.

git clone git://git.postgresql.org/git/postgresql.git postgres

This step may take a while, as git is basically putting the entire Postgres project on your computer - history and all! It took me three and a half minutes to run, but your time may vary.

Once that is done, you'll have a directory named "postgres". Change to it, and you can now poking around in the code, just like CVS, but without all the ugly CVS directories. :)

For more information, check out the "Working with git" page on the Postgres wiki.

Here's to 2010 being the year Postgres finally abandons CVS!


published by steph@endpoint.com (Steph Powell) on 2010-01-25 17:45:00 in the "seo" category

Today, Chris sent me an email with two questions for one of our ecommerce clients:

  • For ecommerce client A, should a forum or blog be added?
  • For ecommerce client A, should the client use Blogger or WordPress if they add a blog?

These are relevant questions to all of our clients because forums and blogs can provide value to a static site or ecommerce site. I answered Chris' question and thought I'd expand on it a bit for a brief article.

First, a rundown comparing the pros and cons of blog versus forum:

  Blog Forum
Pros
  • Content tends to be more organized.
  • Content can be built to be more targeted for search.
  • Content can be syndicated easily.
  • There can be much more content because users are contributing content.
  • Since there is more user generated content, it has the potential to cover more of the long tail in search.
  • There is more potential for user involvement and encouragement to build and contribute to a community.
Cons
  • User generated content will remain minimal if comments are the only form of user generated content in a blog.
  • If internal staff is responsible for authoring content, you can't write as much content as users can contribute.
  • A forum requires management to prevent user spam.
  • A forum requires organization to maintain usability and search engine friendliness.

If we assume that it takes the same amount of effort to write articles as it does to manage user generated content, the decision comes down to whether or not you want to utilize user contributions as part of the content. If the effort involved to write content or manage user generated content is different, a decision should be made based on how much effort the site owners want to make. Other opportunities for user generated content include product reviews and user QnA.

Next, a rundown comparing the pros and cons of Blogger versus self-hosted WordPress:

 
Pros
  • There are a decent amount of widgets available to integrate into a Blogger instance.
  • Fast Google indexing of content may result since the content is hosted by Google.
  • There is decent search implementation on Blogger.
  • A Blogger instance is very easy to create and easy to use.
  • There is a very large feature set available through the WordPress plugin community.
  • Self-hosted WordPress blogs are relatively easy to set up. Many hosting platforms include WordPress installation and setup at the click of a button.
  • WordPress gives you control over the URL structure (articles, categories, tags) through permalinks.
  • Self-hosted WordPress can live at www.yoursite.com/blog/ which can strengthen your domain value in search through external links.
  • WordPress has a very flexible taxonomy system.
Cons
  • The Blogger taxonomy system is limited (using labels) and labels pages are blocked in robots.txt to reduce indexation and search traffic of the label pages.
  • Blogger does not allow for a flexible URL structure. Once an article is published and a title is changed, the URL does not change.
  • Developers must be familiar with the Blogger template language to customize the template.
  • With Blogger, a blog can't be hosted at http://www.yoursite.com/blog/. It can be hosted at http://blog.yoursite.com/. While this results in a strong subdomain, it does not strengthen your domain for search through external links to the blog.
  • Self-hosted WordPress requires your own hosting, setup and installation.
  • Self-hosted WordPress requires management of upgrades and plugins. Plugins may require code changes to the template files.
  • Self-hosted WordPress allows you to select existing themes, but you must be familiar with the WordPress template structure if you want a custom blog look.

The decision to create a Blogger blog or install a WordPress blog will depend on resources such as engineering or designer involvement. A self-hosted blog solution will likely provide a larger feature set and more flexibility, but it also requires more time to enhance, manage and maintain the software. A hosted blog solution such as Blogger will be easy to set up and maintain, but has disadvantages because it is a less flexible solution. I didn't discuss a WordPress-hosted solution because I'm not very familiar with this type of setup, however, I believe the WordPress-hosted solution limits the use of plugins and themes.

For our ecommerce clients, installing a self-hosted WordPress instance on top of their Spree or Interchange ecommerce site has been relatively simple. For another one of our clients, we developed a Radiant plugin to integrate Blogger article links into their site, which has worked well to fit their needs.


published by steph@endpoint.com (Steph Powell) on 2010-01-22 22:38:00 in the "seo" category

Yesterday I attended SEOMoz's webinar titled "SEO Strategies for 2010". Some interesting factoids, comments and resources for SEO in 2010 were presented that I thought I'd highlight:

  • Mobile browser search
    • Mobile search and ecommerce will be a large area of growth in 2010.
    • Google Webmaster Tools allows you to submit mobile sitemaps, which can help battle duplicate content between non-mobile and mobile versions of site content. Another way to handle duplicate content would be to write semantic HTML that allows sites to serve non-mobile and mobile CSS.
  • Social Media: Real Time Search
    • Real time search marked its presence in 2009. The involvement of Twitter in search is evolving.
    • Tracking and monitoring on URL shortening services should be set up to measure traffic and benefit from Twitter.
    • Dan Zarrella published research on The Science of Retweeting. This is an interesting resource with fascinating statistics on retweets.
  • Social Media: Facebook's Dominance
    • Recent research by comScore has shown that 5.5% of all time on the web is spent in Facebook.
    • Facebook has very affordable advertising. Facebook has so much demographic and psychographic data that allows sites to deliver very targeted advertisements.
    • Facebook shouldn't be ignored as a potential business network, but metrics should be put in place to determine the value it brings.
  • Social Media: Shifting LinkGraph
    • In the past, sites received links from blog resources which became a factor in the site's popularity rankings in search. Now, linking has shifted to microblogging such as twitter or other social media platforms. Some folks are stingier about passing links through sites rather than social media. It's interesting to observe how links and information is passed through the web and consider how this can affect search.
  • Bing
    • Despite the fact that Google is responsible for a large percentage of search, Bing shouldn't be ignored.
    • Bing has shown some differences in ranking such as being less sensitive to TLDs (.info, .cc, .net, etc.), and giving more weight to sites with keywords in the domain than other search engines.
  • Other
    • Personalized search is on the rise. This is something to pay attention to, but hard to measure.
    • QDF (query deserves freshness), a search factor related to the freshness of content, has led to search engines indexing content faster. 2010 search strategies recommend becoming a news source to improve search performance.
    • Local search is definitely something to be aware of in 2010. Google's Place Rank algorithm is similar to the PageRank algorithm - it looks at specific location or local attributes as a factor in local search.

I found that a trend of the discussion revolved around having good metrics, not just good metrics, but the right metrics such as conversion and engagement. Testing any of the recommendations above (improving your mobile browsing, getting involved in social media, optimizing for Bing) should be measured against conversion to determine the value of the efforts. Also, multivariate or A/B testing were recommended for testing local search optimization and other efforts.


published by noreply@blogger.com (Greg Sabino Mullane) on 2010-01-20 17:45:00 in the "open-source" category

I've just released a small Perl script that has helped me solve a specific problem with Postgres dump files. When you use pg_dump or pg_dumpall, it outputs things in the following order, per database:

  1. schema creation commands (e.g. CREATE TABLE)
  2. data loading command (e.g. COPY tablename FROM STDIN)
  3. post-data schema commands (e.g. CREATE INDEX)

The problem is that using the --schema-only flag outputs the first and third sections into a single file. Hence, if you load the file and then load a separate --data-only dump, it can be very slow as all the constraints, indexes, and triggers are already in place. The split_postgres_dump script breaks the dump file into two segments, a "pre" and a "post". (It doesn't handle a file with a data section yet, only a --schema-only version)

Why would you need to do this instead of just using a full dump? Some reasons I've found include:

  • When you need to load the data more than once, such as debugging a data load error.
  • When you want to stop after the data load step (which you can't do with a full dump)
  • When you need to make adjustments to the schema before the data is loaded (seen quite a bit on major version upgrades)

Usage is simply ./split_postgres_dump.pl yourdumpfile.pg, which will then create two new files, yourdumpfile.pg.pre and yourdumpfile.pg.post. It doesn't produce perfectly formatted files, but it gets the job done!

It's a small script, so it has no bug tracker, git repo, etc. but it does have a small wiki page at http://bucardo.org/wiki/Split_postgres_dump from which you can download the latest version.

Future versions of pg_dump will allow you to break things into pre and post data sections with flags, but until then, I hope somebody finds this script useful.

Update: There is now a git repo:
git clone git://bucardo.org/split_postgres_dump.git


published by steph@endpoint.com (Steph Powell) on 2010-01-13 14:30:00 in the "Spree" category

Yesterday, I went through some examples using hook and theme implementation in Spree, an open source Ruby on Rails ecommerce platform. I decided to follow-up with closing thoughts and comments today.

I only spent a few hours working with the new Spree edge code (Version 0.9.5), but I was relatively happy with the Spree theme and hook implementation, as it does a better job decoupling the extension views with Spree core functionality and views. However, I found several issues that are potential areas for improvement with this release or releases to come.

Theme too clunky?

One concern I have is that the entire "views" directory from SPREE_ROOT/app was moved into the theme with this theme-hook work (all of the "V" in MVC). Yesterday, I discussed how WordPress had designed a successful theme and plugin interaction and one thing I mentioned was that a WordPress theme was lightweight and comprised of several customer-facing PHP files (index, single post page, archive pages, search result page). Moving all of the Spree core views to the theme presents a couple of issues, in my opinion:

  • A developer that jumps into theme development is immediately met with more than 50 files in the theme directory to understand and work with. What you may notice from my tutorial yesterday is that I actually changed the look of Spree through an extension rather than creating a new theme - I believe there is better separation of my custom design and the Spree core if I included the custom styling in the extension rather than creating a new theme and copying over 50+ files to edit. I'm also more comfortable working with CSS to manipulate the appearance rather than editing and maintaining those files. Now, the next time the Spree core and default template are updated, I don't have to worry about copying and pasting all the theme files into my custom theme and managing modifications. I think over time, Spree should aim to improve separation of theme views and core views and simplify the theme views.
  • The new default Spree includes the admin views. Spree developers and users are probably more interested in changing and modifying customer-facing pages than admin pages. I believe that Spree should focus on developing a strong admin interface and assume that only more advanced developers will need to override the admin views. The admin view would contain a set of predefined core hooks to add tabs and reports. Rather than having a theme with all of the rails views, the theme should be a lightweight collection of files that are likely to be edited by users and the Spree core should include files that are less likely to be modified (and in theory, have an awesome admin interface that would only be extended with additional reports or additional fields for object updates and edits).

Theme-Hook Decoupling?


Extension views or text are hooked through the hooks defined in the theme.

Another big concern I have is the tight coupling between Spree themes and hooks. All of the hooks are defined in the Spree theme. If someone were to switch from one theme to another, there is the potential for functionality to be lost if consistency between theme hooks isn't enforced. This issue piggybacks off of the first issue: I think the Spree core should have control of all the admin views and admin hooks. It would be great to see the views simplified or refactored and allow Spree core to control and instantiate many hooks. I think it's great to provide the flexibility to instantiate hooks in themes, but I think the core code (admin, especially) should be more opinionated and contain its own set of views with hooks that would likely be overridden less frequently.


A more ideal approach to decouple appearance and functionality would require hooks to be defined in the Spree core.

Conclusion

In the tutorial, I also didn't address extended core functionality with models and controllers in the extensions. The logic discussed the article Rails Ecommerce Product Optioning in Spree and Rails Approach for Spree Shopping Cart Customization should work with some view modifications to use existing hooks instead of overriding core views.


A screenshot of the tutorial app in yesterday's article.

Despite the issues mentioned above, I think that the hook and theme work in the upcoming Spree 0.9.5 release is a big step in the right direction to improve the customization building blocks of Spree. It was mentioned in yesterday's article that the release hasn't been made official, but several developers have expressed an interest in working with the code. Hopefully the final kinks of theme and hook implementation will be worked out and the new release will be announced soon. Over time, the hook and theme implementation will advance and more examples and documentation will become available.

Learn more about End Point's general Rails development and Rails shopping cart development.


published by steph@endpoint.com (Steph Powell) on 2010-01-13 02:18:00 in the "Spree" category

In the last couple months, there's been a bit of buzz around theme and hook implementation in Spree. The Spree team hasn't officially announced the newest version 0.9.5, but the edge code is available at http://github.com/railsdog/spree and developers have been encouraged to work with the edge code to check out the new features. Additionally, there is decent documentation here about theme and hook implementation. In this article, I'll go through several examples of how I would approach site customization using hooks in the upcoming Spree 0.9.5 release.

Background

I've been a big proponent of how WordPress implements themes, plugins, and hooks in the spree-user Google group. The idea behind WordPress themes is that a theme includes a set of PHP files that contain the display logic, HTML, and CSS for the customer-facing pages:

  • index
  • a post page
  • archive pages (monthly, category, tag archives)
  • search result page
  • etc.

In many cases, themes include sections (referred to as partial views in Rails), or components that are included in multiple template pages. An example of this partial view is the sidebar that is likely to be included in all of the page types mentioned above. The WordPress theme community is abundant; there are many free or at-cost themes available.

The concept behind WordPress plugins is much like Spree extension functionality - a plugin includes modular functionality to add to your site that is decoupled from the core functionality. Judging from the popularity of the WordPress plugin community, WordPress has done a great job designing the Plugin API. In most cases, the Plugin API is used to extend or override core functionality and add to the views without having to update the theme files themselves. An example of using the WordPress plugin API to add an action to the wp_footer hook is accomplished with the following code:

/* inside plugin */
function add_footer_text() {
    echo '<p>Extra Footer Text!!</p>';
}
add_action('wp_footer', 'add_footer_text');

WordPress themes and plugins with hooks are the building blocks of WordPress: with them, you piece together the appearance and functionality for your site. I reference WordPress as a basis of comparison for Spree, because like WordPress users, Spree users aim to piece together the appearance and functionality for their site. One thing to note is that the hook implementation in Spree is based on hook implementation in Redmine.

Spree Code

I grabbed the latest code at http://github.com/railsdog/spree. After examining the code and reviewing the SpreeGuides documentation, the first thing I learned is that there are four ways to work with hooks:

  • insert before a hook component
  • insert after hook component
  • replace a hook component's contents
  • remove a hook component

The next thing I researched was the hook components or elements. Below are the specific locations of hooks. The specific locations are more meaningful if you are familiar with the Spree views. The hooks are merely wrapped around parts of pages (or layouts) like the product page, product search, homepage, etc. Any of the methods listed above can act on any of the components listed below.

  • layout: inside_head, sidebar
  • homepage: homepage_sidebar_navigation, homepage_products
  • product search: search_results
  • taxon: taxon_side_navigation, taxon_products, taxon_children
  • view product: product_description, product_properties, product_taxons, product_price, product_cart_form, inside_product_cart_form
  • etc.

After I spent time figuring out the hook methods and components, I was ready to do stuff. First, I got Spree up and running (refer to the SpreeGuides for more information):


Spree startup with seed data and images.

Next, I updated the product list with a few pretend products. Let's take a quick look at the site with the updated products:


Spree with new product data for test site.

Example #1: Replace the logo and background styling.

First, I created an extension with the following code. Spree's extensions are roughly based off of Radiant's extension system. It's relatively simple to get an extension up and running with the following code (and server restart).

script/generate extension StephsPhotos

Next, I wanted to try out the insert_after method to append a stylesheet to the default theme inside the <head> html element. I also wanted to remove the sidebar because my test site only has 8 products (lame!) and I don't need sidebar navigation. This was accomplished with the following changes:

  • First, I added the insert_after hook to add a view that contains my extra stylesheet. I also added the remove hook to remove the sidebar element:
    # RAILS_ROOT/vendor/extensions/stephs_photos/stephs_photos_hooks.rb
    insert_after :inside_head, 'shared/styles'
    remove :sidebar
    
  • Next, I added a new view in the extension to include the new stylesheet.
    # RAILS_ROOT/vendor/extensions/stephs_photos/app/views/shared/_styles.erb
    <link type="text/css" rel="stylesheet" href="/stylesheets/stephs_photos.css">
    
  • Next, I created a new stylesheet in the extension.
    /* RAILS_ROOT/vendor/extensions/stephs_photos/public/stylesheets/stephs_photos.css */
    body { background: #000; }
    body.two-col div#wrapper { background: none; }
    a, #header a { color: #FFF; text-decoration: none; }
    
    ul#nav-bar { width: 280px; line-height: 30px; margin-top: 87px; font-size: 1.0em; }
    ul#nav-bar li form { display: none; }
    
    .container { width: 750px; }
    #wrapper { padding-top: 0px; }
    
    .product-listing li { background: #FFF; height: 140px; }
    .product-listing li a.info { background: #FFF; }
    
    body#product-details div#wrapper { background: #000; }
    body#product-details div#content, body#product-details div#content h1 { color: #FFF; margin-left: 10px; }
    #taxon-crumbs { display: none; }
    #product-description { width: 190px; border: none; }
    .price.selling { color: #FFF; }
    #product-image #main-image { min-height: 170px; }
    
    /* Styling in this extension only applies to product and main page */
    
    div#footer { display: none; }
    

One more small change was required to update the logo via a Rails preference. I set the logo preference variable to a new logo image and uploaded the logo to RAILS_ROOT/vendor/extensions/stephs_photos/public/images/.

# RAILS_ROOT/vendor/extensions/stephs_photos/stephs_photos_extension.rb
def activate
 AppConfiguration.class_eval do
   preference :logo, :string, :default => 'images/stephs_photos.png'
 end
end

After restarting the server, I was happy with the new look for my site accomplished using the insert_after and remove methods:


New look for Spree acomplished with several small changes.

Note: You can also add a stylesheet with the code shown below. However, I wanted to use the hook method described above for this tutorial.

def activate
  AppConfiguration.class_eval do 
    preference :stylesheets, :string, :default => 'styles'
  end
end 
Example #2: Use insert_before to insert a view containing Spree core functionality.

The next requirement I imagined was adding promo functionality to the product listing page. I wanted to use core Spree logic to determine which promo image to use. The first promo image would be a 10% off discount to users that were logged in. The second promo image would be a 15% off discount offered to users who weren't logged in and created an account. I completed the following changes for this work:

  • First, I added the insert_before method to add the promo view before the homepage_products component, the component that lists the products on the homepage.
    # RAILS_ROOT/vendor/extensions/stephs_photos/stephs_photos_hooks.rb
    insert_before :homepage_products, 'shared/stephs_promo'
    
  • Next, I added the view using core Spree user functionality.
    # RAILS_ROOT/vendor/extensions/stephs_photos/app/views/shared/_stephs_promo.erb
    <% if current_user -%>
    <img src="http://www.blogger.com/images/promo10.png" alt="10 off" />
    <% else -%>
    <img src="http://www.blogger.com/images/promo15.png" alt="15 off" />
    <% end -%>
    
  • Finally, I uploaded my promo images to RAILS_ROOT/vendor/extensions/stephs_photos/public/images/

After another server restart and homepage refresh, I tested the logged in and logged out promo logic.


vs.


Spree core functionality used to display two different promo images inside a partial view.

Note: The promo coupon logic that computes the 10% or 15% off was not included in this tutorial.

Example #3: Use replace method to replace a component on all product pages.

In my third example, I imagined that I wouldn't have time to manage product descriptions when I was rich and famous. I decided to use the replace hook to replace the product description on all product pages. I completed the following steps for this change:

  • First, I added the replace method to replace the :product_description component with a rails partial view.
    # RAILS_ROOT/vendor/extensions/stephs_photos/stephs_photos_hooks.rb
    replace :product_description, 'shared/generic_product_description'
    
  • Next, I created the view with the generic product description.
    # RAILS_ROOT/vendor/extensions/stephs_photos/app/views/shared/_generic_product_description.erb
    all prints are 4x6 matte prints.<br />
    all photos ship in a folder.
    

After yet another server restart and product refresh, I tested the generic product description using the replace hook.


The replace hook was used to replace product descriptions on all product pages.

Intermission

OK, so hopefully you see the trend:

  1. Figure out which component you want to pre-append, post-append, replace, or remove.
  2. Modify extension_name_hooks.rb to include your hook method (and pass the view, if necessary).
  3. Create the new view in your extension.
  4. Restart server and be happy!

I'll note a couple other examples below.

Example #4: Bummer that there's no footer component

In the next step, I intended to add copyright information to the site's footer. I was disappointed to find that there was no hook wrapped around the footer component. So, I decided not to care for now. But in the future, my client (me) may make this a higher priority and the options for making this change might include:

  • Clone the default template and modify the template footer partial view.
  • Clone the default template, create a hook to wrap around the footer component, add the changes via a hook in an extension.
  • Add a view in the extension that overrides the theme footer view.
Example #5: Add text instead of partial view.

Since I couldn't add copyright information below the footer, I decided to add it using after the inside_product_cart_form component using the insert_after hook. But since it's a Friday at 5:30pm, I'm too lazy to create a view, so instead I'll just add text for now with the following addition to the extension hooks file:

# RAILS_ROOT/vendor/extensions/stephs_photos/stephs_photos_hooks.rb
insert_after :inside_product_cart_form, :text => '<p>&copy; stephpowell. all rights reserved.</p>'

Server restart, and I'm happy, again:


Text, rather than a partial view, was appended via a hook.

Hopefully my examples were exciting enough for you. There's quite a lot you can do with the hook methods, and over time more documentation and examples will become available through the Spree site, but I wanted to present a few very simple examples of my approach to customization in Spree. I've uploaded the extension to http://github.com/stephp/stephs_photos for this article.

Tomorrow, I'm set to publish closing thoughts and comments on the hook implementation since this article is now too long for a blog post. Stay tuned.

Learn more about End Point's general Rails development and Rails shopping cart development.


published by noreply@blogger.com (Greg Sabino Mullane) on 2010-01-11 13:00:00 in the "tips" category

Upgrading between major versions of Postgres is a fairly straightforward affair, but Murphy's law often gets in the way. Here at End Point we perform a lot of upgrades, and the following list explains some of the problems that come up, either during the upgrade itself, or afterwards.

When we say upgrade, we mean going from an older major version to a newer major version. We've (recently) migrated client systems as old as 7.2 to as new as 8.4. The canonical way to perform such an upgrade is to simply do:

pg_dumpall -h oldsystem > dumpfile
psql -h newsystem -f dumpfile

The reality can be a little more complicated. Here are the top ten gotchas we've come across, and their solutions. The more common and severe problems are at the top.

1. Removal of implicit casting

Postgres 8.3 removed many of the "implicit casts", meaning that many queries that used to work on previous versions now gave an error. This was a pretty severe regression, and while it is technically correct to not have them, the sudden removal of these casts has caused *lots* of problems. Basically, if you are going from any version of PostgreSQL 8.2 or lower to any version 8.3 or higher, expect to run into this problem.

Solution: The best way of course is to "fix your app", which means specifically casting items to the proper datatype, for example writing "123::int" instead of "123". However, it's not always easy to do this - not only can finding and changing all instances across your code base be a huge undertaking, but the problem also exists for some database drivers and other parts of your system that may be out of your direct control. Therefore, the other option is to add the casts back in. Peter Eisentraut posted a list of casts that restore some of the pre-8.3 behavior. Do not just apply them all, but add in the ones that you need. We've found that the first one (integer AS text) solves 99% of our clients' casting issues.

2. Encoding issues (bad data)

Older databases frequently were not careful about their encoding, and ended up using the default "no encoding" mode of SQL_ASCII. Often this was done because nobody was thinking about, or worrying about, encoding issues when the database as first being designed. Flash forward years later, and people want to move to something better than SQL_ASCII such as the now-standard UTF-8. The problem is that SQL_ASCII accepts everything without complaint, and this can cause you migration to fail as the data will not load into the new database with a different encoding. (Also note that even UTF-8 to UTF-8 may cause problems as it was not until Postgres version 8.1 that UTF-8 input was strictly validated.)

Solution: The best remedy is to clean the data on the "old" database and try the migration again. How to do this depends on the nature of the bad data. If it's just a few known rows, manual updates can be done. Otherwise, we usually write a Perl script to search for invalid characters and replace them. Alternatively, you can pipe the data through iconv in the middle of the upgrade. If all else fails, you can always fall back to SQL_ASCII on the new database, but that should really be a last resort.

3. Time

Since the database is almost always an integral part of the business, minimizing the time it is unavailable for use is very important. People tend to underestimate how much time an upgrade can take. (Here we are talking about the actual migration, not the testing, which is a very important step that should not be neglected.) Creating the new database and schema objects is very fast, of course, but the data must be copied row by row, and then all the constraints and indexes created. For large databases with many indexes, the index creation step can take longer than the data import!

Solution: The first step is to do a practice run with as similar hardware as possible to get an idea of how long it will take. If this time period does not comfortably fit within your downtime window (and by comfortable, I mean add 50% to account for Murphy), then another solution is needed. The easiest way is to use a replication system like Bucardo to "pre-populate" the static part of the database, and then the final migration only involves a small percentage of your database. It should also be noted that recent versions of Postgres can speed things up by using the "-j" flag to the pg_restore utility, which allows some of the restore to be done in parallel.

4. Dependencies

When you upgrade Postgres, you're upgrading the libraries as well, which many other programs (e.g. database drivers) depend on. Therefore, it's important to make sure everything else relying on those libraries still works. If you are installing Postgres with a packaging system, this is usually not a problem as the dependencies are taken care of for you.

Solution: Make sure your test box has all the applications, drivers, cron scripts, etc. that your production box has and make sure that each of them either works with the new version, or has a sane upgrade plan. Note: Postgres may have some hidden indirect dependencies as well. For example, if you are using Pl/PerlU, make sure that any external modules used by your functions are installed on the box.

5. Postgres contrib modules

Going from one version of Postgres to another can introduce some serious challenges when it comes to contrib modules. Unfortunately, they are not treated with the same level of care as the Postgres core is. To be fair, most of them will continue to just work, simply by doing a "make install" on the new database before attempting to import. Some modules, however, have functions that no longer exist. Some are not 100% forward compatible, and some even lack important pieces such as uninstall scripts.

Solution: Solving this depends quite a bit on the exact nature of the problem. We've done everything from carefully modifying the --schema-only output, to modifying the underlying C code and recompiling the modules, to removing them entirely and getting the functionality in other ways.

6. Invalid constraints (bad data)

Sometimes when upgrading, we find that the existing constraints are not letting the existing data back in! This can happen for a number of reasons, but basically it means that you have invalid data. This can be mundane (a check constraint is missing a potential value) or more serious (multiple primary keys with the same value).

Solution: The best bet is to fix the underlying problem on the old database. Sometimes this is a few rows, but sometimes (as in a case with multiple identical primary keys), it indicates an underlying hardware problem (e.g. RAM). In the latter case, the damage can be very widespread, and your simple upgrade plan has now turned into a major damage control exercise (but aren't you glad you found such a problem now rather than later?) Detecting and preventing such problems is the topic for another day. :)

7. tsearch2

This is a special case for the contrib module situation mentioned above. The tsearch2 module first appeared in version 7.4, and was moved into core of Postgres in version 8.3. While there was a good attempt at providing an upgrade path, upgrades can still cause an occasional issue.

Solution: Sometimes the only real solution is edit the pg_dump output by hand. If you are not using tsearch in that many places (e.g. just a few indexes or columns on a couple tables), you can also simply remove it before the upgrade, then add it back in afterwards.

8. Application behavior

In addition to the implicit casting issues above, applications sometimes have bad behaviors that were tolerated in older versions of Postgres, but now are not. A typical example is writing queries without explicitly naming all of the tables in the "FROM" section.

Solution: As always, fixing the app is the best solution. However, for some things you can also flip a compatibility switch inside of postgresql.conf. In the example above, one would change the "add_missing_from" from its default of 'off' to 'on'. This should be considered an option of last resort, however.

9. System catalogs

Seldom a major update goes by that doesn't see a change in the system catalogs, the low-level meta-data tables used by Postgres to describe everything in the database. Sometimes programs rely on the catalogs looking a certain way.

Solution: Most programs, if they use the system catalogs directly, are careful about it, and upgrading the program version often solves the problem. At other times, we've had to rewrite the program right then and there, either by having it abstract out the information (for example, by using the information_schema views), or (less preferred) by adding conditionals to the code to handle multiple versions of the system catalogs.

10. Embedded data

This is a rare but annoying problem: triggers on a table rely on certain data being in other tables, such that doing a --schema-only dump before a --data-only dump will always fail when importing.

Solution: The easiest way is to simply use pg_dumpall, which loads the schema, then the data, then the constraints and indexes. However, this may not be possible if you have to separate things for other reasons (such as contrib module issues). In this case, you can break the --schema-only pg_dump output into pre and post segments. We have a script that does this for us, but it is also slated to be an option for pg_dump in the future.

That's the list! If you've seen other things, please make a note in the comments. Don't forget to run a database-wide ANALYZE after importing into your new database, as the table statistics are not carried across when using pg_dump.


published by david@endpoint.com (David Christensen) on 2010-01-10 02:59:00 in the "tips" category

I was investigating a recent Postgres server issue, where we had discovered that one of the RAM modules on the server in question had gone bad. Unsurprisingly, one of the things we looked at was the possibility of having to do a restore from a SQL dump, as if there had been any potential corruption to the data directory, a base backup would potentially have been subject to the same possible errors that we were trying to restore to avoid.

As it was already the middle of the night (anyone have a server emergency during the normal business hours?), my investigations were hampered by my lack of sleep.

If there had been some data directory corruption, the pg_dump process would likely fail earlier than in the backup process, and we'd expect the dumps to be truncated; ideally this wasn't the case, as memory testing had not shown the DIMM to be bad, but the sensor had alerted us as well.

I logged into the backup server and looked at the backup dumps; from the alerts that we'd gotten, the memory was flagged bad on January 3. I listed the files, and noticed the following oddity:

 -rw-r--r-- 1 postgres postgres  2379274138 Jan  1 04:33 backup-Jan-01.sql.gz
 -rw-r--r-- 1 postgres postgres  1957858685 Jan  2 09:33 backup-Jan-02.sql.gz

Well, this was disconcerting. The memory event had taken place on the 3rd, but there was a large drop in size of the dumps between January 1st and January 2nd (more than 400MB of *compressed* output, for those of you playing along at home). This indicated that either the memory event took place earlier than recorded, or something somewhat catastrophic had happened to the database; perhaps some large deletion or truncation of some key tables.

Racking my brains, I tried to come up with an explanation: we'd had a recent maintenance window that took place between January 1 and January 2; we'd scheduled a CLUSTER/REINDEX to reclaim some of the bloat which was in the database itself. But this would only reduce the size of the data directory; the amount of live data would have stayed the same or with a modest increase.

Obviously we needed to compare the two files in order to determine what had changed between the two days. I tried:

 diff <(zcat backup-Jan-01.sql.gz | head -2300) <(zcat backup-Jan-02.sql.gz | head -2300)

Based on my earlier testing, this was the offset in the SQL dumps which defined the actual schema for the database excluding the data; in particular I was interested to see if there had been (say) any temporarily created tables which had been dropped during the maintenance window. However, this showed only minor changes (updates to default sequence values). It was time to do a full diff of the data to try and see if some of the aforementioned temporary tables had been truncated or if some catastrophic deletion had occurred or...you get the idea. I tried:

 diff <(zcat backup-Jan-01.sql.gz) <(zcat backup-Jan-02.sql.gz)

However, this approach fell down when diff ran out of memory. We decided to unzip the files and manually diff the two files in case it had something to do with the parallel unzips, and here was a mystery; after unzipping the dumps in question, we saw the following:

 -rw-r--r-- 1 root root 10200609877 Jan  8 02:19 backup-Jan-01.sql
 -rw-r--r-- 1 root root 10202928838 Jan  8 02:24 backup-Jan-02.sql

The uncompressed versions of these files showed sizes consistent with slow growth; the Jan 02 backup was slightly larger than the Jan 01 backup. This was really weird! Was there some threshold in gzip where given a particular size file it switched to a different compression algorithm? Had someone tweaked the backup script to gzip with a different compression level? Had I just gone delusional from lack of sleep? Since gzip can operate on streams, the first option seemed unlikely, and something I would have heard about before. I verified that the arguments to gzip in the backup job had not changed, so that took that choice off the table. Which left the last option, but I had the terminal scrollback history to back me up.

We finished the rest of our work that night, but the gzip oddity stuck with me through the next day. I was relating the oddity of it all to a co-worker, when insight struck: since we'd CLUSTERed the table, that meant that similar data (in the form of the tables' multi-part primary keys) had been reorganized to be on the same database pages, so when pg_dump read/wrote out the data in page order, gzip had that much more similarity in the same neighborhood to work with, which resulted in the dramatic decrease in the compressed gzip dumps.

So the good news was that CLUSTER will save you space in your SQL dumps as well (if you're compressing), the bad news was that it took an emergency situation and an almost heart-attack for this engineer to figure it all out. Hope I've saved you the trouble... :-)


published by noreply@blogger.com (Ron Phipps) on 2010-01-09 00:48:00 in the "Camps" category

In the last few months I've been active setting up DevCamps for several of our newer clients. DevCamps is an open source development environment system, that once setup, allows for easily starting up and tearing down a development environment for a specific site/code base.

I've done many camps setups, and you tend to run into surprises from system to system, but what was most interesting and challenging about these latest installs was that they were to be done on systems running Plesk, CPanel, and ISPConfig. Some things that are different between a normal deployment and one on the above mentioned platforms are:

  • On the Plesk system there was a secured Linux called 'Atomic Secured Linux' which includes the grsecurity module. One restriction of this module is (TPE) Trusted Path Execution which required the camp bin scripts to be owned by root and the bin directory could not be writable by other groups, otherwise they would fail to run.
  • Permissions are a mixed bag, where typically we set all of the files to be owned by the site owner, in Plesk there are special groups such as psacln that the files need to be owned by.
  • On the CPanel system we needed to move the admin images for Interchange to a different directory since CPanel includes Interchange and has aliases for /interchange/ and /interchange-5/ to point at a central location which we would not be using.
  • On ISPConfig and Plesk the home directories of the sites are in different places, which required deploying the code in such places as /var/www/clients/client/user/domain.com or /var/www/vhosts/domain.com.

In the end we were able to get DevCamps to run properly on these various platforms both in development and production. If you are starting a new project or working on an existing project and could use a strong development environment, consider DevCamps.


published by noreply@blogger.com (Ethan Rowe) on 2010-01-07 13:12:00

If you're using SSHFS (as I do recently since OpenVPN started crashing frequently on my OpenBSD firewall), note that the ServerAliveInterval option for SSH can have significant impact on the stability of your mounts.

I set it to 10 seconds on my system and have been happy with the results so far. It could probably safely go considerably higher than that.

It's not on by default, which leaves the stability of your SSH tunnels up to the success of TCP keepalive (which is on by default). On my wireless network, that alone has not been sufficient.