It's no little secret that we here at End Point love and encourage the use of version control systems to generally make life easier both on ourselves as well as our clients. Â While a full-fledged development environment is ideal for maintaining/developing new client code, not everyone has the time to be able to implement these quickly.
A situation we've sometimes found with clients editing/updating production data directly. Â This can be through a variety of means; direct server access, scp/sftp, or web-based editing tools which save directly to the file system.
I recently implemented a script for a client who uses a web-based tool for managing their content in order to provide transparent version control. Â While they are still making changes to their site directly, we now have the ability to roll back any changes on a file-by-file basis as they are created, modified, or deleted.
I wanted something that was: 1) fast, 2) useful, and 3) stayed out of the user's way. Â I turned naturally to git.
In the user's account, I executed git init to create a new git repository in their home directory. Â I then git added the relevant parts that we definitely wanted under version control. Â This included all of the relevant static content, the app server files, and associated configuration: basically anything we might want to track changes to.
Finally, I determined the list of directories which we would like to automatically detect any newly created files. Â These corresponded to the usual places where new content was apt to show up. Â I codified the automatic update of the git repo in a script called git_heartbeat, which is called periodically from cron.
The basic listing for git_heartbeat:
#!/bin/bash # automatically add any new files in these space-separated directories AUTO_ADD_DIRS="catalogs/acme/pages htdocs" # make sure we're in the proper git root directory cd /home/acme # actually add any newly created files in $AUTO_ADD_DIRS find $AUTO_ADD_DIRS -print0 | xargs -0 git add DATE=`date` git commit -q -a -m "Acme Co git heartbeat - $DATE" > /dev/null
A couple notes:
- git commit -a takes care of the modification/deletion of any already tracked files. Â The git add ensures that any newly created files are currently in the index and will be included with the commit.
- if no files have been added, removed, or deleted, no checkpoint is created. Â This ensures that every commit in the log is meaningful and corresponds to an actual change to the site itself.
- Compared to other VCSs which keep metadata in each versioned subdirectory (such as Subversion), this approach stays out of the user's way; we don't have to worry about the user accidentally overwriting/deleting data in their upload directories and thus corrupting the repository.
- This approach is fast; it runs near instantaneously for thousands of files, so we could even push the cron interval to every minute if desired. Â For our purposes, this system works great as is.
- Once the git tools are installed, there is no need to set up a central repository; git repos are very cheap to create/use and for a use case such as this, require little to no maintenance beyond the initial setup.
Areas of improvement/known issues:
- This script could definitely be improved insofar as providing more informative information as to which files were added/modified/deleted. Â However, git's own tools can come in quite useful; for instance, git log --stat will show the files which each heartbeat commit affected.
- Since this is set up as a general cron job running every hour (the period is configurable, obviously), it does preclude extended stagings for non-heartbeat commits; basically, anything which takes longer than the heartbeat interval will be inadvertently committed.

