Version 1.0
by Raymond Hettinger, ActiveState

Managing the package installation and updates

Disposable Instances

The first principle of building AMI environments is to treat them as temporary, disposable instances that can easily be rebuilt from scratch. Ideally, all state changes should be initiated from a build script. Avoid logging-in via ssh to run installs or to edit configurations. Instead, make edits to the build script and redeploy.

Typically, hand-written scripts assume a single target machine, often the local development machine. A more versatile approach is to use a tool like Fabric which is designed to handle multiple deployment steps across multiple machines. A fabfile becomes executable documentation of how to build an AMI, run tests, and bring the system live.

By treating AMIs as disposable, the problem of software updates or altering configurations becomes simpler. Just edit the fabfile and redeploy. The same strategy is also useful for moving to larger or smaller instances and for replicating instances when load-balancing.

See http://docs.fabfile.org/0.9.3/ for instructions on getting started with Fabric.

Package Managers

The second principle of building AMI environments it to use package managers such as yum and aptitude for tools such as Apache.

For Python based tools, two installers are preferred. The ActiveState™ Python distribution comes with a package manager, PyPM that is preloaded with several major python tools which are known to work well together and have had a license review by ActiveState™. Instructions for PyPM are found at http://docs.activestate.com/activepython/2.6/pypm.html.

For other Python tools listed in the Python Package Index at http://pypi.python.org/pypi, installation can be fully automated using pip, found at http://pip.openplans.org/. There are other Python installers, but pip and PyPM are an excellent choices because they tracks dependencies, automates updates, and can save their current state in a requirements file listing the specific versions installed. The latter feature is important for creating reproducible builds.

Another install friendly feature of pip is that it downloads all dependencies before beginning the installs. If the needed packages are unavailable, it fails early without leaving the system in a partially installed state.

For the most part, PyPM has all the capabilities as pip, and it has the advantage of a single central repository of pre-built binaries that have been tested to work on a number of platforms. In contrast, pip pulls resources from multiple sources, potentially delaying an install if one of the dependencies is temporarily inaccessible.

Isolated Environments

The third principle of installs is to compartmentalize and isolate the build environment. For Python, the dominant tools are virtualenv and virtualenvwrapper found at http://pypi.python.org/pypi/virtualenv and http://www.doughellmann.com/projects/virtualenvwrapper/.

The virtualenv tools allow you to easily setup and switch between multiple, isolated python environments. Developers like the isolation because it lets them fearlessly install new tools for experimentation without worrying about corrupting an existing setup.

Having multiple environments also solves another recurring problem related to incompatible version dependencies. For example, tool x requires a supporting tool y at version z and won’t run with version z+1, but another tool may require version z+1 but won’t run with version z. A possible solution is to have two virtual environments, one with version z installed and one with z+1. This works because each virtual environment has its own site-packages directory.

Here is a sample session with an experimental installation of Twisted, a tool for asynchronous programming:

$ mkvirtualenv my_experiment	# create a new isolated environment
$ workon my_experiment		# make that environment active
$ pip install twisted		# install twisted and its dependencies
 . . .				# do work with twisted installed
$ deactivate			# restore the original pristine environment

Strategy for Building a Stack

Relocatable Components

In early development and deployment, it is convenient to put all of the major components on the same AMI. The Apache server, Django web application, Nginx front-end, database store, and database cache can be co-located on the same virtual machine.

To make life easier, the components should communicate through sockets and use distinct hostnames instead of specific IP addresses. Real hostnames are preferred to “localhost” because that moving components becomes a matter of updating DNS entries and running servers do not have to be restarted.

At any given time, it needs to be straight-forward to clone the AMI and retarget the hostnames, allowing the database to run on a different machine than the web server. Likewise, Nginx front-end should be able to run separately from the underlying web server(s). With this design, it even becomes possible to clone the web-servers to different virtual machines so that Nginx can balance server load.

Moving the components to different virtual machines provides a starting-point for scalability. Another advantage of relocatable and clonable components is that failover and recovery become easier to implement.

Working with Databases

At the outset, the choice of database may be driven by employing “the simplest thing that could possibly work”. Later, the data store preference may change in response to load or storage requirements. So, the smart play is to decouple the database from the application and to design for scalability from the outset.

Object Relational Mappers

The purpose of an ORM is two-fold. It serves as a decoupling buffer between application code and the database, and it serves as an adapter between two different styles of database access.

A key architectural consideration is how to decouple data store from the application. Without an early, firm decision to maintain decoupling, it is common for database logic to permeate an application. This makes it difficult to change the database schema or to change the database engine (for example switching from SQLite to MySQL).

ActivePython Business Edition includes our stable, supported, quality-assured Python distribution used by millions of developers around the world for easy Python installation and quality-assured code. Read more »
The Python Package Manager (PyPM) is the package manager for ActivePython. It provides quick installation of thousands of packages for many Python versions and platforms. Read more »