Massive LAMP scalability on a startup budget


Amazon’s web services have been around for a while now. The Elastic Compute Cloud provides incredibly cheap Xen based virtualization, and Simple Storage Service provides terabytes of storage for around the cost of a cinema candy (and not the expensive cinema candy either).

They’re popular with a bunch of startups, including 37Signals’ Basecamp, but haven’t been suitable to run a full LAMP stack based server.

Until now.

Amazon’s web services are Amazon’s own infrastructure provided as a product to third parties. There’s a bunch currently available, but there’s two of particular interest:

Elastic Compute Cloud (EC2) is a shared computing service. While Amazon don’t make this clear, EC2 is based on Xen virtualization, with each VM providing a 1.7Ghz CPU and 1.75G of memory. EC2 is still in beta, which may explain why the default install is an old release of Fedora, but you can easily make images for Ubuntu LTS, CentOS or another enterprise OS. Servers are controlled with a few clicks in a Firefox plugin, or shell commands if you prefer. The elastic part of the equation is being to scale up and down using these tools. You can go from 1 to 20 (and beyond if you call Amazon and make arrangements) in a couple of minutes. All the servers share the same Xen image, and differences are written to copy-on-write snapshots.

Elastic Compute Cloud is purely a computing resource, and does not provide persistent storage – ie, the servers will revert to their original state and the snapshots will disappear if you shut down the machine or the host server fails. That’s why EC2 has historically been limited to crunching code, transcoding video, and so on.

Simple Storage Service (S3) is Amazon’s storage service. It provides literally terabytes of storage for a few bucks (check the pricing on the linked page). The storage is made available by HTTP to the wider internet and EC2.

S3 is also accessible by EC2. But HTTP is not a popular access method compared to say, iSCSI, NFS or CIFS. Most people run a storage engine (like MyISAM or InnoDB) that lives on a mounted filesystem. This is why LAMP has been unable to take advantage of Amazon’s services.

In the last few months, two solutions have emerged to the problem. More are likely on the way:


S3DFS allows you to write to S3 like a regular disk. It’s filesystem driver that runs in userspace, like the SSH and HTTP drivers provided in desktop Linux distributions. Its proprietary, and relatively expensive, but the commercial support options may be attractive.

S3 Storage Engine for MySQL

The S3 storage engine for MySQL replaces the regular MyISAM or InnoDB storage engines, and writes content directly to S3. It only has one developer, but he offers paid support. It’s also Open Source and there may be performance advantages from cutting out the filesystem layer.

Here’s where each fit into the equation, compared to a traditional MySQL setup:

The Advantages

Either solution allows you to scale your LAMP systems and storage at peak times more quickly than a traditional data center, meaning you don’t have to purchase these assets in anticipation of load. Furthermore, you can actually scale downwards when you’re off peak, rather than wasting money on unutilized assets. Both of these are killer features for startups, who need to handle experience massive growth spikes without turning away customers but don’t have the massive amounts of money to purchase real datacenter infrastructure.

Storage is simple: if you want something that isn’t in your EC2 Xen image, write it to S3 and you’ll be able to retrieve it.

Amazon don’t offer SLAs, but traditional SLAs don’t guarantee anything anyway – just provide a partial refund for when they’re broken.

The Drawbacks

Word around town about the connectivity between EC2 and S3 is 10MB/sec and 200MS latency. You’ll want a fairly cache intensive design – be glad each EC2 VM has a 160G disk – and even then this may not be sufficient for your workload. If your VM is powered off or its host fails, you’ll lose cached data & your database will then start re-pulling from the storage engine again.


Ultimately the use of either of these depends on your workload. Neither solution has been around for long enough to be used by a well-known startup. There’s a place for Amazon’s services in your infrastructure – whether it’s your LAMP stack should depend on your testing. But this is certainly a space to watch.

Who copied who?

Previous article

A light hearted look at the great format wars of our time

Next article

You may also like


Leave a reply

More in Technology