Noblogs is based on WordPress, as most of you should know. I chose the words carefully: “based”, not “powered by”, as our wordpress setup has diverged with time from a standard multisite install more and more. As we love to share our knowledge (and to brag about our achievements) I will describe as much as possible of our current installation.
An interesting problem to tackle.
As you may know, A/I services are distributed across several servers that are located in different server farms in different countries. This means that all our infrastructure should support geographical distribution of services, and this implies that network latencies are high, highly variable and the network connections between nodes of the cluster are in general unreliable. This means that we want to achieve a “shared-nothing” architecture as much as possible, but most software we use is not meant to work coordinately in such a way. The free WordPress version that wordpress.org ships does an abysmal work at scaling on more than one server; in particular:
- WordPress expects all user files (images, attachments, etc.) to be available on the local filesystem
- WordPress does not allow to distribute database resources across different database instances (the so-called database sharding).
Managing the codebase: git to the rescue
An (almost) fully scaled-out, geographically distributed wordpress installation
First of all, the only thing that should really be shared across the whole noblogs cluster is the global database, which includes, most notably, the list of users and of the blogs. Fortunately, although WordPress reads a lot from this database, it seldom writes there, thus if we need to replicate it, the stream of data to propagate is limited enough. All other data (files and db tables) are local to a single blog, so in theory they can be placed on a single node. But the standard version of wordpress does not allow this. Luckily enough, for the databases there is a component that you can install within WordPress that allows partitioning of the database: HyperDB. We chose to use a consistent hashing function for partitioning, so that blogs are (almost) evenly distributed across the nodes, and adding a node creates a logical rebalance that moves as few blogs as possible.
Once you have installed HyperDB, you are able to distribute the database, but you are still stuck with one instance only for the WordPress code itself. In our case, this had a major drawback: being limited in the number of blogs you can host (remember, all files are supposed to be on the local filesystem, so in our setup you are basically limited in your growth by disk space).
Also, as our servers are spread across 3 continents, communications between the php frontend and the database suffered of serious performance issues. If a simple 1-row SELECT takes 0.1 seconds of oceanic round-trip, and your php application does a LOT of such queries (and WordPress DOES perform a lot of queries, especially if you install a lot of lame plugins because “users want it”), you end up with abysmal page generation times: one random noblogs page took between 6 and 12 seconds to render!!!!
Even the most aggressive caching would only slightly mitigate the problem: at first we started using wp-super-cache, which is the best, no-nonsense cache for WordPress, but that helped only with already-seen pages. Then we created a memcached-based query cache using mysql-proxy locally on the frontend machine. But this was not enough to have an acceptable user experience, so we decided that php AND files should be served from the SAME server where the database is located.
The solution, R* way
Thus, we installed the php backend on every machine where the database are located, we distributed the user files so that the files of a specific blog reside on the same server as their database, and replicated in master/slave the global WordPress database (remember? this is the only thing that must be globally shared).
We needed to create a software layer that should know that 0blivian.noblogs.org should be served by a server in the US, and not by one in Iceland; also, we need to be able to change this on the fly, and as a plus we want to hide where data really are to the outside world (we take data privacy very seriously). Since we use NGINX on our public web frontends, we decided to generate a dynamic map of blogs / server; this is updated by a script at regular intervals, so that the public web frontend routes the request to the correct php installation. This is simple, fully distributed, acceptably reliable (thus far…) and allows to render the average noblogs page in under 0.3 seconds, while being able to scale-out virtually to tens of thousands of blogs.
That’s (not) it, folks
Of course this setup is quite complex and requires a lot of maintenance work (especially if it is performed by volunteers in their free time, with little or no money to invest in infrastructure – please remember to donate to the project), but we have all the tools in place to allow us to manage our wordpress installation optimally. I will describe in more detail both the tools we use, and the one we’ve created, in subsequent posts.