Some insight on the inner workings of noblogs

Noblogs is based on WordPress, as most of you should know. I chose the words carefully: “based”, not “powered by”, as our wordpress setup has diverged with time from a standard multisite install more and more. As we love to share our knowledge (and to brag about our achievements) I will describe as much as possible of our current installation.

An interesting problem to tackle.

As you may know, A/I services are distributed across several servers that are located in different server farms in different countries. This means that all our infrastructure should support geographical distribution of services, and this implies that network latencies are high, highly variable and the network connections between nodes of the cluster are in general unreliable. This means that we want to achieve a “shared-nothing” architecture as much as possible, but most software we use is not meant to work coordinately in such a way. The free WordPress version that wordpress.org ships does an abysmal work at scaling on more than one server; in particular:

WordPress expects all user files (images, attachments, etc.) to be available on the local filesystem
WordPress does not allow to distribute database resources across different database instances (the so-called database sharding).

Thus, scaling to more than one server is a real issue with WordPress. Also, if you maintain a series of patches to the code base (which we are doing, for example, for privacy reasons), any upgrade to WordPress is going to be a real challenge.

So, we needed to set up a solid and reliable system that could both work around the intrinsic wordpress limitations and allow us to manage our patch set in a sensible way.

Managing the codebase: git to the rescue

I could not stress more how wonderful is git as a distributed VCS. How do we synchronize our codebase with upstream? We have two main branches of noblogs: the “upstream” branch, where we update the upstream codebase, and the branch “noblogs”, where we keep the code that goes live on our servers, thus upstream + our patchset.

When we want to upgrade to a new WordPress version,we perform the upgrade of the codebase in the branch upstream first, then we rebase the branch noblogs on it, and we look for merge issues. As you may have noticed, some times things do not go well, mainly because our installation is not standard, but also because we do install a lot of additional plugins, whose code base is usually really, really poorly written.

So, this is how we manage the code base. How about distributing noblogs on multiple servers?

An (almost) fully scaled-out, geographically distributed wordpress installation

First of all, the only thing that should really be shared across the whole noblogs cluster is the global database, which includes, most notably, the list of users and of the blogs. Fortunately, although WordPress reads a lot from this database, it seldom writes there, thus if we need to replicate it, the stream of data to propagate is limited enough. All other data (files and db tables) are local to a single blog, so in theory they can be placed on a single node. But the standard version of wordpress does not allow this. Luckily enough, for the databases there is a component that you can install within WordPress that allows partitioning of the database: HyperDB. We chose to use a consistent hashing function for partitioning, so that blogs are (almost) evenly distributed across the nodes, and adding a node creates a logical rebalance that moves as few blogs as possible.

Once you have installed HyperDB, you are able to distribute the database, but you are still stuck with one instance only for the WordPress code itself. In our case, this had a major drawback: being limited in the number of blogs you can host (remember, all files are supposed to be on the local filesystem, so in our setup you are basically limited in your growth by disk space).

Also, as our servers are spread across 3 continents, communications between the php frontend and the database suffered of serious performance issues. If a simple 1-row SELECT takes 0.1 seconds of oceanic round-trip, and your php application does a LOT of such queries (and WordPress DOES perform a lot of queries, especially if you install a lot of lame plugins because “users want it”), you end up with abysmal page generation times: one random noblogs page took between 6 and 12 seconds to render!!!!

Even the most aggressive caching would only slightly mitigate the problem: at first we started using wp-super-cache, which is the best, no-nonsense cache for WordPress, but that helped only with already-seen pages. Then we created a memcached-based query cache using mysql-proxy locally on the frontend machine. But this was not enough to have an acceptable user experience, so we decided that php AND files should be served from the SAME server where the database is located.

The solution, R* way

Thus, we installed the php backend on every machine where the database are located, we distributed the user files so that the files of a specific blog reside on the same server as their database, and replicated in master/slave the global WordPress database (remember? this is the only thing that must be globally shared).

We needed to create a software layer that should know that 0blivian.noblogs.org should be served by a server in the US, and not by one in Iceland; also, we need to be able to change this on the fly, and as a plus we want to hide where data really are to the outside world (we take data privacy very seriously). Since we use NGINX on our public web frontends, we decided to generate a dynamic map of blogs / server; this is updated by a script at regular intervals, so that the public web frontend routes the request to the correct php installation. This is simple, fully distributed, acceptably reliable (thus far…) and allows to render the average noblogs page in under 0.3 seconds, while being able to scale-out virtually to tens of thousands of blogs.

That’s (not) it, folks

Of course this setup is quite complex and requires a lot of maintenance work (especially if it is performed by volunteers in their free time, with little or no money to invest in infrastructure – please remember to donate to the project), but we have all the tools in place to allow us to manage our wordpress installation optimally. I will describe in more detail both the tools we use, and the one we’ve created, in subsequent posts.

3 Comments

0blivian says:

2012/02/26 at 23:04

prova
Elliptical machine says:

2011/11/28 at 11:16

I suggest adding a facebook like button for the blog!
Helen
Martín says:

2011/11/16 at 20:37

This is all very interesting. I just got an e-mail account in autistici and I’m beggining to investigate all about this R* way and I love it. Thinking about starting a blog here too.

Comments are closed.