re: axonflux on building and scaling a startup

Yesterday the posterous guys put up a very good article on scaling rails applications from the ground up (the ground up bit is key; we’re focusing on what you need to do as a start-up, not once you are Facebook). It’s a must read for any Rails shop, and a good read regardless of the application stack you work with. One of the comments to the post was from jonathanwallace asked about stories for folks using a Postgres back-end. I started to work up a response for this, but before long decided it needed to live as a full post. I should mention that much of what was posted here applies just as well to Postgres (or even Oracle); mostly I just wanted to point out some differences for those trying to scale on a Postgres backend instead, so this blog post is essentially a “diff” to the original article. Read that first, then read this.

“You’re not going to run full text search out of your DB.”

If you’re using Postgres, this probably isn’t true. Postgres has a built in full text search implementation that runs very fast, gives multiple indexing options, and has all the flexibility of multiple languages, custom stop words, and custom dictionaries, that you would expect from a solid full text search implementation. It might not beat a solution built on Sphinx/Lucene/Solr on straight performance numbers, but the ability to maintain your full text data in a transactional manner, all using one piece of technology rather than having to bolt on an additional application, make it worth it to start with the built-in stuff. For many people, this will be all you ever need.

“Storage engine matters, and you should probably use InnoDB”

Ok, this isn’t exactly about Postgres, but the reasons he gives for InnoDB (crash resistence, non-locking) also apply to Postgres. (And yes, there’s a certain irony that all the reasons why you want to switch to InnoDB are the same reasons you should have just picked Postgres in the first place.) This ability to handle higher concurrency rates is actually even more important with Postgres because…

“if you can start with some replication in place, do it. You’ll want at least one slave for backups anyway. ”

You don’t need a slave for backups (pg_dump will give you online backups just find thank you), but you should create a PITR slave for failover. This wont allow you to scale reads or writes; you’ll need something like Slony for that, but since Postgres is going to scale way up, you don’t need to worry about that for now, just keep buying bigger hardware.

“Fix your DB bottlenecks”

No matter what you code in, having tools like the ones laid out for Ruby/Rails can be very helpful. If you use any kind of ORM, you need to watch for excessive queries and dumb queries. PostgreSQL provides a slow query logging option, so that’s one place to start. You can also do profiling of queries using pgfouine or PGSI. Oh, and Postgres’s EXPLAIN tool kicks the MySQL EXPLAIN’s butt. Learn it. Use it. And yes, you should start here before you move on to memcached and other types of query caching, because in a lot of cases you may not ever need to go there (honestly memcached use isn’t very widespread in the Postgres world, and a lot of the reason is it’s not needed. fwiw, it’s even less so in the Oracle world.) Oh, it’s not (just) that those databases are just full of more magic data grabbing juice, it’s that you’ll also likely want to implement materialized views inside the database before you start bolting on external solutions like memcached. Note that this doesn’t apply to static content caching (images/js libraries/etc…), you’ll probably want to move into caching that stuff much sooner.

“Offline Job Queues… I don’t know why people don’t talk about this more, because if you run a site that basically does anything, you need something like this.”

In case you’re not sure if it needs to be part of Rails, postgres has a notification service built in, and there are also plenty of external queuing systems available. The interesting thing is that it probably doesn’t matter which technology you pick; the pain points will be obvious (resize an image 5 different ways and cache them on 3 different servers upon web form submission?) and moving those out of the critical path will be an obvious win.

“If you don’t monitor it, it will probably go down, and you will never know.”

Again a universal truth. In the Postgres world, check_postgres is hands down the best off the shelf monitoring you can use. You should also think about trending; In MySQL land, Cacti with Baron’s templates are very solid, but there isn’t a clear cut winner on the Postgres side. If you hired us for operations support and don’t have an existing data trending solution, we’d likely use Cacti or Noit, although options like Staplr and I suppse evenMRTG are ok too; point here being you have options, use them. The rest of the post is mostly Rails focused, and I’ll leave it be at that, but hopefully this gives you a better picture of how things will scale if you’re doing Rails/Postgres. The picture is mostly the same, just the walls are moved further away.