Actually, the Relational Model Doesn't Scale

Before all my fellow DBAs’ heads explode, let me just say that I am a relational guy. I like the relational model, think it’s the best tool for the job, and think every programmer (not just DBA’s) should aspire to be as familiar with it as they are with AJAX, MVC, or whatever other technology pattern you think is important. I’ll even take that a step further; I think the NoSQL movement is mostly a re-hash of failed technologies from the last century. Object and document databases had their run in the market (some might say “they had their time”), and they were pretty thoroughly beaten by the RDBMS; that some people have reinvented that wheel doesn’t change the game. That said, I find the recent comments from Jeff Davis on the relational model and scalability to be overlooking some things. The state of computing tasks has changed over the past two decades, and what we know about computer engineering has also changed. Working on highly scalable systems like we do at OmniTI, you can’t escape some of the inherent problems that you face when working in these types of environments. As much as I’d like the answer to every problem to be “just use an RDBMS”, Brewer’s CAP theorem just isn’t something you can ignore. When most people think about the relational model, they think of it in terms of parent-child relationships between tables. Without getting too deep in the details of it, I think it’s pretty fair to say that Primary Keys and Foreign Keys are very large part of any relational implementation, and that pretty much all RDBMS strive to allow you to add these constraints to your model; it’s what helps keep the data consistent. But there’s the rub. CAP theorem points out that as we strive for tighter and tighter consistency, we are pulling away from availability, and sacrificing partition tolerance. Two theoretical systems that run smack dab into each other in the real world. This isn’t really something new; if you have ever de-normalized, dropped a foreign key, or split data across multiple nodes, you’ve run into this before. Now, where CAP theorem falls on it’s face (imho) is that it also ignores another holy trinity of software development; Cheap, Fast, and Good. The size of your problem is dictated by the resources you have available; if you can afford decent tools (and let’s be clear, decent is not your web dev throwing up MySQL on an EC2 instance) it is quite likely that the stressors of the relational model will never impact you in a way that most CAP folks are worried about. This is also one of the places the NoSQL movement fails; by throwing the baby out with the bath water. Giving up your data integrity before you have scalability issues is a form of premature optimization. The trick, as Theo would say, is having the experience to know when such optimizations are and aren’t premature. So what’s the take away? I like to say that you use the relational model because it is best, and you use something else because it is necessary. Most SQL implementations can scale very well, and they should be your first choice when starting a new project. But we also can’t pretend that there aren’t inherent problems as these systems grow larger; let’s understand the trade-offs and engineer appropriately.