Monday, March 8. 2010Actually, the Relational Model doesn't scaleTrackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Editors note: fixed a problem with link for CAP theorem, so now you really can't ignore it.
A fundamental mistake is thinking the relational model is about integrity constraints. The relational model is about relations, commonly if mistakenly represented as tables, not at all about relationships.
The thing about primary keys is that it is cheaper enforcing them near the data then at the application. If data was clean before it was inserted, no need to have them enforced again at the DBMS. Also, primary keys are not part of the relational model for quite some time now. Each relation must have a natural key, or it will not be a relation, but it can have several with no single one being called primary. Thus, your comments simply do not apply to the relational model; they apply to pseudo-relational implementations that add some arbitrary restrictions of their own.
I didn't say anything different. Re-read the third paragraph, I am talking about common application of the relational model to real world problems. This isn't a purists definition, and I don't claim it to be.
Seems that you're falling into the very same fallacy Jeff Davis blogged about. You say that the relational model doesn't scale, and then point to issues that are not inherent to the relational model but rather to typical implementations of it.
Imagine a RDBMS that offers no constraints support and no transaction support as we know it, and would therefore be easy to scale across hundreds of nodes. It would still be relational. It could even still use SQL though it would probably not be standard-compliant. I would love to see a DBMS that offers such tradeoff, otherwise a lot of people will (be forced to) throw out the baby with the bathwater and go with non-relational tools, and experience the same problems that plagued network and hierarchical databases in the past.
I have imagined such an RDBMS, and it looked a lot like drizzle.
>> Imagine a RDBMS that offers no constraints support and no transaction support as we know it
It's called MySql < 5.x, and has no usefulness other than what it is: a SQL parser fronting the file system, which allows coders who use it to claim that they "do relational", while doing exactly what a 1970's COBOL coder did with VSAM (less, actually).
I am with the other comments here; while Date/Codd saw great value in ACID, there is nothing implicit in the relational model that dicates that it must be the case. The relational algebra has great semantics for data regardless of consistency.
I understand your example of parent-child relationships, but this is a specious argument; if we're saying "people expect parent-child relationships to be consistent in a relational system, so the relational model as they expect it doesn't scale" then what's the alternative? A system that doesn't even offer the possibility of declaring parent child relationships? I would rather have the richer data model sans consistency than a system which offers neither consistency or higher level operations and a rich data model. Note that foreign key constraints and constraints themselves, again, are not instrinsic to the relational data model; a nice thing to have, but not required.
Yeah, I know the title of this post isn't fair (which is why I tried to qualify it). The problem is that defending the relational model in a discussion about scalability misses the point; how do you get a all these relational systems we so love to actually friggin' scale? I think that is where the RDBMS folks need to spend their time; not in explaining why the relational model is above the discussion. (BTW, getting people to admit we can throw away ACID is an accomplishment in and of itself imo)
"getting people to admit we can throw away ACID is an accomplishment in and of itself"
["throw away" is too strong... I think you mean "make optional" or "make not sacred"] Exactly. That's the point. As long as people think that the relational model is the problem, nobody (particularly not RDBMS people) will actually fix the real problems. Additionally, it will make people think that the scalability problems are inherent, so they turn to much weaker languages. That's why, whenever someone is talking about scalability and the relational model, we need to set the terminology straight. If you read my post on "terminology confusion" (cited in my post), you will see how much time is wasted when two sides of a debate are basing their arguments on different definitions.
>> (BTW, getting people to admit we can throw away ACID is an accomplishment in and of itself imo)
Rule 5: The comprehensive data sublanguage rule: The system must support at least one relational language that 1. Has a linear syntax 2. Can be used both interactively and within application programs, 3. Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback). It was Gray and Reuter who wrote the first definitive text on transactions, and made ACID a term that didn't mean LSD. However, Rule 5 is implicitly ACID, otherwise it is meaningless. I will note that Date, in his latest edition, jumps the shark with a subsection "Dropping ACID". He's wrong, too.
In regards to Date's "dropping ACID", what little I know about it (second hand) says that the proposal is to just drop the exact terminology, not the principles behind them; AFAIK, it is more of a subtle thing, arguing about terminology minutae, and is nowhere near as dire as the title "dropping ACID" may suggest.
I'm way too lazy to type it all, and it's not on the web, so far as I know, but here's the bottom of pg. 485:
So ACID is a nice acronym - but do the concepts it represents really stand up to close examination? ... in general, no. His objections aren't just syntactical, although they are somewhat picky, in my opinion.
>> Note that foreign key constraints and constraints themselves, again, are not instrinsic to the relational data model; a nice thing to have, but not required.
Rule 10: Integrity independence: Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications. So, wrong.
Thank you both Heikki and Ryan, those are exactly the points I was trying to make.
"The trick, as Theo would say, is having the experience to know when such optimizations are and aren't premature."
Asking out of curiosity: Did you try to generalize such experiences and formulate them as some rules of thumb on nosql usage?
Finally. The point of the relational model (and industrial strength SQL databases) is to slay the scalability dragon by avoiding the byte bloat in the first place. BCNF databases are the most parsimonious storage; removing redundant data from both the model and the implementation is most of the point of the RM. The reasons for doing this, from Codd, were related to data integrity (removing update anomalies), but the freebie that comes along for the ride is, loosely speaking, a minimal cover of the datastore. With SSD and multi-core/processor machines, there is no join penalty, and thus no reason not to build with BCNF databases.
But we're not there yet, in large measure because the OO folk think xml is something really neat and new. Sigh.
"The point of the relational model... is to slay the scalability dragon by avoiding the byte bloat in the first place."
Surely that's not the point of the relational model. That might be an incidental benefit.
Jeff: >> That might be an incidental benefit.
Robert: >> but the freebie that comes along for the ride is, loosely speaking, a minimal cover of the datastore. OK, so I got a little carried away. OTOH, one might argue that parsimony of data is a logical consequence of the RM. I know many writers (I'd need to do some searching for a linkable quote; most of what I know is from dead tree sources) have made the point. |
QuicksearchThis is the weblog of Robert Treat (bio | writings). I lead the Database Operations Group at OmniTI, where we work on some of todays largest database challenges. Hire me! Need help with your database? We are available for large scale or short term engagements. Hire you! If you have experience with Postgres, MySQL, or Oracle, we are looking for people to join our team. Upcoming Events
OSCon 2010 July 19th - 23rd At Portland, Oregon Surge 2010 Sept 30th - Oct 1st At Baltimore, Maryland Recent MusingsYou were saying? about I hardly gnu, you? Fri, 23.07.2010 15:26 Yeah, I talked with the Veraci ty guys at OSCon, they are def initely on a good track (it al so includes integrated d [...] about I hardly gnu, you? Mon, 19.07.2010 06:22 A lot of specialists state tha t loan help a lot of people to live the way they want, becau se they can feel free to [...] about I hardly gnu, you? Sun, 18.07.2010 19:15 Veracity (http://www.ericsink. com/entries/veracity_early.htm l) is supposed to be released under Apache 2.0 License [...] Blog Administration |