Monday, March 8. 2010Actually, the Relational Model doesn't scale
Before all my fellow DBAs' heads explode, let me just say that I am a relational guy. I like the relational model, think it's the best tool for the job, and think every programmer (not just DBA's) should aspire to be as familiar with it as they are with AJAX, MVC, or whatever other technology pattern you think is important. I'll even take that a step further; I think the NoSQL movement is mostly a re-hash of failed technologies from the last century. Object and document databases had their run in the market (some might say "they had their time"), and they were pretty thoroughly beaten by the RDBMS; that some people have reinvented that wheel doesn't change the game.
That said, I find the recent comments from Jeff Davis on the relational model and scalability to be overlooking some things. The state of computing tasks has changed over the past two decades, and what we know about computer engineering has also changed. Working on highly scalable systems like we do at OmniTI, you can't escape some of the inherent problems that you face when working in these types of environments. As much as I'd like the answer to every problem to be "just use an RDBMS", Brewer's CAP theorem just isn't something you can ignore. When most people think about the relational model, they think of it in terms of parent-child relationships between tables. Without getting too deep in the details of it, I think it's pretty fair to say that Primary Keys and Foreign Keys are very large part of any relational implementation, and that pretty much all RDBMS strive to allow you to add these constraints to your model; it's what helps keep the data consistent. But there's the rub. CAP theorem points out that as we strive for tighter and tighter consistency, we are pulling away from availability, and sacrificing partition tolerance. Two theoretical systems that run smack dab into each other in the real world. This isn't really something new; if you have ever de-normalized, dropped a foreign key, or split data across multiple nodes, you've run into this before. Now, where CAP theorem falls on it's face (imho) is that it also ignores another holy trinity of software development; Cheap, Fast, and Good. The size of your problem is dictated by the resources you have available; if you can afford decent tools (and let's be clear, decent is not your web dev throwing up MySQL on an EC2 instance) it is quite likely that the stressors of the relational model will never impact you in a way that most CAP folks are worried about. This is also one of the places the NoSQL movement fails; by throwing the baby out with the bath water. Giving up your data integrity before you have scalability issues is a form of premature optimization. The trick, as Theo would say, is having the experience to know when such optimizations are and aren't premature. So what's the take away? I like to say that you use the relational model because it is best, and you use something else because it is necessary. Most SQL implementations can scale very well, and they should be your first choice when starting a new project. But we also can't pretend that there aren't inherent problems as these systems grow larger; let's understand the trade-offs and engineer appropriately. Thursday, December 24. 2009MySQL, open source's version of "Too Big To Fail" ?
When I was younger, I remember hearing the phrase "too big to fail" being used to describe very large companies in the US, often financial institutions of some type. At the time I had thought the meaning of this phrase was an indicator of size of a company, the diversity of it's business dealings, and it's financial reserves. The idea was that, as the size of the company grew, its ability to withstand a hit in any one market would increase, because other areas of the business could keep it going. Last year as the financial crisis was getting into full swing and our government was looking at bailing out companies, this phrase took on a fairly different meaning, more so referring to the idea that a company had grown so big and so well integrated into the daily economy that it's failure would be catastrophic to the larger financial ecosystem. Or as I more cynically thought of it, the company had grown so big it was able to grease politicians at every level of the system thereby ensuring its future. Too big to fail indeed.
Continue reading "MySQL, open source's version of "Too Big To Fail" ? " Tuesday, October 27. 2009Amazon Offers New RDS (aka MySQL) Service and New Database Related Virtual Machines
Amazon Web Services has announced a new service it is touting as Amazon Relational Database Services, designed to operate the operational management side of running a relational database. To be specific, the service is built around MySQL, and as the announcement reads
"Amazon RDS provides a fully featured MySQL database, so the code, applications, and tools that you use today with your existing MySQL databases work in Amazon RDS without modification. The service automatically handles common database administration tasks, such as setup and provisioning, patch management, and backup." It is certainly an interesting offering for folks running MySQL, especially if you are managing you're own MySQL instances in Amazon's cloud infrastructure already. I didn't see anywhere where it listed the storage engines that would be available with the offering, which would be the first blocker for moving to such a service (I'm guessing that it will offer both InnoDB and MyISAM, but it doesn't say) There are also some questions I have about how its back-up system works. It mentioned several times that backups can be done "automatically", and that you can use file system snapshots to restore your database to "any point in time" once deployed on their service. I'm a little skeptical about that, as filesystem snapshots don't necessarily just work (tm) when it comes to database backups, and MySQL backups are easy enough to get wrong in general, but it's certainly testable and would be a nice approach to solving the problem if it works. The other thing worth noting is that the service doesn't offer replicated slaves, yet. From the Amazon RDS site, one of the new services they plan to offer "soon": "High Availability Offering — For developers and business who want additional resilience beyond the automated backups provided by Amazon RDS at no additional charge. With the high availability offer, developers and business can easily and cost-effectively provision synchronously replicated DB Instances in multiple availability zones (AZ’s), to protect against failure within a single location." Well, that doesn't sound like MySQL replicated slaves anyway, so running multiple services might still be a manual exercise. It's actually an important detail in my book; while Amazon is talking up the ability to scale up the new RDS service, MySQL is probably the worst of the 5 major databases (Oracle, DB2, MS SQL, MySQL, Postgres) for scaling up a database instances; being designed far better for scaling out; so any tools to help with this operation are key factors to the new service for me. Speaking of scaling up, tucked away in the overall Amazon RDS announcement is also the announcement of new higher class EC2 instances, designed with running databases in mind. * Double Extra Large: 34.2 GB memory, 13 ECU (4 virtual cores with 3.25 ECU each), 850 GB storage, 64-bit platform Perhaps I'm just biased by the number of large scale instances we work with, but 32GB seems about the baseline of where I'd want to start out with for my database servers, so these new instances look promising. These EC2 instances aren't tied to the RDS service, you can run Oracle, Postgres, or whatever on them. I'd still like to see this scale up more (if folks running Postgres could go from the current "large" instance up to a 32 core, 256/512GB machine without having to get new hardware... the software could handle that and there would be no additional licensing... well that would be pretty compelling). Anyway, Amazon has made a pretty big move into the database space with these announcements. I'm kind of curious what impact this might have on Microsoft's Azure service actually. Anyway, I'd encourage you to check out the new Amazon RDS site, and the new EC2 instance information (they've lowered some prices btw). Thursday, August 20. 2009Denish looks at RubyRep
The OmniTI database team is noted for managing both large and high volume systems, but also for doing this in heterogeneous database environments. Accomplishing that is not always easy, often requiring custom solutions, so we try to keep our ears to the ground and investigate new tools as they come along. When RubyRep was first announced, I put it on the back-burner: a multi-master replication system that can work on both mysql and postgres? Sounds pretty pie in the sky to me. Luckily Denish (one of the DBA's on our team) isn't as cynical as me, so he took it for a test drive. I have to admit that so far it looks good, enough so that I prodded him to write down some notes on it, to which we've now gotten them up on-line. If you're into database replication solutions at all, I'd encourage you to take a look at RubyRep as well.
Wednesday, July 22. 2009First Issue of Open Source Database Magazine Is Out
Happy to see the first issue of the new Open Source Database Magazine has been released.
For those that aren't aware, Open Source Database Magazine is a re-incarnation of the old MySQL Magazine. The open source database ecosystem has grown a lot of the last year, with the rising popularity of newish systems like Drizzle and MariaDB, the continued growth of the PostgreSQL community, the revival of old concepts like CouchDB, and the really ground breaking stuff like HadoopDB. So, check out the website, check out my article, and let the OSDBzine folks know what new stuff in Open Source Database you'd like to see more about. Thursday, June 11. 2009The Asynchronous Services Analogy
Today I had a chance to sit through a sneak preview of Theo Schlossnagle's new talk Scalable Internet Architectures, to be delivered next week at Velocity 2009 (Dev sessions are an underrated side benefit of working at OmniTI). As always Theo packs a lot of good information into his talks; I could probably do blog entries on half a dozen ideas I jotted down; but I wanted to highlight something that he mentioned with regards to scaling websites via asynchronous services.
Continue reading "The Asynchronous Services Analogy" Friday, June 5. 2009The First Rule of Postgres Club
I had to chuckle when reading the comments of Alan Snelsons blog entry talking about PostgreSQL's project stability in the face of recent MySQL unsteadiness. Of course it shouldn't be a surprise that mentioning Postgres around MySQL people you might get some type of negative reaction. And vice versa of course. And with other db communities too, not just these two. But I digress. However what I really liked about the comments was this quote:
"PostgreSQL vs. MySQL is a non-issue for most people because the decision is already made. I don’t know what’s the point in bringing that up every-time, it shines a bad light on the postgres community." The beauty here is that I found this blog post via Planet MySQL, not Planet Postgres. Further I've never heard of Alan Snelson; the post's author; he hasn't spoken at a Postgres event, and isn't syndicated on Planet Postgres. As far as I know, he just some guy who uses MySQL a lot, and now is looking at Postgres. Apparently he likes it and wants to spread the word. And of course people who don't like hearing about Postgres use this as a mark against the Postgres community, even when the Postgres project has nothing to do with it. You know software is no good when people who use it want to tell you how awesome it is! In truth, that's how a lot of Postgres advocacy works these days. When you see people posting on slashdot, most of the time it's not from people working directly with the project. For those of us who have been using Postgres for a long time, we've long since got the message that MySQL users don't want to hear about Postgres. When I sat in the MariaDB feature request round table at the Percona Performance conference, listening to people ask for feature after feature that Postgres already has; partial indexes, index fillfactor, online ddl, extensible data types, dtrace probes, PITR, the list goes on; I didn't wave the Postgres banner. The only time I spoke about Postgres was when Monty had questions about Postgres' implementation. Some people asked me why I wasn't more vocal, why didn't I tell everyone about all the wonderful things that Postgres can do? What can I say... it's the first rule of Postgres club.
(Page 1 of 7, totaling 43 entries)
» next page
|
QuicksearchThis is the weblog of Robert Treat. I lead the Database Operations Group at OmniTI, where we work on some of todays largest database challenges. bio | writings Hire me! Need help with your database? We are available for large scale or short term engagements. Hire you! If you have experience with Postgres, MySQL, or Oracle, we are looking for people to join our team. Upcoming Events
PG East 2010 March 25th - 28th At Philadelphia, Pennsylvania PGCon 2010 May 18th - 21st At Ottawa, Canada Syndicate This BlogBlog Administration |

You were saying?
Tue, 09.03.2010 19:39
I'm way too lazy to type it al l, and it's not on the web, so far as I know, but here's the bottom of pg. 485: S [...]
Tue, 09.03.2010 19:31
Jeff: >> That might be an inc idental benefit. Robert: > > but the freebie that comes along for the ride is, l [...]
Tue, 09.03.2010 16:06
"The point of the relational m odel... is to slay the scalabi lity dragon by avoiding the by te bloat in the first pl [...]