Monday, March 8. 2010Actually, the Relational Model doesn't scale
Before all my fellow DBAs' heads explode, let me just say that I am a relational guy. I like the relational model, think it's the best tool for the job, and think every programmer (not just DBA's) should aspire to be as familiar with it as they are with AJAX, MVC, or whatever other technology pattern you think is important. I'll even take that a step further; I think the NoSQL movement is mostly a re-hash of failed technologies from the last century. Object and document databases had their run in the market (some might say "they had their time"), and they were pretty thoroughly beaten by the RDBMS; that some people have reinvented that wheel doesn't change the game.
That said, I find the recent comments from Jeff Davis on the relational model and scalability to be overlooking some things. The state of computing tasks has changed over the past two decades, and what we know about computer engineering has also changed. Working on highly scalable systems like we do at OmniTI, you can't escape some of the inherent problems that you face when working in these types of environments. As much as I'd like the answer to every problem to be "just use an RDBMS", Brewer's CAP theorem just isn't something you can ignore. When most people think about the relational model, they think of it in terms of parent-child relationships between tables. Without getting too deep in the details of it, I think it's pretty fair to say that Primary Keys and Foreign Keys are very large part of any relational implementation, and that pretty much all RDBMS strive to allow you to add these constraints to your model; it's what helps keep the data consistent. But there's the rub. CAP theorem points out that as we strive for tighter and tighter consistency, we are pulling away from availability, and sacrificing partition tolerance. Two theoretical systems that run smack dab into each other in the real world. This isn't really something new; if you have ever de-normalized, dropped a foreign key, or split data across multiple nodes, you've run into this before. Now, where CAP theorem falls on it's face (imho) is that it also ignores another holy trinity of software development; Cheap, Fast, and Good. The size of your problem is dictated by the resources you have available; if you can afford decent tools (and let's be clear, decent is not your web dev throwing up MySQL on an EC2 instance) it is quite likely that the stressors of the relational model will never impact you in a way that most CAP folks are worried about. This is also one of the places the NoSQL movement fails; by throwing the baby out with the bath water. Giving up your data integrity before you have scalability issues is a form of premature optimization. The trick, as Theo would say, is having the experience to know when such optimizations are and aren't premature. So what's the take away? I like to say that you use the relational model because it is best, and you use something else because it is necessary. Most SQL implementations can scale very well, and they should be your first choice when starting a new project. But we also can't pretend that there aren't inherent problems as these systems grow larger; let's understand the trade-offs and engineer appropriately. Tuesday, March 2. 2010OmniTI is heading to PGEast 2010
PGEast is the premiere Postgres conference held inside the U.S. each year, and this years conference, in Philadelphia, is now less than a month away. The organization and formatting have evolved a little from previous years, but one things still continues; a very strong presentation line up. We at OmniTI are very happy to be among that group of people, with four talks in this years conference lineup;
Know More Waiting, A Guide To PostgreSQL 9.0 by Robert Treat (hey, that's me), will give an overview of the upcoming PostgreSQL 9.0 release. While we're still a few months ahead of release, but we have a pretty good idea of what's coming in the next release, and this talk will help you start planning for how you will be able to take advantage of the new features coming our way. PostgreSQL, meet AMQP, by Theo Schlossnagle, looks at pg_amqp, a "contrib" style module for Postgres that provides transaction style message queuing from inside of Postgres, using the AMQP standard. Yet Another Replication Tool : RubyRep by Denish Patel, will delve into one of the newer Postgres replication solutions on the block. RubyRep is design for dead simple installation and setup, while still delivering advanced features like data comparing, synchronization between servers, and even master-master replication options. Database Scalability Patterns by Robert Treat (me again), takes a look at the common patterns around scaling your database solution, and looks at some of the different options available to people scaling with Postgres. But wait, there's more! While we at OmniTI are definitely excited to be participating in PGEast this year, there are a number of other good talks and speakers, including Magnus Hagander, Jeff Davis, Baron Schwartz, and many others. For full talk details, check out the conference talks page; I hope you'll journey out and say hi, it should be a pretty good time. Monday, February 22. 2010BWPUG March 10th, Falls Church take two.
Barring a repeat of last months snowmageddon / snowpocalypse, we're going to take another stab at heading down to Falls Church, Va, for the March BWPUG meeting. If you haven't felt like trucking out to Columbia, then please try to make this one. Depending on response we may be able to do this more often.
When: March 10th, 6:30PM. Where: 3150 Fairview Park Dr, Falls Church, VA Host: Noblis, Inc. and the Noblis Innovation and Collaboration Center (NICC) It's basically at 495 and 50 in Northern Virginia. We'll be discussing the upcoming PostgreSQL 9.0 release, including a preview of my upcoming talk at PG East. If you're planning to attend, please RSVP to Stephen Frost via sfrost at noblis.org. The host/facility we'll be meeting at does require a government issued photo ID (eg: driver's license or passport). Hope to see you there! Thursday, December 24. 2009MySQL, open source's version of "Too Big To Fail" ?
When I was younger, I remember hearing the phrase "too big to fail" being used to describe very large companies in the US, often financial institutions of some type. At the time I had thought the meaning of this phrase was an indicator of size of a company, the diversity of it's business dealings, and it's financial reserves. The idea was that, as the size of the company grew, its ability to withstand a hit in any one market would increase, because other areas of the business could keep it going. Last year as the financial crisis was getting into full swing and our government was looking at bailing out companies, this phrase took on a fairly different meaning, more so referring to the idea that a company had grown so big and so well integrated into the daily economy that it's failure would be catastrophic to the larger financial ecosystem. Or as I more cynically thought of it, the company had grown so big it was able to grease politicians at every level of the system thereby ensuring its future. Too big to fail indeed.
Continue reading "MySQL, open source's version of "Too Big To Fail" ? " Monday, November 9. 2009LISA 2009 Wrap-up
While a good portion of the Postgres community was making their way to France for PGDay Europe, fellow BWPUG member Greg Smith and I were manning the home-front in Baltimore at the 2009 Large Installation and Systems Administration (aka LISA) conference, held this year in Baltimore, MD. The two of us took to the exhibition floor to man a booth for the PostgreSQL project, a two-day stint that gave us plenty of face time with the LISA attendees. For me it had been three years since my last LISA conference (at my other local city, Washington, D.C.) so I was curious to see how things had changed since then. Some thoughts/notes I took while working the show floor:
1) There were a lot of Postgres users at the show. A lot of happy Postgres users. Compared to 3 years ago when we ran into just a few, more than half the people who stopped at the booth were already using Postgres. 2) No one asked me "so why should I use Postgres instead of MySQL". Which is not to say the topic of MySQL didn't come up, but the above question is by far the #1 question I normally hear working community booths (even got it at OSCon this summer), so to not get anyone asking was quite a surprise in retrospect. I think this is probably due to two factors; first that Postgres advocacy has been working hard to make the case for Postgres and clarify the differences between the two projects, and second that we've gotten a lot of converts over the past three years so there's much more knowledge about Postgres these days. A couple people showed there was still work to do; some glossed over differences between the projects, and one person even thought Postgres was the commercial version of MySQL; so the job of Postgres advocacy goes on. 3) So where did they come from? Many of the people who told us they were happy Postgres users also mentioned previous database systems they had worked on. These aren't formal numbers, but I'd say the breakdown was close to 55% MySQL, 35% Oracle, and 5% Sybase and 5% MSSQL. Again rough numbers, but that seems about right. As the LISA crowd is heavy on system administrators, the complaints were mostly that MySQL was a pain to keep running (regular corruption issues and similar problems), and Oracle just couldn't justify its cost. 4) One person I spoke told me about a problem they had with setting up authentication. They run a university where they initially setup authentication for students via LDAP which they thought was pretty nice. They then ran into a problem because the students had to write scripts for classes, which required them to hard-code in their ldap passwords in the scripts, which were easily read by other students. They ended up solving the problem by configuring the apache server to run files as the script owner rather than the more standard "nobody" user, which allowed them to prevent others from seeing individual scripts. This isn't the first person I've run into with this type of problem; I'd love to see more people blogging on topics like this. 5) Several people asked about the business model behind Postgres. Many people get stuck in the idea that every piece of open source project has a single corporate backer/owner. I've been a big proponent of highlighting both the strength of the Postgres community and the nature of being a true Open Source project; so for me these are great questions to get to talk about, but it's something we should make sure other folks volunteering for booth duty are prepared to answer. Finally, I want to say a big thanks to the folks running LISA and to the crowd at large. In a conference thin on DBA's, we still managed to get a number of donations which will help with further advocacy efforts. I guess system admins are into solid database software too. Tuesday, October 27. 2009Amazon Offers New RDS (aka MySQL) Service and New Database Related Virtual Machines
Amazon Web Services has announced a new service it is touting as Amazon Relational Database Services, designed to operate the operational management side of running a relational database. To be specific, the service is built around MySQL, and as the announcement reads
"Amazon RDS provides a fully featured MySQL database, so the code, applications, and tools that you use today with your existing MySQL databases work in Amazon RDS without modification. The service automatically handles common database administration tasks, such as setup and provisioning, patch management, and backup." It is certainly an interesting offering for folks running MySQL, especially if you are managing you're own MySQL instances in Amazon's cloud infrastructure already. I didn't see anywhere where it listed the storage engines that would be available with the offering, which would be the first blocker for moving to such a service (I'm guessing that it will offer both InnoDB and MyISAM, but it doesn't say) There are also some questions I have about how its back-up system works. It mentioned several times that backups can be done "automatically", and that you can use file system snapshots to restore your database to "any point in time" once deployed on their service. I'm a little skeptical about that, as filesystem snapshots don't necessarily just work (tm) when it comes to database backups, and MySQL backups are easy enough to get wrong in general, but it's certainly testable and would be a nice approach to solving the problem if it works. The other thing worth noting is that the service doesn't offer replicated slaves, yet. From the Amazon RDS site, one of the new services they plan to offer "soon": "High Availability Offering — For developers and business who want additional resilience beyond the automated backups provided by Amazon RDS at no additional charge. With the high availability offer, developers and business can easily and cost-effectively provision synchronously replicated DB Instances in multiple availability zones (AZ’s), to protect against failure within a single location." Well, that doesn't sound like MySQL replicated slaves anyway, so running multiple services might still be a manual exercise. It's actually an important detail in my book; while Amazon is talking up the ability to scale up the new RDS service, MySQL is probably the worst of the 5 major databases (Oracle, DB2, MS SQL, MySQL, Postgres) for scaling up a database instances; being designed far better for scaling out; so any tools to help with this operation are key factors to the new service for me. Speaking of scaling up, tucked away in the overall Amazon RDS announcement is also the announcement of new higher class EC2 instances, designed with running databases in mind. * Double Extra Large: 34.2 GB memory, 13 ECU (4 virtual cores with 3.25 ECU each), 850 GB storage, 64-bit platform Perhaps I'm just biased by the number of large scale instances we work with, but 32GB seems about the baseline of where I'd want to start out with for my database servers, so these new instances look promising. These EC2 instances aren't tied to the RDS service, you can run Oracle, Postgres, or whatever on them. I'd still like to see this scale up more (if folks running Postgres could go from the current "large" instance up to a 32 core, 256/512GB machine without having to get new hardware... the software could handle that and there would be no additional licensing... well that would be pretty compelling). Anyway, Amazon has made a pretty big move into the database space with these announcements. I'm kind of curious what impact this might have on Microsoft's Azure service actually. Anyway, I'd encourage you to check out the new Amazon RDS site, and the new EC2 instance information (they've lowered some prices btw). Thursday, August 20. 2009Denish looks at RubyRep
The OmniTI database team is noted for managing both large and high volume systems, but also for doing this in heterogeneous database environments. Accomplishing that is not always easy, often requiring custom solutions, so we try to keep our ears to the ground and investigate new tools as they come along. When RubyRep was first announced, I put it on the back-burner: a multi-master replication system that can work on both mysql and postgres? Sounds pretty pie in the sky to me. Luckily Denish (one of the DBA's on our team) isn't as cynical as me, so he took it for a test drive. I have to admit that so far it looks good, enough so that I prodded him to write down some notes on it, to which we've now gotten them up on-line. If you're into database replication solutions at all, I'd encourage you to take a look at RubyRep as well.
(Page 1 of 50, totaling 348 entries)
» next page
|
QuicksearchThis is the weblog of Robert Treat. I lead the Database Operations Group at OmniTI, where we work on some of todays largest database challenges. bio | writings Hire me! Need help with your database? We are available for large scale or short term engagements. Hire you! If you have experience with Postgres, MySQL, or Oracle, we are looking for people to join our team. Upcoming Events
PG East 2010 March 25th - 28th At Philadelphia, Pennsylvania PGCon 2010 May 18th - 21st At Ottawa, Canada Syndicate This BlogBlog Administration |

You were saying?
Tue, 09.03.2010 19:39
I'm way too lazy to type it al l, and it's not on the web, so far as I know, but here's the bottom of pg. 485: S [...]
Tue, 09.03.2010 19:31
Jeff: >> That might be an inc idental benefit. Robert: > > but the freebie that comes along for the ride is, l [...]
Tue, 09.03.2010 16:06
"The point of the relational m odel... is to slay the scalabi lity dragon by avoiding the by te bloat in the first pl [...]