Friday, July 23. 2010
Howdy folks,
slides are up for my talk, "Database Scalability Patterns", which I gave this week at OSCon 2010. You can get them from the OSCon page, from slideshare, or just watch it below
Saturday, April 10. 2010
One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momentary process, where setting up a check in your normal monitoring software is overkill. In these cases one tool that can help out is the watch command.
Case in point, the other day I needed to back up a fairly large partitioned table (about 1.3TB on disk). The plan? A quick little script to pg_dump each of the partitions (about 325). Feed the script through xargs -P so I don't swamp the box, but I get some concurrency out of things. And of course, I planned to run the whole thing in screen session. But dumping this much data will take some time, so how to check on the progress?
When working on databases, one of the most natural things to me is to whip up some SQL to see what going on inside my database. Then you pipe that through watch, and you have some quick and simple monitoring. This example happens to be on postgres, but you could do it with any database's command line program.
Continue reading "watch for momentary monitoring"
Sunday, March 14. 2010
During the MySQL conference Call for Papers there was some talk of getting one or two Postgres sessions into the mix, as a lot of MySQL users seem to have questions about Postgres these days. Alas, looking through the MySQLcon schedule I don't see any on there. I've also looked through the BOF's and nothing about Postgres to be found there either. So, maybe no one is interested in Postgres after all.
However I held a Postgres BOF at MySQLcon last year and we got a handful of people, and since I am going to be at MySQLcon again this year, I might as well host one again. I think it's too late to schedule one formally, but I can put some info on the schedule sheets once I'm at the conference; if you are interested in learning some more about Postgres, please keep an eye out.
Monday, March 8. 2010
Before all my fellow DBAs' heads explode, let me just say that I am a relational guy. I like the relational model, think it's the best tool for the job, and think every programmer (not just DBA's) should aspire to be as familiar with it as they are with AJAX, MVC, or whatever other technology pattern you think is important. I'll even take that a step further; I think the NoSQL movement is mostly a re-hash of failed technologies from the last century. Object and document databases had their run in the market (some might say "they had their time"), and they were pretty thoroughly beaten by the RDBMS; that some people have reinvented that wheel doesn't change the game.
That said, I find the recent comments from Jeff Davis on the relational model and scalability to be overlooking some things. The state of computing tasks has changed over the past two decades, and what we know about computer engineering has also changed. Working on highly scalable systems like we do at OmniTI, you can't escape some of the inherent problems that you face when working in these types of environments. As much as I'd like the answer to every problem to be "just use an RDBMS", Brewer's CAP theorem just isn't something you can ignore.
When most people think about the relational model, they think of it in terms of parent-child relationships between tables. Without getting too deep in the details of it, I think it's pretty fair to say that Primary Keys and Foreign Keys are very large part of any relational implementation, and that pretty much all RDBMS strive to allow you to add these constraints to your model; it's what helps keep the data consistent. But there's the rub. CAP theorem points out that as we strive for tighter and tighter consistency, we are pulling away from availability, and sacrificing partition tolerance. Two theoretical systems that run smack dab into each other in the real world. This isn't really something new; if you have ever de-normalized, dropped a foreign key, or split data across multiple nodes, you've run into this before.
Now, where CAP theorem falls on it's face (imho) is that it also ignores another holy trinity of software development; Cheap, Fast, and Good. The size of your problem is dictated by the resources you have available; if you can afford decent tools (and let's be clear, decent is not your web dev throwing up MySQL on an EC2 instance) it is quite likely that the stressors of the relational model will never impact you in a way that most CAP folks are worried about. This is also one of the places the NoSQL movement fails; by throwing the baby out with the bath water. Giving up your data integrity before you have scalability issues is a form of premature optimization. The trick, as Theo would say, is having the experience to know when such optimizations are and aren't premature.
So what's the take away? I like to say that you use the relational model because it is best, and you use something else because it is necessary. Most SQL implementations can scale very well, and they should be your first choice when starting a new project. But we also can't pretend that there aren't inherent problems as these systems grow larger; let's understand the trade-offs and engineer appropriately.
Thursday, December 24. 2009
When I was younger, I remember hearing the phrase "too big to fail" being used to describe very large companies in the US, often financial institutions of some type. At the time I had thought the meaning of this phrase was an indicator of size of a company, the diversity of it's business dealings, and it's financial reserves. The idea was that, as the size of the company grew, its ability to withstand a hit in any one market would increase, because other areas of the business could keep it going. Last year as the financial crisis was getting into full swing and our government was looking at bailing out companies, this phrase took on a fairly different meaning, more so referring to the idea that a company had grown so big and so well integrated into the daily economy that it's failure would be catastrophic to the larger financial ecosystem. Or as I more cynically thought of it, the company had grown so big it was able to grease politicians at every level of the system thereby ensuring its future. Too big to fail indeed.
Continue reading "MySQL, open source's version of "Too Big To Fail" ? "
Tuesday, October 27. 2009
Amazon Web Services has announced a new service it is touting as Amazon Relational Database Services, designed to operate the operational management side of running a relational database. To be specific, the service is built around MySQL, and as the announcement reads
"Amazon RDS provides a fully featured MySQL database, so the code, applications, and tools that you use today with your existing MySQL databases work in Amazon RDS without modification. The service automatically handles common database administration tasks, such as setup and provisioning, patch management, and backup."
It is certainly an interesting offering for folks running MySQL, especially if you are managing you're own MySQL instances in Amazon's cloud infrastructure already. I didn't see anywhere where it listed the storage engines that would be available with the offering, which would be the first blocker for moving to such a service (I'm guessing that it will offer both InnoDB and MyISAM, but it doesn't say)
There are also some questions I have about how its back-up system works. It mentioned several times that backups can be done "automatically", and that you can use file system snapshots to restore your database to "any point in time" once deployed on their service. I'm a little skeptical about that, as filesystem snapshots don't necessarily just work (tm) when it comes to database backups, and MySQL backups are easy enough to get wrong in general, but it's certainly testable and would be a nice approach to solving the problem if it works.
The other thing worth noting is that the service doesn't offer replicated slaves, yet. From the Amazon RDS site, one of the new services they plan to offer "soon":
"High Availability Offering — For developers and business who want additional resilience beyond the automated backups provided by Amazon RDS at no additional charge. With the high availability offer, developers and business can easily and cost-effectively provision synchronously replicated DB Instances in multiple availability zones (AZ’s), to protect against failure within a single location."
Well, that doesn't sound like MySQL replicated slaves anyway, so running multiple services might still be a manual exercise. It's actually an important detail in my book; while Amazon is talking up the ability to scale up the new RDS service, MySQL is probably the worst of the 5 major databases (Oracle, DB2, MS SQL, MySQL, Postgres) for scaling up a database instances; being designed far better for scaling out; so any tools to help with this operation are key factors to the new service for me.
Speaking of scaling up, tucked away in the overall Amazon RDS announcement is also the announcement of new higher class EC2 instances, designed with running databases in mind.
* Double Extra Large: 34.2 GB memory, 13 ECU (4 virtual cores with 3.25 ECU each), 850 GB storage, 64-bit platform
* Quadruple Extra Large: 68.4 GB memory, 26 ECU (8 virtual cores with 3.25 ECU each), 1690 GB storage, 64-bit platform
Perhaps I'm just biased by the number of large scale instances we work with, but 32GB seems about the baseline of where I'd want to start out with for my database servers, so these new instances look promising. These EC2 instances aren't tied to the RDS service, you can run Oracle, Postgres, or whatever on them. I'd still like to see this scale up more (if folks running Postgres could go from the current "large" instance up to a 32 core, 256/512GB machine without having to get new hardware... the software could handle that and there would be no additional licensing... well that would be pretty compelling).
Anyway, Amazon has made a pretty big move into the database space with these announcements. I'm kind of curious what impact this might have on Microsoft's Azure service actually. Anyway, I'd encourage you to check out the new Amazon RDS site, and the new EC2 instance information (they've lowered some prices btw).
Thursday, August 20. 2009
The OmniTI database team is noted for managing both large and high volume systems, but also for doing this in heterogeneous database environments. Accomplishing that is not always easy, often requiring custom solutions, so we try to keep our ears to the ground and investigate new tools as they come along. When RubyRep was first announced, I put it on the back-burner: a multi-master replication system that can work on both mysql and postgres? Sounds pretty pie in the sky to me. Luckily Denish (one of the DBA's on our team) isn't as cynical as me, so he took it for a test drive. I have to admit that so far it looks good, enough so that I prodded him to write down some notes on it, to which we've now gotten them up on-line. If you're into database replication solutions at all, I'd encourage you to take a look at RubyRep as well.
|
You were saying?
Fri, 23.07.2010 15:26
Yeah, I talked with the Veraci ty guys at OSCon, they are def initely on a good track (it al so includes integrated d [...]
Mon, 19.07.2010 06:22
A lot of specialists state tha t loan help a lot of people to live the way they want, becau se they can feel free to [...]
Sun, 18.07.2010 19:15
Veracity (http://www.ericsink. com/entries/veracity_early.htm l) is supposed to be released under Apache 2.0 License [...]