Friday, August 26. 2011On Clouds And Data
I'm sitting in SFO tonight, awaiting my return trip back to Hurricane Pending Maryland. (As a former Floridian, I must of course scoff at any notions that this hurricane is significant). Walking through the airport I noticed a large billboard about "Big Data and the Cloud". This is the kind of billboard you only see in Silicon Valley; I don't see signs like that in Portland or Ottawa, and certainly not when I had to change flights in Detroit this year.
Anyway, these two buzz words aren't a local phenomenon, and are actually taking the tech world by storm. Big Data has become serious enough that there are multiple conferences now for folks interested in the topic. And cloud, well, perhaps harder to define, but more and more businesses are moving to the cloud every day. The problem here is that, most of the traditional ideas on big data run entirely counter to the ideas that work well in the cloud. Last spring I moderated a panel PGEast in New York that focused on Postgres in the cloud. As someone who works on multi-terabyte systems, and someone who deals with cloud servers on at least a semi-regular basis, I tried to prod and poke my panelists into sharing their take on how they see Postgres's role in the cloud. Not too surprisingly, the idea behind "Big Data" on Postgres in the cloud was not a particularly popular one. The tools you need to do the job effectively with Postgres just aren't there. Not to say you can't try, but so far I haven't seen many wild successes. Next month at Surge though, I'm going to be involved in another panel focusing on "Pushing Big Data To The Cloud". This time though I'm turning over moderating duties to long-time thought leader in the MySQL community Baron Schwartz. Joining me on the panel are several folks who all have a stake in the idea of Big Data in the cloud; John Hugg and Philip Wickline from VoltDB and Hadapt, respectivly, two new database vendors built with scale-out in mind; Bryan Cantrill, VP of Engineering at Joyant, a cloud provider with thier own strong opinions on dealing with data in the clouds, and Kate Matsudaira, someone who is currently managing those multi-TB databases, all in the cloud, over at SEOMoz. This should be a really good mix of people using different technology, with different biases against the problems involved. If you're looking to work on Big Data in The Cloud, I hope you'll join us, it should be a lot of fun. Monday, August 15. 2011Paying Attention Pays Off
I often run my ops like I take care of data; a bit overzealously. Case in point, when setting up a new database, I like to throw on a metric for database size, which gets turned into both a graph for trending, but also an alert on database size. Everyone is always on board with trending database size in a graph, but the alert is one people tend to question. This is not entirely without justification.
On a new database, with no data or activity, deciding when to alert is pretty fuzzy. When we set up a new client within our managed hosting service, I usually just toss up an arbitrary number, like 2GB or something. The idea isn't that a 2GB database is a problem, it's that when we cross 2GB, we should probably take a look at the trending graph and do a projection. Depending on how things look, we'll bump up the threshold on the alert to a new level, based on when we think we might want to look at things again. For example, in this graph we take a month long sample, and then project it out for three months. We can then set a new threshold somewhere along that line. ![]() While this is good for capacity planning, there's more that can be gained from this process. The act of alerting forces us to pay attention. And if we get notices before our expectations, we go back in and re-evaluate the data patterns. Of course, some times people will question this. Getting a notice that your database has passed 4GB can seem pointless when you have 100+ GB of free space on your disks. And besides, isn't that what free space monitors are for? Here is a graph of another of our clients database growth. Their data size is not particularly large (don't confuse scalability with size; it doesn't take a large database to have scalability issues), but what's important is that we kept getting notices that the size was growing, and when talking with the developers, no one thought it should be growing at nearly this rate. Eventually we were able to track down the problem to purging job that had gone awry. Once that was fixed, the growth pattern leveled off completely (and the database size returned to the tiny amount that was expected!) ![]() Monday, August 8. 2011Maybe they just like it better?
There has been a lot of chatter the past week about Apple replacing MySQL with Postgres in the new OSX Lion Server [U.S. | England | New Zealand ]. Most of it seems to tie things back to Oracle's new stewardship over the MySQL project, a lot of that stemming from what I would say is FUD from the EnterpriseDB folks, regarding doom and gloom about the way Oracle might handle the project in the future. Not that the FUD is entirely unwarrented; While Oracle has done a pretty decent job with MySQL so far, looking at what Oracle has done to projects like Open Solaris certainly would make one queasy. And yes, we've seen an uptick in people asking for help with Oracle/MySQL to Postgres migrations since the acquisition of Sun. That said, I have an alternative theory. Maybe they just like it better?
Continue reading "Maybe they just like it better?" Wednesday, March 16. 2011Upserting via Writeable CTE
Earlier today my colleague Depesz posted a nice write up showing one of the use cases for the new 9.1 feature, "Writable CTE's". It certainly shows one handy thing that this feature will enable, but it's not the only one. Here's a quick little bit of SQL I have been playing with for some time that re-implements the infamous "UPSERT" command (a long time sticking point for people trying to make MySQL apps more cross-database compatible) into Postgres.
pagila=# select * from actor where first_name = 'AMBER' and last_name = 'LEE';Now, to be fair, this bit of SQL does have a race condition (think two people trying to insert the same actor at the same time), so it doesn't really solve all of your problems, but if you are looking for a quick hack, it might just do the trick. Also don't be afraid to play with it; this was like 2 minutes of thought and making sure the syntax worked; you could certainly try turning it around or coming up with other variants. That's actually one of the coolest things about this feature; waiting to see what use cases people come up with for it. Monday, August 16. 2010Now What? (wrt OpenSolaris and your database)
Last week's "announcement" of the death of OpenSolaris has steered a lot of questions my way about where people should go, and/or where OmniTI will go, now that OpenSolaris future looks non-existent. As one of the more open users of Solaris related technology, and running some beefy loads on top of it, it makes sense that people would be curious as to what we might be doing next. I would start with saying that as a company, we don't have an official policy on this yet, and probably won't. We evaluate each situation on a customer by customer basis, so what follows here is more my personal feelings on what people should do at this current point in time.
The one thing I have noticed from the people I have already spoken with is that there seem to be two major camps, an over simplification to be sure, but I break this down into the free software camp (those motivated by a desire to remain on open source, and/or support, free software as a primary driver of technology decisions), and those more interested in the technology than the ideals behind it. Depending on where you fall into that spectrum, you have different options available to you, and will likely reach very different conclusions. Too Soon?The first thing I have said to everyone is that it is honestly too soon to make any moves. Oracle is notorious for being poor communicators, and at this point I don't think we've seen enough official communication to really know what's going to happen. This doesn't mean you can't start planning though! We've been looking at some of the available options since before the Oracle/Sun merger was closed, so it doesn't hurt to start evaluating the options out there. However there's no need to rush in to things; it is possible that the announcement of OpenSolaris's death might be premature. I personally don't believe Solaris can't survive based on the model we've just seen laid out; there are too many people learning the gnu tool chain who won't be willing to invest big money into a tool that is hard for them to use. They need a low cost / free option for people to familiarize themselves on (and all the better if it installs gnu tools by default). There's an outside chance Oracle might come to this conclusion, which would give new life to OpenSolaris. A more likely alternative to that theory is that some other group might pick up OpenSolaris maintenance and start pushing it forward. Certainly not an easy task, but there are already several different distribution of OpenSolaris available, so the userland level management has the resources, we mostly would need to figure out how to handle the more core technologies that have been maintained by Sun. I think this might also be possible, as there are numerous companies already heavily invested in OpenSolaris technology, and there are Solaris internals hackers looking to move out of Oracle, it's not an impossible leap to think we might see something worked out. And if Oracle continues to make technology available via the CDDL (which most of the current signs seem to indicate), this could work out. I would say that this might not resemble the OpenSolaris as it is now, but could definitely be an option for current users who'd like to remain on the OpenSolaris platform. Other Options?Of course, you might not want to put all your eggs in that basket. So what other options do we have? Well, that mostly depends on what you're getting out of OpenSolaris now, and what you want out of your OS going forward. For many people, I suspect that Solaris 11 Express might be a suitable replacement, especially for those running mixed OpenSolaris / Solaris environments. Migrating up to full Solaris 11 will also cover most of your technology needs, so depending on pricing I suspect people may find that a cheaper alternative to migrating to a new platform. Of course, if you want to stick with a free software solution, this won't really be an option. FreeBSD seems to be the most obvious alternative platform. If you're currently taking advantage of dtrace, zfs, and zones, FreeBSD gives you options to cover all three. It won't be the same; the dtrace and zfs implementations are pretty close aiui, but for zones you'll probably have to use either Jails or OpenVS, neither of which am I a fan of. I think you'd also find a larger overlap in system utilities (tar, find, grep, etc..) between FreeBSD and Solaris, so for people (and scripts) making the transition, this might be an easier move. The big question here is probably hardware support; if you can't get FreeBSD running on your hardware, that's likely to be a show stopper, unless you can work out a new hardware purchase in the transition So, if you don't want to go closed Solaris, and FreeBSD isn't an option, that probably leaves you on Linux. People sometimes think I don't like Linux; I'm actually very comfortable on it. My first "unix" was Linux, and we run some extremely demanding systems on Linux and it has performed well in those cases. However if you're trying to do deep introspection, systemtap is a poor man's dtrace. And if you are relying on zfs, you'll have a hard time finding a suitable replacement amongst the current Linux options. Personally I am most comfortable on ext3, but I tend to give up on file system snapshots, which is a painful submission if you have to make it. XFS is probably the next most common option, and generally I've no bones about using it if you want to avoid ext3. Of the three "advanced" replacements; ext4, btrfs, and zfs on linux; I think ext4 is probably your best bet, but only because zfs is too new for any serious database systems, and if you are moving off OpenSolaris to get away from Oracle, "butter" seems like an odd choice. And so...I think it's wise to keep things in perspective. There are some cases where you want to be a technology leader (we've been running Postgres 9 for months on some systems), but generally speaking when it comes to picking the operating system and filesystem for your database, it's best to tread lightly. Now is a fine time to start evaluating your options; at least figure out what features are critical to your enterprise that you'll need to replace (and don't just think about database, you might be relying on crossbow for something, or who knows what else). We'll certainly be watching the current options available, and I suspect diversifying a little, over the next 6 months, as we wait for the picture to clear up where we can. We're not in a hurry (after all, we do have the source code of what we're running now), and I don't see much reason for others to be either. Friday, July 23. 2010Database Scalability Patterns - OSCon 2010
Howdy folks,
slides are up for my talk, "Database Scalability Patterns", which I gave this week at OSCon 2010. You can get them from the OSCon page, from slideshare, or just watch it below Database Scalability Patterns
View more presentations from Robert Treat. Saturday, April 10. 2010watch for momentary monitoring
One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momentary process, where setting up a check in your normal monitoring software is overkill. In these cases one tool that can help out is the watch command.
Case in point, the other day I needed to back up a fairly large partitioned table (about 1.3TB on disk). The plan? A quick little script to pg_dump each of the partitions (about 325). Feed the script through xargs -P so I don't swamp the box, but I get some concurrency out of things. And of course, I planned to run the whole thing in screen session. But dumping this much data will take some time, so how to check on the progress? When working on databases, one of the most natural things to me is to whip up some SQL to see what going on inside my database. Then you pipe that through watch, and you have some quick and simple monitoring. This example happens to be on postgres, but you could do it with any database's command line program. Continue reading "watch for momentary monitoring"
(Page 1 of 8, totaling 51 entries)
» next page
|
QuicksearchHi! I'm Robert Treat, COO of OmniTI, perhaps the best internet technology consulting company on the planet. A veteran open source developer and advocate, I have been recognized as a major contributor to the PostgreSQL project, and can often be found speaking on open source, databases, and large scale web operations. Syndicate This BlogBlog Administration |

You were saying?
Tue, 20.12.2011 10:49
thanks for the slides and the post.
Sun, 27.11.2011 15:42
And the slides are up at http: //www.2ndquadrant.com/en/talks /
Thu, 24.11.2011 11:42
You probably want array_agg in stead of array_accum. That sa id, if you don't understand ho w to fix the query, it's [...]