Monday, February 6. 2012Intrest free (technical) debt is risky
Earlier today I read a post from Javier Salado that asked the question "If the interest rate is 0%, do you want to pay back your debt?". In this case Javier was referring to technical debt, but I felt like the conclusion he reached was the same mis-understanding that people apply to regular debt. Let me back up a bit. In Javier's post, he lay's out the following scenario:
"Imagine you convince a bank (not likely) to grant you a loan with 0% interest rate until the end of time, would you pay back? I wouldn’t. It’s free money. Who doesn’t like free money?" He then goes on to apply this thinking to technical debt. "You have an application with, let’s say, $1,000,000 measured technical debt. It was developed 10 years ago when your organization didn’t have a fixed quality model nor coding standards for the particular technologies involved, hence the debt. Overtime, the application has been steadily provided useful functionality to users and what they have to say about it is mainly good. You have adapted to your organization’s new quality process, the maintenance cost is reasonable and any changes you have to make have an expected time-to-market that allows business growth. We could say the interest rate on your debt is close to 0%, why should I invest in reducing the debt?" I think the answer to both questions is yes, and he makes the same mistake a lot of people do when it comes to taking on debt (technical or otherwise). Calculating the cost of debt cannot be based just on the interest rate alone, you must also factor in risk. In financial transactions, even a debt with 0% interest likely has some form of payment terms and collateral. (One might argue that Javier really meant a loan from a bank that was 0% interest, required no collateral, and had no terms for re-payment. I'd argue that's a gift, not a loan.) It turns out, 0% interest loans aren't actually just make believe. A simple example, which is actually a real world example, would be a 0% interest car loan. While this looks great from an interest point of view, it's not so good from a risk assessment point of view; if you get into an accident, you now owe a bunch of money and no longer have the collateral to pay it off. It's a double whammy if you figure you might have to deal with fallout from the accident itself. So the question is, does risk assessment carry over to the technical debt metaphor? I believe it does. In most cases technical debt comes from legacy code, which means the number of people who can work on it are all folks who have been around a long time. In most cases, rather than teach new people how to develop on the legacy system, you just have the "old timers" deal with it when needed. But of course, this is risky, because as time goes by, you probably have fewer and fewer people who can serve in this role. This is a risk. You also have to be aware that, while you have the large amount of managed technical debt, it's always possible that some new, unforeseen event could occur that changes the dynamic of things. Perhaps a large client / market opens up to you, or some similar opportunity. Perhaps a merger with a new company would be proposed. You now have to re-evaluate your technical situation, and in many cases that technical debt may come back to bite you. In the end, I don't think Javier was way off base with his recommendations, which was essentially to follow Elizabeth Naramore's "D.E.B.T." system (pdf/slides), to measure your debt and then decide how and what needs to be paid off. But I think it's important to remember that once you have identified your debt, even if the "interest" on that debt is still low, it does represent risk within your organization (or your personal finances), and you would be best to eliminate as much of it as you can. Wednesday, December 14. 2011Monitoring for "E-Tailers"
As we sit in the midst of record traffic and holiday rushes online, as people scramble to get their gifts ordered and shipped before time runs out, I recently wrote a piece for Retail Info Sys News, talking about various best practices for monitoring web operations during the holiday rush. The folks at circonus asked me to expand on that, which I did in this guest post on the Circonus blog. If you run web operations, do e-commerce, or are just wondering about what goes on behind the scenes, I'd encourage you to check it out.
Wednesday, November 23. 2011Cloudy With A Chance Of Scale
Recently I met with a company looking for some long term advice on building out their database infrastructure. They had a pretty good mix of scaling vertically for overall architecture, while scaling horizontally by segmenting customers into their own schemas. The had a failover server in place, but as the business was growing, they were looking at ways to better future proof operations against growth, and also build more redundancy into the system, including multi-datacenter redundancy. After talking with them for a bit, I drew up a radical solution: "To The Cloud!"
I think I am generally considered a cloud skeptic. Most of how we are taught to scale systems and databases from a technical standpoint doesn't work well in the cloud. I mean, if you have a good methodology for problem solving you can make a lot of improvements in any environment; we've certainly seen that with customers we've worked with at OmniTI. But if you are just into looking at low-level numbers, or optimizing performance around disk i/o (generally the most common problem in databases), those methods just aren't going to be as effective in the cloud. That is not to say that if you are willing to embrace some of the properties of what makes for successful cloud operations, then I think it can be a pretty successful strategy. One of the key factors which I often see overlooked in most "will the cloud work for me" discussions is whether or not your business lends itself well to the way cloud operations work. In the case of this particular client, it's a really good match. First, this company already segments their customer data, so there is a natural way to split up the database and operations. Second, they don't do any significant amount of cross customer data, which means they don't have to re-engineer those bits to make the switch. Further, the customers have different dataset sizes, different access patterns, and different operational needs, and most importantly, they pay different rates based on desired levels of service. This matches up extremely well with a service like postgres.heroku.com. Imagine that, instead of buying that next bigger server, instead of setting up cross-data-center WAL shipping, instead of buying machines in a different colo somewhere across the country, instead of all that, they could instead buy individual servers with Heroku, sized according to customer data size and performance needs. For smaller customers you start with minimal resources, and as the customer grows, you dial up the server instance size. Furthermore, you get automated failover setups, and an also easily store backups in a different datacenter based on given regions. You can even work to match customers to different availability zones based on their users endpoints. And if you want to do performance testing or development work, you can create copies of the production databases and hack away. These are the kinds of services OmniTI has built on top of Solaris, Zones, and ZFS, and believe me they will change the way you think about database operations. Of course, it's not all ponies and rainbows. You still have to move clients on to the new infrastructure, but that should be pretty manageable. You'd also need to build out some infrastructure for monitoring, and you'll need to be able juggle operational changes. Some of this is not significantly different; pushing DDL changes across schemas is pretty similar to doing it across servers, but you'll probably want to create some toolsets around this. Also you're less likely to bear fruit from micro-optimizations; that doesn't mean that you throw away your pgfouine reports, but the return on performance improvements and query optimization will be much lower. That said, if you can get good enough performance for your largest customers (and remember, you'll have easy capabilities for distributing read loads), you end up an extremely scalable system, not just technically, but from a business standpoint as well. If you aren't building this on top of Heroku's Postgres service, the numbers will probably look different, but the idea that you've matched your infrastructure capabilities to a significant range of possible growth patterns should be compelling for both suits and the people who maintain the systems. Wednesday, November 16. 2011Checkpoints, Buffers, and Graphs
Last night at BWPUG, Greg Smith gave his talk on "Managing High Volume Writes with Postgres", which dives deep into the intersection of checkpoint behavior and shared buffers, and also into dealing with vacuum. One of the things I always like about Greg's talks are it's a good way to measure what we've learned between reading code and running large scale / highly loaded system in the wild. Even in the cases where we disagree, it's good to get a different point of view on things. If you manage Postgres systems and get the chance to see this talk, it's worth taking a look (and I suspect he'll post the slides up somewhere this week, if they aren't already available).
One of the other cool things that came out of the talk was one of the guys on my team again validating why we love working with Circonus. We have an unofficial slogan that with Circonus, "if you can write a query, you can make a graph". Well, Keith noticed that we didn't have any monitoring for the background writer info on one of our recently upgraded from 8.3->9.1 multi-TB Postgres, so he jump into Circonus and just like that, we had metrics and a graph faster than Greg could move off the slide. This will be awesome once we accumulate some more data, but here's a screenshot I took from last night while we were in the talk: ![]() Yay graphs! Update: Shortly after posting, Keith mentioned that he had updated the graph to speak in MB rather than Buffers. So, here is an updated screenshot with friendlier output and more data. (Note that Phil, one of our other DBA's, also flipped the buffers allocated to a right axis as well). ![]() Monday, September 19. 2011Reminder: BWPUG Meeting Tomorrow, Sept 20th
Hey Folks!
Looks like we had a snafu with the Meetup site where it was showing the meet on our old schedule last week rather than the new schedule. We're in the process of fixing that, but wanted to make sure everyone knew that we are still going to meet on our new night, which is tomorrow, Tuesday, September 20th. This month, Theo will talk about application and systems performance measurement and why almost everyone does it wrong. It's not hard to do right, but people often approach these things completely wrong. So, we'll look at some numbers, understand why they are misleading and talk about the right way to approach these problems. Since we can't always approach things the right way, we'll talk a bit about adding a tiny bit of value to the "wrong" approach. When: September 20th, ~6:30PM. Where: 7070 Samuel Morse Dr, Columbia, MD, 21042. Host: OmniTI As always we will have time for networking and we can do some more open Q & A, and we'll likely hit one of the local restaurants after the meet. BWPUG Meetup Page BWPUG Mailing List Wednesday, August 17. 2011A funny thing happened on the way to September
In spite of all previous notions to the contrary, thanks to some last minute wrangling by the conference organizers, I will be making the trek out to Chicago this September for Postgres Open after all. I had been planning to sit out the event and just stay focused on Surge (which, I must say, looks even more kick ass than last year), but after looking at the schedule, and some persuading at OSCon, I'm very excited about what has been put together, and look forward to seeing many of my fellow Postgres community members once again.
Oh, and in case you were wondering, I'll be reprising my talk from this years Velocity conference, "Managing Databases in a DevOps Environment". At Velocity, the talk was intended to highlight how people already familiar with DevOps should approach their databases systems. I'm not sure how well "DevOps" is understood within the Postgres community, so I think I'll try to emphasize the differences between managing databases and traditional services, to hopefully give better expectations to DBA's whose organizations might be undergoing such a change. If you're going to be at Postgres Open and are interested in the topic, I'd love to hear your feedback on what aspects of this topic you're most interested in. (PS. I'll also be heading to the Velocity Summit next week in San Francisco, for those attending, I'd love to hear your thoughts on this topic as well). Monday, August 15. 2011Paying Attention Pays Off
I often run my ops like I take care of data; a bit overzealously. Case in point, when setting up a new database, I like to throw on a metric for database size, which gets turned into both a graph for trending, but also an alert on database size. Everyone is always on board with trending database size in a graph, but the alert is one people tend to question. This is not entirely without justification.
On a new database, with no data or activity, deciding when to alert is pretty fuzzy. When we set up a new client within our managed hosting service, I usually just toss up an arbitrary number, like 2GB or something. The idea isn't that a 2GB database is a problem, it's that when we cross 2GB, we should probably take a look at the trending graph and do a projection. Depending on how things look, we'll bump up the threshold on the alert to a new level, based on when we think we might want to look at things again. For example, in this graph we take a month long sample, and then project it out for three months. We can then set a new threshold somewhere along that line. ![]() While this is good for capacity planning, there's more that can be gained from this process. The act of alerting forces us to pay attention. And if we get notices before our expectations, we go back in and re-evaluate the data patterns. Of course, some times people will question this. Getting a notice that your database has passed 4GB can seem pointless when you have 100+ GB of free space on your disks. And besides, isn't that what free space monitors are for? Here is a graph of another of our clients database growth. Their data size is not particularly large (don't confuse scalability with size; it doesn't take a large database to have scalability issues), but what's important is that we kept getting notices that the size was growing, and when talking with the developers, no one thought it should be growing at nearly this rate. Eventually we were able to track down the problem to purging job that had gone awry. Once that was fixed, the growth pattern leveled off completely (and the database size returned to the tiny amount that was expected!) ![]()
(Page 1 of 1, totaling 7 entries)
|
QuicksearchHi! I'm Robert Treat, COO of OmniTI, perhaps the best internet technology consulting company on the planet. A veteran open source developer and advocate, I have been recognized as a major contributor to the PostgreSQL project, and can often be found speaking on open source, databases, and large scale web operations. Upcoming Events
PGDay DC 2012 March 30th At Reston, VirginiaPGCon 2012 May 15th - 18th At Ottawa, CanadaVelocity 2012 June 25th - 27th At Santa Clara, CaliforniaSurge 2012 September At Baltimore, MarylandSyndicate This BlogBlog Administration |

You were saying?
Wed, 22.02.2012 19:46
Fine way of describing, and fa stidious paragraph to obtain d ata concerning my presentation subject, which i am goi [...]
Wed, 22.02.2012 11:48
Neat blog! Is your theme custo m made or did you download it from somewhere? A theme like y ours with a few simple t [...]
Wed, 22.02.2012 03:10
If you are going for best cont ents like myself, only go to s ee this website daily since it provides quality conten [...]