The Web in Your Database

I’ve heard a lot of talk recently about Yahoo’s new service ”Yahoo Pipes”. For those that haven’t seen it, it basically allows you to mash-up rss feeds to create your own customized feed for reuse and display. It’s not exactly revolutionary but it does lower the barrier to entry for a number of folks. In any case, last night, by way of log buffer and David Van Couvering, I came across Alex Iskold’s interesting entry on the topic comparing the idea of Yahoo Pipes to that of Relational Databases. A good read for sure, but it struck me that maybe if you want to have “the web as database” maybe you should just put “the web in your database”. Assuming your database gives you some type of remote data capabilities it shouldn’t be that hard; if your database lets you do it in PHP it’s downright “simple”.
pagila=# create type blog as (blog text, entry text, pubdate timestamptz, link text); CREATE TYPE pagila=# create or replace function getblog(text) pagila-# returns setof blog pagila-# language plphp pagila-# as $$ pagila$# $blog = simplexml_load_file($args[0]); pagila$# pagila$# $r[0] = (string)$blog->channel[0]->title; pagila$# pagila$# foreach($blog->channel[0]->item as $entry) { pagila$# $r[1] = (string)$entry->title; pagila$# $r[2] = (string)$entry->pubDate; pagila$# $r[3] = str_replace('http://people.planetpostgresql.org/','~',(string)$entry->link); pagila$# return_next($r); pagila$# } pagila$# $$; CREATE FUNCTION
Yep, that pretty much does it. Now I can pass in my favorite feed and get back a standard sql result set.
pagila=# select entry,link from pagila-# getblog('http://people.planetpostgresql.org/xzilla/index.php?/feeds/rss2.xml'); entry | link -----------------------------------------------------+----------------------------------------------------------------------------------------- Web team updates | ~xzilla/index.php?/archives/289-Web-team-updates.html getting killed by dbx | ~xzilla/index.php?/archives/288-getting-killed-by-dbx.html killing invincible processes with dbx | ~xzilla/index.php?/archives/287-killing-invincible-processes-with-dbx.html phpPgAdmin 4.1 Released | ~xzilla/index.php?/archives/286-phpPgAdmin-4.1-Released.html planetpostgresql blogger feed urls fixed | ~xzilla/index.php?/archives/285-planetpostgresql-blogger-feed-urls-fixed.html Log Buffer #27, A Carnival of the Vanities for DBAs | ~xzilla/index.php?/archives/284-Log-Buffer-27,-A-Carnival-of-the-Vanities-for-DBAs.html Go Gators! | ~xzilla/index.php?/archives/283-Go-Gators!.html 2006 Year In Review | ~xzilla/index.php?/archives/282-2006-Year-In-Review.html sizing up your partitions | ~xzilla/index.php?/archives/281-sizing-up-your-partitions.html sql servers charindex function in postgresql | ~xzilla/index.php?/archives/279-sql-servers-charindex-function-in-postgresql.html PostgreSQL full text search testing PART II | ~xzilla/index.php?/archives/280-PostgreSQL-full-text-search-testing-PART-II.html PostgreSQL full text search testing | ~xzilla/index.php?/archives/278-PostgreSQL-full-text-search-testing.html PostgreSQL bios page updated | ~xzilla/index.php?/archives/277-PostgreSQL-bios-page-updated.html PostgreSQL 8.2 RC1 Released, are you testing? | ~xzilla/index.php?/archives/276-PostgreSQL-8.2-RC1-Released,-are-you-testing.html Why usability matters | ~xzilla/index.php?/archives/275-Why-usability-matters.html (15 rows)
What’s better is I can make all of this transparent…
pagila=# create or replace view omniti as pagila-# select \* from getblog('http://www.schlossnagle.org/~george/blog/index.php?/feeds/index.rss2') pagila-# union select \* from getblog('http://feeds.feedburner.com/shiflett') pagila-# union select \* from getblog('http://www.lethargy.org/~jesus/feeds/index.rss2') pagila-# union select \* from getblog('http://netevil.org/feeds/rss.xml') pagila-# union select \* from getblog('http://people.planetpostgresql.org/xzilla/index.php?/feeds/rss2.xml') pagila-# ; CREATE VIEW
At this point we can look at all of our blogs just like any other type of data in our database, and we can do all kinds of (quasi) interesting stuff, like grab the list of recent posts
pagila=# select blog, entry, pubdate from omniti where pagila-# pubdate > date_trunc('month',current_date) pagila-# order by pubdate desc; blog | entry | pubdate -----------------+------------------------------------------------------------+------------------------ Evil, as in Dr. | Jyte.com: digg crossed with social networking | 2007-02-15 17:40:03-05 Evil, as in Dr. | AOL switches on OpenID provider | 2007-02-15 17:29:51-05 Evil, as in Dr. | OpenID (and TypeKey) using native OpenSSL functions in PHP | 2007-02-15 17:23:48-05 zillablog | Web team updates | 2007-02-08 21:18:08-05 Esoteric Curio | Debugging a memory-exhausting mod_perl app | 2007-02-03 21:12:07-05 (5 rows)
or maybe plot the relative activity of each of us on our blogs
pagila=# select blog,to_char(date_trunc('month',pubdate),'mon YY'), count(*) from omniti pagila-# where pubdate > date_trunc('month',current_date) - '2 months'::interval pagila-# group by blog, date_trunc('month',pubdate) pagila-# order by date_trunc('month',pubdate) desc,3 desc; blog | to_char | count -----------------+---------+------- Evil, as in Dr. | feb 07 | 3 Esoteric Curio | feb 07 | 1 zillablog | feb 07 | 1 zillablog | jan 07 | 7 Chris Shiflett | jan 07 | 4 Esoteric Curio | jan 07 | 2 Evil, as in Dr. | jan 07 | 2 George's Blog | jan 07 | 1 zillablog | dec 06 | 5 Chris Shiflett | dec 06 | 5 Esoteric Curio | dec 06 | 3 Evil, as in Dr. | dec 06 | 2 (12 rows)
So there you have it. Granted, this may not be the most practical application of this functionality, but being able to take any information on the internet and present as a standard object in your database is bound to come in handy at some point.