welcome: please sign in
EAV Databases comment

(Originally posted in response to a blog entry by linoj, but mangled by his comment system.)

Hey linoj! You're not on IRC, so I'll post this here. :)

It seems that the EAV model is something people are getting more interested in lately as flexible databases are being needed for all the crazy semantic web stuff people want.

It turns out that it's possible to write an efficient EAV system -- the problem seems to be that everyone was trying to do it using RDBMSes, which happen to have a really complex pipeline for every query (due to all the locking and transactions and query optimization and other misc things). An EAV store can do with a much, much simpler data access pipeline. :)

Daniel Abadi (http://cs-www.cs.yale.edu/homes/dna/) has totally dug into the current 30-year old RDBMS technology and figured out some really interesting things. This paper is awesome -- http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf. He shows that 95% of the work that an OLTP database does is wasted (locking/latching/logging/buffer-management), and if you remove all that stuff, and solve the concurrency/consistency problems by making the database single-threaded and replicated, you get a significant boost in performance.

Here's a great diagram:
http://epi.ponzo.net/images/OLTP_overhead.png

Another interesting development is the Freebase.com database called "graphd" (http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/). It's got good performance (freebase is freakin huge), and the way you query it is neat -- instead of asking for rows, you pass it a (possibly-nested) hash of attribute=>value pairs. If the value is filled in, it finds that, and if a value is nil, that's one of the fields in your result. You basically leave out pieces, and it fills them in. :D

You can see this in action by playing with their query editor: http://www.freebase.com/tools/queryeditor (a nice set of example queries are at the bottom.)

I can see people moving to databases that are similar to what CouchDB et al. are doing -- you store all your data in a homogenous form, but have "views" of the data which are kinda like dynamic schema, and generate index structures for performance.

EAV_Databases_comment (last edited 2010-04-24 09:29:34 by localhost)