The case of ActiveRecord vs. CouchDB
After my Ruby sittin’ on the Couch talk about Ruby Frameworks and CouchDB at Scotland on Rails last week a bit of a debate started. In the talk I did compare the available Ruby Frameworks and discussed how well they fit the CouchDB way of doing things. In my conclusion I recommended using either CouchRest or RelaxDB for development and at the same time urged people not to use one of the ActiveRecord like libraries like CouchFoo.
As it seems to me the intention of my talk hasn’t reached all of its audience yet I’ll try to make my point again using this blog post.
The frameworks I looked at could be divided into two classes: the ones using CouchDB semantics (e.g. CouchRest/RelaxDB) and the ones trying to provide an ActiveRecord like interface for the applications (e.g. CouchFoo). The reasons I recommended not to use ActiveRecord semantics are:
CouchDB is not about relations
In ActiveRecord we have to model our domain models so they fit into a relational schema. That means flat tables and relationships between two tables, through a third and fourth table etc. To get results from the database we usually join a few tables and get back the resulting rows, nicely converted into Ruby objects for us. That is (sort of) fine for a relational database because all it has are those tables but it doesn’t work at all with CouchDB.
To use CouchDB to its full potential you need to understand and use its views
CouchDB doesn’t have a concept of tables at all. And the way you pull your data from CouchDB is fundamentally different. Instead of joining data from different tables via an SQL query you procedurally build up an index of data by providing the map and reduce functions which you then query.
In order to fully use CouchDB you have to write custom map/reduce functions and abstracting that into an API that was designed for generating SQL queries doesn’t allow you that.
I could go into more details but that really is my main point. I know from my experience that changing the wiring in your head after too many years of ActiveRecord is hard so I don’t expect anyone to immediately agree with me but I do believe that the only viable way of using CouchDB is through the interface that was designed for it and not by an abstractions that just happened to be there first.
I’m looking forward to a lively discussion, either in the comments or on other blogs.
Tags: activerecord, couchdb, frameworks, ruby, scotlandonrails, sor09




March 31st, 2009 at 17:03
hi alex, i am also strongly looking forward to a debate on this subject.
in my spare time i am building a “mapper” (or helper layer, or whatever you want to call it) for couchdb in scala. after all these years of relational i’m still trying to wrap my head around these concepts, for instance thinking in “dynamic data with static queries” versus relational’s “static data with dynamic queries”.
so instead of carrying on trying to fit a square into a circle, i am giving it some time until i picture more clearly what this couchdb helper layer should look like, to get the most out of the document-oriented way of thinking.
just ranting here, first thing that comes up is: parse view results JSON into classes and let clients provide their own functions that map from that structure to whatever they need (one example could be, a list of a certain domain model). ultimately, it’s only them who are aware of the map/reduce functions and what they expect to receive.
March 31st, 2009 at 17:05
Hi Alex -
Can’t agree more .. I have been talking the same stuff in Twitter over the last couple of days. Check out .. http://twitter.com/debasishg/status/1412770189 and http://twitter.com/debasishg/status/1412775001 and http://twitter.com/debasishg/status/1417237103. The entire purpose of document oriented storage is to model data close to the domain. Hence we talk about less impedance mismatch with this paradigm, which does away with the necessity of a big fat abstraction named “ORM”. Whatever is left will be the thin layer of mapping between the domain objects and JSON.
Cheers.
March 31st, 2009 at 18:01
CouchDB wrappers should look like http. However we’ve written some helpers that cache parts of views on other documents to give ‘relationship’ like properties..
e.g. if you have a book document and an author document, you might want to ‘cache’ the books author name on the book. Our helper monitors changes to book document types and copies some data (based on a view) onto the book. Hence you can just get a book out and use some of the author data immediatly (great for lists of books for instance). Of course if your authors aren’t really being used for anything other than name, just store the author in the book document
March 31st, 2009 at 18:05
I attended a presentation @janl gave on CouchBD. I came away with the impression that the hardest part of ‘getting it right’ would be switching off ones ingrained inclination to think of the domain in a RDMS way and to adopt instead a document store mindset.
The naive example, I mentioned after your presentation, of data that I thought would fit well with CouchDB is music records. The meta data on jazz records has a totally different data structure to that of classical records, which are different again to pop and rock records. Neither do they map easily in an object relational sense, the object graph is too chaotic. Far better I’m thinking to just switch back and forth between JSON and some form of R(uby)ON.
Hope I’m on the right track.
March 31st, 2009 at 20:30
so what about stuff you get for free with ORMs, such as lazy loading? the music records examples is valid, but i would like to understand more complex scenarios, where you would perhaps even have circular references in your domain model.
the difference is that the mapper can no longer generate queries for us, because of its static nature. this means the way i use my domain entities throughout my code will inevitably change, if i used to rely on the ORM to transparently go fetch data as needed.
i was doing something similar to approach #2 on the couchdb joins post by cmlenz (http://www.cmlenz.net/archives/2007/10/couchdb-joins), so i had to create a permanent view to make the join happen, adding specific parameters to the http request (startkey, endkey, etc).
how would the couchdb way be, if i had many of these cases?
– create a permanent view for every join – but could quickly become annoying
– don’t use joins and stuff everything inside the document – but there are reasons not to do so, e.g. to avoid conflicts if the doc is frequently updated, or not to send the whole doc for every append
– tim parkin’s method seems fine but i suppose authors don’t have a reference to their books. so again you would need a view if you have to access “all the author’s books” (to simulate the fact that authors point to its books).
perhaps the solution resides on a mix of different techniques. what do you think? maybe i’m still way into the relational mindset and can’t see it clearly.
March 31st, 2009 at 20:33
Are the slides and/or a transcript of your talk available? I didn’t find them on the Scotland on Rails page. I’m just starting to discover the new database paradigm and maybe your talk is a great introduction.
March 31st, 2009 at 21:15
hi ralph,
http://www.slideshare.net/langalex/ruby-sittin-on-the-couch
April 1st, 2009 at 09:35
I was fortunate to see your talk at Scotland on Rails which opened my mind to new ways of relational v’s dynamic databases – I will be definately exploring CouchDB as it looks like an elegant scaleable solution, simply JSON, Indexes and RESTfull http requests – simple – excellent talk by the way, really enjoyed it!
April 1st, 2009 at 09:49
@francisco any code we can see yet? i completely agree with you, the whole custom map/reduce functions and how to handle what comes back from those is a central question.
@Debasish those tweets could have been mine
@Anthony your example sounds interesting i’d love to see some example code and what you are trying to achive
@francisco 2: well in rdbms you have a separate sql query for each … query so why not many couchdb views? i also find that it helps sometimes to consider turning certain more complex queries into domain models, e.g. you might have a BookList model (probably something more complex)
btw i have already started to rewrite couch potato, for anyone who’s interested you can follow the commits here: http://github.com/langalex/couch_potato
April 1st, 2009 at 10:08
“In order to fully use CouchDB you have to write custom map/reduce functions and abstracting that into an API that was designed for generating SQL queries doesn’t allow you that.”
@alex – Can you give some examples (other than find_by_sql). I’m can’t see how find(), save(), validates_presence_of, has_many … apply any less to CouchDB
April 1st, 2009 at 10:25
@george i just posted an example to you blog
also i wouldn’t say that save/validations are something activerecord specific – i think we all agree we want validations and save.
April 1st, 2009 at 10:51
Hi Alex – Am the author of ActiveCouch and I agree that a lot of what ActiveCouch is based on stems from the fact that I was most familiar with ActiveRecord semantics when I first started wrote ActiveCouch (which was slightly more than a year ago) and CouchDB itself was still very young.
I agree that it is a very naive way of looking at things and it doesn’t make sense for a lot of cases, but it did (and still does) work for my particular use-case and is still being used in production over at wego.com.
I’m planning to get around to re-thinking and re-working ActiveCouch to be more in tune with CouchDB’s philosophy but life at a young startup can be very demanding and I haven’t had the time to get around to it.
Thanks for your feedback
April 1st, 2009 at 11:50
@arun i would be very interested in seeing your particular use case and how that works for you… any chance to see some code or an example of how you structured your data and why?
April 1st, 2009 at 21:00
@debasish “The entire purpose of document oriented storage is to model data close to the domain”.
mind explaining what do you exactly meant by this? that it is closer to the domain because of less superfluous relationships?
@alex no code to show yet – mainly because it’s highly experimental. i will dress it up and push it to github – but only once i have grasped these concepts. i need to firmly understand what i want to do before carrying on with coding
“well in rdbms you have a separate sql query for each … query so why not many couchdb views?”
yes, but the difference is with sql i have the orm dynamically generating 99% of the queries i use – whereas in couchdb all queries (in the ‘view’ sense) are manually created. unless you’re planning to use temporary (aka slow) views. or you suggest that these ’static’ views should be managed with the aid of the helper/mapper?
“i also find that it helps sometimes to consider turning certain more complex queries into domain models, e.g. you might have a BookList model (probably something more complex)”
yes, but let’s say i have cmlenz’s example i cited before (blog post and a list of comments). i think it just makes sense for the Post model to have a list of Comments. i don’t get why would i want to introduce a CommentList model.
this situation could be easily found plenty of times throughout a non-obvious app.
this weekend i will try to dust off my ruby and have a serious look to couchpotato’s improvements. cheers
April 2nd, 2009 at 10:09
@francisco no im not planning to use temporary views. nobody should
and yes, couchrest/relaxdb/couchpotato all manage static views for you in that they create and update them for you.
yes, for a list of comments you would probably just load the list. the example i posted over at george’s blog (http://www.rowtheboat.com/archives/37) might be a better fit for a view that turns into a model: getting a list of posts each with its comment count.
April 4th, 2009 at 15:37
@alex i twittered you about a question in this (http://blip.tv/file/1949416) talk.
disclaimer: i have no idea of how mongodb “nested documents” work. but i suppose my point was: with that feature you could avoid having to “join” sibling documents when you want to store e.g. a blog post in 1 doc + comments in their own docs.
i guess the advantage with nested comments is you treat them as if they were inline json data, but with less chance of conflicts and more performant because there are still different docs.
now i should go read a bit more about the subject
April 4th, 2009 at 17:04
well I don’t understand the advantage nested documents would provide but I’ll look into it and report back. Or whoever is faster can add the new wisdom here. I’d be very interested.
April 7th, 2009 at 01:41
[...] Ghosh também estão engajados em temas desse tipo, e aqui você pode ver uma palavrinha dos dois neste post – super atual, por sinal – do Alexander [...]
April 7th, 2009 at 08:57
two interesting reads that add to the topic: http://debasishg.blogspot.com/2009/04/framework-inertia-couchdb-and-case-of.html (talking about developers being trapped in convenience) and http://www.cmlenz.net/archives/2007/10/couchdb-joins (3 ways to model/retrieve what AR knows as a has_many relationship)
April 10th, 2009 at 14:12
[...] Alexander Lang has written a nice article about why CouchDB is not compatible with ActiveRecord, and why you should not try to coerce CouchDB into some kind of library that mimics ActiveRecord. It really is a very different thing altogether. Read: The case of ActiveRecord vs. CouchDB [...]
April 10th, 2009 at 21:17
I’ll not speak of the specifics of ActiveRecord/Ruby, but I don’t see how couchd views aren’t almost exactly what views are in RDBMSes. They filter and aggregate and return a set of ‘documents’ from a set of tables.
April 12th, 2009 at 13:39
well, i don’t know too much about rdbms views but from what i know they are sort of what activerecord simluated with named sopes, e.g. sql queries that you put in the database and that you can run additional queries on. hence, what they do is the same a sql query would do. (correct me if i’m wrong).
i’d say the only thing sql and couch views have in common is that they return table data. and here are a few of the differences:
1. couch views are the only way to get data out of couch while sql views are just a shortcut for “normal” queries
2. couch views generate indexes, not tables that you can again query
3. couch views are built using map/reduce functions (with javascript, an imperative language) while sql views are built by a sql query (no idea if sql views are also precomputed?)
4. sql views work on table data. you can join and aggregate the fields of these tables. couch views work on documents (i.e. structures, potentially complex data). you don’t access your data on a attribute/column level but have access to the anything you want, i.e. the average number of letters in that array of values
5. not sure about sql views but with couch views you can generate multiple entries in your result set from a single document by issuing multiple emit() calls per document
…
April 17th, 2009 at 00:53
[...] The case of ActiveRecord vs. CouchDB [...]
May 27th, 2009 at 08:10
well I don’t understand the advantage nested documents would provide but I’ll look into it and report back. Or whoever is faster can add the new wisdom here. I’d be very interested.
May 28th, 2009 at 01:04
well, nesting things in a document is the easiest way to do something, so if what you want to do allows that i’d do that first.
June 13th, 2009 at 15:53
[...] one-click CouchDB package for the Mac CouchDB: Perform like a pr0n star The case of ActiveRecord vs. CouchDB CouchDB introduction PDF CouchDB and Me CouchDB website CouchDB: The Definitive Guide CouchDB: [...]