July 30, 2010 by alex
So we’ve been told that all these fancy new NoSQL stores don’t support transactions because that wouldn’t scale, and we’d just have to live with that. So yes, technically, CouchDB doesn’t support transactions, yet it still does. In a way.
What CouchDB doesn’t support is transactions that span multiple read/write operations, i.e. write document a, then write document b, if something goes wrong, roll back both writes. What it does support is single document “transactions”, i.e. a document is either written completely or not. So if our application requires a transaction, all we have to do is make sure that transaction happens in a single document.
Here’s our use case: at Cobot, our coworking space management service, we have a feature where a coworking space can charge a coworker for a one time service, e.g. usage of a meeting room. The way it works is that the manager of a space goes to cobot and enters an amount and description, e.g. “$10” and “meeting room”. At the end of the month, a cron job sends out invoices for all the one-time charges.
Our problem lies with this cron job. The job has to find all the charges for a coworker that haven’t been invoiced yet, create an invoice and mark the charges as invoiced. Without transactions, if something went wrong between creating the invoice and marking the charge as invoiced, the charge could end up being invoiced twice, because it hasn’t been marked as invoiced the first time.
Clearly we need to throw CouchDB away now and go back to a proper™ (a.k.a. relational) database, right? Well, no. As I said earlier we can try to move the transaction into a single document.
In our relational past, we would have had an invoices table and a charges table. The invoices table would carry all the invoices data (have fun with that), and the charges would have a field for amount and description, as well as a boolean field invoiced.
In the world of NoSQL (documents, no transactions), instead we have 2 documents: an invoice document that contains all the invoice data and a charge document with the amount and description again. Oh, and the documents also have _id_s.
Now when we create the invoice, instead of marking the charge as invoiced, we add the charge’s id to an array invoiced_charges in the invoice.
That was pretty easy. Now the tricky part is, next month, to determine which charges have been invoiced already. The first approach could be to make two requests:
Fetch all the charges by using a view that just emits every document that is a charge
function(doc) {
if(doc.type == 'charge') {
emit(doc.id, null);
}
}
Then fetch all the charge ids from the invoices:
function(doc) {
if(doc.invoiced_charges) {
doc.invoiced_charges.forEach(function(id) {
emit(id, null);
});
}
}
Then on the client side we can throw all the charges whose ids are in the list of invoiced charges.
But we can do better: We can use a list function to do the filtering within CouchDB. First we have to combine the above views into one:
function(doc) {
if(doc.type == 'charge') {
emit('charge', null);
}
if(doc.invoiced_charges) {
doc.invoiced_ids.forEach(function(id) {
emit('_invoiced', null)
});
}
}
Now we add the following list function:
function() {
// some helpers
Array.prototype.index = function(val) {
for(var i = 0, l = this.length; i < l; i++) {
if(this[i] == val) return i;
}
return null;
}
Array.prototype.include = function(val) {
return this.index(val) !== null;
}
// interesting stuff
var used_ids = [];
send_json_results(function(row, sender) {
if(row['key'] == '_invoiced') {
used_ids.push(row['id']);
} else {
if(!used_ids.include(row['id'])) {
sender.send_row(row);
};
};
});
// more helpers
// this just makes sending json from a list function easier
function send_json_results(callback) {
send('{"rows": [');
var first_row = true, sender = {};
sender.send_row = function(json) {
if(!first_row) {
send(',');
}
first_row = false;
send(JSON.stringify(json));
};
while(row = getRow()) {
callback(row, sender);
}
send(']}');
};
}
(Disclaimer: this is not exactly the same code as used in cobot and I haven’t tested it again.)
Apart from all the helpers, what this essentially does it put all the invoiced charges ids into a list and check every charge against that list, only sending it down to the client if it hasn’t been invoiced already.
As the view is sorted by the keys (‘_invoiced’, ‘charge’) we can be sure that all the invoiced charge ids have been collected before the list function checks the first charge.
There you have it, transactions in CouchDB.
P.S. If you are writing your stuff in Ruby (and possibly Rails), Couch Potato now has support for creating and querying list functions. See the readme.