CouchDB: The Definitive Guide

mark_l_watson · on Aug 23, 2010

Good job of putting the book on the web. I bought the PDF version a long time ago and it was useful in getting me up to speed. Although I use MongoDB for most of my work and not CouchDB, I spend a good deal of play time with CouchDB. Also, CouchDB looks to be hackable at the implementation level: I saw a good article on hacking the Erlang source to customize CouchDB.

teilo · on Aug 23, 2010

What a gorgeous way to put a book on the web!

nslater · on Aug 24, 2010

Do you mind if we quote you on this?

teilo · on Aug 24, 2010

Quote away!

rdrimmie · on Aug 23, 2010

I'm a big fan of this book. It's one of the rare technical books that is so well-written that it's a great read regardless of whether or not you're actually trying to use the technology.

ratsbane · on Aug 23, 2010

The CouchDB group is hosting CouchCamp September 8-10 in Petaluma, CA: http://www.couch.io/couchcamp It's quite cheap ($600, including room and meals) and should be a great chance to learn more about CouchDB from the original sources.

I've not signed up for it (yet) myself; I'm concerned about a schedule conflict, but I may go ahead and do it anyway.

Even if you don't plan on using CouchDB directly, it's worth a look just to see how neatly they've implemented RESTful APIs.

lzw · on Aug 23, 2010

One thing I wonder about CouchDB is whether writing scales.

Assuming you have a cluster of couchdb nodes. They all replicate, so you can spread reads and writes over the cluster. As you add nodes, read performance goes up. Initial write performance should as well.

But when you write to node N, then that data needs to be written to nodes 0-(N-1) as well. So every write goes to every node (thru replication). Since writes on CouchDB are syncronous (Eg: only one write at a time) then there is a fixed total write bandwidth on each node. Since a write on any node needs to be replicated, then, assuming the cluster is using identical machines, the total write speed of the cluster would be the same as the write speed of a single node, right?

This is probably not a problem for most cases as reads outnumber writes for web businesses. But it seems this is an area where couchDB isn't scalable.

My assessment is based on the perception that writing data coming via replication is exactly as expensive as writing data coming from the web. Replicated data goes thru the filters, and replicated data has its views recalculated as well.

jchrisa · on Aug 23, 2010

Scaling writes requires partitioning, which can be done in CouchDB with either:

http://tilgovi.github.com/couchdb-lounge/

or in a hosted Erlang version:

http://blog.cloudant.com/dynamo-and-couchdb-clusters

carterschonwald · on Aug 23, 2010

the thing you're missing is that these single writes can be a "bulk" write of a large number of documents. More over, the single threaded (semantically) writing is an artifact of having append only writing to the db.

there are 2 majors wins when you have this approach to a db.

1) append only memory allocation is FAST. If we were talking about writing to ram, its as simple as allocating the interval from i to i + (amount of memory needed), where i is the heap pointer

2) you don't have to deal with locks! Locks are not bad per se, but in many cases lock based data structures can lead to pain later on (just look at home much work has been done on python and linux over time in this matter).

Also, I believe that in the case that here aren't conflicting versions of the same doc in two couchdb nodes, replication is essentially bandwidth rate limited.

take the time to read up on couchdb. it may not be perfect or appropriate for everything, but it is a very very nice piece of software. And whats really awesome is that as a consequence of the append only semantics, the data loss bug they recently had was totally recoverable for those who hadn't compacted their db's! (how many db's can say that for their data loss bugs?)

lzw · on Aug 23, 2010

They must have just pushed this, because I was looking at it yesterday and it was the old tired and broken (by comparison) version of the page.

Has the edition changed? I'm a bit confused. I'm reading the O'Reilly eBook and it seems like what I'm reading is a draft. There are chapters seemingly missing (for instance, they described Map well enough but skipped reduce and went right into talking about Sofa without ever introducing Sofa)

However, I am reading it cover to cover and it is pretty thorough and not a bad book.

I just wonder which version I have and how it compares to the two on the new webpage.

Maybe we need a standard for book versions. Rather than just "Editions" something of a smaller increment. Like Edition/Revision.