Skip to main content

Posts

Showing posts from March, 2009

Distributed Databases and Cassandra at PyCon

I'll be leading an open-spaces discussion about distributed database architecture, implementation, and use today at 5:00 PM in the Lambert room. Specifically, we will cover bigtable, dynamo, and cassandra, and how to port a typical relational schema to cassandra's ColumnFamily model. I wrote a little background information yesterday about why I think Cassandra in particular is compelling .

Why I like the Cassandra distributed database

I need a distributed database. A real distributed database; replication doesn't count because under a replication-oriented db, each node still needs to be able to handle the full write volume, and you can only throw hardware at that for so long. So, I'm working on the Cassandra distributed database. I gave a lightning talk on it at PyCon this morning. Cassandra is written in Java and implements a sort of hybrid between Dynamo and Bigtable . (Both papers are worth reading.) It takes its distribution algorithm from Dynamo and its data model from Bigtable -- sort of the best of both worlds. Avinash Lakshman, Cassandra's architect, is one of the authors of the Dynamo paper. There is a video about Cassandra here . The first 1/4 is about using Cassandra and then the rest is mostly about the internals. Cassandra is very bleeding edge. Facebook runs several Cassandra clusters in production (the largest is 120 machines and 40TB of data ), but there are sharp edges th