Forgot your password?
typodupeerror
Databases Social Networks

How Twitter Is Moving To the Cassandra Database 157

Posted by kdawson
from the big-table-doesn't-capture-the-half-of-it dept.
MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database. Here's some detailed background on Cassandra, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model." Before settling on Cassandra, the Twitter team looked into: "...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting. ... We're currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets. ... Some side notes here about importing. We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast — it would saturate the backplane of our network. We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster." Relatedly, an anonymous reader notes that the upcoming NoSQL Live conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace.
This discussion has been archived. No new comments can be posted.

How Twitter Is Moving To the Cassandra Database

Comments Filter:
  • Re:network issues? (Score:3, Informative)

    by Bill, Shooter of Bul (629286) on Tuesday February 23, 2010 @04:33PM (#31250796) Journal
    Yes and no. They are specifically talking about importing their data into cassandra. Which will be a one time event, not worth upgrading the network bandwidth. They need to throttle it to allow for more time sensitive traffic to use the bandwidth. The bandwidth to the database in normal use will be much, much less then the import bandwidth.
  • Re:network issues? (Score:4, Informative)

    by ryansking (1752556) on Tuesday February 23, 2010 @04:55PM (#31251186)
    If we're going to have to slow the system down, we'd rather use the standard interface, because that means the bulk loading doubles as a load test and the tools we build for it can be re-used for normal operations.
  • by ryansking (1752556) on Tuesday February 23, 2010 @05:07PM (#31251402)
    You're right, I failed to mention disaster recovery– it was something we looked at, its just been awhile since we went through the evaluation process, so I've forgotten a few things. We actually liked Cassandra for DR scenarios – the snapshot functionality makes backups relatively straight forward, plus multi-DC support will make operational continuity in the case of losing a whole DC a possibility.
  • by zuperduperman (1206922) on Tuesday February 23, 2010 @06:31PM (#31252544)

    Sure - but I think the whole point is that you'd be smiling even more if they were using one of the modern & trendy dynamic languages because you'd likely have 2 - 3 times the amount of hardware to look after. I'm not sure what alternative you would propose that uses less hardware but there actually aren't many that are better than the JVM these days.

We warn the reader in advance that the proof presented here depends on a clever but highly unmotivated trick. -- Howard Anton, "Elementary Linear Algebra"

Working...