How Twitter Is Moving To the Cassandra Database 157

Posted by kdawson on Tuesday February 23, 2010 @02:55PM from the big-table-doesn't-capture-the-half-of-it dept.

MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database. Here's some detailed background on Cassandra, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model." Before settling on Cassandra, the Twitter team looked into: "...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting. ... We're currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets. ... Some side notes here about importing. We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast — it would saturate the backplane of our network. We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster." Relatedly, an anonymous reader notes that the upcoming NoSQL Live conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace.

How Twitter Is Moving To the Cassandra Database

This discussion has been archived. No new comments can be posted.

Search 157 Comments Log In/Create an Account

Comments Filter:

Re:network issues? (Score:3, Informative)

by Bill, Shooter of Bul ( 629286 ) writes: on Tuesday February 23, 2010 @05:33PM (#31250796) Journal

Yes and no. They are specifically talking about importing their data into cassandra. Which will be a one time event, not worth upgrading the network bandwidth. They need to throttle it to allow for more time sensitive traffic to use the bandwidth. The bandwidth to the database in normal use will be much, much less then the import bandwidth.

Re:network issues? (Score:4, Informative)

by ryansking ( 1752556 ) writes: on Tuesday February 23, 2010 @05:55PM (#31251186)

If we're going to have to slow the system down, we'd rather use the standard interface, because that means the bulk loading doubles as a load test and the tools we build for it can be re-used for normal operations.

Re:Twitter needs scalability experts (Score:2, Informative)

by ryansking ( 1752556 ) writes: on Tuesday February 23, 2010 @06:07PM (#31251402)

You're right, I failed to mention disaster recovery– it was something we looked at, its just been awhile since we went through the evaluation process, so I've forgotten a few things. We actually liked Cassandra for DR scenarios – the snapshot functionality makes backups relatively straight forward, plus multi-DC support will make operational continuity in the case of losing a whole DC a possibility.

Re:Java / JVM Wins Again ... (Score:3, Informative)

by zuperduperman ( 1206922 ) writes: on Tuesday February 23, 2010 @07:31PM (#31252544)

Sure - but I think the whole point is that you'd be smiling even more if they were using one of the modern & trendy dynamic languages because you'd likely have 2 - 3 times the amount of hardware to look after. I'm not sure what alternative you would propose that uses less hardware but there actually aren't many that are better than the JVM these days.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

How Twitter Is Moving To the Cassandra Database 157

How Twitter Is Moving To the Cassandra Database More Login

How Twitter Is Moving To the Cassandra Database

Re:network issues? (Score:3, Informative)

Re:network issues? (Score:4, Informative)

Re:Twitter needs scalability experts (Score:2, Informative)

Re:Java / JVM Wins Again ... (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot