Forgot your password?
typodupeerror
Databases Social Networks

How Twitter Is Moving To the Cassandra Database 157

Posted by kdawson
from the big-table-doesn't-capture-the-half-of-it dept.
MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database. Here's some detailed background on Cassandra, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model." Before settling on Cassandra, the Twitter team looked into: "...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting. ... We're currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets. ... Some side notes here about importing. We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast — it would saturate the backplane of our network. We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster." Relatedly, an anonymous reader notes that the upcoming NoSQL Live conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace.
This discussion has been archived. No new comments can be posted.

How Twitter Is Moving To the Cassandra Database

Comments Filter:
  • by smellsofbikes (890263) on Tuesday February 23, 2010 @03:07PM (#31248230) Journal
    They keep saying that the Cassandra database is better, but somehow I don't believe them. I can't imagine they know what they're talking about. Maybe in the long-term they'll be proven right but I really don't think they are. I don't know why, though...

    heh heh heh.

    • by Yvan256 (722131)

      Do you have an ex-girlfriend called Cassandra, by any chance?

      • by einhverfr (238914)

        Drink a few beers. Read the Iliad. You'll feel better.

        (AJAX: When second-best is good enough. Or maybe AJAX is almost as good as ACHILLES.)

    • I took an axe to my last Cassandra cluster and feel quite better now.
    • by sconeu (64226)

      Damn... you beat me to it. I was going to say, "Cassandra? I don't believe it!"

    • by mariushm (1022195)

      For some reason my mind went to Cassandra Crossing (http://en.wikipedia.org/wiki/The_Cassandra_Crossing)

  • by maugle (1369813) on Tuesday February 23, 2010 @03:15PM (#31248370)
    I hear Cassandra can even predict when disastrous system failures are going to occur! Unfortunately, for some reason nobody ever believes the warnings.
  • network issues? (Score:5, Insightful)

    by QuietLagoon (813062) on Tuesday February 23, 2010 @03:26PM (#31248592)
    We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast it would saturate the backplane of our network.

    .

    First time I have ever heard anyone say that a database was too fast. Maybe there are network problems that also need to be addressed.

    • Could someone explain to me why this kind of speed would be a problem? It seems to me that if BinaryMemtable is so incredibly fast that other things become a bottleneck, then you're in a great position. You have something very fast for storing and retrieving data - you just need to get bigger, faster pipes.

      • Re: (Score:3, Insightful)

        by b0bby (201198)

        I know next to nothing about NoSQL, but what they're talking about there seems to be using BinaryMemtable for the one-time move of data. You can see that you wouldn't want to "saturate the backplane of our network" for several days while that completes, so they're using a slower method & throttling it. It will take a week to do the move, but everything else will keep working.

      • Re: (Score:3, Informative)

        Yes and no. They are specifically talking about importing their data into cassandra. Which will be a one time event, not worth upgrading the network bandwidth. They need to throttle it to allow for more time sensitive traffic to use the bandwidth. The bandwidth to the database in normal use will be much, much less then the import bandwidth.
        • Ah, that makes sense. For some reason I thought they were talking about general usage. Thanks for clearing that up. (o:

      • by geniusj (140174)

        I haven't checked, but I'd bet that BinaryMemtable uses UDP, when combined with the fast speed, could easily cause significant network saturation..

  • I look forward to a brand new twitter that randomly doesn't display expected data and sometimes doesn't take my status updates!

  • by Lunix Nutcase (1092239) on Tuesday February 23, 2010 @03:36PM (#31248842)

    Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?

    • by BarryJacobsen (526926) on Tuesday February 23, 2010 @03:44PM (#31248990) Homepage

      Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?

      Because the change prevented them from posting it to twitter.

    • by Gruuk (18480) on Tuesday February 23, 2010 @03:44PM (#31248994)

      Scaling. If something turns out to be robust and fast enough for Twitter, it is definitely of interest to anyone working on significantly large and busy websites.

      • by Lunix Nutcase (1092239) on Tuesday February 23, 2010 @03:56PM (#31249246)

        Yes, because twitter is the epitome of robustness and speed. Oh wait... Just in the 2 months of this year alone they've had something like 4 outages.

        • by kriston (7886) on Tuesday February 23, 2010 @04:53PM (#31250134) Homepage Journal

          No way. Their architecture is about as "best guess" engineering as Facebook. I don't think that's actually what engineering is. "Maybe this one will work?"

          In the meantime, I have not been able to update my avatar image on Twitter, and TwitPic-like feature is still a faint glimmer in Twitter's amateur eyes. Speaking of missed opportunities, why drive so much traffic to Twitter parasites Bit.ly, TwitPic, TinyURL, Twitition, TwitLonger?

          What in the world are Twitter's engineers actually DOING should be the real question.

          • by haruchai (17472)

            That may not be what actual engineering is but that describes a lot of software "engineering"

        • by e2d2 (115622)

          Which is exactly why developers need to pay attention - So we can avoid these mistakes ourselves.

        • by Xest (935314)

          Well that's actually why I like this news.

          I like to think of Twitter's technology experiments, as high not to build and run a high performance web application. Hell, they bought us confirmation that Ruby on Rails wasn't exactly ready for prime time in terms of high performance work for example.

          We have a lot to thank them for, but you're right, one of those things is not how to run a stable, secure, scalable web site, it is the opposite- how not to. I suspect before long we'll be able to see for ourselves ho

      • Re: (Score:3, Interesting)

        by u38cg (607297)
        Does Twitter really have loads which are more difficult to manage than, say, the BBC, CNN, Google, or Wikipedia? I would have thought serving up a fairly straightforward page, a stylesheet, a background image and the tweets or twits or whatever they're called can't be that difficult compared to, say, Facebook.
        • by roman_mir (125474)

          Twwweeeeter can also probably generate static pages just as well on some large node and then push them to web servers, that just might have worked better for them.

          Do they really need dynamic pages at all or could they live with something that's regenerated every 10 minutes? Just saying.

        • Does Twitter really have loads which are more difficult to manage than, say, the BBC, CNN, Google, or Wikipedia?

          (1) In some measures , probably;
          (2) When Google or Wikipedia makes announcements about technology (whether its a "change" or not) they use in their backend, that's usually often a front-page story on Slashdot, too. The BBC and CNN don't, AFAIK, tend to make big public announcements about back-end technology.

          I would have thought serving up a fairly straightforward page, a stylesheet, a background i

    • Re: (Score:3, Insightful)

      I suppose then why would we care if any site made any random change to any part of its infrastructure?

      Twitter is a -very- busy site.

      They are changing their infrastructure to accomodate. Here's what they looked at, here is what they chose. If you are looking for something with equal performance, you don't have to shop around.

    • Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?

      Because in some areas Twitter is at an extreme of scale, so what they are doing to deal with that extreme of scale (even if it isn't necessarily always the ideal choice) is usually interesting since, if you are looking for things that have been done in production to deal with the kind of scaling they experience, there aren't a lot of other data points to find.

  • who cares what twuufter is running off.

    The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.

    If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs? Databases are not being replaced by NoSQL in projects that need databases. The projects that may not have ever needed databases may b

    • by codepunk (167897)

      Is there really a huge issue with rdbms speeds? I don't know perhaps you should pose that question to google for instance.

      • Re: (Score:3, Insightful)

        by roman_mir (125474)

        your question is answered in my post: google does not need a database for ACID properties.

        Can you complain much if in one location google gives you results that are very different for the same search query as for the same query in a different location at the same time? Well, if you do complain, you can ask google for your money back.

        • by maraist (68387) *
          Regional data has nothing to do with BigTable or RDBMS. Have you read the white-papers on BigTable? If google leverages any IP isolated network solutions, then it's at the networking/application level ABOVE BigTable.

          BigTable itself leverages map-reduce to cascade the query to potentially thousands of machines, reducing their results back to a SINGLE requesting node.

          Geo-location would pick one of several data-centers which house an isolated effective database. The upper layered code would act identically
    • by AndrewNeo (979708) on Tuesday February 23, 2010 @03:56PM (#31249250) Homepage

      I think their point is not everything needs an RDBMS, whereas before it was the 'go to' method of storing data.

      • by Abcd1234 (188840) on Tuesday February 23, 2010 @04:51PM (#31250098) Homepage

        Or: use the right tool for the job. The only difference is, now alternative tools actually exist.

        • Or: use the right tool for the job. The only difference is, now alternative tools actually exist./blockquote

          In point of fact alternative persistence mechanisms to relational databases predate relational databases.

          • by Abcd1234 (188840)

            Yeah, no kidding, it's called a filesystem. But when was the last time you heard announced a mainstream, high-performance, non-relational data store that was intended to be an alternative to an RDBMS (BTW, I'm intentionally discounting OODBMSes, as I think they and RDBMSes are intended to target largely the same application space)? I know I haven't. People simply rolled their own and moved on. But times are changing and that niche is finally being filled (in part because that niche isn't so niche anymor

            • But when was the last time you heard announced a mainstream, high-performance, non-relational data store that was intended to be an alternative to an RDBMS (BTW, I'm intentionally discounting OODBMSes, as I think they and RDBMSes are intended to target largely the same application space)? I know I haven't. People simply rolled their own and moved on. But times are changing and that niche is finally being filled (in part because that niche isn't so niche anymore).

              One of the most recent, well-known major suc

              • by Abcd1234 (188840)

                One of the most recent, well-known major successes before the recent "NoSQL" movement, in terms of a product that sacrificed ACID for performance as an alternative to databases providing ACID guarantees, was MySQL.

                I said nothing about ACID compliance. I specifically mentioned non-relational datastores, and clearly MySQL isn't that. As such, it still forces the developer to work with a relational data model, and one of the main things these so-called "NoSQL" projects do is lift that requirement.

                Aside from

                • said nothing about ACID compliance. I specifically mentioned non-relational datastores, and clearly MySQL isn't that.

                  Um, the reason MySQL with MyISAM doesn't provide ACID guarantees (particularly, its deficiencies with regard to consistency) are related to the ways in which MySQL with MyISAM fails to implement the relational model. Merely using a dialect of SQL as a query language doesn't make a database relational.

                  Well bully for you having a chance to show off your obscure knowledge of non-relational data

                • by haruchai (17472)

                  Just because you haven't heard of something doesn't make it obscure. Tens of millions of Americans still can't find Iraq on a map - doesn't mean it doesn't exist or isn't "mainstream".

                  And, at least one major "mainstream" US news network can tell Egypt from Iraq.

                  http://mediamatters.org/mmtv/200907270040 [mediamatters.org]

                  Caché / MUMPS is heavily used in Healthcare and Finance.
                  Your life and your financial future may well depend on apps that run on them.

                  Just because something isn't incessantly hyped by egocentric CE

      • Re: (Score:2, Insightful)

        by roman_mir (125474)

        You know, the truth is, most data is still stored in individual files, not in databases. So RDBMSs were always a very niche thing used for projects because they are understood and it's easier to develop for them if you really have massive data requirements.

        Files - that's what many projects even today use, not databases. This is basically what they are going back to - files with whatever window dressing on top - a facade of hashes, it's all key/value pairs. It is, my friends, the old old idea of property

      • I think their point is not everything needs an RDBMS, whereas before it was the 'go to' method of storing data.

        Except, of course, that it never was the "go to" method of storing data. There was no point in history where RDBMS's were anywhere close to the exclusive method of persisting data. Non-relational document-oriented storage has pretty much always dominated in the era in which relational databases existed, whether it was proprietary binary document formats, fairly direct text-based document formats, o

    • by azmodean+1 (1328653) on Tuesday February 23, 2010 @04:08PM (#31249450)

      I think you're missing the point here, the problem with RDBMSs isn't that they are "slow" per-se, which implies that they just need some good ol' fashioned optimization. The problem is that there is a cost associated with the data integrity guarantees they make (usually appears in scalability bottlenecks rather than in pure computational inefficiencies), regardless of how good the implementation is, and if you don't need some of those guarantees, you can dispense with them and end up with better performance (again, this typically means better scalability). Additionally, this is the kind of bottleneck that you just can't throw more resources at. Sure you can find the bottleneck and beef up that particular component to do more transactions/second, but at a certain point you've isolated the bottleneck on a world-class server that is doing nothing but that, and it's still a bottleneck. At that point (preferably long before you reach that point) you have to look at transitioning to an infrastructure that makes some kind of tradeoff that allows the removal of the bottleneck, which is what NoSQL does.

      I doubt Twitter wants very many RDBMS-type data coherency guarantees at all. 160-character text strings with a similarly-sized amount of metadata, and no real-time delivery guarantees? Sounds like their database can get pretty inconsistent without messing things up badly. It seems to me they would be well served by using a database that offers just what they want/need in that area and better performance.

      Oh and this:

      Is there really a huge issue with rdbms speeds?

      yes, and what are you smoking that you would even ask this question?

      • by roman_mir (125474)

        As I said, there are projects and then there are projects. Tweater is not the project that requires any real database in the first place, who cares is a commit is transactional there?

        As for your last comment: problems with database performance are all about design. You think NoSQL will not hit the same roadblocks in projects that don't do design right? What are they going to move to when that one fails? NoNoSQL++?

        • by einhverfr (238914)

          I am going to add a few other things here. The first is that "not possible to scale" is not really accurate. I believe there are ways to design structures so that write capacity on an RDBMS can scale upward with the nodes on the network. Of course this only works for some types of applications (the approach I have in mind would work with Twitter, for example). And even with Amazon, you would CERTAINLY want RI on purchases even if you don't care about reviews.

          However, the larger point is that an RDBMS is

    • Speed is latency. (how long it takes)
      Scalability is throughput. (how many concurrent). Or put another way; Speed is the quality, throughput is the width.

      who cares what twuufter is running off.

      Well, developers, and their managers do. They're nothing if not fashion victims.

      RDBMS aren't the be all and end all of scalability (or speed, they perform a shit load of management functions you may or may not need). While attempting to scale conventional rdbms you get into write consistency problem, lookup performance problems unless you specifically desig

    • by Knowbuddy (21314)

      I don't think you understand the niche that NoSQL databases are trying to fill.

      The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.

      It's not a black and white, panacea-type situation. Relational databases are good at some things, non-relational databases are good at others. Where non-relational databases are better is at solving very specific problems, many of which

      • by roman_mir (125474)

        Freedom of choice, definitely. I had projects just recently I used property files as a database - inserts, deletes, updates, all in a property file. Easy enough because it is just a hash map. You don't impress me with any of it, it's not in any way new first of all, but it does not replace any RDBMS where RDBMS is needed.

        My entire point is that Twooter never needed an RDBMS in the first place. They should be just fine without any database usage on the front end, and forget about JSON. The problem with

    • by Doomdark (136619)
      Is there really a huge issue with rdbms speeds? Well if there is something there, that's what needs to be looked at. If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.

      What makes you think this has not been done? Sometimes combination of arrogance and ignorance here is amazing. Very bright minds are working on all kinds of approaches; and of course Oracle (et al) are working on their set of tools to improve them as well.

      In reality it is ALL about different

    • by Eil (82413)

      Just like there is no universal programming language for every type of software, there is no universal database engine for every type of data storage.

    • by maraist (68387) *
      RDBMS's are optimized for READS, not writes. You can produce a 1,000 machine mysql-INNODB cluster that will be faster than memcached and be fully ACID complaint. But you'll only ever have 1 write node. You CAN do sharded masters with interleaved auto-incremented values, but then your foreign keys are totally out the window - as is your ACIDity. Oracle has clustered lock managers, but very quickly is going to max out it's scalability - especially if it's limited to a single SAN.

      Relatively expensive 15,00
      • by QuoteMstr (55051)

        Thank you for the informative and thought-provoking post. It's certainly refreshing to see discourse on a level above "I hate MySQL, therefore SQL sucks." You make some good points.

        Nevertheless, an RDBMS is still the way to go. You hint at the reason in your last paragraph, actually. The entire NoSQL "movement" is predicated on a confusion of implementation and interface. You describe various problems with the way conventional RDBMSes employ the disk: who said RDBMSes had to use those approaches?

        There's not

        • by TheSunborn (68004)

          The problem is that with this kind of backing storage, you can't implement most of sql effective. So you might end up with a 'sql' database where you can't user joins in production due to performance. So you end up with the worst of both worlds. A 'relational' database where you can't use most of the relational operations due to performance issues. And you still have a relative interface, so you can't do the kind of magic optimizations you can do with a simple key/value storage.

          As i see it, the problem is t

        • by maraist (68387) *
          I totally love RDBMS's, don't get me wrong. You can manipulate schema on-the-fly (more or less). Introduce optimizations independently of the source-code. You don't have to think about fringe cases, or data-integrity. But when a project grows to a certain point, you have two decisions: Go to a hyper-expensive RDBMS solution ($100k .. $500k) (for a project that may only be worth $100k), or identify the key bottlenecks and try to re-architect the tables of interest.

          I've often found that DRBD+NFS flat file
    • If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs?

      There is a difference between needing a structured storage mechanism (database) and needing a database that implements the relational model and provides ACID guarantees. Further, many non-relational databases provide specific, weaker forms of ACID guarantees that are better than (say) naive flat file storage would, while providing better scalability in certain appli

    • http://slashdot.org/~roman_mir/comments - I imagine twater storm of moderation points was spent well this time, every single post I had on this issue was above 3 point and now within 1 hour, all comments were moderated down. To me that's just funny - someone does not like the truth.

      I just wonder is it the twater birds or does it have something to do with the nosql ideologists?

  • They should move to Intersystems [intersystems.com] Caché [intersystems.com]. SQL, objects, XML and even MUMPS. It will make equally happy SQL and NoSQL fans. And it's damn fast. Much leaner than Oracle, DB2 or Informix, too. Excellent support. Extremely good. Not cheap, thought.
    • by edmicman (830206)

      Not cheap, though.

      That might be part of it....

  • I hear Cassandra is really a trojan. Can anyone verify? I don't want a trojan on my computer.....

    • I hear Cassandra is really a trojan. Can anyone verify? I don't want a trojan on my "computer....."

      But, but... what if I gift it you? I swear I'm not Trojan but Greek.

  • by Heretic2 (117767) on Tuesday February 23, 2010 @04:51PM (#31250104)

    I love how ass backwards twitter has always been with learning how to scale their 90s infrastructure up. I remember when they called out the Ruby community because they didn't understand MySQL replication and memcached.

    I guess without a profit model they couldn't use a real RDBMS like Oracle. EFD (Enterprise Flash Drive) support anyone? 11g supports EFD on native SSD block-levels. Write scale? How about 1+ million transactions/sec on a single node Oracle DB using <$100K worth of equipment and licenses? Anyway, I've built HUGE databases for a long time, odds are most of you have interfaced with them. Just because it's free and open-source doesn't make it cheap.

    I love FOSS don't get me wrong, but best-in-class is best-in-class. I only use FOSS when it happens to be best-in-class. I laugh at how none of the requirements included disaster recovery. No single point of failure does not preclude failing at every point simultaneously. EMP bomb at your primary datacenter anyone?

    • by msimm (580077)

      ...real RDBMS like Oracle...

      Holy fuck, the right tool for the right job, please? Oracle does somethings for some markets really well but for the rest of us who don't need such a high degree of transactional safety that $90k + two-node RAC price tag might just end up taking your great web 3.0 business through development, maybe early beta before you begin liquidating assets. That's per-processor licensing too on a database that scales vertically well (very well really) but not horizontally well (sharding a

    • by mini me (132455)

      They are seeing about 1/2 million transactions per second with this setup based on the information given, but no word of what their cluster consists of. If it is just a handful of generic PCs, $100,000 for your setup looks pretty expensive.

    • Re: (Score:2, Informative)

      by ryansking (1752556)
      You're right, I failed to mention disaster recovery– it was something we looked at, its just been awhile since we went through the evaluation process, so I've forgotten a few things. We actually liked Cassandra for DR scenarios – the snapshot functionality makes backups relatively straight forward, plus multi-DC support will make operational continuity in the case of losing a whole DC a possibility.
    • by codepunk (167897)

      I love oracle it is a fine database, would I personally buy it? Nope, but as long as
      it is OPM (Other Peoples Money) I am perfectly fine with it. Now say I was designing something
      like a medical records system oracle would be a no brainer. Missing a couple of tweets here
      and there who is really going to care.

    • by lawpoop (604919)

      I only use FOSS when it happens to be best-in-class

      Just curious, what FOSS have/do you use?

    • by Bazouel (105242)

      I am curious what someone with your experience thinks of PostgreSQL ? Would you say that it can scale properly as Oracle does ?

      This is a genuine question as I am pondering between both for my startup. Even thought I already done my investigations, one more opinion cannot hurt :) Assuming my current DB design holds, it will have about 50 tables, most having less than 10,000 records and some having few millions records (they will be partitioned). The volume of reads will be much higher than writes. Write quer

    • by Eil (82413)

      I laugh at how none of the requirements included disaster recovery. No single point of failure does not preclude failing at every point simultaneously. EMP bomb at your primary datacenter anyone?

      1) They never said they didn't plan for disaster recovery. It's silly to deride them for not discussing the entirety of their backups and disaster recovery efforts when the whole topic of the article was their move to Cassandra as a primary data store.

      2) Disaster recovery looks at realistic threat scenarios. Fire, s

  • It's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM, piece by piece. I think it's actually a credit to them that they seem to have level heads and are evaluating technology on it's merits (where as if you talk to most of the ruby / python crowd they would rather stick toothpicks in their eyes than endorse a solution that involves java).

    • Re: (Score:3, Funny)

      by codepunk (167897)

      Until recently I thought the same way, I would never endorse a solution that involves java. However
      a recently came to the same realization that sun did when they created it. Java is a fantastic
      way to over sell gobs of expensive hardware. I am a system administrator so the more hardware it takes to
      run a solution the better off I am, more machines, more money and better job security. So I have now
      fully jumped on the java bandwagon, java makes me smile.

      • Re: (Score:3, Informative)

        Sure - but I think the whole point is that you'd be smiling even more if they were using one of the modern & trendy dynamic languages because you'd likely have 2 - 3 times the amount of hardware to look after. I'm not sure what alternative you would propose that uses less hardware but there actually aren't many that are better than the JVM these days.

    • It's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM piece by piece.

      I think its fascinating, too -- but probably in a very different way than you do. You seem to think that it is a repudiation of some mythical "post-Java revolution", when in many ways I think it is a validation of exactly the approach that was common to pushing Ruby, Python, and similar languages as more agile alternatives to Java. The appeal of tools n

  • A lot of the complaints from NoSQL seem to be regarding DBMSses being too slow and SQL being too hard. And yet a lot of them invent query languages/query languages similar to SQL. Supposedly Oracle scales up really well. There is a paper that compares mapreduce to parallel databases and Hadoop takes a huge beating via the RDBMSes in performance. Now the funny thing is that Oracle was not included, yet most content that if you pay enough Oracle scales really well. DB2 also scales, because in 1999 I work
    • by einhverfr (238914)

      But most open sources databases seem to not be able to compete with the likes of the commercial parallel databases. But it seems like an open source parallel database would do a lot to silence many nosql critics. There is still the complaint about needing to define a schema, however if you are not exploring the data and are processing the same data over and over again, it seems like a good idea to define a schema anyway, that way you can better detect files that don't conform.

      I have actually thought it woul

    • by maraist (68387) *
      [complaining that] "SQL being too hard"? Well, one can assume you can ignore this class of amateurs - there's no lack of free learning tools for SQL - and it's dirt simple.

      "And yet a lot of them invent query languages/query languages similar to SQL. " - See, I think you're magically associating two classes of programmers. There are people, like myself that love the expressiveness of SQL over virtually any other language for data-set manipulation. Thus we would like as an optional to utilize SQL on even a
      • by einhverfr (238914)

        "But it seems like an open source parallel database would do a lot to silence many nosql critics" - you're not going to silence people that think of data as simple key-value pairs, or highly specialized full-text-searching (which is related to but independent of RDBMS activity).

        Simple key/value pairs work for some things. Most data cannot be managed reasonable as key/value pairs. And full text searches are entirely orthogonal. That involves searching through text rather than questions of semantic informa

  • Cassandra has the goods for high available and optimized for non-financial data.

    That said, I am amazed at how much time, money, and effort has gone into Twitter.

    Now a distributed scalable super duper database will keep track of who is pooping. http://poop.obtoose.com/ [obtoose.com]

Men love to wonder, and that is the seed of science.

Working...