Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Databases IT

A Tale of Two Databases, Revisited: DynamoDB and MongoDB 73

Questioning his belief in relational database dogma, new submitter Travis Brown happened to evaluate Amazon's Dynamo DB and MonogDB. His situation was the opposite of Jeff Cogswell's: he started off wanting to prefer Dynamo DB, but came to the conclusion that the benefits of Amazon managing the database for him didn't outweigh the features Mongo offers. From the article: "DynamoDB technically isn't a database, it's a database service. Amazon is responsible for the availability, durability, performance, configuration, optimization and all other manner of minutia that I didn't want occupying my mind. I've never been a big fan of managing the day-to-day operations of a database, so I liked the idea of taking that task off my plate. ... DynamoDB only allows you to query against the primary key, or the primary key and range. There are ways to periodically index your data using a separate service like CloudSearch, but we are quickly losing the initial simplicity of it being a database service. ... However, it turns out MongoDB isn't quite as difficult as the nerds had me believe, at least not at our scale. MongoDB works as advertised and auto-shards and provides a very simple way to get up and running with replica sets." His weblog entry has a few code snippets illustrating how he came to his conclusions.
This discussion has been archived. No new comments can be posted.

A Tale of Two Databases, Revisited: DynamoDB and MongoDB

Comments Filter:
  • Did he compare MongoDB to the correct product then? I'd love to have seen him also include Amazon SimpleDB [amazon.com].

  • by Anonymous Coward on Friday February 22, 2013 @07:07PM (#42986247)

    Nice astroturf. See here for a detailed analysis of why MongoDB is broken by design [hackingdistributed.com].

    • by stephanruby ( 542433 ) on Saturday February 23, 2013 @04:28AM (#42988205)

      Nice astroturf. See here for a detailed analysis of why MongoDB is broken by design [hackingdistributed.com].

      Speaking of which, the same back to you.

      Did you know the author of the article you pointed to is a competitor and recommends his NoSQL Database called HyperDex as a reasonable alternative (although, he doesn't state he's the developer for HyperDex, nor does he state the fact that HyperDex is still in alpha and doesn't work properly)?

      Nicely played Anonymous Coward!

      • by Anonymous Coward

        Actually, his connection to HyperDex is pretty clear on his pages. And if anything it puts him in a position of authority. I don't like Mongo pushing these kinds of blog posts with low information content and I really don't like Mongo maligning other hackers who have technical points. Ad hominem attacks show that Mongo has no technical comeback.

        • Actually, his connection to HyperDex is pretty clear on his pages.

          My mistake, I didn't read everything in the "About Me" box on the right of his blog, but still his connection could have been made clearer. This was a very-very long blog post, which I actually read completely, which he obviously crafted very carefully, and I only found out about his connection to HyperDex in the comments of others.

          And if anything it puts him in a position of authority.

          Why couldn't it be both?

          By being a competitor posting about MongoDB, he's in a position of authority, and he also has a conflict of interest possibly motivating his opinion.

          ...and I really don't like Mongo maligning other hackers who have technical points. Ad hominem attacks show that Mongo has no technical comeback.

          It w

      • The author of that article clearly references his association with HyperDex in the "About Me [palegray.net]" box in the top right area of the page. Following the "more..." link, you find another reference to HyperDex [palegray.net]. He's not exactly hiding his involvement in the project; how are you supposing people would miss it or somehow feel the urge to claim deceit on his part?

        I didn't know about HyperDex before reading the GPP comment, and might play with it this weekend. Maybe you should, too, especially since it's apparently a

  • by Anonymous Coward

    MongoDB is web scale [youtube.com]

    And... /thread.

    • by Anonymous Coward

      This application could easily have been done in mysql and maybe memcached

  • by Anonymous Coward

    The comments on the last story shows what a joke this was considered before.

    "No one cares. Stop click-baiting the buzzword Slashdot sub-sites. If we wanted to go to them we would do so voluntarily."

    "Having actually RTFA, it just enforces how poorly most programmers understand relational databases and shouldn't be let near them. It's so consistently wrong it could be just straight trolling (which given it's posted to post-Taco Slashdot, is likely)."

    "I think the author and his team failed the customer in this

  • by Anonymous Coward

    I wanted DynamoDB to work, but concluded that Mongo is a safer and more accessible place for my data.

  • by zmooc ( 33175 ) <zmooc@zmooc.DEGASnet minus painter> on Friday February 22, 2013 @07:26PM (#42986375) Homepage

    "in some strange way my brain had been conditioned to think of modeled data in a relational way"

    The relational model is not much more or less than the mathematically sound way of dealing with sets and relations between their items in ways that enforce and maintain consistency. There is no alternative to that. It's not merely the status quo, as the article states. Even when designing a datamodel for storage in a NoSQL database, the rules of the relational model are best taken into account.

    The only sound reason for deviating from the relational model and its rules is that your (reasonably priced) relational database server has shortcomings, typically related to dealing with large datasets in clusters, situations in which relational database solutions typically don't scale well and a compromise is needed.

    Note that NoSQL has its place and I have encountered and worked on projects in which there was just no alternative, but I wouldn't trust my precious data to any developer that chooses NoSQL over a proper datamodel for arguments other than those mentioned above, because they're bound to be wrong.

    I don't get how anybody educated in computer science fails to understand this.

    All hail Edgar F. Codd!

    • by 21mhz ( 443080 )

      Well said, sir. I started to exclude Slashdot from my regular browsing routine because of all the dumb shit going on here recently (like articles written by incompetents about NoSQL solutions they don't understand the purpose of), but comments like yours make me reconsider.

    • by martin-boundary ( 547041 ) on Saturday February 23, 2013 @02:32AM (#42987955)

      The only sound reason for deviating from the relational model and its rules is that your (reasonably priced) relational database server has shortcomings, typically related to dealing with large datasets in clusters, situations in which relational database solutions typically don't scale well and a compromise is needed.

      That's unfortunately incorrect. The Codd model is not as fundamental as you imply. It is a finite dimensional model, suitable for when your data is naturally representable as a finite number of attributes such as name, address, etc. If there are N attributes, then each record is representable as a point in an N dimensional cartesian product.

      Perhaps the simplest example where that assumption fails is when representing a free text document as a bag of words, which is a standard representation for information retrieval applications (eg the google web index). In this case, the natural data representation is infinite dimensional, ie there can be abitrarily many attributes in a document. In such applications, even defining meaningful schemas as done in RDBMS's is impossible.

      Google would not have amounted to anything had they tried to work with relational models.

      • Perhaps the simplest example where that assumption fails is when representing a free text document as a bag of words, which is a standard representation for information retrieval applications (eg the google web index). In this case, the natural data representation is infinite dimensional, ie there can be abitrarily many attributes in a document. In such applications, even defining meaningful schemas as done in RDBMS's is impossible.

        Document ID, word number, word. There's no reason why you'd need to repres

        • Document ID, word number, word. There's no reason why you'd need to represent a single document in a single row. In fact you'd probably not want to, because this structure allows for easier subexpression searching.

          You can represent literally every finite object dataset as a set of triples (object-number,variable-within-object-number,value), and in fact all you need is a single number to encode the pair (object-number,variable-within-object-number), using a Cantor enumeration [wikipedia.org] for example. This is a p

    • by Jonner ( 189691 )

      Unfortunately, the vast majority of people talking about databases don't know the difference between the relational model and SQL. They can't understand that limitations of SQL and SQL databases are not necessarily limitations of the relational model. They don't realize that certain features of SQL databases such as ACID have absolutely nothing to do with the relational model. It's very often unclear whether "NoSQL" is a reaction against SQL, the relational model, or features of many SQL databases such as A

  • by HalAtWork ( 926717 ) on Friday February 22, 2013 @07:46PM (#42986517)
    Mongo not just pawn in game of DB
  • by Anonymous Coward

    "Sure, it looks simple enough, but what about the 50th and 100th time you have to remodel your hierarchical data into different, but still hierarchical data?"

    The structure of data should reflect its semantic relationships, and these are necessarily pretty static (information wouldn't be much use if its meaning constantly changed.) Therefore, I find it hard to believe that the author's data semantics are changing rapidly enough to demand 50-100 remodellings. More plausible, I think, is that a great many of t

    • For clarification, I did not mean the 50th - 100th time you remodel the same data, I meant the 50th - 100th time you have to remodel different data. In retrospect, I should have said the 50th to 100th time you have to write code to TRANSFORM data. I used the wrong word.
    • We'd all like to think that the data model is 100% clear at the beginning of a project and comes from the heavens in perfectly formed 3NF.

      Reality bites. Needs change... the domain expands, human judgement is limited (there are multiple ways to model the same data, at least once you have tuples in your relational language... and we're not perfect), etc.

      I mean, ... you really only need triples so why bother with supporting operations on tuples?

  • He's still using "ironically" incorrectly, despite the fact that i defined it for him in the comments on last story. He saw and replied to my comment, acknowledging his error.

    http://www.youtube.com/watch?v=WY_amJ0YZrM [youtube.com]

    "Ironically, it was a session at AWS Re:invent that initially scared me away from MongoDB."

    I'm not sure which word he wants, but "ironically" isn't it.

    • The article was written before I received my vocabulary lesson. I vow to never use this word again on slashdot, as I clearly do not understand its meaning.
    • The author's use looked reasonable to me, and in agreement with common usage, so I looked into it a bit more, and it is not so simple.

      Wikipedia's entry on irony includes these statements:

      The American Heritage Dictionary's secondary meaning for irony: "incongruity between what might be expected and what actually occurs". This sense, however, is not synonymous with "incongruous" but merely a definition of dramatic or situational irony.

      Situational irony: This is a relatively modern use of the term [citation ne

      • The author's use looked reasonable to me, and in agreement with common usage, so I looked into it a bit more, and it is not so simple.

        Wikipedia's entry on irony includes these statements:

        The American Heritage Dictionary's secondary meaning for irony: "incongruity between what might be expected and what actually occurs". This sense, however, is not synonymous with "incongruous" but merely a definition of dramatic or situational irony.

        Situational irony: This is a relatively modern use of the term [citation needed], and describes a discrepancy between the expected result and actual results in a certain situation.

        The free online dictionary has this usage note:
        The words ironic, irony, and ironically are sometimes used of events and circumstances that might better be described as simply "coincidental" or "improbable," in that they suggest no particular lessons about human vanity or folly. Thus 78 percent of the Usage Panel rejects the use of ironically in the sentence 'In 1969 Susie moved from Ithaca to California where she met her husband-to-be, who, ironically, also came from upstate New York.' Some Panelists noted that this particular usage might be acceptable if Susie had in fact moved to California in order to find a husband, in which case the story could be taken as exemplifying the folly of supposing that we can know what fate has in store for us. By contrast, 73 percent accepted the sentence 'Ironically, even as the government was fulminating against American policy, American jeans and videocassettes were the hottest items in the stalls of the market', where the incongruity can be seen as an example of human inconsistency.

        The author describes a situation that involves situational irony as defined by Wikipedia: at an event promoting the use of MongoDB, he sees something that dissuades him for using it, at least temporarily. In fact, there is not just a discrepancy between expected and actual results, but an opposition. Furthermore, it does not fall foul of the 'mere coincidence' rule in the usage note. FWIW, I would not avoid using 'ironically' here.

        And Wikipedia is wrong. Irony has to do with the literal intention of words being different from the meaning intended by the person using them. That's it. If the genuine intent is the same as the literal intent, then there is no irony. Even if an eventual outcome seems humorous, incongruous, or unexpected when compared to the literal intent, there is no irony.

        Telling an actor to "break a leg" to wish them well is ironic.
        An actor actually breaking their leg after being told that is not ironic.
        If the spea

        • ... irony ... irony ... fucking horse shit ... ironic ... gaggle of shitwicks ... irony

          What a terrible situation you are in! You know something, but nobody gives a shit. It must be incredibly galling, and I am deeply sympathetic (would you call that irony, given that I am actually amused by your pedantic petulance?)

    • From the Dilbert Newsletter [freerepublic.com]:

      I've also learned recently that "ironic" means anything you want it to mean. Example:

      Me: "I heard that Bob was killed by a meteor."

      Induhvidual: "Wow. That's ironic."

      Me: "Why is it ironic? Was he an astronomer?"

      Induhvidual: "No, it's ironic because, you know, what are the odds?"

      Me: "So anything unlikely is automatically ironic?"

      Induhvidual: "No, it also needs to be bad."

      Me: "This conversation is ironic."

  • by EmperorOfCanada ( 1332175 ) on Friday February 22, 2013 @09:48PM (#42987145)
    I am sufficiently freaked out by Amazon pricing that I just can't use it. I have two simple fears. One is that I screw up with my code and do a bazillion transactions per second that result in either a ridiculous bill or exhausting my budget resulting in my service having to shut down. Secondly I am scared that some kind of DDOS would blow through my life savings. I much prefer the control of having my own dedicated servers with extremely fixed costs. It might not be the most efficient scheme but with the service I use they can throw extra servers on pretty quickly and I can set them up in a flash. So yes I am potentially hosed if Opera or Slashdot feature my work but I sleep like a baby knowing that amazon won't be billing me a house tomorrow. I even tried out their free service and while loving it was deathly afraid of getting billed.

    If my sites really grew I would even contemplate going a step further and running my own physical servers. The joy of being able to reach out and jam USB sticks into them would be pretty good.
    • by MariusBoo ( 883340 ) on Saturday February 23, 2013 @03:23AM (#42988033)
      Actually there is no reason to be freaked out by their pricing. Just buy the number of instances that you need (one for example) and don't set up any auto-scaling. This way if you get slashdoted your instance will just fail as a normal server would and you will incur no charges. Also no service...

      I have worked with amazon aws and with dedicated server providers. Amazon has been much faster and reliable.

      Furthermore, the way to protect your life savings from a potential business failure is not through inefficient procurement practices. Just incorporate, otherwise you will be open to all kind of risks (must of it unknown to you, and uninsured)
      • But what about bandwidth charges? With my present service if I get slashdotted the server will be overwhelmed but I have no expanding charges with bandwidth. On Amazon more bandwidth = more money, way more bandwidth=way more money.

        So maybe the instance falters but the bandwidth keeps being used.

        And I agree setting up a server on Amazon was at first confusing but once I figured it out quite easy. With my present system the easiest and fastest way to set up a new server is to phone them. But as I say, my
    • If my sites really grew I would even contemplate going a step further and running my own physical servers. The joy of being able to reach out and jam USB sticks into them would be pretty good.

      Except that, it does suck to have to service your own servers 24 hours a day / 7 days a week (and your own backup generators, etc.)

  • apples and oranges (Score:5, Interesting)

    by bennini ( 800479 ) on Saturday February 23, 2013 @03:18AM (#42988029) Homepage
    We are heavy users of MySQL (Percona) and MongoDB at my work. Recently I have been researching DynamoDB because of a specific use-case. A side project I run uses Google App Engine with Datastore (aka bigtable) for persistence.

    Comparing DynamoDB with MongoDB is like comparing apples and oranges. The only thing the two share in common really is the fact that neither supports SQL (and for that reason are called NoSQL databases). Their intended purpose is completely different which is why I found it strange that the author of the original Slashdot story would pit them against each other the way he did.

    If DynamoDB is to be compared against another datastore, the most similar alternative would probably be Google App Engine's Datastore/big table.

    Similarities between DynamoDB and GAE Datastore
    • both use "schema-less" table structures for storing items (i.e. two items in a single table can have different columns)
    • both support relatively simple primary keys (GAE only allows a single column PK, Dynamo allows a pseudo-two-column PK)
    • both encourage only efficient queries (GAE forces it, Dynamo allows full table scans but they are highly discouraged)
    • both support list properties (a column with multiple string values for example)
    • both are hosted "in the cloud" and scale horizontally almost infinitely
    • both are billed based on reads/writes + total stored data (Dynamo has an extra dimension to cost which is throughput)
    • both have very limited support for referential integrity between items (GAE supports "embedded" entities and recently added basic relationships but nothing like many to many)
    • GAE supports transactions across entities within the same group (i.e. on the same server) and recently added support for XA transactions (tx's across entities in different groups/on different servers). Dynamo does not have transactions but it supports some atomic operations on an individual item like compare and get.

    Differences between DynamoDB and GAE Datastore
    One major difference between GAE Datastore and DynamoDB is that GAE supports single and multi property indexes while Dynamo does not support indexes at all aside from a table's primary key. GAE datastore supports efficient queries that use the indexes (if you try to run a query that does not use an index it will fail) along with some basic predicates like equality, inequality, greater than and less than expressions, etc. In DynamoDB, if you want an index, you have to build it yourself in a supplementary table.

    GAE Datastore Self-Merge Joins
    GAE datastore also supports what they call "self-merge joins" which are super powerful. I don't know if any other schema-less datastore has this.

    DynamoDB Purpose
    The main reason one would use DynamoDB is when they need scalable throughput; in other words, when your needs for write and/or read speeds fluctuate drastically and when you know you will occasionally spike to extremely high throughput requirements. For times when you expect to have huge throughput for writing, you can pay to scale for that small period of time and then you can reduce your costs by throttling down to a more sane limit. You can run MapReduce jobs over DynamoDB tables using Amazon Elastic Map Reduce. And you can also copy a DynamoDB table into an Amazon Redshift "warehouse"; once the data is copied into Redshift you can run efficient SQL queries over it and Redshift can efficiently do that over petabytes worth of data.

    MongoDB
    MongoDB, on the other hand, is a "schema-less," document oriented database that is good for organizing clumps of information as a single "item" in the datastore. So for example, you can have a single book document which contains nested information about its authors, keywords, reader reviews, and statistics about word usage in the book....all in a single mondodb "record." This is essentially impossible in DynamoDB (unless you do what the previous article's author did by

    • by Anonymous Coward

      Someone who actually knows what they're talking about joins the conversation.

We are Microsoft. Unix is irrelevant. Openness is futile. Prepare to be assimilated.

Working...