Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Oracle Databases Upgrades

Oracle Promises 100x Faster DB Queries With New In-Memory Option 174

Hugh Pickens DOT Com writes "ZDNet reports that Oracle's Larry Elison kicked off Oracle OpenWorld 2013 promising a 100x speed-up querying OTLP database or data warehouse batches by means of a 'dual format' for both row and column in-memory formats for the same data and table. Using Oracle's 'dual-format in-memory database' option, every transaction is recorded in row format simultaneously with writing the same data into a columnar database. 'This is pure in-memory columnar technology,' said Ellison, explaining that means no logging and very little overhead on data changes while the CPU core scans local in-memory columns. Ellison followed up with the introduction of Oracle's new M6-32 'Big Memory Machine,' touted to be the fastest in-memory machine in the world, hosting 32 terabytes of DRAM memory and up to 384 processor cores with 8-threads per core."
This discussion has been archived. No new comments can be posted.

Oracle Promises 100x Faster DB Queries With New In-Memory Option

Comments Filter:
  • by rossdee ( 243626 ) on Monday September 23, 2013 @05:06AM (#44922365)

    Especially upwind, but not 100x

    still Emirates Team NZ only need to win one more race..to take back the Americas cup

    • Anybody know how much this costs?

      (It's Monday morning and I need a good laugh...)

    • Well, they can always cheat [bizjournals.com] to go faster...

    • Oracle CEO promises faster than light drive

      NASA promises to stop using Russians for manned space flights

      Which of these is more believable?

      • by armanox ( 826486 )

        If Larry Elison thought it would be profitable, we'd get FTL drive....

        • I'm pretty sure that isn't profitability that is stopping us from getting FTL drive.

          Maybe it's xenophobia. Last time we did that in the movies, a bunch of pointy eared illegal aliens dropped by.

  • by shri ( 17709 ) <shriramc@@@gmail...com> on Monday September 23, 2013 @05:19AM (#44922405) Homepage

    " hosting 32 terabytes of DRAM memory and up to 384 processor cores with 8-threads per core. "

    Let me be the first to point out the Beowulf possibilities with a few hundred of these clustered together. :)

    • by sjwt ( 161428 )

      Finally a cluster that can Slashdot slashdot.org

      • by ae1294 ( 1547521 )

        Finally a cluster that can Slashdot slashdot.org

        Not running oracle it can't!
        {I never worked with it. They seemed to much like a drug dealer when they came around, they had a building here in SE-VA in Chesapeake}

    • by nmb3000 ( 741169 )

      hosting 32 terabytes of DRAM memory and up to 384 processor cores with 8-threads per core.

      Good news everyone! There's finally a machine that can run a Minecraft server without throwing a java.lang.OutOfMemoryError!

  • by Anonymous Coward on Monday September 23, 2013 @05:24AM (#44922413)

    With increasing surveillance on American citizens such database will provide security forces with instant profile of each person. Let's combine that with license plate scanning, cell phone tracking, sexual preferences and health records.

    Now we can sleep well at night, our children are safe.

  • by VortexCortex ( 1117377 ) <VortexCortex AT ... trograde DOT com> on Monday September 23, 2013 @05:26AM (#44922425)

    "Big Memory Machine"... So, they finally built Deepthought?

    In-memory IO is grand, when that's your're bottleneck. Mine tends to be in the network level, so I use a local daemon for query result caching at the application level as "in-memory" speedup. The speedups are nice, but pricey. Color me unimpressed -- that's pink, BTW; I'm a Caucasoid your colors may vary, but only up to VARCHAR(20);

    Uhg. Is "in memory" now just another buz-word? I guess we've come full circle back to Mainframe? Big memory banks are faster and better for a while, but then the bandwidth goes up and the price, reliability and scalability will favor distributed systems (as currently). I wonder which phase of the cycle quantum computing will favor: distributed / localized? You have to take into consideration your user distribution too...

    So, eventually you'll want a hybrid system where the memory is distributed and cloned at each query-able interface, but still maintaining the entire dataset "in memory"...
    SELECT * FROM earth WHERE answer LIKE "everything";
    ...
    42 rows returned

    • Re: (Score:3, Funny)

      by Anonymous Coward

      If you used ECC memory, the answer wouldn't have been 42.

    • by ae1294 ( 1547521 )

      I wonder if all that RAM has ECC or not... I am betting not....

    • Re: (Score:3, Funny)

      That should be Varchar2(20).
    • The cloud is effectively a reinvention of the mainframe. IBM should sue them.

  • by hebertrich ( 472331 ) on Monday September 23, 2013 @05:30AM (#44922437)

    As long as they keep with the extorsion techniques Oracle is famous for they can keep their hardware.

  • by pla ( 258480 ) on Monday September 23, 2013 @05:38AM (#44922461) Journal
    First, let me say that I would love to have a table option to keep a particularly heavily-hit table always in memory.

    This ain't it.

    From TFA, "Maintaining those indexes is expensive and slows down transaction processing. Let's get rid of them," Ellison remarked. "Let's throw all of those analytic indexes away and replace the indexes with in-memory column sort."

    This merely minimizes the penalties of poor indexing and RBAR by making complete table scans on arbitrary columns faster. Apparently Mr. Ellison has forgotten his algoithmics and combinatorics - Oh, wait, no he didn't, he dropped out as a sophmore. Pity, because had he stayed, he would have learned that even with a 1000x slower storage medium, an O(log N) algorithm (index seek) will eventually beat an O(N log N) algorithm (column sort).

    Thanks, Larry, but you want to make Oracle faster? Remove cursors from the core language, and although that alone won't "fix" it, you'll see all the hacks who can't think in set-based logic drop out overnight.
    • by homb ( 82455 ) on Monday September 23, 2013 @06:27AM (#44922621)

      From TFA, "Maintaining those indexes is expensive and slows down transaction processing. Let's get rid of them," Ellison remarked. "Let's throw all of those analytic indexes away and replace the indexes with in-memory column sort."

      This merely minimizes the penalties of poor indexing and RBAR by making complete table scans on arbitrary columns faster. Apparently Mr. Ellison has forgotten his algoithmics and combinatorics - Oh, wait, no he didn't, he dropped out as a sophmore. Pity, because had he stayed, he would have learned that even with a 1000x slower storage medium, an O(log N) algorithm (index seek) will eventually beat an O(N log N) algorithm (column sort).

      I think you misunderstand the way columnar databases work. They are not doing a column sort the way you think. The column itself is an index.
      Of course the inanities coming out of Ellison's mouth don't help explain things correctly. No Larry, you don't do away with indexes. You mostly store indexes on everything, automatically.

      Thanks, Larry, but you want to make Oracle faster? Remove cursors from the core language, and although that alone won't "fix" it, you'll see all the hacks who can't think in set-based logic drop out overnight.

      Can't argue there!

      • by rev0lt ( 1950662 )

        (...) Oh, wait, no he didn't, he dropped out as a sophmore (...)

        If a guy had to go the university to learn algorithmics (or how the O(n) notation works), I'd be pretty f*****.

        • by Luyseyal ( 3154 )

          No kidding. I was just looking through some vendor PL/SQL and there you have it: a cursor looping through rows to find the most recent date in a column instead of using a dad-blasted ORDER BY clause and limiting rows to 1 result. What the...???

          -l

          • Look you can't trust those database queries. The syntax is so simple you can't possibly be certain that it's not making a mistake without checking every possible row. It's not like there's any way to be sure the database doesn't have invalid data!

            I recently looked at a stored proc used on our system. It uses nested cursors. And the cursors are all defined as "scrollable" meaning they take far more memory than normal, even though the procedure never does anything other than "fetch next". Oh, and the ent

          • Once I saw a system running "SELECT * FROM customers,orders" to find the last customer and order number for a week end sales report, because the only SQL the developer knew was "SELECT * FROM". It was annoying to track down because the server would crash, running out of memory for that join, before it was logged it as a slow query. There was no evidence of what happened unless you caught it while it was running, and the damn thing fired at 4AM on a weekend morning.

    • Re: (Score:3, Insightful)

      by oranGoo ( 961287 )

      From TFA, "Maintaining those indexes is expensive and slows down transaction processing. Let's get rid of them," Ellison remarked. "Let's throw all of those analytic indexes away and replace the indexes with in-memory column sort." This merely minimizes the penalties of poor indexing and RBAR by making complete table scans on arbitrary columns faster. Apparently Mr. Ellison has forgotten his algoithmics and combinatorics - Oh, wait, no he didn't, he dropped out as a sophmore. Pity, because had he stayed, he would have learned that even with a 1000x slower storage medium, an O(log N) algorithm (index seek) will eventually beat an O(N log N) algorithm (column sort).

      RTA - the improvement is there specifically for real time analytic workloads. In these kind of workflows the optimal algorithm is O(n) in general case and indexes are useless (query optimizing engine will always choose scans as you need to visit a lot of data). You might know a thing or two about algorithms, but you should brush up on problem analysis 101.

      Other mistakes in logic: Index seek and column sort are not different algorithms for the same task so comparing them brings little insight (without cons

      • Re: (Score:2, Informative)

        by Anonymous Coward

        This leads you to nonsensical claim that O(log N) will eventually beat or be equal to O(N log N). It is not eventually, the first will be always faster or equal.

        Er... no, not if you think about constant terms. Consider 8 * (log N) + 3 > 2 * ( N log N) for N=4 for example. Big O notation is about the asymptotic performance for large N. It's quite common that the most efficient algorithms for large data sets may provide no benefit for trivial cases and may in fact be slower.

        • by Jorl17 ( 1716772 )
          woosh!
          • by pla ( 258480 ) on Monday September 23, 2013 @09:55AM (#44923971) Journal
            Okay, you'll have to 'splain it too me as well then, because I don't see the joke (and only refrained from posting substantially the same response because an AC beat me to it).

            Memory runs roughly 1000x faster than disk (it can get down to around 50x on an array of SSDs, but up to 100,000x for random seeks across physical platters). Holding all else equal, 1000*O(log N) will take longer than 1*O(N log N) until N=1000, despite the lower time complexity. Additionally, the AC made a good point about the relative constant factor of the algorithms themselves, in that a binary search of a sorted list has virtually no overhead, while a good general purpose sorting algorithm does.


            And all that said, yes, I see now that I made an error because this change applies to columnar rather than row-oriented data; O(log N) will still eventually beat O(N), however.
            • by Jorl17 ( 1716772 )
              My "woosh!" was in agreement with the AC's response! I totally agree with it, and find it somewhat ironical that parent, the one with the mistake, stated that gp should "brush up on problem analysis 101".
  • whats OTLP?

  • by SigmundFloyd ( 994648 ) on Monday September 23, 2013 @06:16AM (#44922587)

    14 hours ago, itnews.com.au runs a story (promptly picked up by /.) about how the social networks are staying with MySQL [itnews.com.au]. In the article, it is suggested that the switch to MariaDB by some Linux distros is a "political move", and that Google's switch might be a retaliation against an unrelated lawsuit from Oracle. Also, it's mentioned (twice, with the same wording) that the Mozilla foundation is "upgrading from MariaDB to MySQL 5.6" (emphasis added).

    7 hours ago, itnews.com.au runs a story (promptly picked up by /.) about how Oracle's 12C database will be 100x faster, despite the fact that we only have Oracle's CEO word for it.

    Now that's what I call an Oracle-friendly site (or two?)

    • MySQL is a piece of shit. MongoDB and PostgreSQL, depending on the data layout (MongoDB if you're looking for an indexed XML-YAML-JSON type of data with reliable data integrity in a cluster; PostgreSQL if you're looking for an indexed CSV with good replication and only minimal probability of basically silent data loss in a fail-over scenario).

      • MongoDB
        reliable
        data integrity
        in a cluster

        You mean if you run 3 copies of same database and constantly compare them with each other so in case of discrepancy you can at least guess which one is correct?

        • by rtaylor ( 70602 )

          MongoDB makes a great caching layer.

          Write your data to Pg into a permanent data-store. Use a trigger to push the data into MongoDB (foreign data wrapper).

          Enjoy the query benefits of Mongo and the reliability of Pg with the caveat that you need to occasionally rebuild your cache (MongoDB copy).

          • Writeable foreign data wrappers for PostgreSQL just appeared in 9.3, released a few weeks ago. I'm looking forward to seeing the hybrid storage approaches people build with that, with triggers pushing into a FDW being just one of the possibilities. All sorts of neat data queuing approaches are possible if you also combine that with the new background workers [postgresql.org] interface. You don't even need to make the FDW write synchronous. When eventual consistency is good enough for your data set, you can have the trig

        • by bluefoxlucid ( 723572 ) on Monday September 23, 2013 @01:22PM (#44926287) Homepage Journal

          No, I mean MongoDB will take a 3 database cluster and let you "Replica Acknowledge" a transaction with "Majority" count. Once it hits 50%+1 servers, it's 100% guaranteed solid unless you lose both servers. If both servers suffer a power drop at that point, the last server refuses to accept writes; when those servers come back, they will replay their oplog back to the last server to synchronize it. There's one flaw here: there's no "Replica Journal Acknowledge", so it's theoretically possible to lose that transaction anyway; both servers have to suffer a system failure (power drop, kernel panic) within 100mS of receiving the operation, since they write out their data to disk every 100mS. In practice this is extremely unlikely.

          That means once you've sent it and gotten back that it's written, it is written. You'd have to lose both (or more--3 servers in a cluster of 5, etc.) servers' power or hard drives (corruption, failure) before the data is propagated further.

          By contrast, Percona and MariaDB have XtraDB. XtraDB does optimistic locking: in normal autocommit, the transaction might get rolled back silently--it will write successfully to one server and return success, but if another server simultaneously gets a write that conflicts and starts propagating it then the transaction will be silently rolled back (i.e. undone, removed, lost, failed). With BEGIN-COMMIT transactions, you may get a Deadlock on "COMMIT" and then you're informed that it did in fact roll back the transaction and you must re-submit (i.e. do this if you actually care about durability of the data). With autocommit, as well as with any transactions (even explicit COMMIT) on MySQL master-slave replication or PostgreSQL WAL replication, you may in fact be informed that the transaction is 100% committed and then have that server FAIL and the slave comes up without that transaction--unavoidable silent data loss.

          The failure mode expressed by MySQL master-slave replication and PostgreSQL WAL replication [postgresql.org] in the default asynchronous streaming replication mode is the same failure mode as with "Journaled" write concern in MongoDB. When running "Journaled" rather than "Replica Acknowledged," you write to exactly one server and are told it's committed when it's written to disk--it's durable on that server, but not necessarily replicated. If that server power drops and comes back up, it may find new operations have made its non-replicated operations invalid; it will then silently roll those back.

          Therefor, in cluster layouts, it is possible for MongoDB to have a negligible reliability advantage over PostgreSQL's most common replication methods. PostgreSQL has settings that make up that last bit of reliability, putting it roughly on par with MongoDB. MongoDB has a guaranteed "It has reached enough servers that it is valid on the cluster unless God hates you" write concern by which the data is likely to actually be there if it tells you it's there, unless a very specific subset of servers experience a catastrophic failure in an extremely small (tenths of a second) window--a subset large enough to take down your entire cluster.

          Short version: MongoDB allows you to, on a per-query basis, write data into the database at any level of reliability that MySQL and PostgreSQL provide. Single-server Journaled (WAL log shipping, WAL asynchronous streaming, MySQL master-slave replication), multi-server Replica Acknowledged (PostgreSQL WAL synchronous streaming), and a single-server "Acknowledged" mode that is faster but gives a weaker data durability guarantee (transaction is valid, not yet to disk, and not replicated).

          *"PostgreSQL streaming replication is asynchronous by default. If the primary server crashes then some transactions that were committed may not have been replicated to the standby server, causing data loss. The amount of data loss is proportional to the replication delay at the time of failover."

          • PostgreSQL does have a synchronous commit mechanism mechanism too, which requires data be durably written to two nodes. It has a feature that as far as I know is unique to its implementation too: synchronous_commit [postgresql.org] is a per-transaction behavior. You have to specifically setup a synchronous standby server, but once it's there you can adjust the durability level you want at each commit. So you can pay for important data be fully synchronous, while streaming by less important things in standard mode, where

            • PostgreSQL has settings that make up that last bit of reliability, putting it roughly on par with MongoDB.

              And...

              Short version: MongoDB allows you to, on a per-query basis, write data into the database at any level of reliability that MySQL and PostgreSQL provide. Single-server Journaled (WAL log shipping, WAL asynchronous streaming, MySQL master-slave replication), multi-server Replica Acknowledged (PostgreSQL WAL synchronous streaming), and a single-server "Acknowledged" mode that is faster but gives a weaker data durability guarantee (transaction is valid, not yet to disk, and not replicated).

              • Look at that, MongoDB write concerns [mongodb.org] have been improved since I last checked. Thanks for usefully pointing to that as a reference, instead of, say, quoting yourself just to be smug.

                • As I was comparatively defending MongoDB, which was facing criticism for being "unreliable" compared to traditional SQL-interface fixed-column row-based relational database software, I felt the need to point at the documentation for PostgreSQL (a top tier product) while explaining the potential for data loss in its replication configurations. Trying to spout a bunch of fluff about how PostgreSQL typically has a window of data loss in replication configurations aside from an expensive synchronous setting w

          • The lucid in your handle is valid; yours is the first explanation on making sure that data has gotten to where it's supposed to that I've been able to follow. (This stuff is so far from anything I know about as to be beyond laughable but I'm curious nonetheless.) Thank you.

            However I may not have followed as well as I think, so a question, if I may: for the "extremely unlikely" case in the first para, would you know the transaction has been lost and be able to re-do it (i.e., rather than finding out later

            • "Extremely unlikely" comes out because the data's been replicated enough that it should be propagating further. Notably, 50%+1 nodes, so 2 nodes in a 3 node cluster, or 3 of 5, or 5 of 7. The data is also flushed to disk every 100mS ("j" write concern on a single node flushes the journal immediately and won't acknowledge until it's on disk). Once the data is to disk, it's 100% guaranteed to be there if the server experiences a recoverable system failure (power drop, kernel panic); it will be lost if it

              • Wow, fantastic, and thank you.

                And yep, I followed that. Your posts give me a much better grounding when reading about datacenter, server, and DB stuff, and help me separate wheat from chaff. Thanks again for sharing your knowledge and having the patience to explain it.

    • Now that's what I call an Oracle-friendly site (or two?)

      Consider it a testament to corporate PR:
      Get journalists to the conference and then pump them full of press releases.

      Brett Winterford travelled to OpenWorld as a guest of Oracle.

    • Or Oracle is hosting a large event which generally produces a lot of news related to the host.

      You will also probably see a good bit of Apple News during Apple World Wide Developer Conference. Probably a good deal of Microsoft news during BUILD and a good deal of iD news during Quake Con. Companies have discovered that hosting their own conferences allows them to flood the news for a day instead of attending group conventions like GDC or CES and having to share the headlines with a 100 other companies.

  • by Rosco P. Coltrane ( 209368 ) on Monday September 23, 2013 @06:21AM (#44922603)

    Just you hope Oracle maintains the batteries properly, especially since an emergency save-to-disk is going to take more than a few minutes...

    • lol, you should see our battery room. We've got a 10 story building and every single computer in it is on the UPS. And that is just there to keep everything up long enough for the two 500hp natural gas generators can kick in. They feed off both the city feed as well as have their own separate tanks. Short of a meteor hitting the building the power will never go out. And we're pretty much small time.

      • Short of a meteor hitting the building the power will never go out

        A small fire in a strategic place... Or perhaps, just some minor failure of something you thought was separated enough.

        • Maybe not do they have mo e than one feed for the grid? though this sounds like a telco exchange set up (central office for you colonials)
      • by dbIII ( 701233 )

        two 500hp natural gas generators can kick in.

        Two is definitely a good idea. One workplace I was at had a 20MW generator that was started for testing every month for thirty years without fail and then just could not get going the day it was needed.

      • by ae1294 ( 1547521 )

        I'm so glad you just said that...
        A pox upon your data center! Curses, unknowable horrid darkness will be at your throat, unseen and without form.... Soon... May it consume your data and your soul for my Master waits for the...

    • Just you hope Oracle maintains the batteries properly, especially since an emergency save-to-disk is going to take more than a few minutes...

      Obviously, if you are in the market for a machine with 32TB of RAM, your problem is probably important and/or expensive enough to not skimp on (relatively) cheap details like backup power; but, so long as you were willing to stop what you were doing and dump to disk, 32TB wouldn't be too bad.

      Magnetics can't do random I/O for shit; but even the cheap consumer crap plays surprisingly nicely with linear reads or writes (high areal density will do that for you).

      You would never actually want to shut down

    • by dbIII ( 701233 )
      Here's one supplier of many that can feed a mountain of such machines
      http://cfaspower.com/11MW-20MW_CTG.html [cfaspower.com]
      Note the age of some of the things. This stuff is not rocket science, it's 1950s jet engine science!
    • by larien ( 5608 )
      *facepalm*

      No, you don't use main memory as a write cache, that would be idiotic. You use it for read cache. Any updates will be written to disk (and memory) before acknowledging to the client.

      In any case, it doesn't matter how good your batteries/UPS are if your server or DB software crashes for any reason and you have uncommitted writes in memory, hence why caching writes on a database is a Bad Idea (TM).

  • Now if we just put a few of those in a beowolf cluster...

  • by Skinkie ( 815924 ) on Monday September 23, 2013 @06:53AM (#44922725) Homepage
    Columnstore databases such as MonetDB and their commercial spinoff Vectorwise [wikipedia.org] (now Actian) already showed [tpc.org] this can be achieved with open source and proprietary code.
  • So a 32GB RAM stick costs ~$1000. 32TB of RAM would cost $1,024,000. Given that RAM is about 1/4 of typical system build costs. This system costs about $4M raw. Oracle's markup is easily 1000% so $40M for this beast. And then the yearly licensing cost for Oracle at $2500/core: ~$1M.

    At that cost you could probably develop your own MemCache/BDB cluster.

  • So...in memory database = a fancy new term for a database that's (essentially) loaded onto a giant ram-disc?

    As an added bonus I'm pretty sure that option would save you a mint in licensing fees.

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...