Forgot your password?
typodupeerror
Databases Programming Software Data Storage IT

The 1-Petabyte Barrier Is Crumbling 217

Posted by CmdrTaco
from the so-much-data dept.
CurtMonash writes "I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."
This discussion has been archived. No new comments can be posted.

The 1-Petabyte Barrier Is Crumbling

Comments Filter:
  • by Anonymous Coward on Monday August 25, 2008 @08:42AM (#24735487)

    Since 500GB drives, this has been a reality. A couple of companies started selling petabyte
    arrays at about the time those drives were
    established.

  • by Anonymous Coward on Monday August 25, 2008 @08:44AM (#24735515)

    They have many towns now with less than 50k people completely photographed, every street in high res. That has to be well over 1-petabyte, though I doubt it's all in one location, must be distributed?

  • Re:I am confused !! (Score:1, Interesting)

    by n3xg3n (994581) on Monday August 25, 2008 @08:49AM (#24735573)
    0.009 1 Library of Congress = 10 Terabytes = ~0.009 Petabytes
  • No big news here.... (Score:5, Interesting)

    by edwardd (127355) on Monday August 25, 2008 @08:49AM (#24735577) Journal

    Take a look at almost any large financial firm. The email retention system alone is much larger than a petabyte, and that's just dealing with the online media, not including what's spooled to tape. Due to deficiencies in RDBMS ssytems, each of the large firms usually develop their own systems for managing the archival system on top of the database.

  • Oh, come on. (Score:5, Interesting)

    by seven of five (578993) on Monday August 25, 2008 @08:50AM (#24735583) Homepage
    Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.
  • by houghi (78078) on Monday August 25, 2008 @08:53AM (#24735607)

    This is intended as a joke, I asume, but it also brings up the fact that it will be different sort of data that is now collected.

    When I look at CRM systems, they used to contain basically the address and perhaps logs from calls they made to the call center. Now whole phone conversations are logged as well as faxes and letters that are scanned, together with images and video that is available.
    Faxes and letters used to have only a reference number and you could look them up in a file cabinet.

    So even though there is not that much more data collected, (things were already available) they are now all put in the database. Where it used to be an entry 'customer was extremely angry and cursed a lot' it now saves the mp3 for all eternity (where legal).

    So yes, the HD space it takes is bigger and thus the amount is bigger, yet it does not automaticaly mean that sort of data is bigger. e.g. do we suddenly have shoesize or other data available? Could be but it also could be that we just have different file formats we now save in the databse.

  • by cjonslashdot (904508) on Monday August 25, 2008 @08:57AM (#24735641)
    I remember encountering a 1+ petabyte database 10 years ago: it was the database to record and analyze particle accelerator experiment data at CERN. And it was built using a commercial object database - not relational. Oh but wait - the relational vendors have told us that OO databases don't scale....

    That was ten years ago.
  • by petes_PoV (912422) on Monday August 25, 2008 @09:09AM (#24735751)
    or more correctly, restore time.

    Any organisation that wishes to be classed in any way professional knows that the value in it's databases has to be protected. That requires them to have the means to recover the data if something bad happens. A hot-mirrored copy is simply not good enough (one corruption would get written to both copies).

    As a consequence, the size of commercial databases is limited by the amount of time the organisation is willing to have it unavailable while it is restored, in the case of a disaster, or the time taken to create/update secure, offline, copies.

    Not by intrinsic properties of the database or host architecture

  • by Anonymous Coward on Monday August 25, 2008 @09:09AM (#24735753)

    ... DB design and old data that should be purged. Color me unimpressed.

    I'm convinced now that regardless of attempted discrimination, HUMANS are pack-rats. THAT I can deal with, as people can be trained to actually throw shit away. The problem is when lawyers get involved in the matter. Yes, most of the shit we have today in the corporate world we are FORCED to keep due to some insane lawsuit and follow-up "fix-it-forever" law that calls for us to keep a copy of every damn thing that flows electronically for the next 7 - 70 years.

    Could you almost call it corruption? Yes, I can. The similarities between supply and demand feeding the corruption of oil companies can also be seen in data storage markets. Hard drives probably wouldn't be eclipsing 80GB if it were not for laws driving it that way. New personal computers with almost a terabyte of storage, yeah like Grandma is ever gonna fill that up. Give me a break.

  • How is this news? (Score:5, Interesting)

    by Dark$ide (732508) on Monday August 25, 2008 @10:27AM (#24736617) Journal
    We've had petabyte databases on mainframes for a good couple of years. DB2 v9 on zSeries has two new tablespace types that make managing these humungous databases much easier.

    So it may be news for the PC world but it's bordering on ancient history on IBM mainframes.
  • by littlewink (996298) on Monday August 25, 2008 @11:12AM (#24737285)
    You are mistaken. While certainly almost everything (right or wrong) has been said at some time by someone, nobody respectable who knew what they were doing ever claimed that object-oriented databases would not scale.

    In fact OO and similar (CODASYL, network-style, etc. ) databases were used and continue to be used very heavily in applications where relational database do not scale.

  • Re:Oh, come on. (Score:5, Interesting)

    by Alpha830RulZ (939527) on Monday August 25, 2008 @12:02PM (#24737993)

    Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.

    A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.

    It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.

  • by blahplusplus (757119) on Monday August 25, 2008 @12:58PM (#24738789)

    "they used to contain basically the address and perhaps logs from calls they made to the call center. Now whole phone conversations are logged as well as faxes and letters that are scanned, together with images and video that is available."

    Reminds me of David brin's Transparent society

    http://www.davidbrin.com/tschp1.html [davidbrin.com]

    http://www.amazon.com/Transparent-Society-Technology-Between-Privacy/dp/0738201448/ [amazon.com]

  • by mcrbids (148650) on Monday August 25, 2008 @04:23PM (#24741815) Journal

    On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.

    Yep. That's exactly it. $200 today buys a 1 TB drive. $200 a few years ago bought a 1 GB drive. As the price has fallen the value of the HDD has risen relative to its cost. Those archive directories and development junk aren't being deleted because they have value. Sure, it's enough value to justify keeping them around when a 1 GB drive costs $200, but they are worth keeping around with a 1 TB drive costs that much.

    They aren't "doing nothing" - they just aren't doing enough that it's worth keeping it until the price drops enough.

    All of this is making the 1 TB drive considerably more valuable than the 1 GB drive, despite their original purchase price parity. This is long-tail economics at work [wired.com]. As the individual bits become worth less and less, the value in of the bits in total continues to rise, resulting in a completely new set of capabilities.

    My DVR is an excellent example of this - it's a thorough change in the way that I watch television. Suddenly, it's a family event that we can all share, because when I want to comment, I can just hit pause, and share my thought. Nothing's lost, if needed we can just hit rewind a bit, and suddenly, instead of being annoyed at my daughter for wanting to comment on a point during a televised debate, I'm excited and interested! No more SHUSHSTing at my family, it's now a much more shared experience.

    The price of nonlinear access media has dropped so incredibly that marginal-value bits (like video) are suddenly cheap enough to make it all possible.

  • by TheSunborn (68004) <tillerNO@SPAMdaimi.au.dk> on Monday August 25, 2008 @09:41PM (#24745859)

    Only problem is, where do you find an oo database with a good index and search implementation, that don't cost to much that when you ask the company for a price, they don't even want to reply.

  • by cjonslashdot (904508) on Monday August 25, 2008 @10:17PM (#24746197)
    Point well taken. The problem now is the reality that OO databases database products were decimated by their failure to explain their value to the market. However, there is a little bit of a resurgence. See http://www.service-architecture.com/products/object-oriented_databases.html [service-architecture.com]

The study of non-linear physics is like the study of non-elephant biology.

Working...