Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Databases Programming Software Data Storage IT

The 1-Petabyte Barrier Is Crumbling 217

CurtMonash writes "I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."
This discussion has been archived. No new comments can be posted.

The 1-Petabyte Barrier Is Crumbling

Comments Filter:
  • by Plantain ( 1207762 ) on Monday August 25, 2008 @08:58AM (#24735651)

    Google Maps' database is far bigger...

    A base of 8 tiles, with each becoming four more smaller tiles, in two modes (map/satellite), and 16 zoom levels.

    Each tile is approx. 30kB.

    (((0.03* (8 * (4^16)))/1024)/1024) == 983.04TB right there.

    My calculator doesn't handle numbers big enough for streetview. O_O

  • Science! (Score:5, Informative)

    by edremy ( 36408 ) on Monday August 25, 2008 @09:15AM (#24735791) Journal
    Petabytes are actually pretty common in the sciences. I visited NCAR (National Center for Atmospheric Research [ucar.edu]) in Boulder five years ago and their main database was in the 2PB region even then. I'm sure it's a lot larger today

    The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.

  • Re:I am confused !! (Score:4, Informative)

    by Anonymous Coward on Monday August 25, 2008 @09:31AM (#24735945)

    1 Petabyte = 1,000 Terabytes
    1 LoC = 10 Terabytes
    100 LoC = 1,000 Terabytes
    100 LoC = 1 Petabyte

  • by Anonymous Coward on Monday August 25, 2008 @09:37AM (#24736021)
  • by captaindomon ( 870655 ) on Monday August 25, 2008 @09:44AM (#24736091)
    WalMart's data warehouse is already 4 petabytes: http://storefrontbacktalk.com/story/080307walmart.php [storefrontbacktalk.com]
  • by gravis777 ( 123605 ) on Monday August 25, 2008 @10:10AM (#24736395)

    Okay, I know that the article is refering to database, but the comments seem to have gone into the way of disc storage, so I will take the bait and go off topic.

    Petabyte drives would not really be that unpractical of an application for people who like to archive stuff. I just filled up a 300 gig drive and a 750 gig drive with just stuff off of the DVR in under a year. While National Geographic HD may be compressed so badly that it barely looks better than HD, and a one hour show is under 2 gig, try archiving something with a higher bandwidth. For example, I recorded the Olympics, and saved the opening and closing ceremonies and all gymnastic events. A single 4 hour day saved is around 40 gig.

    So, lets think media server for HD material. Let's just stick with HDTV for a while. Let's say that I want to archive on a media server a Blu-Ray disc. Let's for the matter of talking say that the movie takes up all 50 gig of the disc. Ten movies, 500 gig. 100 movies, 5 Terrabyte, 1000 movies, 50 Terrabyte.

    Now let's say that we are an IMAX theater, and upgrading to the new Imax Digital standard. I read not too long ago that an Imax film is equilivant to 18k (most digital theaters project 2K, although some are now installing 4K systems). So, to keep from having these big massive films around of the 20 year old science documentaries that we keep in rotation, we get the digital versions of these. Does anyone want to do the math?

    I am waiting for the day when neural implants can actually read the human brain, and as such, you can archive experiences to some type of storage medium. I am sure wikipedia has somewhere how much information the human brain processes a second. Now, I am sure we will find a way of compressing stuff, we can already do audio and video, so I am sure one day we will have the ability to compress smell, taste and touch, granting that we actually have a way of capturing these. Still, the amount of data would be massive, and will probably be a whole new avenue for the Porn industry.

    Granted, these are extremes, but who would have thought 15 years ago when we first started hitting the 1 gig barrier, that in 2008 we would have discs used for storing movies that have a capacity of 50 gig, and we would even consider saving stuff at a resolution of 1920x1080 and have PCM sound at a bitrate of 4.6Mbps?

    Give us the storage space, and we will find a use for it.

  • Re:I am confused !! (Score:2, Informative)

    by Lachlan Hunt ( 1021263 ) on Monday August 25, 2008 @10:10AM (#24736401) Homepage

    You seem to be trying to calculate in Tebibytes (TiB) and Pebibytes (PiB), which are based on the binary system, rather than Terabytes (TB) and Petabytes (PB), which are base 10.

    Although some operating systems incorrectly use the decimal-based units with binary-based values (i.e. 1TB = 1024MB), that is technically wrong. Hard drive manufacturers actually report correctly using the decimal-based values (i.e. 1TB = 1000MB).

    Also, you still got your maths wrong. 10TiB = ~0.09PiB.

  • by cjjjer ( 530715 ) <cjjjerNO@SPAMhotmail.com> on Monday August 25, 2008 @11:20AM (#24737375)
    Seems that Yahoo made this claim months [computerworld.com] ago but for a 2 petabyte database. The article goes on to list a couple of others that have more than 2 petabytes of archived data. So it's safe to say that the petabyte data barrier has been broken for some time.
  • Re:I am confused !! (Score:1, Informative)

    by Anonymous Coward on Monday August 25, 2008 @11:56AM (#24737923)

    Data transmission isn't done in power of two unit sizes (packets can be variable size), so they should indeed use base 10 units, and bits, not bytes. 10Mbps, no problem.

    Hard drives are formatted with block sizes that are a power of two (e.g., 512 bytes). Thus it is more useful to see how many of them you would have on a filesystem than some power of ten figure that also conveniently inflates the capacity.

    Imagine RAM being sold in base 10, it would be stupid.

  • LHC data production (Score:4, Informative)

    by SlowMovingTarget ( 550823 ) on Monday August 25, 2008 @12:56PM (#24738759) Homepage

    So when active, the Large Hadron Collider will generate the equivalent volume of data of 50 Libraries of Congress every second.

If it's not in the computer, it doesn't exist.