The 1-Petabyte Barrier Is Crumbling

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

The 1-Petabyte Barrier Is Crumbling 217

Posted by CmdrTaco on Monday August 25, 2008 @08:35AM from the so-much-data dept.

CurtMonash writes "I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."

This discussion has been archived. No new comments can be posted.

The 1-Petabyte Barrier Is Crumbling

Load All Comments

Search 217 Comments Log In/Create an Account

Comments Filter:

Porn collection (Score:4, Funny)

by Anonymous Coward writes: on Monday August 25, 2008 @08:39AM (#24735439)

No porn collection jokes please.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by aywwts4 ( 610966 ) writes:
  
  No porn collection jokes please.
  +1 Futile
- Re: (Score:2)
  
  by Lisandro ( 799651 ) writes:
  
  I don't know about yours, but my porn collection is no joke.
Won't somebody think of the children.... (Score:2, Funny)

by Anonymous Coward writes:

Oh wait, that was petabyte...
Fixed it for you... (Score:5, Funny)

by hyperz69 ( 1226464 ) writes: on Monday August 25, 2008 @08:41AM (#24735477)

I had been a Porn Collector for a decade before I found 1-gigabyte Porn Collections to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling.

Share
twitter facebook
- Noob (Score:5, Funny)
  
  by SmallFurryCreature ( 593017 ) writes: on Monday August 25, 2008 @09:16AM (#24735795) Journal
  
  My porn collection has long since achieved infinity.
  
  Parent Share
  twitter facebook
  - Re:Noob (Score:5, Funny)
    
    by Gilmoure ( 18428 ) writes: on Monday August 25, 2008 @09:44AM (#24736089) Journal
    
    It has an event horizon and is actively acquiring porn on it's own?
    
    Parent Share
    twitter facebook
    - I've seen porn collections like that... (Score:2)
      
      by GameboyRMH ( 1153867 ) writes:
      
      ...on virus-infested Windows PCs.
    - Re: (Score:2)
      
      by Kjella ( 173770 ) writes:
      
      My porn collection has long since achieved infinity.
      It has an event horizon and is actively acquiring porn on it's own?
      <voice series="Futurama" character="Hermes Conrad">
      That would be a singularity. Since the universe is infinite, you can have an infinitely large porn collection by using an infinitely large volume rather than create a singularity.
      </voice>
      - Re: (Score:2)
        
        by Gilmoure ( 18428 ) writes:
        
        So... the entire universe may just be an infinite porn collection encoded in matter and energy? Damn it! Where's the key?
    - Re: (Score:3, Funny)
      
      by infinite9 ( 319274 ) writes:
      
      ...event horizon...
      Awesome! That's what I'm going to call it now! My "event horizon"!
      "Here it comes baby, the point of no return!"
- - Re: (Score:2)
    
    by Poltras ( 680608 ) writes:
    
    Please. Stop. Just stop.
Petabyte DBs are old news to... (Score:3, Funny)

by C_Kode ( 102755 ) writes: on Monday August 25, 2008 @08:42AM (#24735489) Journal

Petabyte DBs are old news to techie porn collectors. They always mix their two favorite subjects into one. Tech + Porn = Petabyte+ Porn Database

Share
twitter facebook
- Comment removed (Score:5, Interesting)
  
  by account_deleted ( 4530225 ) writes: on Monday August 25, 2008 @08:53AM (#24735607)
  
  Comment removed based on user account deletion
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Lodragandraoidh ( 639696 ) writes:
    
    You hit the nail on the head. The technology allows for a richer experience for the user -- hence the ability to collect more useful information to make the customer experience better/faster/stronger/etc.
  - Re:Petabyte DBs are old news to... (Score:4, Interesting)
    
    by blahplusplus ( 757119 ) writes: on Monday August 25, 2008 @12:58PM (#24738789)
    
    "they used to contain basically the address and perhaps logs from calls they made to the call center. Now whole phone conversations are logged as well as faxes and letters that are scanned, together with images and video that is available."
    Reminds me of David brin's Transparent society
    http://www.davidbrin.com/tschp1.html [davidbrin.com]
    http://www.amazon.com/Transparent-Society-Technology-Between-Privacy/dp/0738201448/ [amazon.com]
    
    Parent Share
    twitter facebook
  - - Re: (Score:2)
      
      by zippthorne ( 748122 ) writes:
      
      If they had the time to listen to you while on hold.. why would they put you on hold?
      - Re: (Score:3, Insightful)
        
        by QuantumRiff ( 120817 ) writes:
        
        When my unemployment was running out years ago, I took a job at a call center to pay the bills.. When I had to ask a co-worker a question, I often would hit Mute instead of hold after asking them to hold. It was pretty entertaining!
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
    - Re: (Score:2)
      
      by myth24601 ( 893486 ) writes:
      
      Hmmm... you say most places record and store the whole call now? Like, even what I say when on hold?
      Uh-oh.
      I guess let's just hope nobody listens to my recordings, lest they find out how I truly feel about their hold music.
      If you simply play copyrighted material while on hold and they record it can they be sued but the Record Industry?
Oh s***! I'm calling my Congressman! (Score:5, Funny)

by BitterOldGUy ( 1330491 ) writes: on Monday August 25, 2008 @08:43AM (#24735497)

We must protect the children from the petabytes! These petabytes are everywhere trying to have sex with our children!
I have to find my kid. Last time I saw her, she was with her Uncle Micky while he was having his morning martini.

Share
twitter facebook
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by billcopc ( 196330 ) writes:
    
    Hotlinking FAIL!
Google Street View must be most massive db ever? (Score:3, Interesting)

by Anonymous Coward writes: on Monday August 25, 2008 @08:44AM (#24735515)

They have many towns now with less than 50k people completely photographed, every street in high res. That has to be well over 1-petabyte, though I doubt it's all in one location, must be distributed?

Share
twitter facebook
- Re:Google Street View must be most massive db ever (Score:5, Informative)
  
  by Anonymous Coward writes: on Monday August 25, 2008 @09:37AM (#24736021)
  
  http://labs.google.com/papers/bigtable.html [google.com]
  
  Parent Share
  twitter facebook
I am confused !! (Score:5, Funny)

by neonux ( 1000992 ) writes: on Monday August 25, 2008 @08:45AM (#24735523) Homepage

How many Libraries of Congress are necessary to break the 1-petabyte barrier ??

Share
twitter facebook
- - Re: (Score:2, Informative)
    
    by Lachlan Hunt ( 1021263 ) writes:
    
    You seem to be trying to calculate in Tebibytes (TiB) and Pebibytes (PiB), which are based on the binary system, rather than Terabytes (TB) and Petabytes (PB), which are base 10.
    Although some operating systems incorrectly use the decimal-based units with binary-based values (i.e. 1TB = 1024MB), that is technically wrong. Hard drive manufacturers actually report correctly using the decimal-based values (i.e. 1TB = 1000MB).
    Also, you still got your maths wrong. 10TiB = ~0.09PiB.
    - - Re: (Score:2)
        
        by Lachlan Hunt ( 1021263 ) writes:
        
        Yeah, well, like it or not, hard drive manufacturers and data transmission rates use the base 10 SI units.
        
        Re: (Score:2)
        
        by Lachlan Hunt ( 1021263 ) writes:
        
        Hard drives are formatted with block sizes that are a power of two (e.g., 512 bytes). Thus it is more useful to see how many of them you would have on a filesystem than some power of ten figure that also conveniently inflates the capacity.
        The issue being discussed isn't whether they should use base 10 or base 2 values, it's about which SI Prefix names that should be used for reporting the values.
        It is an indisputable fact that hard drive manufacturers do currently use base 10 values and the base 10 prefixes. If you think they should use base 2 values, then fine, you may have a valid point. But you would have take it up with their marketing departments. However, if they did, they would also have to switch to the base 2 prefixes to avoid any c
      - Re: (Score:2)
        
        by novakreo ( 598689 ) writes:
        
        SI is the older, historical standard, dating back to the nineteenth century. And you are using the 'inanely stupid' SI names, but breaking the standard by redefining them for your own purposes. How difficult is it to write TiB instead of TB when you want to be unambiguous?
  - Re: (Score:2)
    
    by suggsjc ( 726146 ) writes:
    
    Isn't the Library of Congress continually growing? If so, doesn't that need to be a dynamic algorithm to adjust for its rate of growth? I couldn't find any documentation or any historical data, but I would think its out there somewhere...then we can start working on this algorithm.
    - Re: (Score:2)
      
      by zippthorne ( 748122 ) writes:
      
      Further, do they mean, the size of the *text* of the books when ascii-encoded, or do they mean images of every page in the books, and all the media encoded appropriately and "losslessly?"
      Even further: RGB filters only? What about reflective inks/bindings, embossed covers? lenticular "hologram" covers?
  - - Re:I am confused !! (Score:4, Informative)
      
      by Anonymous Coward writes: on Monday August 25, 2008 @09:31AM (#24735945)
      
      1 Petabyte = 1,000 Terabytes
      1 LoC = 10 Terabytes
      100 LoC = 1,000 Terabytes
      ======
      100 LoC = 1 Petabyte
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Lodragandraoidh ( 639696 ) writes:
        
        If you think you're confused now, wrap your head around this:
        1 64 bit address space = 18 Quintillion Bytes = 18 Million Petabytes = 18 Billion Libraries of Congress -- directly addressable by one machine (I wouldn't want the electric bill from that machine).
      - LHC data production (Score:4, Informative)
        
        by SlowMovingTarget ( 550823 ) writes: on Monday August 25, 2008 @12:56PM (#24738759) Homepage
        
        So when active, the Large Hadron Collider will generate the equivalent volume of data of 50 Libraries of Congress every second.
        
        Parent Share
        twitter facebook
      - Re: (Score:2)
        
        by SlowMovingTarget ( 550823 ) writes:
        
        I'm not sure what language you code in, but I've never had a line of code take up 10 terabytes...
        The AC obviously codes in Lisp, all those closing parens really add up!
No big news here.... (Score:5, Interesting)

by edwardd ( 127355 ) writes: on Monday August 25, 2008 @08:49AM (#24735577) Journal

Take a look at almost any large financial firm. The email retention system alone is much larger than a petabyte, and that's just dealing with the online media, not including what's spooled to tape. Due to deficiencies in RDBMS ssytems, each of the large firms usually develop their own systems for managing the archival system on top of the database.

Share
twitter facebook
Oh, come on. (Score:5, Interesting)

by seven of five ( 578993 ) writes: on Monday August 25, 2008 @08:50AM (#24735583)

Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.

Share
twitter facebook
- Re:Oh, come on. (Score:5, Insightful)
  
  by poetmatt ( 793785 ) writes: on Monday August 25, 2008 @08:58AM (#24735649) Journal
  
  So the fact that movies have gone from 780mb (dvdrips) to 4.8gb (straight up copies) to 25gig (blu ray) doesn't bear any significance to you?
  Or how about games which have gone from 1mb to installations that are upwards of 10gigs now (warhammer IIRC is 9 something).
  Not to mention MS's fiasco of their Office XML format where things take up a ridiculous amount of space in comparison to open office (10mb docx vs 2.9mb open office)...it's all about the level of tech knowledge of someone that determines their space usage.
  I wouldn't mind 3-4 TB, I'd split it off into about 4 partitions or raid stripe and call it a day for a while.
  However consumer use is indicative of business use, so I would expect things to head towards exabyte eventually.
  
  Parent Share
  twitter facebook
  - Re:Oh, come on. (Score:5, Insightful)
    
    by seven of five ( 578993 ) writes: on Monday August 25, 2008 @09:25AM (#24735869)
    
    However consumer use is indicative of business use, so I would expect things to head towards exabyte eventually.
    
    This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB.
    
    On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by blahplusplus ( 757119 ) writes:
      
      "This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB."
      That's true for small companies but places like Digg and any site that gets a lot of comments would very quickly fill up that TB.
    - Re: (Score:2)
      
      by nasor ( 690345 ) writes:
      
      That's exactly what I was thinking. Okay, a hi-def movie is 25 GB - but does some company really have 40k hi-def movies to stored?
    - More long-tail economics! (Score:5, Interesting)
      
      by mcrbids ( 148650 ) writes: on Monday August 25, 2008 @04:23PM (#24741815) Journal
      
      On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.
      Yep. That's exactly it. $200 today buys a 1 TB drive. $200 a few years ago bought a 1 GB drive. As the price has fallen the value of the HDD has risen relative to its cost. Those archive directories and development junk aren't being deleted because they have value. Sure, it's enough value to justify keeping them around when a 1 GB drive costs $200, but they are worth keeping around with a 1 TB drive costs that much.
      They aren't "doing nothing" - they just aren't doing enough that it's worth keeping it until the price drops enough.
      All of this is making the 1 TB drive considerably more valuable than the 1 GB drive, despite their original purchase price parity. This is long-tail economics at work [wired.com]. As the individual bits become worth less and less, the value in of the bits in total continues to rise, resulting in a completely new set of capabilities.
      My DVR is an excellent example of this - it's a thorough change in the way that I watch television. Suddenly, it's a family event that we can all share, because when I want to comment, I can just hit pause, and share my thought. Nothing's lost, if needed we can just hit rewind a bit, and suddenly, instead of being annoyed at my daughter for wanting to comment on a point during a televised debate, I'm excited and interested! No more SHUSHSTing at my family, it's now a much more shared experience.
      The price of nonlinear access media has dropped so incredibly that marginal-value bits (like video) are suddenly cheap enough to make it all possible.
      
      Parent Share
      twitter facebook
- Re:Oh, come on. (Score:4, Insightful)
  
  by AP31R0N ( 723649 ) writes: on Monday August 25, 2008 @08:59AM (#24735665)
  
  Agreed.
  And i'd also be worried about losing a PB all at once. There are TB drives at my local Best Buy, but that's a lot to lose at once. i'd rather split my files and programs between two or more smaller drives (and have a RAID).
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by tekiegreg ( 674773 ) * writes:
    
    This might be going slightly offtopic but yeah I've noticed that with the increases in data size, an increase in backup awareness and redundancy has been percolating down even to the home users.
    For example, recently I set up a mirrored drive system for my stepdad for his home photos (which are somewhere in the 200GB range as he is semi-professional) just in case one drive goes out. Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?
    - Re: (Score:2)
      
      by Fweeky ( 41046 ) writes:
      
      Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?
      Backup 200GB+ of data to DVD's? Are you mad? That's 25-50 disks just for the initial backup, and you probably want twice that to handle discs going bad.
      Get two or three external disks (ESATA ideally; you can run SMART self tests, get better transfer rates, etc). Use a decent incremental backup tool to make versioned snapshots to them, rotating the drives periodically; keep one in storage, and ideally one off-site. Faster, less hassle, more robust and more flexible than a pile-o-DVDs.
- I won't call you old fashioned... (Score:4, Insightful)
  
  by VampireByte ( 447578 ) writes: on Monday August 25, 2008 @09:04AM (#24735699) Homepage
  
  ... but I do wonder if you've ever heard of Sarbanes-Oxley.
  
  Parent Share
  twitter facebook
- Science! (Score:5, Informative)
  
  by edremy ( 36408 ) writes: on Monday August 25, 2008 @09:15AM (#24735791) Journal
  
  Petabytes are actually pretty common in the sciences. I visited NCAR (National Center for Atmospheric Research [ucar.edu]) in Boulder five years ago and their main database was in the 2PB region even then. I'm sure it's a lot larger today
  The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by dargaud ( 518470 ) writes:
    
    The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.
    Shit, I'm working on those 2 projects. I'd better ask for a bigger hard drive to management...
  - Re: (Score:2)
    
    by steelfood ( 895457 ) writes:
    
    The LHC will generate several PB of data per year
    I know 1080p60 takes a lot of space, but I'm not sure I want to see that much hardon's colliding...
- Re:Oh, come on. (Score:4, Funny)
  
  by secondhand_Buddah ( 906643 ) writes: <secondhand.buddah@NoSPAm.gmail.com> on Monday August 25, 2008 @09:21AM (#24735839) Homepage Journal
  
  Bill, is that you???
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by garcia ( 6573 ) writes:
  
  You can have only so much useful information about anything.
  If you have the space available and the tools to utilize the stored data, why not? The more data you keep, the more information you will have available when techniques or routines become available to you to utilize this data.
- Re:Oh, come on. (Score:4, Insightful)
  
  by Kjella ( 173770 ) writes: on Monday August 25, 2008 @09:59AM (#24736275) Homepage
  
  Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.
  Unfortunately, you gather up a lot of digital stuff fast and most of the time it's not useful. Take for example my business mail, it's full of old presentations and random versions of various documents and whatnot. Is it worth cleaning up? No. Is it worth keeping? Well, from time to time clients start asking about old things and it's very useful to have it. I figure 90% of it could be deleted, only keeping final versions and important mails. Of those 90% will never be asked for again, so I keep 100% for maybe 1%. Make a company with hundreds of thousands of people all like that and you get huge, huge amounts of data. It's still cheaper than to go through those huge, huge amounts of data. That goes double for many automated data collection processes - it's cheaper to keep until it's all guaranteed useless than trying to sort it out.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by abigor ( 540274 ) writes:
  
  a. How on earth would you know? Do you work in a data-intensive industry?
  b. Do you understand what a data warehouse even is?
  c. Data mining is statistically based. The more information that's available to mine, the more accurate the results will be. And by "information", I don't mean some kid's hard drive filled with terrible mp3s and downloaded movies.
  - Re:Oh, come on. (Score:5, Interesting)
    
    by Alpha830RulZ ( 939527 ) writes: on Monday August 25, 2008 @12:02PM (#24737993)
    
    Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.
    A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.
    It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by MrMarket ( 983874 ) writes:
  
  I'm guessing most of these databases are keeping CYA information, most of which will never be used.
Too Bad Most of that is Due to Poor... (Score:2, Insightful)

by eno2001 ( 527078 ) writes:

... DB design and old data that should be purged. Color me unimpressed.
- Re: (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  ... DB design and old data that should be purged. Color me unimpressed.
  I'm convinced now that regardless of attempted discrimination, HUMANS are pack-rats. THAT I can deal with, as people can be trained to actually throw shit away. The problem is when lawyers get involved in the matter. Yes, most of the shit we have today in the corporate world we are FORCED to keep due to some insane lawsuit and follow-up "fix-it-forever" law that calls for us to keep a copy of every damn thing that flows electronically for the next 7 - 70 years.
  Could you almost call it corruption? Yes, I
- Effect of the scale (Score:2, Insightful)
  
  by cefek ( 148764 ) writes:
  
  Imagine having tens of millions, or just millions users - all of them with their records, history, targeted ads data. Or some mail provider that stores attachments in a database. Or a file sharing service like those you and I know. That's a plenty of information to manage. Add an overhead, and it's easy to overfill even the biggest database.
  Also I agree with you that bad design might be a concern. Of course there's no big database that couldn't get on a "purge" diet.
  Now seems to me we might have a problem w
OO databases have done this ten years ago (Score:5, Interesting)

by cjonslashdot ( 904508 ) writes: on Monday August 25, 2008 @08:57AM (#24735641)

I remember encountering a 1+ petabyte database 10 years ago: it was the database to record and analyze particle accelerator experiment data at CERN. And it was built using a commercial object database - not relational. Oh but wait - the relational vendors have told us that OO databases don't scale....

That was ten years ago.

Share
twitter facebook
- Re: (Score:2)
  
  by dfetter ( 2035 ) writes:
  
  Storing it is one thing. Querying is a very different thing. What happens when somebody wants to find out something not specifically envisioned in the original experiment?
- Re: (Score:3, Interesting)
  
  by littlewink ( 996298 ) writes:
  
  You are mistaken. While certainly almost everything (right or wrong) has been said at some time by someone, nobody respectable who knew what they were doing ever claimed that object-oriented databases would not scale.
  In fact OO and similar (CODASYL, network-style, etc. ) databases were used and continue to be used very heavily in applications where relational database do not scale.
  - - Re: (Score:3, Interesting)
      
      by TheSunborn ( 68004 ) writes:
      
      Only problem is, where do you find an oo database with a good index and search implementation, that don't cost to much that when you ask the company for a price, they don't even want to reply.
      - Re: (Score:3, Interesting)
        
        by cjonslashdot ( 904508 ) writes:
        
        Point well taken. The problem now is the reality that OO databases database products were decimated by their failure to explain their value to the market. However, there is a little bit of a resurgence. See http://www.service-architecture.com/products/object-oriented_databases.html [service-architecture.com]
Google Maps is way bigger... (Score:3, Informative)

by Plantain ( 1207762 ) writes: on Monday August 25, 2008 @08:58AM (#24735651)

Google Maps' database is far bigger...
A base of 8 tiles, with each becoming four more smaller tiles, in two modes (map/satellite), and 16 zoom levels.
Each tile is approx. 30kB.
(((0.03* (8 * (4^16)))/1024)/1024) == 983.04TB right there.
My calculator doesn't handle numbers big enough for streetview. O_O

Share
twitter facebook
- Re:Google Maps is way bigger... (Score:5, Funny)
  
  by Speare ( 84249 ) writes: on Monday August 25, 2008 @09:02AM (#24735689) Homepage Journal
  
  Google Maps' database is far bigger...
  
  A base of 8 tiles, with each becoming four more smaller tiles, in two modes (map/satellite), and 16 zoom levels.
  We are sorry, but we don't
  have maps at this zoom
  level for this region.
  Try zooming out for a
  broader look.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by TheGreatGraySkwid ( 553871 ) writes:
    
    That's the worst haiku I've ever seen.
  - - Re: (Score:2)
      
      by Speare ( 84249 ) writes:
      
      My point is, two thirds of the surface of the earth is water. Oceans have maybe two or three zoom levels. Given the fractal nature of the data, your estimate of "16 levels" as the global average is waaaaaay off base. I'd be very surprised if all the unique graphics for all modes ends up being more than 1 terabyte.
      - Re: (Score:2)
        
        by imsabbel ( 611519 ) writes:
        
        Then be surprised.
        The landsat data alone comes close to 1TB.
        And that is just the whole world in the broad 30m or so array.
        (I know, because waaay back, i mirrored part of the Nasa WorldWind data)
        This data is in no way fractal in nature.
        And just do the math (just to see that your argument is bogus):
        A km^2 at level 20 has 4^4=256 times as much data as one at level 16.
        If you do the math, central europe alone is enough to push the world to an average of level 16 (germany, e.g., is completely covered in airplane
When the petafile barrier crumbles ... (Score:5, Funny)

by cpu_fusion ( 705735 ) writes: on Monday August 25, 2008 @09:05AM (#24735709)

... we'll need an army of Chris Hansens and a mountain of beartraps. God help us.

Share
twitter facebook
the only *real* barrier is backup time (Score:5, Interesting)

by petes_PoV ( 912422 ) writes: on Monday August 25, 2008 @09:09AM (#24735751)

or more correctly, restore time.
Any organisation that wishes to be classed in any way professional knows that the value in it's databases has to be protected. That requires them to have the means to recover the data if something bad happens. A hot-mirrored copy is simply not good enough (one corruption would get written to both copies).
As a consequence, the size of commercial databases is limited by the amount of time the organisation is willing to have it unavailable while it is restored, in the case of a disaster, or the time taken to create/update secure, offline, copies.
Not by intrinsic properties of the database or host architecture

Share
twitter facebook
- Re: (Score:2)
  
  by TheLink ( 130905 ) writes:
  
  Exactly.
  
  When various Important People are standing behind you making "supportive" noises, while other people are coming by every 5 minutes to ask "Is it fixed yet?", you'll start to realize that restore time is very important, and that disk I/O is pathetic, and tape is overrated.
s/barrier/arbitrary round number/g (Score:5, Insightful)

by ivan256 ( 17499 ) writes: on Monday August 25, 2008 @09:17AM (#24735813)

That is all.

Share
twitter facebook
- Re: (Score:2)
  
  by Spatial ( 1235392 ) writes:
  
  I'm an arbitrary round number, you insensitive clod!
The world will only ever need 5 large databases (Score:5, Funny)

by davidwr ( 791652 ) writes: on Monday August 25, 2008 @09:23AM (#24735857) Homepage Journal

The world will only need 5 large databases.
None of them will never need more than 640KB^H^HMB^H^HGBMB^H^HTB of RAM and 32MB^H^HGB^H^HTB^H^HPB of storage.

Share
twitter facebook
WalMart has a 4 petabyte database already (Score:4, Informative)

by captaindomon ( 870655 ) writes: on Monday August 25, 2008 @09:44AM (#24736091)

WalMart's data warehouse is already 4 petabytes: http://storefrontbacktalk.com/story/080307walmart.php [storefrontbacktalk.com]

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  They only needed one petabyte, but the Chinese cut them a deal on 4.
IBM Boulder (Score:2, Insightful)

by Abattoir ( 16282 ) writes:

Is the location of IBM's Managed Storage Services (MSS) division, which deploys SAN for customers in Boulder (including IBM internal) and other locations (over high speed fibre links) on IBM "Shark" (ESS) and DS6000/DS8000 devices. When I worked at IBM their marketing materials stated they were managing over 4 petabytes of data for enterprise customers out of that location alone - that was four years ago! That doesn't count for other MSS locations either, nor all the other areas where IBM implements large a
- Re: (Score:2)
  
  by serviscope_minor ( 664417 ) writes:
  
  So you want to talk about high levels of storage - IBM has the game covered, considering they invented the HDD.
  Actually, this is about databases rather than disks per se. But that's OK since they invented the relational database, too [wikipedia.org].
I wonder (Score:2)

by DragonTHC ( 208439 ) writes:

How much of that data is marketing information?
seriously, is all of that data current and necessary?
seems to me that they should prune off and backup old data.
- Re: (Score:2)
  
  by Shados ( 741919 ) writes:
  
  When you're doing automated data projections, using previous years of data to try and predict, from trends, the future (so to speak), having 10+ years of data isn't a luxury. And in our field, 10 years of data is often -all- of your data...so well...
Johnny Mnemonic (Score:5, Funny)

by vjmurphy ( 190266 ) writes: on Monday August 25, 2008 @10:05AM (#24736347) Homepage

I need measurements I can understand, like how many Keanu Reeves' brains is a petabyte? And could he hold it indefinitely, or would his head explode at some point? If the latter, can we get him started on it now?

Share
twitter facebook
- Re: (Score:2)
  
  by Lodragandraoidh ( 639696 ) writes:
  
  I believe 1 'Keanu' = 64 Kilobytes, but I would have to check the literature...
- Re: (Score:2)
  
  by Johnny Mnemonic ( 176043 ) writes:
  
  Johnny's brain could hold 80GB, or 160GB if he used a "doubler". So a PB is 12.5 times the capacity of Johnny's brain, undoubled.
  I should know. ;)
  - Re: (Score:2)
    
    by Leebert ( 1694 ) writes:
    
    I should know. ;)
    That's a bummer then, since you're off by a factor of 1000. ;)
How is this news? (Score:5, Interesting)

by Dark$ide ( 732508 ) writes: on Monday August 25, 2008 @10:27AM (#24736617) Journal

We've had petabyte databases on mainframes for a good couple of years. DB2 v9 on zSeries has two new tablespace types that make managing these humungous databases much easier.

So it may be news for the PC world but it's bordering on ancient history on IBM mainframes.

Share
twitter facebook
Re: (Score:2, Informative)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
Pet peeve: misuse of "barrier" (Score:2)

by JoeBuck ( 7947 ) writes:

Round numbers are not "barriers", they are just round numbers. The term "barrier" should only be used when there is something special about the number that creates special engineering challenges to overcome.
Example: the sound barrier. The aerodynamics of a moving airplane are completely different when traveling faster than the speed of sound, than when traveling slower, so it was a real barrier that required engineering effort to overcome.
Another barrier had to do with fabricating electronic component
- Re: (Score:2)
  
  by CurtMonash ( 986884 ) writes:
  
  You have a point.
  But the nice round numbers lead to marketing false alarms, so I think it's noteworthy when hype gives way to reality.
  This also happens to be an area that lends itself to round numbers right now, since 10 terabytes is about the level where Oracle has totally run out of gas, and 100 terabytes used to be the hard limit on Netezza configurations.
  CAM
Greenplum is based on Postgresql (Score:2)

by TheNarrator ( 200498 ) writes:

From the Greenplum article mentioned in the summary:
Most or all of the PostgreSQL data access methods are left intact. The big changes to PostgreSQL lie in the areas of query optimization, planning, and execution. I.e., Greenplum has its own way of breaking up a query into pieces â" and of course of seeing that data gets shipped among nodes â" but the low-level operators for storage and access are from PostgreSQL.
- Re: (Score:2, Flamebait)
  
  by bconway ( 63464 ) writes:
  
  Database, not filesystem. Thanks for almost bothering to read the summary, though.
  - Re: (Score:3, Insightful)
    
    by Beale ( 676138 ) writes:
    
    As soon as you have the capacity, people will fill the capacity. There's always more data to collect.
  - - Re: (Score:2)
      
      by camperdave ( 969942 ) writes:
      
      And a database is a file system.
- I could see practical applications (Score:3, Informative)
  
  by gravis777 ( 123605 ) writes:
  
  Okay, I know that the article is refering to database, but the comments seem to have gone into the way of disc storage, so I will take the bait and go off topic.
  Petabyte drives would not really be that unpractical of an application for people who like to archive stuff. I just filled up a 300 gig drive and a 750 gig drive with just stuff off of the DVR in under a year. While National Geographic HD may be compressed so badly that it barely looks better than HD, and a one hour show is under 2 gig, try archivin
  - Re: (Score:2)
    
    by Kjella ( 173770 ) writes:
    
    Granted, these are extremes, but who would have thought 15 years ago when we first started hitting the 1 gig barrier, that in 2008 we would have discs used for storing movies that have a capacity of 50 gig, and we would even consider saving stuff at a resolution of 1920x1080 and have PCM sound at a bitrate of 4.6Mbps?
    Actually, very many. The infamous Moore's "law" was well underway and everything was growing nice and exponential. Though what the future needs is the bandwidth revolution, it's not "We can sto
- Re: (Score:2)
  
  by Daimanta ( 1140543 ) writes:
  
  "Is it likely to explode once it reaches 1 petabyte?"
  No, but your head will.
- - Re: (Score:2)
    
    by rdebath ( 884132 ) writes:
    
    The same barrier exists at 2TB or 2^32 disk sectors.
    After that MSDOS style partition tables aren't good enough any more.
- Re: (Score:2)
  
  by JPLemme ( 106723 ) writes:
  
  I'm assuming, because back in 1992 I remember reading that MCI had a 1 TB (!) database. It was big enough news to make it into PC Week.
- Re: (Score:2)
  
  by CurtMonash ( 986884 ) writes:
  
  how are you measuring that? Total database size? Raw input data size?
  It's true user data. I make a point of that.
  Following through to the links re Teradata gives a sense of what kind of back and forth that can engender.
  CAM
- Re: (Score:2)
  
  by CurtMonash ( 986884 ) writes:
  
  The post surprisingly does not mention Aster Data Systems [asterdata.com] which is the datawarehouse behind MySpace. When web sites start to store and analyze every single user click then you quickly get into massive amount of data. It's no surprise that the Petabyte barrier is reached especially with the density of storage increasing at constant cost.
  I met with Aster Data last Thursday, and will be writing about them soon. Aster's MySpace installation is a big database. But it's not petabyte-scale yet.
  CAM

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Porn collection (Score:4, Funny)

Re: (Score:3, Insightful)

Re: (Score:2)

Won't somebody think of the children.... (Score:2, Funny)

Fixed it for you... (Score:5, Funny)

Noob (Score:5, Funny)

Re:Noob (Score:5, Funny)

I've seen porn collections like that... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Funny)

Re: (Score:2)

Petabyte DBs are old news to... (Score:3, Funny)

Comment removed (Score:5, Interesting)

Re: (Score:2)

Re:Petabyte DBs are old news to... (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Oh s***! I'm calling my Congressman! (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Google Street View must be most massive db ever? (Score:3, Interesting)

Re:Google Street View must be most massive db ever (Score:5, Informative)

I am confused !! (Score:5, Funny)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:I am confused !! (Score:4, Informative)

Re: (Score:2)

LHC data production (Score:4, Informative)

Re: (Score:2)

No big news here.... (Score:5, Interesting)

Oh, come on. (Score:5, Interesting)

Re:Oh, come on. (Score:5, Insightful)

Re:Oh, come on. (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

More long-tail economics! (Score:5, Interesting)

Re:Oh, come on. (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

I won't call you old fashioned... (Score:4, Insightful)

Science! (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re:Oh, come on. (Score:4, Funny)

Re: (Score:2)

Re:Oh, come on. (Score:4, Insightful)

Re: (Score:2)

Re:Oh, come on. (Score:5, Interesting)

Re: (Score:2)

Too Bad Most of that is Due to Poor... (Score:2, Insightful)

Re: (Score:2, Interesting)

Effect of the scale (Score:2, Insightful)

OO databases have done this ten years ago (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Google Maps is way bigger... (Score:3, Informative)

Re:Google Maps is way bigger... (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

When the petafile barrier crumbles ... (Score:5, Funny)

the only *real* barrier is backup time (Score:5, Interesting)

Re: (Score:2)

s/barrier/arbitrary round number/g (Score:5, Insightful)

Re: (Score:2)

The world will only ever need 5 large databases (Score:5, Funny)

WalMart has a 4 petabyte database already (Score:4, Informative)

Re: (Score:2, Funny)

IBM Boulder (Score:2, Insightful)

Re: (Score:2)

I wonder (Score:2)

the only real barrier is backup time (Score:5, Interesting)