The 1-Petabyte Barrier Is Crumbling 217
CurtMonash writes "I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."
Google Maps is way bigger... (Score:3, Informative)
Google Maps' database is far bigger...
A base of 8 tiles, with each becoming four more smaller tiles, in two modes (map/satellite), and 16 zoom levels.
Each tile is approx. 30kB.
(((0.03* (8 * (4^16)))/1024)/1024) == 983.04TB right there.
My calculator doesn't handle numbers big enough for streetview. O_O
Science! (Score:5, Informative)
The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.
Re:I am confused !! (Score:4, Informative)
1 Petabyte = 1,000 Terabytes
1 LoC = 10 Terabytes
100 LoC = 1,000 Terabytes
======
100 LoC = 1 Petabyte
Re:Google Street View must be most massive db ever (Score:5, Informative)
WalMart has a 4 petabyte database already (Score:4, Informative)
I could see practical applications (Score:3, Informative)
Okay, I know that the article is refering to database, but the comments seem to have gone into the way of disc storage, so I will take the bait and go off topic.
Petabyte drives would not really be that unpractical of an application for people who like to archive stuff. I just filled up a 300 gig drive and a 750 gig drive with just stuff off of the DVR in under a year. While National Geographic HD may be compressed so badly that it barely looks better than HD, and a one hour show is under 2 gig, try archiving something with a higher bandwidth. For example, I recorded the Olympics, and saved the opening and closing ceremonies and all gymnastic events. A single 4 hour day saved is around 40 gig.
So, lets think media server for HD material. Let's just stick with HDTV for a while. Let's say that I want to archive on a media server a Blu-Ray disc. Let's for the matter of talking say that the movie takes up all 50 gig of the disc. Ten movies, 500 gig. 100 movies, 5 Terrabyte, 1000 movies, 50 Terrabyte.
Now let's say that we are an IMAX theater, and upgrading to the new Imax Digital standard. I read not too long ago that an Imax film is equilivant to 18k (most digital theaters project 2K, although some are now installing 4K systems). So, to keep from having these big massive films around of the 20 year old science documentaries that we keep in rotation, we get the digital versions of these. Does anyone want to do the math?
I am waiting for the day when neural implants can actually read the human brain, and as such, you can archive experiences to some type of storage medium. I am sure wikipedia has somewhere how much information the human brain processes a second. Now, I am sure we will find a way of compressing stuff, we can already do audio and video, so I am sure one day we will have the ability to compress smell, taste and touch, granting that we actually have a way of capturing these. Still, the amount of data would be massive, and will probably be a whole new avenue for the Porn industry.
Granted, these are extremes, but who would have thought 15 years ago when we first started hitting the 1 gig barrier, that in 2008 we would have discs used for storing movies that have a capacity of 50 gig, and we would even consider saving stuff at a resolution of 1920x1080 and have PCM sound at a bitrate of 4.6Mbps?
Give us the storage space, and we will find a use for it.
Re:I am confused !! (Score:2, Informative)
You seem to be trying to calculate in Tebibytes (TiB) and Pebibytes (PiB), which are based on the binary system, rather than Terabytes (TB) and Petabytes (PB), which are base 10.
Although some operating systems incorrectly use the decimal-based units with binary-based values (i.e. 1TB = 1024MB), that is technically wrong. Hard drive manufacturers actually report correctly using the decimal-based values (i.e. 1TB = 1000MB).
Also, you still got your maths wrong. 10TiB = ~0.09PiB.
Comment removed (Score:2, Informative)
Re:I am confused !! (Score:1, Informative)
Data transmission isn't done in power of two unit sizes (packets can be variable size), so they should indeed use base 10 units, and bits, not bytes. 10Mbps, no problem.
Hard drives are formatted with block sizes that are a power of two (e.g., 512 bytes). Thus it is more useful to see how many of them you would have on a filesystem than some power of ten figure that also conveniently inflates the capacity.
Imagine RAM being sold in base 10, it would be stupid.
LHC data production (Score:4, Informative)
So when active, the Large Hadron Collider will generate the equivalent volume of data of 50 Libraries of Congress every second.