Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.
So the fact that movies have gone from 780mb (dvdrips) to 4.8gb (straight up copies) to 25gig (blu ray) doesn't bear any significance to you?
Or how about games which have gone from 1mb to installations that are upwards of 10gigs now (warhammer IIRC is 9 something).
Not to mention MS's fiasco of their Office XML format where things take up a ridiculous amount of space in comparison to open office (10mb docx vs 2.9mb open office)...it's all about the level of tech knowledge of someone that determines their space usage.
I wouldn't mind 3-4 TB, I'd split it off into about 4 partitions or raid stripe and call it a day for a while.
However consumer use is indicative of business use, so I would expect things to head towards exabyte eventually.
However consumer use is indicative of business use, so I would expect things to head towards exabyte eventually.
This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB.
On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.
"This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB."
That's true for small companies but places like Digg and any site that gets a lot of comments would very quickly fill up that TB.
On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.
Yep. That's exactly it. $200 today buys a 1 TB drive. $200 a few years ago bought a 1 GB drive. As the price has fallen the value of the HDD has risen relative to its cost. Those archive directories and development junk aren't being deleted because they have value. Sure, it's enough value to justify keeping them around when a 1 GB drive costs $200, but they are worth keeping around with a 1 TB drive costs that much.
They aren't "doing nothing" - they just aren't doing enough that it's worth keeping it until the price drops enough.
All of this is making the 1 TB drive considerably more valuable than the 1 GB drive, despite their original purchase price parity. This is long-tail economics at work [wired.com]. As the individual bits become worth less and less, the value in of the bits in total continues to rise, resulting in a completely new set of capabilities.
My DVR is an excellent example of this - it's a thorough change in the way that I watch television. Suddenly, it's a family event that we can all share, because when I want to comment, I can just hit pause, and share my thought. Nothing's lost, if needed we can just hit rewind a bit, and suddenly, instead of being annoyed at my daughter for wanting to comment on a point during a televised debate, I'm excited and interested! No more SHUSHSTing at my family, it's now a much more shared experience.
The price of nonlinear access media has dropped so incredibly that marginal-value bits (like video) are suddenly cheap enough to make it all possible.
While most companies don't need to keep this level of data, there are a number who do.
Think of banks and credit card companies who need to store every transaction that happens on their cards. Or supermarkets who store a record of every item anyone purchases. There are a number of business's who need to store hundreds of billions of transactions.
So the fact that movies have gone from 780mb (dvdrips) to 4.8gb (straight up copies) to 25gig (blu ray) doesn't bear any significance to you?
Are people actually storing BD movies on their hard drives these days? In BitTorrent land, movies are still only a gig or so, even the ones ripped from BD, as they always use a better codec like h264 or Xvid, rather than the ridiculously obsolete MPEG2.
Not to mention MS's fiasco of their Office XML format where things take up a ridiculous amount of space in comparison
And i'd also be worried about losing a PB all at once. There are TB drives at my local Best Buy, but that's a lot to lose at once. i'd rather split my files and programs between two or more smaller drives (and have a RAID).
This might be going slightly offtopic but yeah I've noticed that with the increases in data size, an increase in backup awareness and redundancy has been percolating down even to the home users.
For example, recently I set up a mirrored drive system for my stepdad for his home photos (which are somewhere in the 200GB range as he is semi-professional) just in case one drive goes out. Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?
Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?
Backup 200GB+ of data to DVD's? Are you mad? That's 25-50 disks just for the initial backup, and you probably want twice that to handle discs going bad.
Get two or three external disks (ESATA ideally; you can run SMART self tests, get better transfer rates, etc). Use a decent incremental backup tool to make versioned snapshots to them, rotating the drives periodically; keep one in storage, and ideally one off-site. Faster, less hassle, more robust and more flexible than a pile-o-DVDs.
While I was only looking to back up maybe 15-20GB subset of that data, floating the idea of DVD's for it, you do have a point there. I can probably do a decent backup with more external HD's and cheaper too in the long run. Thanks for the sanity check bud!
Petabytes are actually pretty common in the sciences. I visited NCAR (National Center for Atmospheric Research [ucar.edu]) in Boulder five years ago and their main database was in the 2PB region even then. I'm sure it's a lot larger today
The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.
don't forget projects like LOFAR [wikipedia.org] (snippets from lofar website)
In the first digital processing step 256 kHz subbands are formed. Only a subset of these bands is further processed. The maximum total bandwidth selected for further processing will be 32 MHz. Each Remote Station delivers a single dual polarization beam at 32 MHz, or 8 dual polarization beams at 4 MHz or any combination in between. The resulting output data rate is 2 Gb/s. The secondary filtering stage (to 1kHz channels) is done in the Central P
You can have only so much useful information about anything.
If you have the space available and the tools to utilize the stored data, why not? The more data you keep, the more information you will have available when techniques or routines become available to you to utilize this data.
Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.
Unfortunately, you gather up a lot of digital stuff fast and most of the time it's not useful. Take for example my business mail, it's full of old presentations and random versions of various documents and whatnot. Is it worth cleaning up? No. Is it worth keeping? Well, from time to time clients start asking about old things and it's very useful to have it. I figure 90% of it could be deleted, only keeping final versions and important mails. Of those 90% will never be asked for again, so I keep 100% for maybe 1%. Make a company with hundreds of thousands of people all like that and you get huge, huge amounts of data. It's still cheaper than to go through those huge, huge amounts of data. That goes double for many automated data collection processes - it's cheaper to keep until it's all guaranteed useless than trying to sort it out.
I'm currently working on a project which has a working database of around 1.5 petabytes (at last count).
What's more, this database is constantly ingesting more data and shuffling off old data to tape archives. If the technology was available, this DB would be even bigger so we wouldn't have to retrieve data from archives in order to query data more than a year old.
There is an unbelievable amount of data out there. As long as there is somewhere to put it, we will find reasons to stick it in a database and an
a. How on earth would you know? Do you work in a data-intensive industry?
b. Do you understand what a data warehouse even is?
c. Data mining is statistically based. The more information that's available to mine, the more accurate the results will be. And by "information", I don't mean some kid's hard drive filled with terrible mp3s and downloaded movies.
Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.
A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.
It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.
If you download your ass of and you don't want to delete your porn, games, movies (BluRay) and music (uncompressed) and your hdd/ssd' are in raid formation so that they back each other up with a journaling filesystem and they are partitioned for all your Linux/*BSD/Windows distros and you have never thrown a single file away because you back everything up and place it all back after a cleaaan install... I think you are going to want to have petabyte storage capacity.
In the Financial bussines. We need to keep all trading data for at least 7 years. Most client firms use automated quoting systems. So traffic is substantial. 100's of megabytes for multiple systems of data are generated each day.(Quoting, Trading, Clearing, Reporting,...). IO performance and ACK in millisecond. Are also very important.
Oh, come on. (Score:5, Interesting)
Re:Oh, come on. (Score:5, Insightful)
So the fact that movies have gone from 780mb (dvdrips) to 4.8gb (straight up copies) to 25gig (blu ray) doesn't bear any significance to you?
Or how about games which have gone from 1mb to installations that are upwards of 10gigs now (warhammer IIRC is 9 something).
Not to mention MS's fiasco of their Office XML format where things take up a ridiculous amount of space in comparison to open office (10mb docx vs 2.9mb open office)...it's all about the level of tech knowledge of someone that determines their space usage.
I wouldn't mind 3-4 TB, I'd split it off into about 4 partitions or raid stripe and call it a day for a while.
However consumer use is indicative of business use, so I would expect things to head towards exabyte eventually.
Re:Oh, come on. (Score:5, Insightful)
This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB.
On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.
Re: (Score:2)
"This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB."
That's true for small companies but places like Digg and any site that gets a lot of comments would very quickly fill up that TB.
Re: (Score:2)
More long-tail economics! (Score:5, Interesting)
On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.
Yep. That's exactly it. $200 today buys a 1 TB drive. $200 a few years ago bought a 1 GB drive. As the price has fallen the value of the HDD has risen relative to its cost. Those archive directories and development junk aren't being deleted because they have value. Sure, it's enough value to justify keeping them around when a 1 GB drive costs $200, but they are worth keeping around with a 1 TB drive costs that much.
They aren't "doing nothing" - they just aren't doing enough that it's worth keeping it until the price drops enough.
All of this is making the 1 TB drive considerably more valuable than the 1 GB drive, despite their original purchase price parity. This is long-tail economics at work [wired.com]. As the individual bits become worth less and less, the value in of the bits in total continues to rise, resulting in a completely new set of capabilities.
My DVR is an excellent example of this - it's a thorough change in the way that I watch television. Suddenly, it's a family event that we can all share, because when I want to comment, I can just hit pause, and share my thought. Nothing's lost, if needed we can just hit rewind a bit, and suddenly, instead of being annoyed at my daughter for wanting to comment on a point during a televised debate, I'm excited and interested! No more SHUSHSTing at my family, it's now a much more shared experience.
The price of nonlinear access media has dropped so incredibly that marginal-value bits (like video) are suddenly cheap enough to make it all possible.
$200? (Score:2)
For $200 you could almost get two 1TB drives [pricewatch.com].
Re: (Score:1)
Re: (Score:2)
So the fact that movies have gone from 780mb (dvdrips) to 4.8gb (straight up copies) to 25gig (blu ray) doesn't bear any significance to you?
Are people actually storing BD movies on their hard drives these days? In BitTorrent land, movies are still only a gig or so, even the ones ripped from BD, as they always use a better codec like h264 or Xvid, rather than the ridiculously obsolete MPEG2.
Not to mention MS's fiasco of their Office XML format where things take up a ridiculous amount of space in comparison
Re:Oh, come on. (Score:4, Insightful)
Agreed.
And i'd also be worried about losing a PB all at once. There are TB drives at my local Best Buy, but that's a lot to lose at once. i'd rather split my files and programs between two or more smaller drives (and have a RAID).
Re: (Score:2)
This might be going slightly offtopic but yeah I've noticed that with the increases in data size, an increase in backup awareness and redundancy has been percolating down even to the home users.
For example, recently I set up a mirrored drive system for my stepdad for his home photos (which are somewhere in the 200GB range as he is semi-professional) just in case one drive goes out. Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?
Re: (Score:2)
Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?
Backup 200GB+ of data to DVD's? Are you mad? That's 25-50 disks just for the initial backup, and you probably want twice that to handle discs going bad.
Get two or three external disks (ESATA ideally; you can run SMART self tests, get better transfer rates, etc). Use a decent incremental backup tool to make versioned snapshots to them, rotating the drives periodically; keep one in storage, and ideally one off-site. Faster, less hassle, more robust and more flexible than a pile-o-DVDs.
Re: (Score:1)
I won't call you old fashioned... (Score:4, Insightful)
... but I do wonder if you've ever heard of Sarbanes-Oxley.
Science! (Score:5, Informative)
The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.
Re: (Score:2)
The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.
Shit, I'm working on those 2 projects. I'd better ask for a bigger hard drive to management...
Re: (Score:1)
In the first digital processing step 256 kHz subbands are formed. Only a subset of these bands is further processed. The maximum total bandwidth selected for further processing will be 32 MHz. Each Remote Station delivers a single dual polarization beam at 32 MHz, or 8 dual polarization beams at 4 MHz or any combination in between. The resulting output data rate is 2 Gb/s. The secondary filtering stage (to 1kHz channels) is done in the Central P
Re: (Score:2)
The LHC will generate several PB of data per year
I know 1080p60 takes a lot of space, but I'm not sure I want to see that much hardon's colliding...
Re:Oh, come on. (Score:4, Funny)
Re: (Score:2)
You can have only so much useful information about anything.
If you have the space available and the tools to utilize the stored data, why not? The more data you keep, the more information you will have available when techniques or routines become available to you to utilize this data.
Re:Oh, come on. (Score:4, Insightful)
Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.
Unfortunately, you gather up a lot of digital stuff fast and most of the time it's not useful. Take for example my business mail, it's full of old presentations and random versions of various documents and whatnot. Is it worth cleaning up? No. Is it worth keeping? Well, from time to time clients start asking about old things and it's very useful to have it. I figure 90% of it could be deleted, only keeping final versions and important mails. Of those 90% will never be asked for again, so I keep 100% for maybe 1%. Make a company with hundreds of thousands of people all like that and you get huge, huge amounts of data. It's still cheaper than to go through those huge, huge amounts of data. That goes double for many automated data collection processes - it's cheaper to keep until it's all guaranteed useless than trying to sort it out.
Re: (Score:1)
Re: (Score:1)
I'm currently working on a project which has a working database of around 1.5 petabytes (at last count).
What's more, this database is constantly ingesting more data and shuffling off old data to tape archives. If the technology was available, this DB would be even bigger so we wouldn't have to retrieve data from archives in order to query data more than a year old.
There is an unbelievable amount of data out there. As long as there is somewhere to put it, we will find reasons to stick it in a database and an
Re: (Score:2)
a. How on earth would you know? Do you work in a data-intensive industry?
b. Do you understand what a data warehouse even is?
c. Data mining is statistically based. The more information that's available to mine, the more accurate the results will be. And by "information", I don't mean some kid's hard drive filled with terrible mp3s and downloaded movies.
Re:Oh, come on. (Score:5, Interesting)
Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.
A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.
It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)