Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.
a. How on earth would you know? Do you work in a data-intensive industry?
b. Do you understand what a data warehouse even is?
c. Data mining is statistically based. The more information that's available to mine, the more accurate the results will be. And by "information", I don't mean some kid's hard drive filled with terrible mp3s and downloaded movies.
Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.
A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.
It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.
Oh, come on. (Score:5, Interesting)
Re:Oh, come on. (Score:2)
a. How on earth would you know? Do you work in a data-intensive industry?
b. Do you understand what a data warehouse even is?
c. Data mining is statistically based. The more information that's available to mine, the more accurate the results will be. And by "information", I don't mean some kid's hard drive filled with terrible mp3s and downloaded movies.
Re:Oh, come on. (Score:5, Interesting)
Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.
A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.
It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.