Amazon Launches Public Data Sets To Spur Research 82
turnkeylinux writes "Amazon just launched its Public Data Sets service (home). The project encourages developers, researchers, universities, and businesses to upload large (non-confidential) data sets to Amazon — things like census data, genomes, etc. — and then let others integrate that data into their own AWS applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various US Census databases, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl."
Re:Check off privacy (Score:5, Interesting)
The less privacy we have, the less we have to worry about our privacy. That sounds flip, and along the lines of "if you have nothing to hide..." but it isn't.
We want privacy primarly due to shame.
We have shame because we wear masks almost 100% of the time.
We wear masks don't want people to realize who we 'really are' either mentally or phyically.
We don't want people to really know us because we have been convinced to hold ourselves to standards that no one actually meets.
We hold ourselves to these standards because everyone else is wearing masks and while we can tell ourselves that 'they are just like us', it's hard to grasp that cognatively without actual proof.
If there were no privacy, no one could wear a mask. If no one were wearing a mask, we would realize that the standards we hold ourselves to are unrealistic. If we realize the standards we hold ourselves to are unrealisitic, we are freed from shame. If we are freed from shame, we no longer find privacy necessary.
Re:Sounds like "Give us data so we can charge you" (Score:3, Interesting)
Anyway, now that it's done, putting something like this on Amazon would be great (if I had the rights to the original clips). Not only would it save someone else the work, but researchers would be using a real, tough data set. Plus, it might get corrections (no way I didn't make at least a few mistakes in all those clips), and it might get added to (there are so many different sounds in this world, no way is this data set complete). Alternately, if I was a researcher now and I got my hands on this, it would save months of work, months of pay to an RA, a semester's tuition, even I did have to pay for cycles.
On the other hand, I think there are a few places that do this, possibly for free. I want to say...Wolfram maybe? Plus, there's specialty ones. I think there's a big facial recognition set etc.
Re:Catch 22 (Score:4, Interesting)
Note that on Amazon's website they say that you can only access the data if you're paying them to crunch numbers on their cloud computers.
That is, you can't just download the data off their sites, which would be the nice thing to do.
And you know what you can do with a cloud computer, my little rocket scientist? You can set up a frickin' web server. And then you can download anything your precious heart desires.
Re:Selling EC2 service? (Score:3, Interesting)
> If they had a culture that was mainly revenue-focused, I'd expect this idea to get shot
> down, because some penny-pincher would argue that they'd make more money from people
> uploading duplicates of these giant data sets over and over.
And a clever marketing man would counter that this is an opportunity to achieve lock-in by establishing exclusive access to a large number of datasets. Once people have built large, complex applications that use a number of these datasets in Amazon's environment and format it will very difficult for them to move elsewhere. To marketing people "community"=="locked-in customers".