Forgot your password?
typodupeerror
Databases Science

Amazon Launches Public Data Sets To Spur Research 82

Posted by kdawson
from the put-it-there dept.
turnkeylinux writes "Amazon just launched its Public Data Sets service (home). The project encourages developers, researchers, universities, and businesses to upload large (non-confidential) data sets to Amazon — things like census data, genomes, etc. — and then let others integrate that data into their own AWS applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various US Census databases, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl."
This discussion has been archived. No new comments can be posted.

Amazon Launches Public Data Sets To Spur Research

Comments Filter:
  • Re:Check off privacy (Score:5, Interesting)

    by Chyeld (713439) <(moc.liamg) (ta) (dleyhc)> on Friday December 05, 2008 @11:38AM (#26003153)

    The less privacy we have, the less we have to worry about our privacy. That sounds flip, and along the lines of "if you have nothing to hide..." but it isn't.

    We want privacy primarly due to shame.

    We have shame because we wear masks almost 100% of the time.

    We wear masks don't want people to realize who we 'really are' either mentally or phyically.

    We don't want people to really know us because we have been convinced to hold ourselves to standards that no one actually meets.

    We hold ourselves to these standards because everyone else is wearing masks and while we can tell ourselves that 'they are just like us', it's hard to grasp that cognatively without actual proof.

    If there were no privacy, no one could wear a mask. If no one were wearing a mask, we would realize that the standards we hold ourselves to are unrealistic. If we realize the standards we hold ourselves to are unrealisitic, we are freed from shame. If we are freed from shame, we no longer find privacy necessary.

  • by cecille (583022) on Friday December 05, 2008 @12:46PM (#26004019)
    Oh, agreed...it's totally a business move for them, wrapped in the veneer of a good deed. On the other hand...if this is implemented correctly, it could be amazing. I say this as a researcher who has spent more time than necessary gathering data sets. Just as a quick (and painful) example...during my Master's degree, I was doing CI research for a hearing aid application. Without boring you with the details, the idea was to create a system to classify the audio background environment so it could be more effectively removed. For this, I needed a large set of ~1-sec clips of background noise with as much variety as possible. I didn't want to use what we normally call a "toy" data set because this was intended to be actually used. So I wanted variety, but I also wanted combo sounds - it's easy to tell a highway from a room of people, but what about a cityscape, with cars AND people AND a bah-zillion other sounds. Anyway, the result was that I spent MONTHS in a sound booth splitting audio files and listening to EACH 1-sec clip individually and recording exactly what sounds were in the clip and then parsing audio features. It SUCKED.

    Anyway, now that it's done, putting something like this on Amazon would be great (if I had the rights to the original clips). Not only would it save someone else the work, but researchers would be using a real, tough data set. Plus, it might get corrections (no way I didn't make at least a few mistakes in all those clips), and it might get added to (there are so many different sounds in this world, no way is this data set complete). Alternately, if I was a researcher now and I got my hands on this, it would save months of work, months of pay to an RA, a semester's tuition, even I did have to pay for cycles.

    On the other hand, I think there are a few places that do this, possibly for free. I want to say...Wolfram maybe? Plus, there's specialty ones. I think there's a big facial recognition set etc.
  • Re:Catch 22 (Score:4, Interesting)

    by dubl-u (51156) * <2523987012@pota . t o> on Friday December 05, 2008 @01:47PM (#26004803)

    Note that on Amazon's website they say that you can only access the data if you're paying them to crunch numbers on their cloud computers.
    That is, you can't just download the data off their sites, which would be the nice thing to do.

    And you know what you can do with a cloud computer, my little rocket scientist? You can set up a frickin' web server. And then you can download anything your precious heart desires.

  • by John Hasler (414242) on Friday December 05, 2008 @02:06PM (#26005091) Homepage

    > If they had a culture that was mainly revenue-focused, I'd expect this idea to get shot
    > down, because some penny-pincher would argue that they'd make more money from people
    > uploading duplicates of these giant data sets over and over.

    And a clever marketing man would counter that this is an opportunity to achieve lock-in by establishing exclusive access to a large number of datasets. Once people have built large, complex applications that use a number of these datasets in Amazon's environment and format it will very difficult for them to move elsewhere. To marketing people "community"=="locked-in customers".

If you aren't rich you should always look useful. -- Louis-Ferdinand Celine

Working...