Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Databases Science

Amazon Launches Public Data Sets To Spur Research 82

turnkeylinux writes "Amazon just launched its Public Data Sets service (home). The project encourages developers, researchers, universities, and businesses to upload large (non-confidential) data sets to Amazon — things like census data, genomes, etc. — and then let others integrate that data into their own AWS applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various US Census databases, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl."
This discussion has been archived. No new comments can be posted.

Amazon Launches Public Data Sets To Spur Research

Comments Filter:
  • Re:Privacy? (Score:5, Informative)

    by Frosty Piss ( 770223 ) on Friday December 05, 2008 @11:17AM (#26002909)
    The US Census Bureau charges to access much of their datasets.
  • What's the license? (Score:3, Informative)

    by SanityInAnarchy ( 655584 ) <ninja@slaphack.com> on Friday December 05, 2008 @01:11PM (#26004389) Journal

    You'll recall that Amazon's "cloud computers" (ugh) are by the hour, and are pretty much root access to a VM. Unless there's a specific legal reason you can't, it's always possible to just download the data -- you'd just pay a bit for the time that instance must be up, and for the data transferred.

    However, for those of us who already are using EC2, it's nice to not have to download the whole set -- which can be terabytes, for some of these -- and instead be able to simply mount it from wherever it is and work with it right away. Especially when you consider the cost of downloading terabytes worth of data from Amazon's web services, at 17 cents per gigabyte -- reasonable, but still probably more than you wanted to just query the stuff.

    I suspect, also, that at least some of these will be made available via a web service of some sort, maybe even free, by some of those people using that service.

  • by dubl-u ( 51156 ) * <.ot.atop. .ta. .2107893252.> on Friday December 05, 2008 @01:28PM (#26004555)

    If the uploaded data is not available for download, but is only available to AWS applications running on Amazon's (paid for) compute service, then Amazon deserves nothing but contempt and an "Up yours" for this.

    Seriously? Or did somebody just put sand in your pancakes this morning?

    As an AWS user, I think this is great. It means I don't have to waste time and money copying over a public dataset. When I read about this I fired up a virtual Linux box, attached the census data as /dev/sdb, and spent a couple hours rummaging. Total cost: $0.70. If I had had to copy everything over first, it would have been $20 in bandwidth, plus a long time waiting for the 200 GB to transfer.

    You realize that these datasets are public, right? For the census one, you can already download it for free [census.gov]. Do you want Amazon to make it extra-super-free or something?

    I presume it's the same for the others. But if not, you should put your money where your very active mouth is. It would take maybe 15 minutes work to get an Amazon server up and running, attach all the public datasets, and set up a web server.

    I'm so very tired of people who say "somebody should do X!" but aren't willing to be that somebody.

  • by Anonymous Coward on Friday December 05, 2008 @02:21PM (#26005265)

    These data must be public to begin with, or they wouldn't host them:

    If you have a public domain or non-proprietary data set that you think is useful and interesting to the AWS community [...] You must have the right to make the data freely available.

    (How to share a public data set on AWS [amazon.com])

    So I guess there isn't even a license. Free as in go grab 'em.

  • by Anonymous Coward on Friday December 05, 2008 @02:36PM (#26005467)

    plus much much more at:
    at
    http://genome.ucsc.edu/

    http://www.ensembl.org/index.html

    this is just a way to access it from amazon compute cloud.

All the simple programs have been written.

Working...