Forgot your password?
typodupeerror
Databases Science

Amazon Launches Public Data Sets To Spur Research 82

Posted by kdawson
from the put-it-there dept.
turnkeylinux writes "Amazon just launched its Public Data Sets service (home). The project encourages developers, researchers, universities, and businesses to upload large (non-confidential) data sets to Amazon — things like census data, genomes, etc. — and then let others integrate that data into their own AWS applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various US Census databases, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl."
This discussion has been archived. No new comments can be posted.

Amazon Launches Public Data Sets To Spur Research

Comments Filter:
  • by kellyb9 (954229) on Friday December 05, 2008 @11:00AM (#26002711)

    One more step to a non private world CHECK

    Depends on what you upload. Census data isn't private.

  • Re:Privacy? (Score:5, Insightful)

    by russotto (537200) on Friday December 05, 2008 @11:07AM (#26002797) Journal

    It is my understanding that this data was already obtainable in the first place.

    This is true. But the easier it is to obtain datasets like these, the easier it is for anyone to do data mining and correlate the public (presumably non-identified) datasets with any private data they do happen to have.

  • by bonyari (697573) on Friday December 05, 2008 @11:14AM (#26002865)
    This just looks like a way to sell there cloud computing services. They provide the free data and you provide the monthly service fee.
  • Catch 22 (Score:3, Insightful)

    by Anonymous Coward on Friday December 05, 2008 @11:21AM (#26002943)

    Note that on Amazon's website they say that you can only access the data if you're paying them to crunch numbers on their cloud computers.
    That is, you can't just download the data off their sites, which would be the nice thing to do.
    As such, this article is nothing more than a slashvertizement.

  • by johnsonav (1098915) on Friday December 05, 2008 @11:29AM (#26003027) Journal

    One more step to a non private world CHECK

    Privacy, as we have experienced in the last hundred years, is on its way out anyway. The sheer volume, immortality, and interconnection of, even publicly available, datasets inadvertently reveal information most of us would rather keep private. Much like how most people don't have a problem with beat cops regularly patrolling an area, but feel threatened by cameras monitoring, recording, analyzing, and storing information about the same public area.

    That said, its here to stay. The data's here as long as we use credit cards for most purchases, use I-Pass(or similar) toll paying systems, carry GPS enabled cell phones, and expect the police to protect us from 100% of terrorist and criminal bogeymen. We might as well get some private research done, rather than leave it all to the government and big business.

  • by truthsearch (249536) on Friday December 05, 2008 @11:43AM (#26003225) Homepage Journal

    Privacy, as we have experienced in the last hundred years, is on its way out anyway.

    It was only recently on its way in. For most of history people lived in small communities where everyone knew each others' business. Privacy only seemed to become a major concern when technology let us share information across large distances and with many more people.

    I'm not commenting on whether that's a good or bad thing.

  • by tylerni7 (944579) on Friday December 05, 2008 @11:58AM (#26003373) Homepage
    We (or at least some of us) also want privacy to prevent annoyances and for protection.

    I certainly don't want to have to answer to the government anytime I say the word "bomb" or "terrorist" on the telephone, in email, or in an IM.
    I also don't want some company complaining anytime they see me buy a product from one of their competitors.
    I also don't want to have everyone on the internet knowing my social security number, address, license plate number, or telephone number.

    That isn't because of "shame" that's because people can be assholes, and some people will abuse information. I don't care if people that I trust know these things, but I don't think shame or masks or whatever has anything to do with getting one's identity stolen, or having the government ensure you don't say anything bad about them.

    That said, I don't think this public dataset business really affects individual privacy. This is more a database of already public, but hard to find, data, that doesn't contain personally identifiable anything in it.
    Let's just hope they keep it that way.
  • by Morgaine (4316) on Friday December 05, 2008 @12:07PM (#26003503)

    If the uploaded data is not available for download, but is only available to AWS applications running on Amazon's (paid for) compute service, then Amazon deserves nothing but contempt and an "Up yours" for this.

    It seems that working for a living is out of fashion at Amazon. They expect people to supply them with resources so that they can charge them and others for their use. It's creative business bullshit, and not even remotely funny.

    Amazon, how about you PAY BACK for the privilege of having the datasets uploaded to you by hosting them freely for the Internet community, and only on the back of that you charge for local, higher-speed access by AWS applications? Or would that be too "fair" for an Amazon business practice?

  • by MikeURL (890801) on Friday December 05, 2008 @12:14PM (#26003589) Journal
    Most people never experienced Usenet. They never got to see offhand comments they made 15 years ago still searchable today.

    I think if everyone had a chance to really live the immortality of data in that way they'd be a LOT more scared. As it is, most 'immortal' data lives out of our sight and lurks behind the scenes. Our credit card charge history isn't available in Google so it is easy to think it is gone.

    Having said that...I think you're right. There is a cloud-like structure developing out there where virtually EVERY electronic transaction will leave a permanent record. There will come a skynet-like point where we won't even have the option of simply restoring privacy. The entire system is built upon the premise that privacy isn't really all that important. As it is you can get people to surrender virtually all explicit privacy if you give then a free iPhone (or whatever gadget they were offering) and implicit privacy is mostly an illusion already.
  • Not the same (Score:3, Insightful)

    by Nerdposeur (910128) on Friday December 05, 2008 @12:46PM (#26004025) Journal

    [Privacy] was only recently on its way in. For most of history people lived in small communities where everyone knew each others' business.

    Which is very different from a large society in which some people know everybody else's business.

    Even if this stuff is public, the time and money and knowledge necessary to use it will not be evenly distributed.

  • by dubl-u (51156) * <2523987012&pota,to> on Friday December 05, 2008 @12:55PM (#26004175)

    This just looks like a way to sell there cloud computing services. They provide the free data and you provide the monthly service fee.

    I'd bet that's not quite how they think about it.

    I once had the fortune to work on a small project for a guy who had built a pretty large software company and then sold it. He said that he always looked to do something interesting first, and then figured out how to make it not lose money, because money-losers aren't sustainable.

    I don't know anybody at Amazon anymore, but from my pals who did work there, my guess is that AWS has a similar culture: they seek out the useful and interesting, and actually do the ideas they can make pay for themselves.

    If they had a culture that was mainly revenue-focused, I'd expect this idea to get shot down, because some penny-pincher would argue that they'd make more money from people uploading duplicates of these giant data sets over and over.

  • Re:Not the same (Score:4, Insightful)

    by johnsonav (1098915) on Friday December 05, 2008 @01:06PM (#26004323) Journal

    Which is very different from a large society in which some people know everybody else's business.
    Even if this stuff is public, the time and money and knowledge necessary to use it will not be evenly distributed.

    Information has never been evenly distributed. In small communities it was the neighborhood gossip, the corner pharmacist, the village priest, or the county sheriff who knew everybody's business. The replacement of social capital with monetary capital is the only difference.

    Those small communities had, however, a fast-acting, closely monitored feedback system. If someone abused their position of power and trust, it was caught quickly and it was easy to remove them from the loop. A similar system is needed now, only on a national, or worldwide scale. I think the only way to accomplish this, without going back to a pre-computer society, is to make sure that as much information about the watchers is as publicly accessible as possible. Hopefully, the same spirit that makes the OSS community so vibrant and quick to act will transfer to this new domain.

  • Re:Privacy? (Score:3, Insightful)

    by dubl-u (51156) * <2523987012&pota,to> on Friday December 05, 2008 @01:43PM (#26004755)

    Yes, but at least now we are all able to do data mining in large databases.

    This is absolutely the case.

    The web has made vast amounts of information available, so you would think it would play into the "computers will bring about the age of big brother" that was so prominent during the 60s. But it hasn't. Instead, because everybody can afford computers and bandwidth, is had distributed power rather than concentrating it.

    The rich and powerful already have access to vast datasets, and the computing and human power necessary to mine them. Things like Google and Wikipedia and blogs have given everybody a taste of that power, and I'm in favor of anything that helps level the playing field.

  • by tylerni7 (944579) on Friday December 05, 2008 @02:00PM (#26004985) Homepage
    If my phone number and address were available, then people could easily contact and harass me. It's true that they could do the same to anyone, but that doesn't mean they will stop harassing people all together. Instead what would (probably) happen, is people would just choose who they want to harass. (Just think about 4chan, for instance, they don't do it because it's difficult, they do it to harass people)

    Likewise, the government wouldn't just change laws, instead they would (probably) just use the information they have to go after people they don't like.

    I am just speculating of course, and you do have a lot of valid points, like with SSNs for isntance. But I don't agree that if society was completely open, people would suddenly stop abusing their power and stop being assholes to other people. Instead, it would just be easier for them to do these things.
  • by Slashdot Parent (995749) on Friday December 05, 2008 @06:10PM (#26008039)

    A) Home Bandwidth is a sunk cost. Transferring it wouldn't ahve cost you more then a penny more then you are paying. Assuming you pay a flat rate.

    My time is not a sunk cost.

    B) Transferring the data would made it available to you for free, anytime.

    Most of these datasets are hundreds of GB in size. That's going to take a long time to download and it's going to mean buying a new hard disk and/or deleting your pornography collection.

    The whole idea here is that if you are an AWS customer, and you're crunching a bunch of numbers, and need to crunch some census/genome/whatever data, you can type 'ec2-create-volume --snapshot <snapshotId>' and now that dataset can be attached to any EC2 instance. You don't have to wait to transfer the data in, and you don't have to pay the $0.10/GB to transfer the data in. The data sets are there for you when you need them.

    If you are not an AWS customer, then this isn't for you. Move along, now.

Pause for storage relocation.

Working...