Facebook's Prism, Soon To Be Open Sourced, Gives Hadoop Delay Tolerance 17
snydeq writes "Facebook has said that it will soon open source Prism, an internal project that supports geographically distributed Hadoop data stores, thereby removing the limits on Hadoop's capacity to crunch data. 'The problem is that Hadoop must confine data to one physical data center location. Although Hadoop is a batch processing system, it's tightly coupled, and it will not tolerate more than a few milliseconds delay among servers in a Hadoop cluster. With Prism, a logical abstraction layer is added so that a Hadoop cluster can run across multiple data centers, effectively removing limits on capacity.'"
This changes everything (Score:1)
...but when?
Re:This changes everything (Score:5, Insightful)
Re:This changes everything (Score:4, Funny)
Sounds like it shouldn't be hard to...Hadooplicate?
Re: (Score:2)
soon
I think it's great (Score:1)
Love to see useful stuff open sourced, but part of me is annoyed it is Facebook doing it.
Facebook Prison (Score:1)
What? (Score:2)
You typically have O(ms) seek latency for hard drives, does this mean that Facebook had all data in RAM before they made Prism?
Re: (Score:2)
Google does that, I wouldn't be surprised that facebook does it too
Re: (Score:2)
Altavista was doing this way back. When the typical Windows desktop was 16 - 32 MB RAM, they have a RAM cache of up to 64GB.
(Relatively) lay explanation of bottleneck? (Score:2)
What is the sub-problem when running a Hadoop job that has this bottleneck and requires such low latency? Is it something that could have been avoided for a start?
And how does (or if, predictably, the media reports don't explain it, *would*) a logical abstraction layer solve this problem such that Hadoop's programmers couldn't have more easily done it within the application's own code?