Slashdot Log In
How Facebook Stores Billions of Photos
Posted by
CmdrTaco
on Wed Jun 25, 2008 10:57 AM
from the laser-printer-and-a-warehouse-i-figure dept.
from the laser-printer-and-a-warehouse-i-figure dept.
David Gobaud writes "Jason Sobel, the manager of infrastructure engineering at Facebook, gave an interesting presentation titled Needle in a Haystack:
Efficient Storage of Billions of Photos at Stanford for the Stanford ACM. Jason explains how Facebook efficiently stores ~6.5 billion images, in 4 or 5 sizes each, totaling ~30 billion files, and a total of 540 TB and serving 475,000 images per second at peak. The presentation is now online here in the form of a Flowgram."
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Photos? You mean people use FB for photos too? (Score:5, Funny)
I thought it was created just so that you could have all your spam and silly forwards in one place.
Re:Photos? You mean people use FB for photos too? (Score:5, Insightful)
If you used the service, you'd know that Facebook privacy settings are actually implemented very well. For example, I set up an account for my mother so she can look at all her siblings photos. She hasn't been bothered by anyone outside of the family, and is really enjoying the ability to communicate with everyone.
The best thing I can compare it to is AOL. Its got a built in Email clone, IM service, Forums, Groups, and of course, profiles. But unlike AOL, Facebook is just a web page. There's no lock in - its more of a resource provider than a service provider.
Parent
Re:Photos? You mean people use FB for photos too? (Score:5, Funny)
I thought it was created just so that you could have all your spam and silly forwards in one place.
Then proceed to futher prove the GP post by saying:
The best thing I can compare it to is AOL
Parent
Re:Photos? You mean people use FB for photos too? (Score:5, Insightful)
This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.
Facebook started off (stolen idea or not) as a site with some php and a database. In the early years there were no applications or photos. They've managed to scale PHP beyond what most slashdotters will say PHP can even do. They've even contributed some of their stuff back to the PHP community. [facebook.com]
Look at some other similar 'home grown' sites that have had to quickly scale and invent stuff just to stay a float.
Archive.org has their pentabox [archive.org]
Google has their Google File System [google.com] and all of their own hard ware design.
Hopefully the site will recover. 540TB of data and 500k images per second while at the same time being able to process photos near instantly in the background to 4-5 different sizes is nothing to ignore. Fortune 500 companies could probably learn a thing or two...
Parent
Already been done. (Score:5, Informative)
This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.
...Fortune 500 companies could probably learn a thing or two...
Parent
I dunno. (Score:5, Funny)
But seeing as how this just got posted and already it's Slashdotted, I'll bet it's not the same way Flowgram stores its presentations.
Re:I dunno. (Score:4, Informative)
Parent
Re:I dunno. (Score:5, Funny)
Parent
Re:I dunno. (Score:5, Funny)
In other news, companies in the UK reported record productivity this afternoon.
Parent
Re:I dunno. (Score:5, Insightful)
In the late 90's we stopped using documents with images and text), because they had the following disadvantage:
1) Printable
2) Searchable
3) You could look over them at a glance to find information
We replaced them by the fabulous presentation with voice-over.
It removed part of the ability to scan over information, to search, and to print.
Unfortunately, it still had the disage of letting the user seek to some part of the presentation, so another iteration was needed.
Now, welcome to the 21th century. Thanks to flowgram, you don't have to worry about printing anymore (you can't), or searching (you can't), or even pausing, going forward, or doing anything (you can't).
If you get a phone call in the middle of the presentation, though luck. And of course, you have no way of knowing how long it is, how long is left, or anything. And if you miss a word or a sentence, you can always restart the presentation and listen more carefully the next time.
I must congratulate the folks over flowgram.com. It seems very hard to have some idea that could be less usable. I'm pretty sure there is someone somewhere working hard at this, and some VC will give him money for that, but, for now, if you want to put have a shitty unusbale presentation online, flowgram is the way to go.
Parent
How X Stores Billions of Photos (Score:5, Funny)
he has a premium acct... (Score:5, Funny)
at Flickr
FLASH?! (Score:4, Funny)
"You either have javascript turned off or you have an older version of Adobe Flash."
That was an informative article but I didn't see anything about Facebook. At least there weren't ads and they kept it to one page!
Slashdotted (Score:5, Funny)
Does anyone see the irony in Flowgram's demonstration?
Flowgram Guy 1: "OK, this is how Facebook stores billions of photos and serves thousands of them each second"
Flowgram Guy 2: "Cool, maybe we should implement that technology"
Flowgram Guy 1: "Why? It's not as if we're ever going to have our servers swamped with thousands of requests..."
Re:Slashdotted (Score:5, Funny)
Parent
The peak is a paltry 0.45e6/s? (Score:4, Funny)
Let's all go look at pictures on fb from 12 noon EST to 12:05 EST. That ought to show them...
I 3 Myspace hunni!
Cheers!
Transcript? (Score:5, Insightful)
I don't suppose there's a transcript of this anywhere, is there? That + slides would be infinitely more useful....
Full sized images, please (Score:5, Insightful)
I wish that facebook wouldn't resize its images on the backend. My friends all post pictures from parties/trips, etc.. there, and I'd love to be able to just download the full res version to send off to be printed, but facebook resizes the largest dimension to be ~600px, which is pretty worthless for printing.
Yeah yeaj. there's other sites that don't, and I post my stuff there (to flickr, personally), but convincing that one person who took the nice photo of you to do it too is near impossible.
Not hard (Score:5, Insightful)
While the article is slashdotted, this is not a hard problem. It has an expense involved, but it is not difficult.
So, as another poster implied, 18K per photo on average, so about 8Gig per second, peak.
So, assuming that the pictures are evenly distributed, you'd need a bunch of machines and a good number of "tubes" and a way of directing requests to the correct image server or server cluster.
So, what's the problem? Why would you think this is difficult? It's all off the shelf technology, just a bunch of it.
I find (Score:5, Insightful)
But if you already know everything, by all means, shoot. But the outline that just got you modded as insightful isn't an application, didn't detail redundancy of any sort and would be a management nightmare (ie, all the interesting stuff).
I mean really, we could propose that solution to just about any web based application but that's not hardly the story is it?
Parent
When you are talking 500Tb, you hit limits (Score:5, Insightful)
No, I dont work for them, but I do work for another company facing similar storage/distribution problems. When things get this big, its not simply "take what works and just make it bigger or get more of them", you have to start redesigning things. For a bad car analogy: its like saying a passenger train is just a bunch of greyhound busses.
tm
Parent
Facebook needs to add more processing capacity (Score:5, Interesting)
I put some short video clips on Facebook's video application (just stuff of my daughter for my friends and family to see). These are AVI files generated by my digital camera, about 20-30MB in size, lasting about 1-1.5 minutes each.
They uploaded pretty quickly, but then they were put in a queue to be encoded for their flash player. It took over 3 days for them to be online in my profile! It seems they don't need to just have large capacity for storage, but a bunch more CPU for processing.
Next paper (Score:5, Funny)
User-mode GoogleFS (Score:5, Informative)
(summarizing the big long presentation)
This is basically want to make a usermode GoogleFS. Their biggest problem is reducing reads - which are hampered by Posix file standards (inodes, metadata, etc...)
Instead they use a database-like index/data file arrangement. The index stays in memory and files are stored together in large contiguous spaces on a single file. It's possible to utilize a LUN for storage - but not there yet.
There... where's my cookie?
(Oddly enough - I'm writing the exact same code they are... bazaar world, eh??)
Re:540TB / 30 billion images (Score:5, Informative)
Parent