How Facebook Stores Billions of Photos 154
David Gobaud writes "Jason Sobel, the manager of infrastructure engineering at Facebook, gave an interesting presentation titled Needle in a Haystack:
Efficient Storage of Billions of Photos at Stanford for the Stanford ACM. Jason explains how Facebook efficiently stores ~6.5 billion images, in 4 or 5 sizes each, totaling ~30 billion files, and a total of 540 TB and serving 475,000 images per second at peak. The presentation is now online here in the form of a Flowgram."
Photos? You mean people use FB for photos too? (Score:5, Funny)
I thought it was created just so that you could have all your spam and silly forwards in one place.
Re:Photos? You mean people use FB for photos too? (Score:5, Insightful)
If you used the service, you'd know that Facebook privacy settings are actually implemented very well. For example, I set up an account for my mother so she can look at all her siblings photos. She hasn't been bothered by anyone outside of the family, and is really enjoying the ability to communicate with everyone.
The best thing I can compare it to is AOL. Its got a built in Email clone, IM service, Forums, Groups, and of course, profiles. But unlike AOL, Facebook is just a web page. There's no lock in - its more of a resource provider than a service provider.
Re:Photos? You mean people use FB for photos too? (Score:5, Funny)
I thought it was created just so that you could have all your spam and silly forwards in one place.
Then proceed to futher prove the GP post by saying:
The best thing I can compare it to is AOL
Re: (Score:1)
Not everyone prides themselves on using a 'cool' isp.
Re:Photos? You mean people use FB for photos too? (Score:4, Funny)
Me too!
Not everyone prides themselves on using a 'cool' isp.
Re: (Score:2)
Re: (Score:2)
Seriously, MySpace is some of the worst software I've ever, ever seen
Re:Photos? You mean people use FB for photos too? (Score:4, Insightful)
If you used the service, you'd know that Facebook privacy settings are actually implemented very well.
Given that I can't look at my sisters photos without signing up for an account I'd say her privacy is being 'protected' solely to induce all her friends and siblings to sacrifice theirs by joining facebook.
I set up an account for my mother so she can look at all her siblings photos.
You don't need facebook for that.
and is really enjoying the ability to communicate with everyone.
or that.
But unlike AOL, Facebook is just a web page. There's no lock in - its more of a resource provider than a service provider.
How exactly is requiring me to create and login to a facebook account to view content someone else wants me to be able to see not lockin?
That's like requiring me to create a gmail account to receive email from people with gmail accounts. Or requiring me to sign up to AOL to see websites hosted by AOL. Facebook is pretty much the definition of lock-in.
Re: (Score:3, Interesting)
I'd need to test it again, but I'm fairly certain FB had a function that let you share albums with non-users by having FB generate a special link you'd give to the user.
Re: (Score:2, Funny)
I totally agree with you about how having friends on facebook is highly offensive and joining it will lead to identity theft and involuntary permanent incarceration in guantanamo bay. My neighbour tried to give facebook fake details, and mark zuckerberg showed up and stabbed him in the eye.
Re: (Score:2)
I totally agree with you about how having friends on facebook is highly offensive and joining it will lead to identity theft and involuntary permanent incarceration in guantanamo bay.
Not all of us are willing to 'join free services' that exist for the sole purpose of collecting our information, profiling us, and selling us to advertisers in exchange for stupid web shinys. I don't have a gmail account either for the same reason. Google collects enough data on me against my will, without me handing it to them
Re: (Score:2)
Re: (Score:3, Interesting)
It's nice to have principles but at the end of the day my friends come first. I can always (ad)block adverts. Oh no, what if they wheedle into my subconcious or the ToS change? Then I'll occasionally make a marginally worse purchasing decision. It's not like i never do that anyway.
Re: (Score:3, Interesting)
Sure, that's a nice idea. But of course then you're paying for it, and most likely so must all your friends and family if they want to share its best features with you. I think a social network built on that model would not grow large. It might fill a niche, but it would have nowhere near the utility of a free-to-join network that promotes sharing information.
What makes a social networking site really gre
Re: (Score:3, Interesting)
Sure, that's a nice idea. But of course then you're paying for it, and most likely so must all your friends and family if they want to share its best features with you.
Lets see, my ISP offers 'free' email, pop3, imap, and webmail access. They offer 'free' access to a reasonable number of usenet groups, and offer a small and fairly limited but entirely usable web hosting package, with tools to make it easy to setup multiple small websites, upload and share photos, and so on.
Is it really 'free'? Of course not
Re: (Score:2)
Yeah my ISP provides all those things too. All the ISPs I've used have. My hosting provider provides the
Re: (Score:3, Interesting)
I haven't used my ISP's mail since I lived on campus back in college. Before I got there, I did not use my dialup ISP's email service. Two reasons: a) email address lock-in; b) the interface sucks.
re: a - the same applies gmail or any other provider.
re: b - there was likely no 'gmail' when you were 'back in college on your dialup isp', and most people used standalone clients, many still have little need for webmail.
Nowadays I've solved the lock-in problem by paying for a domain
makes sense.
and the sucky int
Privacy is a problem ... (Score:2)
And given enough info, our information mining overlords will be able to predict what passwords you use, what sort of "private" proclivities you indulge in, etc. Then your Big Brother issues a subpoena for that shit, and you're f$cked.
Re: (Score:2)
I started to use Facebook, and it seemed to be quite a good way to share things.
Then I read this bit of their terms of use [facebook.com] :
They want your real name. (Score:2)
As your screen name.
That was enough to put me off.
Re: (Score:2)
You mean, unless they specifically set that album to be open to everybody. Which they can do.
By default, profiles and pictures are hidden unless you grant access, which you can do both by friending somebody and by sending/replying to a message. (However, access granted to a recipient merely by sending them a message is temporary)
Re: (Score:1)
I get updates like "Friend A has left a comment on photo X" with a link to the photo and comment - where photo X is in an album of person B - somebody I do not know. I can go view all the photos in that particular album. I'm not very up on how things at facebook work, but has Person B allowed full public access to their photos for me to do this?
Re:Photos? You mean people use FB for photos too? (Score:5, Insightful)
This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.
Facebook started off (stolen idea or not) as a site with some php and a database. In the early years there were no applications or photos. They've managed to scale PHP beyond what most slashdotters will say PHP can even do. They've even contributed some of their stuff back to the PHP community. [facebook.com]
Look at some other similar 'home grown' sites that have had to quickly scale and invent stuff just to stay a float.
Archive.org has their pentabox [archive.org]
Google has their Google File System [google.com] and all of their own hard ware design.
Hopefully the site will recover. 540TB of data and 500k images per second while at the same time being able to process photos near instantly in the background to 4-5 different sizes is nothing to ignore. Fortune 500 companies could probably learn a thing or two...
Already been done. (Score:5, Informative)
This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.
...Fortune 500 companies could probably learn a thing or two...
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Obviously Datatree patented their setup, so they decided to use a very different implementation that works significantly different to avoid having to pay royalties...
Re: (Score:2)
Serving images alone is very different from accepting 100 million uploads weekly.
Also, they were unable to withstand you linking to them.
Re: (Score:1, Funny)
Fortune 500 companies could probably learn a thing or two...
Re: (Score:3, Funny)
Facebook is where you spam your friends with pointless messages about how you've hurled a squirrel at them.
I dunno. (Score:5, Funny)
But seeing as how this just got posted and already it's Slashdotted, I'll bet it's not the same way Flowgram stores its presentations.
Re:I dunno. (Score:4, Informative)
Re:I dunno. (Score:5, Funny)
Re:I dunno. (Score:5, Funny)
In other news, companies in the UK reported record productivity this afternoon.
Re: (Score:3, Informative)
To view the slideshow . . err I mean 'flowgram' (whatever the fuck that's supposed to mean), you dont need to register.
Re:I dunno. (Score:5, Insightful)
In the late 90's we stopped using documents with images and text), because they had the following disadvantage:
1) Printable
2) Searchable
3) You could look over them at a glance to find information
We replaced them by the fabulous presentation with voice-over.
It removed part of the ability to scan over information, to search, and to print.
Unfortunately, it still had the disage of letting the user seek to some part of the presentation, so another iteration was needed.
Now, welcome to the 21th century. Thanks to flowgram, you don't have to worry about printing anymore (you can't), or searching (you can't), or even pausing, going forward, or doing anything (you can't).
If you get a phone call in the middle of the presentation, though luck. And of course, you have no way of knowing how long it is, how long is left, or anything. And if you miss a word or a sentence, you can always restart the presentation and listen more carefully the next time.
I must congratulate the folks over flowgram.com. It seems very hard to have some idea that could be less usable. I'm pretty sure there is someone somewhere working hard at this, and some VC will give him money for that, but, for now, if you want to put have a shitty unusbale presentation online, flowgram is the way to go.
Re: (Score:2)
How X Stores Billions of Photos (Score:5, Funny)
Re: (Score:2)
Cue the cuneiforms of cute girls with no acumen queuing up to watch Cusack's cucumber
Cheers!
--
Vig
he has a premium acct... (Score:5, Funny)
at Flickr
FLASH?! (Score:4, Funny)
"You either have javascript turned off or you have an older version of Adobe Flash."
That was an informative article but I didn't see anything about Facebook. At least there weren't ads and they kept it to one page!
Re: (Score:1)
Re: (Score:2)
They should teach Firefox and Opera how to play video directly. It's not much harder than displaying an image file.
Re: (Score:3, Insightful)
I think it's becoming part of the HTML5 spec; however, it's tremendously more complicated due to the limitless plethora of video formats. With web-oriented images, it's almost all jpegs for photos and typically pngs for graphics, with plenty of gifs around. Tiff is a very established format but never sees use in websites since the files are stupidly large, and most other formats are specific to some editing program. With video, you've got half a dozen Quicktime formats, DivX, XviD, h.264, x264, WMV, Real
Re: (Score:2)
Opera has some support already for embedding video without plugins.
Re: (Score:2, Funny)
Re: (Score:2, Informative)
Re: (Score:3, Informative)
Worked for me from Ubuntu.
Very interesting (Score:3, Informative)
Re: (Score:2)
Slashdotted (Score:5, Funny)
Does anyone see the irony in Flowgram's demonstration?
Flowgram Guy 1: "OK, this is how Facebook stores billions of photos and serves thousands of them each second"
Flowgram Guy 2: "Cool, maybe we should implement that technology"
Flowgram Guy 1: "Why? It's not as if we're ever going to have our servers swamped with thousands of requests..."
Re: (Score:2, Insightful)
That's for sure.
Plus, when their server (singular?) finally responded to me, it requires a later version of Flash than I have. So I can't read the presentation at all. Way to not get the word out, folks.
Re:Slashdotted (Score:5, Funny)
The peak is a paltry 0.45e6/s? (Score:4, Funny)
Let's all go look at pictures on fb from 12 noon EST to 12:05 EST. That ought to show them...
I 3 Myspace hunni!
Cheers!
Re: (Score:2)
Re: (Score:2)
Or 450e3, if you really like engineering notation.
Transcript? (Score:5, Insightful)
I don't suppose there's a transcript of this anywhere, is there? That + slides would be infinitely more useful....
Full sized images, please (Score:5, Insightful)
I wish that facebook wouldn't resize its images on the backend. My friends all post pictures from parties/trips, etc.. there, and I'd love to be able to just download the full res version to send off to be printed, but facebook resizes the largest dimension to be ~600px, which is pretty worthless for printing.
Yeah yeaj. there's other sites that don't, and I post my stuff there (to flickr, personally), but convincing that one person who took the nice photo of you to do it too is near impossible.
Re: (Score:1)
Re: (Score:2)
Trust me, I do ask my friends for them, and they usually get around to it. With all the drive space they have at facebook and as fat as internet connections are for my friends (either at home or at school), the extra space/time would be insignificant compared to the utility of just being able to go, "print". Besides, if facebook integrated it in the site, they could probably make a killing on letting people print directly from the interface (and take a cut along the way)
There is already a site where my frie
Re: (Score:2)
Aside from the fact that most photos on facebook are blurry drunken crap, the copyright and privacy settings coming from that kind of thing would get VERY weird VERY fast. Facebook profiting from selling my photos? Nuh-uh, I don't think so. If they want to display an ad alongside them as my friends view them, I'm okay with that - it's understood as part of using the service; without at least some sort of profit sharing, that would be a big no-no. Maybe if they want to tie in SmugMug or something that I
Re: (Score:2)
>> Aside from the fact that most photos on facebook are blurry drunken crap
If they were blurry and drunken, I wouldn't want them. I'm thinking things like graduation group photos or pictures from study abroad type stuff.
>> copyright issues
They could make it opt-in or add it to the privacy settings "Allow [GROUP OF PEOPLE] to print photos". Or not even have the photo printing, just offer an 'original resolution' option. There's a number of ways they could work around that issue, the problem is th
Re: (Score:2)
Has nobody done this with a Facebook app? The notable hurdle is that people would have to opt-in before uploading pictures.
Re: (Score:2)
That's the big thing. I'd knock something together myself, but if they are storing full resolution images in the DB, they're not exposed to the API
Re: (Score:2)
That's the big thing. I'd knock something together myself, but if they are storing full resolution images in the DB, they're not exposed to the API
You'd need to intercept on upload and store them on your own server. I'm not familiar enough with the Facebook API (I just read the whitepaper when it came out, that's about it) to know if you can intercept core modules.
If you have to add your own image upload app, that raises the hurdle even higher. If your model has enough value, that might not be a problem.
Re: (Score:2)
Re: (Score:2)
Their current photos have a maximum dimension of 604(ish) pixels on its longest side. Maybe increase by a factor of 4 or 9? Even 4.5PB isn't out of the realm of feasibility, it's only 4,500 1TB drives :)
(I know I know about the drives and the difference between server drives and consumer drives, and how contention on individual drives could bring the thing to a grinding halt)
Re: (Score:3, Informative)
It's not ideal, but it works quite well. A friend of mine is a professional photographer and she puts all her work up there. Works well for her.
Re: (Score:2)
It's not me that's the problem, it's my friends.
Does it integrate with the core photo app so that when you hit the 'photos of xxx' or 'photos of you and xxx' button it shows both the core photos and the big photos? Can you tag users that don't have the bigphoto app? If not, then it just won't fly.
Re: (Score:2)
Yup. I wish Facebook would implement some way of performing metadata sharing with external photo sites. I can understand not wanting to store the originals of images unless users paid (and I don't think anyone would pay for photo hosting from Facebook...), but it would be nice if there could be a way for a photo posted on an external site (like Flickr or SmugMug) could appear on a person's Facebook profile *in an integrated manner*.
There are Facebook apps that put galleries from other sites (flickr, SmugM
Re: (Score:2)
I figure if google can offer me 6 gigs in exchange for advertisements, facebook can toss me some storage space in exchange for my social profile. It would be nice though, I agree.
Flowgram slashdotted? (Score:1, Funny)
Flowgram serving 475000 /. users flawlessly , now that would be impressing.
540TB / 30 billion images (Score:4, Informative)
RS
Re:540TB / 30 billion images (Score:5, Informative)
Looks like beta.flowgram.com should be (Score:2, Funny)
Not hard (Score:5, Insightful)
While the article is slashdotted, this is not a hard problem. It has an expense involved, but it is not difficult.
So, as another poster implied, 18K per photo on average, so about 8Gig per second, peak.
So, assuming that the pictures are evenly distributed, you'd need a bunch of machines and a good number of "tubes" and a way of directing requests to the correct image server or server cluster.
So, what's the problem? Why would you think this is difficult? It's all off the shelf technology, just a bunch of it.
Re: (Score:1)
Because the access to the pictures is _not_ evenly distributed. Worse, it's also not consistent.
Now, the question is, is it evenly distributed _enough_, or consistent _enough_. My guess is that it is, at best, _barely_ so, to the point that each backend system needs to be able to handle 2-3 times what the peak would be if it was evenly distributed; that's just a WAG, though. Hopefully the presentation answers that question.
I find (Score:5, Insightful)
But if you already know everything, by all means, shoot. But the outline that just got you modded as insightful isn't an application, didn't detail redundancy of any sort and would be a management nightmare (ie, all the interesting stuff).
I mean really, we could propose that solution to just about any web based application but that's not hardly the story is it?
Re: (Score:2)
When you are talking 500Tb, you hit limits (Score:5, Insightful)
No, I dont work for them, but I do work for another company facing similar storage/distribution problems. When things get this big, its not simply "take what works and just make it bigger or get more of them", you have to start redesigning things. For a bad car analogy: its like saying a passenger train is just a bunch of greyhound busses.
tm
Re: (Score:2)
Limits, like: Netap filers max out at 16Tb (raw) per volume,
Then use more than one.
Use multiple IP addresses and pipes. Balance the images based on popularity. Use redundant storage, hell even use rsync to keep images redundant.
None of this stuff is rocket science. It is all just an erector set.
I do this stuff for a living and there are much harder problems than this.
Now, if you were transcoding the images on the fly, that might be more fun.
Re:Not hard (Score:4, Interesting)
The issue isn't the number of bytes per second, it's the number of distinct requests. The data is _way_ bigger than will fit in memory, and hard disks can only do 100-150 seeks per second so you need a lot of them to serve from disk. A naive implementation will go to disk many times for a single file, because filesystems aren't designed for this many small files. So this is really an issue of getting exactly the right stuff in memory so you can serve hot content from memory, and if you go to disk you seek exactly once instead of several times.
Re: (Score:2)
Most operating systems will do read-ahead caching. If you use something like the sendfile system call then they will swap the entire file in in a single read (if it's only 18KB then this takes a maximum of five seeks, and typically just one). It will then keep it in RAM for a bit, so repeated access to the same photo will be faster.
If they were clever, they would put related photos contiguously on disk and grab them with a single read. If they were really clever then they'd use progressive encoding so
Akamai? (Score:3, Insightful)
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:3, Informative)
That won't work considering the number of files. Given the quote (which require nearly a year of hassle with the Akamai morons and sighing an NDA, thus the AC post) we got from those idiots, it would cost us almost $200k/year given our bandwidth use to store ~1,000 files. Facebook has 30 billion files and assuming the same price per file as we were quoted, Akamai would charge $6,000,000,000,000/year to host them. To put that number in perspective, that's more than the GDP of the Germany plus that of the
Re: (Score:2)
If you watched the presentation you'd know that they do use Akamai as the first layer of their infrastructure.
FaceBook photo viewing is SLLLOOOOOWWWWWW... (Score:2)
Re: (Score:2)
Facebook needs to add more processing capacity (Score:5, Interesting)
I put some short video clips on Facebook's video application (just stuff of my daughter for my friends and family to see). These are AVI files generated by my digital camera, about 20-30MB in size, lasting about 1-1.5 minutes each.
They uploaded pretty quickly, but then they were put in a queue to be encoded for their flash player. It took over 3 days for them to be online in my profile! It seems they don't need to just have large capacity for storage, but a bunch more CPU for processing.
Next paper (Score:5, Funny)
server load fixed (Score:2, Informative)
How Facebook Stores Billions of Photos? (Score:3, Funny)
Facebook? (Score:2)
User-mode GoogleFS (Score:5, Informative)
(summarizing the big long presentation)
This is basically want to make a usermode GoogleFS. Their biggest problem is reducing reads - which are hampered by Posix file standards (inodes, metadata, etc...)
Instead they use a database-like index/data file arrangement. The index stays in memory and files are stored together in large contiguous spaces on a single file. It's possible to utilize a LUN for storage - but not there yet.
There... where's my cookie?
(Oddly enough - I'm writing the exact same code they are... bazaar world, eh??)
What a waste (Score:2)
Ahh haha (Score:2, Informative)
Re: (Score:2)
It's just an online presentation where you can actually click on the links in their slides.
Re: (Score:2)
Some kind of problem with their Engrams
Better check what KSW says to do when slashdotted ...