How Facebook Stores Billions of Photos 154
David Gobaud writes "Jason Sobel, the manager of infrastructure engineering at Facebook, gave an interesting presentation titled Needle in a Haystack:
Efficient Storage of Billions of Photos at Stanford for the Stanford ACM. Jason explains how Facebook efficiently stores ~6.5 billion images, in 4 or 5 sizes each, totaling ~30 billion files, and a total of 540 TB and serving 475,000 images per second at peak. The presentation is now online here in the form of a Flowgram."
Facebook needs to add more processing capacity (Score:5, Interesting)
I put some short video clips on Facebook's video application (just stuff of my daughter for my friends and family to see). These are AVI files generated by my digital camera, about 20-30MB in size, lasting about 1-1.5 minutes each.
They uploaded pretty quickly, but then they were put in a queue to be encoded for their flash player. It took over 3 days for them to be online in my profile! It seems they don't need to just have large capacity for storage, but a bunch more CPU for processing.
Re:Not hard (Score:4, Interesting)
The issue isn't the number of bytes per second, it's the number of distinct requests. The data is _way_ bigger than will fit in memory, and hard disks can only do 100-150 seeks per second so you need a lot of them to serve from disk. A naive implementation will go to disk many times for a single file, because filesystems aren't designed for this many small files. So this is really an issue of getting exactly the right stuff in memory so you can serve hot content from memory, and if you go to disk you seek exactly once instead of several times.
Re:Photos? You mean people use FB for photos too? (Score:3, Interesting)
I'd need to test it again, but I'm fairly certain FB had a function that let you share albums with non-users by having FB generate a special link you'd give to the user.
Re:Photos? You mean people use FB for photos too? (Score:1, Interesting)
Actually you can look at photos without an account. There's a link at the bottom of every album you own that allows people to see your album without having to log in or have an account with facebook.
There was even a link to allow people without facebook to see a limited profile, but that one is more well hidden.
Re:Photos? You mean people use FB for photos too? (Score:3, Interesting)
It's nice to have principles but at the end of the day my friends come first. I can always (ad)block adverts. Oh no, what if they wheedle into my subconcious or the ToS change? Then I'll occasionally make a marginally worse purchasing decision. It's not like i never do that anyway.
Re:Photos? You mean people use FB for photos too? (Score:3, Interesting)
Sure, that's a nice idea. But of course then you're paying for it, and most likely so must all your friends and family if they want to share its best features with you. I think a social network built on that model would not grow large. It might fill a niche, but it would have nowhere near the utility of a free-to-join network that promotes sharing information.
What makes a social networking site really great can't happen unless there are a lot of people using it. The policies, shininess, and penetration of Facebook allow amazing results in short time frames. I've been on Facebook I think less than a year. I don't visit the site often, yet in that time I have regained contact with friends last seen during high school, played games with coworkers, learned about worthwhile charitable causes, hosted memes, and grown closer to people after learning about mutual interests that might not have come to light during normal conversation.
Consider an acquaintance of mine, a person I met several years ago. We've previously exchanged pleasantries and gotten along well at the odd party or the around the neighborhood where I work, but never held a conversation about any Deep Topics or connected much more broadly than Shared Entertainment Experiences and Goofy Jokes. About two weeks ago our Facebook networks connected. Tonight I received an invitation to a philosophical roundtable discussion at a library across town. The topic promises to present new ideas and address questions and gaps in my web of understanding. A doorway opens to become better friends with good people. What a serendipitous opportunity! Maybe I would have heard about this event through another medium in a Facebookless world. I doubt it. I don't check the library's events calendar.
I know that Facebook consumes as much information about me as they can stuff into their considerable data hole. So I make sure to only provide information that I don't mind sharing to all and sundry. I don't accept friend requests from people I don't know in meat space. I hesitate to register with apps because I know they get access to everything. I wish Facebook would uncheck their permission boxes by default. But every such border is a barrier to information flow, and networks like Facebook thrive and grow, both in size and utility, on the free flow of information.
Free is the key. Every reward is born from risk.
Re:Already been done. (Score:1, Interesting)
What FB is doing is impressive, mainly because of their robust index/permissions model and because they are dealing with color and transcoding on-the-fly from all different formats. Of course they suck power and burn down servers because they are using scripting systems written by morons, but it allowed them to get their stuff done and out the door. Getting DataTree online was really, really hard! It's just a damned shame they don't have guys like us to go in and rewrite everything now for optimization. I don't know about you, but after what Harish paid me, I'd be looking for eight digits (firm) from FaceBook or Google to fix them and there is just no way those children who run those places are going to pay like that. They would gladly rather give the money to the server/data center people and keep buying jumbo jets. Gates would pay, but he'd be more likely to try and capture our knowledge for incorporation into .NET (as if) and then force us to use .NET. In the end, it doesn't matter --Google will be in every County Recorder's office soon enough. They'll make Sergey go to Data Tree, he'll buy the company from First American (under threat of getting court orders to suck the data through the county's public terminals or manually re-loading the database), then execute an import command to suck the TIFF files into Google Earth and boom end of Data Tree. Anyway, I don't know about you, but at least I'm not making cereal for a living in San Dimas with Harish!
Re:Photos? You mean people use FB for photos too? (Score:3, Interesting)
Sure, that's a nice idea. But of course then you're paying for it, and most likely so must all your friends and family if they want to share its best features with you.
Lets see, my ISP offers 'free' email, pop3, imap, and webmail access. They offer 'free' access to a reasonable number of usenet groups, and offer a small and fairly limited but entirely usable web hosting package, with tools to make it easy to setup multiple small websites, upload and share photos, and so on.
Is it really 'free'? Of course not, its bundled in with my internet access so I'm paying for it. And while I have no gaurantee that my ISP isn't reading my email, and processing my hosted content, that isn't their business model, and they aren't pasting adds up in my site or in my email.
I think a social network built on that model would not grow large. It might fill a niche, but it would have nowhere near the utility of a free-to-join network that promotes sharing information.
You mean the model email and usenet and the web itself were built on couldn't reach the critical mass of users to be really interesting and useful? Give me a break.
What makes a social networking site really great can't happen unless there are a lot of people using it.
Sounds a lot like email, and that's worked out just fine.
Lots of people using it. In fact, I can send messages to people at work, coordinate meetings, organize outings, exchange messages with friends, even grandparents. Some of them use ad-supported hosted services, some of them use paid services, some of them host their own services, all seamlessly interconnecting.
Consider an acquaintance of mine, a person I met several years ago...
I'm not arguing -against- social netorking. I'm arguing against accepting facebook lock-in, becoming a product, and selling your information in exchange for a features.
Its a fallacy that the only way we can have services like social networking or instant messaging is via accepting ridiculous lock-in, and closed standards.
Next thing you'll be telling me there is no way to create a modern fully featured multi-user operating system and application suites that could be downloaded and used for free without either paying exorbitant prices for licensing or signing all your rights to the data on your PC away.
Oh wait... ;p
Re:Photos? You mean people use FB for photos too? (Score:3, Interesting)
I haven't used my ISP's mail since I lived on campus back in college. Before I got there, I did not use my dialup ISP's email service. Two reasons: a) email address lock-in; b) the interface sucks.
re: a - the same applies gmail or any other provider.
re: b - there was likely no 'gmail' when you were 'back in college on your dialup isp', and most people used standalone clients, many still have little need for webmail.
Nowadays I've solved the lock-in problem by paying for a domain
makes sense.
and the sucky interface problem by having my MX records send all mail to a gmail account, The free-to-join, invasive, ad-supported gmail service works way better than any webmail, IMAP, or POP3 client I've found.
To each their own.
Personally I run a Linux server, with Scalix community edition, works great with my smart phone (push email support, address book sync, etc), has an excellent webmail client for the odd time I need one, and I mostly access my mail via Thunderbird. It works way better than any other solution I've found has no privacy implications, is ad free, and it meets my needs and principles better than anything else, including gmail.
The cool thing about it though is that I can still send you a message without signing up for a gmail account. YOU can agree to their terms, and I can stand by mine, and we can still interact, exchange messages,
Lock-in? Hardly. I'm also on MySpace. I also use email. I also use IM -- Pidgin, so I don't get locked in to a specific IM service. I also use usenet, web forums, feedback forms, web chat, on and on. Different tools for different tasks. Facebook excels at the task of clustering my friends and exposing information about them.
You don't know what lock in is then.
I don't have or want a facebook account. If all your friends had accounts at different social networking sites, how well would facebook excel at 'clustering your friends and exposing information about them'?
It wouldn't.
The only way facebook excels if everyone has a facebook account and agrees to facebooks terms of service.
The only way email excels is if everyone has an email account. The difference is that we can get an email account on any service we like, or even host our own, and it makes no nevermind. No matter where I get my email account you can send it messages.
While my refusal to submit to facebook means that I am excluded from that entirely because it won't interoperate with any other site. I know people with multiple accounts on multiple social networking sites, not because they have any desire to do so, but because each site gives them access to different groups of friends they can't access from the other site. THAT is the effect of LOCK-IN. I only need one email account to send to any other email provider. I might have multiple if I have a desire for multiple, but I don't need multiple.
My information costs me nothing to give away. My money costs me money to give away. I'd rather pay for services using a currency that copies on write than one with a 1:1 opportunity cost. Not that I share everything, obviously. Some information will cost to give away - my SSN for example. But most everything about me - my relationship status, my mood, my hobbies - I gain value by giving this information freely.
That's fine. To each their own. I however have little interest in submitting to facebooks terms of service. I publish what I want people to see on my websites. I'd be happy to integrate with facebook to the extent of letting people include me, message me, etc from their facebook account. But I don't want an account with facebook myself. But facebook is a closed system.
IM is a service that started with ridiculous lock-in and closed standards. I still used it then. Eventually a service will arise to tie together your Facebook and Myspace networks just like Trillian or Pidgin did for IM.
Trillian or Pidgin just lets you access your multiple accounts from a single applic