The Computer Science Behind Facebook's 1 Billion Users 113
pacopico writes "Much has been made about Facebook hitting 1 billion users. But Businessweek has the inside story detailing how the site actually copes with this many people and the software Facebook has invented that pushes the limits of computer science. The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.' To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."
Oh bullshit. (Score:5, Insightful)
The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.'
Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?
Facebook is utterly trivial compared to many problems out there.
Your first mistake... (Score:5, Funny)
...is looking for meaningful computer science discussion in a business magazine article.
Re: (Score:1)
Re:Your first mistake... (Score:4, Insightful)
At the risk of stating the obvious, an information technology problem is not the same as a computer science problem.
Re: (Score:2)
There's still useful research to be done there. Even binary search trees aren't a solved problem.
Re: (Score:2)
Facebook and "computer science" have little to do with one another, except to the extent that one has absorbed what others have done, rather like an amoeba.
Re:Your first mistake... (Score:5, Funny)
Re: (Score:2)
What the hell did I ever do to you?
commenting to remove an accidental 'redundant' mod. sorry.
Re: (Score:2)
Facebook's source code is PHP. Which is then compiled into C++ (complete with all assets) and then compiled into a native binary. Linking is a huge problem as it produces a huge (multi-gigabyte) executable that is run directly.
Deployment is another issue - I believe they use a form of Bittorrent to do it, and naturally, the scripts that update from one executable to another don't work completely across the entire server farm - so those failed deploym
the secret is simple (Score:4, Funny)
facebook.pl
it's just one script in perl.
Re: (Score:3)
Indeed. Handily proven by "To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code.""
a) that's a terrible idea, and b) the fact that it's even possible (if it is, sounds like business magazine bs to me) speaks volumes. I only work for Red Hat, we're pretty cool but we're hardly the biggest fish out there, and you can imagine the chaos if we tried that...I'm sure others can apply it to their com
Re:Oh bullshit. (Score:4, Funny)
Hard to believe it takes so long to learn Facebook's code. I work at Google, and I learned every bit of Google's code in one day.
I don't think I'm giving away the store when I tell you the bits were '0' and '1'.
Let's test you... (Score:2)
Bad News: There will be a test.
Good News: it is true / false.
Let's see how your scan-tron scores... R.I.P. [tmz.com]
Re: (Score:2)
I don't think I'm giving away the store when I tell you the bits were '0' and '1'.
Given the fact that '0' stands for 'not-evil' and '1' stands for 'evil', the important question of course is: did you count those 0's and 1's and what is their frequency?
Re: (Score:2)
LOL!
Thank you for that. It made my day!
-dZ.
Re: (Score:3)
Re: (Score:1)
After reading the details, I'm actually less impressed. varnish, apache, hadoop, php.. bfd
Re: (Score:1)
Bfd, huh? You should drop that Zuckerberg guy a line and let him know they can just fire 3,500 of some of the finest IT staff and programmers in the world.
"I think Facebook has the hardest information technology problem on the planet," says Mike Stonebraker, a computer scientist and longtime professor at the University of California at Berkeley. "A company like Google certainly does innovative stuff, but Facebook solves the harder problem."
Each day, Facebook processes 2.7 billion "Likes," 300 million photo uploads, 2.5 billion status updates and check-ins, and countless other bits of data, and uses that mass of transactions to guesstimate which ads to serve up.
And let's not forget, it's constantly figuring out which of those items to show each of those 1 billion, variously-connect people.
Maybe you could just handle all that high frequency trading for the large exchanges in your off-hours too, for extra cash.
Re:Oh bullshit. (Score:5, Informative)
"Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."
Ah that's Zuckerberg's secret sauce apparently, plenty of overtime for six-weeks so that a new engineer can learn every bit of Facebook's code. This way, they can push the limits of computer science [wikipedia.org] (or disregard them completely) and ignore the lessons from the Mythical Man Month [wikipedia.org].
I cringe to think that many business people will actually take BusinessWeeks' article seriously.
Re: (Score:2)
Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?
Facebook is utterly trivial compared to many problems out there.
While I happen to agree with you, none of those other problems get daily front-page attention on slashdot. While facebook is one of the least interesting problems in computer science, it has been a staple of slashdot discussion ever since facebook became a staple of everyday conversation (or perhaps ever since the creator of facebook surpassed cmdrtaco in net worth).
Re:Oh bullshit. (Score:5, Informative)
Actually, Facebook's problem isn't trivial in any sense of the word. The complexity and joins of various database tables must be insane. With YouTube it's all about raw bandwidth, which actually is a fairly easy problem to solve especially since 99% of that data is static. You just physically distribute it and throw money / resources at the problem. As far as database structure, any CS student should be able to reproduce the bulk of it in a single day. You have videos associated with users, and comments associated with videos, etc. The gist of it is straightforward.
Now let's talk about Facebook. There is no compartmentalization of the data. You've heard the "six degrees of separation", whereby any two people on the planet can be socially connected to one another in at most 6 steps. Well, with Facebook, the average degree of separation between any two people is 3.74. What that means is everyone is very closely networked, all the data is dynamic (or more specifically, the data the users really care about is the dynamic and most recent data), and since many people (myself included) open up their information to "friends of friends", there is a tremendous amount of data that any one person can potentially have access to. Even Google searches don't have this problem, because the bulk of the common search terms can be preprocessed for easy retrieval, and having data that's an hour or two old isn't a huge issue.
So you have this massive database (1 billion users, each with many different types of associated data - posts, images, videos, things they've liked, things they've shared, etc, etc), and each of those 1 billion users has an entirely different set of friends from which recent (basically real-time) data must be polled - over, and over, and over again, all day long. Now, throw in the very complex privacy rules, as to which types of posts can be seen by which types of friends, groups, block lists, etc, and the problem becomes very, very complex. Sure, most of us could bang out something with that core functionality without too much difficulty, but to make it work nearly real-time for 1 billion users at once? That's an incredible undertaking.
Re: (Score:3, Funny)
The other day I had to write a red-black tree in my CS152 class, now that's a tough problem!
Re:Oh bullshit. (Score:4, Interesting)
I wrote a red-black tree for fun the other day. What's the problem?
Re:Oh bullshit. (Score:5, Interesting)
Re: (Score:2)
I know they have at least one account that isn't "used", but simply sits there for that one time every 18 months or so a family member posts a picture that I actually have an interest in looking at.
I'm sure I'm not the only one out of those billion that doesn't give a crap about information being transmitted to me in real time.
Re: (Score:2)
And your opinion carries any weight, why? You realize that Facebook can pretty easily mine their massive data to link these duplicate Farmville accounts to the real accounts, right? This is pretty basic data analysis that companies like Facebook, Google, etc. can do. And their 1 billion active users is after taking out all the fake and duplicate accounts.
Re:Oh bullshit. (Score:5, Insightful)
You think so? One person in six on this earth, including infants and the elderly in developing countries without regular internet access has an active facebook account do they? Facebook's numbers have never been properly audited, its not in their best interests to do so. The more users they can claim, the better for them. I would agree with possibly a couple hundred million, but I have a really hard time believing much more than that.
Re: (Score:3)
Re: (Score:2)
Lets not forget product accounts, accounts to market products. Then redundant accounts, people changed names. Try and forgotten accounts because it is so hard to erase your data. Of course there is always Facebooks rather crappy and falling share price 'hmm' 1 billion users, that'll pump up the share price. So Facebook should have been honest with the investing public and declared how many active user accounts, not presented information to the public with an intent to deceive non-inside investors (those as
Re:Oh bullshit. (Score:5, Interesting)
It's made infinitely easier by being asynchronous and 99% reads. There are no timing issues. If a post is delayed to someones screen by a minute or two, nobody dies.
It's not terribly difficult to make numerous (near infinite) read-only replica's of a database which are within tens of milliseconds of the primary; so that takes care of 99% of their problems.
Handling their write load is harder but keep in mind the vast majority of their accounts are idle; and again asynchronous writes make it much much easier. They can shove everything through a message queue and put heavy-weight sharding of the data behind that.
I think handling 100 Million banking customers in 2000 was infinitely harder than Facebook has it from a technical standpoint.
Joins? There's your problem right there (Score:1)
"The complexity and joins of various database tables must be insane."
Nah. You simply put one users data in one place (well more than one place for redundancy, but two or 3 not lots of places).
To build a page you can ask each machine processing that persons data. You ask the machines processing their fiends data for that data, and build the page. Arrange your network so that groups of machines are in subnets, and place the users data based on the connectivity onto machines in the subnet. So more connected us
Re: (Score:1)
Shrug. It's a hard problem, but it's hardly a unique problem. I work on Google ads backend. Superficially this is a very different system from Facebook, which is all frontend... and yet every problem you describe there is one that we have had to solve at similar scale. (Yes we have an order of magnitude fewer users for example, but our users have an order of magnitude more data, and it is easier to shard many small users than a smaller number of goliath users.)
Once you get big enough, the problems shared by
Re: (Score:1)
Facebook's problem IS trivial compared to the problems that deserve solving on this planet. Facebook does not solve a single real business problem (except their own). Technically, what they do may be a challenge but it doesn't contribute to anything except let some kids looking for attention share stuff that nobody cares about. If you ask me, this is a waste of resources, technically, intellectually, and energy-wise.
Re: (Score:2)
Not sure if I completely see what your saying making too much sense.
1bn users accessing a DB which, Yes, polls and manages a large amount of data, YES. However, you just slammed YouTube for doing pretty much the same thing but only exponentially better. Lets not forget massive amounts of comment management, video relevancy tools, algorithms that automatically scour video clips for copyright infringement and convert text to speech, etc so on an so fourth.
You bounced from the statement from " The complexity a
Re: (Score:2)
Unlike many other databases, errors can be tolerated in facebook. If a post gets lost or a connection or two dropped it really doesn't cost Facebook anything--and it's unlikely to be noticed. And downtime and retries are tolerated by the users.
Try running a real-time, financial system like credit card authorization & processing (which probably has more than 1 billion users), needs to balance at the end of the day and has response requirements measuring under 250 ms.
Facebook is just bett
Re: (Score:1)
Agreed. And the low threshold for acceptable eventual consistency and lack of important of the data (overall) makes it less complex that it would otherwise seem.
Wall Street's types of issues make Facebook look like "Hello World."
Re: (Score:2)
Re: (Score:2)
The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.'
Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?
Facebook is utterly trivial compared to many problems out there.
+5 insightful ? Seems like slashdot have a bug somewhere. Should be +5 Ignorant. Seriously.. this is so wrong it's crazy.
Re: (Score:1)
1 billion users (Score:5, Funny)
I totally believe that Facebook has 1 billion users... because I am 4 of them.
Re: (Score:1)
Re: (Score:3)
Would it make you feel worse if the number was a "mere" 250m? Or 100m?
I am currently ignoring 2 different accounts, FWIW. Facebook keeps sending notifications of various uninteresting types to both, I assume that they are both considered "active".
I joined with a buch of real life friends years ago, and it appears that about 1 in 10 ever post anything on a regular basis.
[Shrugs]
Re: (Score:2)
Re:1 billion users (Score:4, Insightful)
In short, there aren't a billion people on facebook--nowhere near it. An important fact for businesses that are looking to tap into a network of "real" people.
Re: (Score:3)
Put me down for 6.
Re: (Score:2)
Put me down for 0.5 since I check it 1 day per fortnight and I don't use any features besides app blocking, ignoring people, and rejecting photo tags.
Re: (Score:1)
Facebook ID for voting. Vote early and vote often!
PHP (Score:4, Insightful)
Re: (Score:3, Insightful)
Re: (Score:2)
Also Yahoo and Wikipedia, both of which are in the top five.
Re:PHP (Score:5, Informative)
Re:PHP (Score:5, Insightful)
PHP has proven to be the best web development kit. It's only persistent failure is the legacy growth of inconsistent api calls. For the rest, it's turing complete, does scale well, and most of all is the best tuned hammer for the job. It delivers.
In effect, PHP is a huge C api with its own C like language constructs, a layer of abstraction which takes away the mundane and gets you building web sites.
Now C is hailed for its great power, and not made fun of because of its ability to make real crappy, insecure code.
PHP however is not hailed for its great power, and made fun of because of its ability to make real crappy, insecure code.
It's all a matter of perspective. The problem is low level programmers who can't live with the fact people make a billion dollar without obsessing over pointers or garbage collection.
Re: (Score:1)
PHP has proven to be the best web development kit.[citation required]
Re: (Score:1)
Re: (Score:1)
Good post AC. I think you are on to something with your last sentence too. Technical research on the web is a nightmare, because you have to parse the motivation behind the opinions and filter on that too. Low level programmers ... obsessing over pointers or garbage collection ... indeed. These people can come across as enormously well-informed but their opinions are often worthless outside their tiny, unknowable silos.
Re: (Score:2)
The problem is low level programmers who can't live with the fact people make a billion dollar without obsessing over pointers or garbage collection.
Facebook relies on C++ because PHP is too slow to use it for everything.
Re: (Score:2)
I can think of a few more which were still relevant around the introduction of PHP 5:
Re: (Score:1)
This is obviously not on topic, but what have I done, that that I should be listed as your foe?
Re: (Score:2)
Oh yes, please tell me all about the computer geniuses that wrote the PHP scripts that power facebook!
Well, I know PHP bashing is all the rage, so how about the computer geniuses at Facebook that wrote HipHop, their PHP-to-binary compiler?
I think it is a pretty cool technical thing (and according to their stats it dropped their CPU usage by some significant figure) - and even better, they open sourced it. Like they do with a lot of their stuff [facebook.com].
Centralized Social Networking = Difficult Problem (Score:1)
Social networking maps very nicely to decentralized resources.
(I know who my friends are, and I can scrape their RSS feeds by myself.)
When you try to cram all that into one data center, and then try to replicate that across many data centers in real time ... yep, you've got a problem.
The mistake is in the belief that it's an "information technology" problem.
Read the article, not much CS inside... (Score:2)
Re: (Score:1)
1 billion users, analyzed (Score:5, Funny)
Terrible (Score:5, Informative)
I don't recommend reading it. There is absolutely nothing in this article about the actual engineering problems behind scaling for this number of users and how these problems are solved. In fact, there is nothing technical at all in this article except for some vague descriptions of the "bootcamp".
another blowjob for zuckerberg (Score:1)
I'm sure there are some smart people working on how to mine every last drop of money out of our private lives at facebook, but IT?
Last I heard, fb uses mysql. That's not cutting edge CS.
It's rather clever (Score:5, Informative)
It's actually a rather impressive setup. Some Facebook architects gave a talk in EE380 at Stanford a few years back. Originally, Facebook's architecture assumed that most "friends" would be regionally local, reflecting Facebook's college-campus origin. That's not how it worked out after some growth. So they have to assemble pages across regions and data centers. There's caching, but there's also active cache invalidation, which they can do because they control both sides of the cache. There's extensive inter-process communication, and it's not HTTP. There's a lot of PHP for the user-facing stuff, but it's compiled with their in-house compiler, not interpreted.
Facebook's purpose is banal, but the technology behind it is non-trivial.
Aside from the million petaquads Google deals with (Score:2)
Is what you meant.
Re: (Score:2)
That information technology problem has nothing to do with servers and storage.
The hardest information technology problem on the planet is: How do the Facebook exec's stop the company going the way of Silicon Graphics (NYSE: SGI) - oh wait, no, (DELISTED by NYSE because the share price couldn't stay above $1: SGI); since the company creates no real value, and has done nothing but drop it's price since IPO.
Six weeks? (Score:1)
With the way Facebook runs, surely it doesn't take much more than a six-hour lecture to learn.
The Computer Science Behind ... (Score:1)
If you believe there are actually 1 Billion, completely unique users on Facebook, then I need to ask that each of you turn over your Internet Licenses, power down your computers, and find a new hobby. You are just to dangerous to be allowed on the Internet without adult supervision.
Ever hear of bot nets?
You know...all those virus infected zombie computers that have been u
Billion Users? (Score:3)
How do they count these 'users'? I have six accounts myself and most people I know have at least two. Now, there is a poll for Slashdot; How many Facebook accounts do you have?
Re: (Score:2)
The article does not clearly say how they count them. However, it does suggest that it is not a real, accurate quantification of actual live accounts--more like a statistical figure.
In the article, the figure is compared to the United Nations announcing the population on earth, so I guess it involves a lot of extrapolation based on subscription rates and usage loads.
If you read the article, it's a bit comforting that they have absolutely no idea how many real people are actively using the system, nor which
Jack Meehoff (Score:2)
Facebook actually thinks these are one billion distinct humanoids? Zuck is stoopider than his investors look. As anyone who ever posted a signup sheet in a college dorm for IM softball can attest, at least a third of people who sign up for anything, anywhere, ever, are fake.
If it keeps breaking doesnt mean ur pushing limits (Score:2)
It just means it doesn't work well enough.
Facebook is the worst performing and most opaque large scale site with the worst interface that I use regularly.
Browsing photos, the most basic Facebook activity is still a pain and buggy as hell on a slowish connection, and they keep changing the damn interface just when you figured out the previous unintuitive change. The mobile website sucks, their Android app sucks, I don't know what the new iOS app is like. The interface has gone from simplicity to being clutte
Re: (Score:1)
Yeah, YouTube really nailed the comment system...
The really interesting thing is the number of (Score:1)
Not really one billion (Score:3)
It's not really one billion users. As any developer in any online service knows, the real figure is around 30% of the actual reported total. Still, it's no small challenge.
Wrong, long & boring (Score:2)
I would be interested in learning more about the software and hardware side of Facebook. But after 15 seconds of scrolling I hadn't seen any ... just a lot of tedious "gotta do this" journalism ... and gave up. LOOOOOONG BOOOORING
I cannot decide what is worse .. (Score:1)
.. hearing day after day about Facebook or Zuckerberg, seeing Zuckerberg's face in some *cough* "creative" way or hearing him heralded as some business guru (let's just say I disagree).
I think the face is the worst. I can live with the claims of him being an innovator, I got inured to that after decades worth of Microsoft marketing.
Hell, I may switch back to a text only browser for my news - speeds things up as well.