Forgot your password?
typodupeerror
Facebook Social Networks Software

The Computer Science Behind Facebook's 1 Billion Users 113

Posted by timothy
from the yet-they-keep-most-recent-instead-of-top-stories dept.
pacopico writes "Much has been made about Facebook hitting 1 billion users. But Businessweek has the inside story detailing how the site actually copes with this many people and the software Facebook has invented that pushes the limits of computer science. The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.' To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."
This discussion has been archived. No new comments can be posted.

The Computer Science Behind Facebook's 1 Billion Users

Comments Filter:
  • Terrible (Score:5, Informative)

    by thePsychologist (1062886) on Friday October 05, 2012 @07:43PM (#41564569) Journal
    The print version is available [businessweek.com].

    I don't recommend reading it. There is absolutely nothing in this article about the actual engineering problems behind scaling for this number of users and how these problems are solved. In fact, there is nothing technical at all in this article except for some vague descriptions of the "bootcamp".
  • It's rather clever (Score:5, Informative)

    by Animats (122034) on Friday October 05, 2012 @07:52PM (#41564625) Homepage

    It's actually a rather impressive setup. Some Facebook architects gave a talk in EE380 at Stanford a few years back. Originally, Facebook's architecture assumed that most "friends" would be regionally local, reflecting Facebook's college-campus origin. That's not how it worked out after some growth. So they have to assemble pages across regions and data centers. There's caching, but there's also active cache invalidation, which they can do because they control both sides of the cache. There's extensive inter-process communication, and it's not HTTP. There's a lot of PHP for the user-facing stuff, but it's compiled with their in-house compiler, not interpreted.

    Facebook's purpose is banal, but the technology behind it is non-trivial.

  • Re:Oh bullshit. (Score:5, Informative)

    by stephanruby (542433) on Friday October 05, 2012 @08:06PM (#41564711)

    "Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."

    Ah that's Zuckerberg's secret sauce apparently, plenty of overtime for six-weeks so that a new engineer can learn every bit of Facebook's code. This way, they can push the limits of computer science [wikipedia.org] (or disregard them completely) and ignore the lessons from the Mythical Man Month [wikipedia.org].

    I cringe to think that many business people will actually take BusinessWeeks' article seriously.

  • Re:Oh bullshit. (Score:5, Informative)

    by Dan East (318230) on Friday October 05, 2012 @08:15PM (#41564787) Homepage Journal

    Actually, Facebook's problem isn't trivial in any sense of the word. The complexity and joins of various database tables must be insane. With YouTube it's all about raw bandwidth, which actually is a fairly easy problem to solve especially since 99% of that data is static. You just physically distribute it and throw money / resources at the problem. As far as database structure, any CS student should be able to reproduce the bulk of it in a single day. You have videos associated with users, and comments associated with videos, etc. The gist of it is straightforward.

    Now let's talk about Facebook. There is no compartmentalization of the data. You've heard the "six degrees of separation", whereby any two people on the planet can be socially connected to one another in at most 6 steps. Well, with Facebook, the average degree of separation between any two people is 3.74. What that means is everyone is very closely networked, all the data is dynamic (or more specifically, the data the users really care about is the dynamic and most recent data), and since many people (myself included) open up their information to "friends of friends", there is a tremendous amount of data that any one person can potentially have access to. Even Google searches don't have this problem, because the bulk of the common search terms can be preprocessed for easy retrieval, and having data that's an hour or two old isn't a huge issue.

    So you have this massive database (1 billion users, each with many different types of associated data - posts, images, videos, things they've liked, things they've shared, etc, etc), and each of those 1 billion users has an entirely different set of friends from which recent (basically real-time) data must be polled - over, and over, and over again, all day long. Now, throw in the very complex privacy rules, as to which types of posts can be seen by which types of friends, groups, block lists, etc, and the problem becomes very, very complex. Sure, most of us could bang out something with that core functionality without too much difficulty, but to make it work nearly real-time for 1 billion users at once? That's an incredible undertaking.

  • Re:PHP (Score:5, Informative)

    by phantomfive (622387) on Friday October 05, 2012 @11:35PM (#41565921) Journal
    Their PHP compiler isn't secret. It's open source and freely available [github.com].

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell

Working...