Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Facebook Social Networks Software

The Computer Science Behind Facebook's 1 Billion Users 113

pacopico writes "Much has been made about Facebook hitting 1 billion users. But Businessweek has the inside story detailing how the site actually copes with this many people and the software Facebook has invented that pushes the limits of computer science. The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.' To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."
This discussion has been archived. No new comments can be posted.

The Computer Science Behind Facebook's 1 Billion Users

Comments Filter:
  • Oh bullshit. (Score:5, Insightful)

    by Anonymous Coward on Friday October 05, 2012 @07:17PM (#41564365)

    The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.'

    Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?

    Facebook is utterly trivial compared to many problems out there.

    • by MrEricSir ( 398214 ) on Friday October 05, 2012 @07:25PM (#41564413) Homepage

      ...is looking for meaningful computer science discussion in a business magazine article.

    • Indeed. Handily proven by "To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code.""

      a) that's a terrible idea, and b) the fact that it's even possible (if it is, sounds like business magazine bs to me) speaks volumes. I only work for Red Hat, we're pretty cool but we're hardly the biggest fish out there, and you can imagine the chaos if we tried that...I'm sure others can apply it to their com

      • by russotto ( 537200 ) on Friday October 05, 2012 @08:55PM (#41565085) Journal

        Hard to believe it takes so long to learn Facebook's code. I work at Google, and I learned every bit of Google's code in one day.

        I don't think I'm giving away the store when I tell you the bits were '0' and '1'.

        • I don't think I'm giving away the store when I tell you the bits were '0' and '1'.

          Bad News: There will be a test.
          Good News: it is true / false.
          Let's see how your scan-tron scores... R.I.P. [tmz.com]
        • I don't think I'm giving away the store when I tell you the bits were '0' and '1'.

          Given the fact that '0' stands for 'not-evil' and '1' stands for 'evil', the important question of course is: did you count those 0's and 1's and what is their frequency?

        • by dzfoo ( 772245 )

          LOL!

          Thank you for that. It made my day!

                    -dZ.

      • Oh yeah. I worked on an embedded project that had custom kernel code as well as over 2 million lines in system libraries. No one could possibly know every single line of that. The project I was in charge of there maybe had 200,000 lines of code, and I often had to rely on comments to remember what goes where! I had the unfortunate aspect of being the only team on an embedded processor and had to fix cross platform issues with the system libraries too. It was a lot of work.
    • Re:Oh bullshit. (Score:5, Informative)

      by stephanruby ( 542433 ) on Friday October 05, 2012 @08:06PM (#41564711)

      "Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."

      Ah that's Zuckerberg's secret sauce apparently, plenty of overtime for six-weeks so that a new engineer can learn every bit of Facebook's code. This way, they can push the limits of computer science [wikipedia.org] (or disregard them completely) and ignore the lessons from the Mythical Man Month [wikipedia.org].

      I cringe to think that many business people will actually take BusinessWeeks' article seriously.

    • Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?

      Facebook is utterly trivial compared to many problems out there.

      While I happen to agree with you, none of those other problems get daily front-page attention on slashdot. While facebook is one of the least interesting problems in computer science, it has been a staple of slashdot discussion ever since facebook became a staple of everyday conversation (or perhaps ever since the creator of facebook surpassed cmdrtaco in net worth).

    • Re:Oh bullshit. (Score:5, Informative)

      by Dan East ( 318230 ) on Friday October 05, 2012 @08:15PM (#41564787) Journal

      Actually, Facebook's problem isn't trivial in any sense of the word. The complexity and joins of various database tables must be insane. With YouTube it's all about raw bandwidth, which actually is a fairly easy problem to solve especially since 99% of that data is static. You just physically distribute it and throw money / resources at the problem. As far as database structure, any CS student should be able to reproduce the bulk of it in a single day. You have videos associated with users, and comments associated with videos, etc. The gist of it is straightforward.

      Now let's talk about Facebook. There is no compartmentalization of the data. You've heard the "six degrees of separation", whereby any two people on the planet can be socially connected to one another in at most 6 steps. Well, with Facebook, the average degree of separation between any two people is 3.74. What that means is everyone is very closely networked, all the data is dynamic (or more specifically, the data the users really care about is the dynamic and most recent data), and since many people (myself included) open up their information to "friends of friends", there is a tremendous amount of data that any one person can potentially have access to. Even Google searches don't have this problem, because the bulk of the common search terms can be preprocessed for easy retrieval, and having data that's an hour or two old isn't a huge issue.

      So you have this massive database (1 billion users, each with many different types of associated data - posts, images, videos, things they've liked, things they've shared, etc, etc), and each of those 1 billion users has an entirely different set of friends from which recent (basically real-time) data must be polled - over, and over, and over again, all day long. Now, throw in the very complex privacy rules, as to which types of posts can be seen by which types of friends, groups, block lists, etc, and the problem becomes very, very complex. Sure, most of us could bang out something with that core functionality without too much difficulty, but to make it work nearly real-time for 1 billion users at once? That's an incredible undertaking.

      • Re: (Score:3, Funny)

        by kestasjk ( 933987 ) *
        Pff. Apache + hadoop + mysql + varnish. Easy.

        The other day I had to write a red-black tree in my CS152 class, now that's a tough problem!
      • Re:Oh bullshit. (Score:5, Interesting)

        by jittles ( 1613415 ) on Friday October 05, 2012 @08:58PM (#41565113)
        Except that I don't believe they have 1 billion real users. They probably have 100m users and another 900m users in fake accounts people use to play Farmville, etc.
        • by bonehead ( 6382 )

          I know they have at least one account that isn't "used", but simply sits there for that one time every 18 months or so a family member posts a picture that I actually have an interest in looking at.

          I'm sure I'm not the only one out of those billion that doesn't give a crap about information being transmitted to me in real time.

        • And your opinion carries any weight, why? You realize that Facebook can pretty easily mine their massive data to link these duplicate Farmville accounts to the real accounts, right? This is pretty basic data analysis that companies like Facebook, Google, etc. can do. And their 1 billion active users is after taking out all the fake and duplicate accounts.

          • Re:Oh bullshit. (Score:5, Insightful)

            by Intrepid imaginaut ( 1970940 ) on Saturday October 06, 2012 @02:25AM (#41566381)

            You think so? One person in six on this earth, including infants and the elderly in developing countries without regular internet access has an active facebook account do they? Facebook's numbers have never been properly audited, its not in their best interests to do so. The more users they can claim, the better for them. I would agree with possibly a couple hundred million, but I have a really hard time believing much more than that.

            • It's actually 1 in 7 now ;) I used to work for an ISP back when there lots of them, and we used to offer one month free for new members. Most people quit after the first month, but that didn't stop us advertising how many customers we had in our database. I'd be willing to bet a lot of money FB is doing the same thing. I have 3 FB accounts myself, one I use, one I use for signing up to all those crap services which only let you use an FB account (hello Spotify) and like to spam your status, and another for
        • by rtb61 ( 674572 )

          Lets not forget product accounts, accounts to market products. Then redundant accounts, people changed names. Try and forgotten accounts because it is so hard to erase your data. Of course there is always Facebooks rather crappy and falling share price 'hmm' 1 billion users, that'll pump up the share price. So Facebook should have been honest with the investing public and declared how many active user accounts, not presented information to the public with an intent to deceive non-inside investors (those as

      • Re:Oh bullshit. (Score:5, Interesting)

        by rtaylor ( 70602 ) on Friday October 05, 2012 @09:00PM (#41565131) Homepage

        It's made infinitely easier by being asynchronous and 99% reads. There are no timing issues. If a post is delayed to someones screen by a minute or two, nobody dies.

        It's not terribly difficult to make numerous (near infinite) read-only replica's of a database which are within tens of milliseconds of the primary; so that takes care of 99% of their problems.

        Handling their write load is harder but keep in mind the vast majority of their accounts are idle; and again asynchronous writes make it much much easier. They can shove everything through a message queue and put heavy-weight sharding of the data behind that.

        I think handling 100 Million banking customers in 2000 was infinitely harder than Facebook has it from a technical standpoint.

      • by Anonymous Coward

        "The complexity and joins of various database tables must be insane."

        Nah. You simply put one users data in one place (well more than one place for redundancy, but two or 3 not lots of places).
        To build a page you can ask each machine processing that persons data. You ask the machines processing their fiends data for that data, and build the page. Arrange your network so that groups of machines are in subnets, and place the users data based on the connectivity onto machines in the subnet. So more connected us

      • by Anonymous Coward

        Shrug. It's a hard problem, but it's hardly a unique problem. I work on Google ads backend. Superficially this is a very different system from Facebook, which is all frontend... and yet every problem you describe there is one that we have had to solve at similar scale. (Yes we have an order of magnitude fewer users for example, but our users have an order of magnitude more data, and it is easier to shard many small users than a smaller number of goliath users.)

        Once you get big enough, the problems shared by

      • by Anonymous Coward

        Facebook's problem IS trivial compared to the problems that deserve solving on this planet. Facebook does not solve a single real business problem (except their own). Technically, what they do may be a challenge but it doesn't contribute to anything except let some kids looking for attention share stuff that nobody cares about. If you ask me, this is a waste of resources, technically, intellectually, and energy-wise.

      • by oztiks ( 921504 )

        Not sure if I completely see what your saying making too much sense.

        1bn users accessing a DB which, Yes, polls and manages a large amount of data, YES. However, you just slammed YouTube for doing pretty much the same thing but only exponentially better. Lets not forget massive amounts of comment management, video relevancy tools, algorithms that automatically scour video clips for copyright infringement and convert text to speech, etc so on an so fourth.

        You bounced from the statement from " The complexity a

      • by klubar ( 591384 )

        Unlike many other databases, errors can be tolerated in facebook. If a post gets lost or a connection or two dropped it really doesn't cost Facebook anything--and it's unlikely to be noticed. And downtime and retries are tolerated by the users.

        Try running a real-time, financial system like credit card authorization & processing (which probably has more than 1 billion users), needs to balance at the end of the day and has response requirements measuring under 250 ms.

        Facebook is just bett

    • Agreed. And the low threshold for acceptable eventual consistency and lack of important of the data (overall) makes it less complex that it would otherwise seem.

      Wall Street's types of issues make Facebook look like "Hello World."

    • Facebook isn't just about status updates. They have a whole robust API they use to interact with apps and other websites. It hosts music, events, photos, videos, app data, along with tons of user data with timeline. You can share anything from a cat video to a milestone of you losing weight. Serving up all that data in quick and well-presented manner to millions of people around the world is very difficult.
    • The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.'

      Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?

      Facebook is utterly trivial compared to many problems out there.

      +5 insightful ? Seems like slashdot have a bug somewhere. Should be +5 Ignorant. Seriously.. this is so wrong it's crazy.

    • by hceylan ( 982414 )
      As much as I hate Facebook, and I believe the number of true Facebook profiles are less then 250M, "To Caesar What Is Caesar's". Just because you think the added value Facebook creates is not rocket science, Facebook not only does use high tech software architecture but also creates software technology and delivers some as open source. I would recommend you read http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/ [pingdom.com] And trust me when your scale goes above 10 digit numbers nothing is trivial
  • by Anonymous Coward on Friday October 05, 2012 @07:19PM (#41564379)

    I totally believe that Facebook has 1 billion users... because I am 4 of them.

    • by tomhath ( 637240 )
      Okay, so it's actually 999,999,997. Does that make you feel better?
      • Would it make you feel worse if the number was a "mere" 250m? Or 100m?

        I am currently ignoring 2 different accounts, FWIW. Facebook keeps sending notifications of various uninteresting types to both, I assume that they are both considered "active".

        I joined with a buch of real life friends years ago, and it appears that about 1 in 10 ever post anything on a regular basis.

        [Shrugs]

      • Re:1 billion users (Score:4, Insightful)

        by L3370 ( 1421413 ) on Friday October 05, 2012 @07:45PM (#41564591)
        If he can make 4, so can the bozo that wants to create a fake account to for your pets, browsing ex girlfriends, gaming Farmville perks, and avoiding your boss' prying eyes.

        In short, there aren't a billion people on facebook--nowhere near it. An important fact for businesses that are looking to tap into a network of "real" people.
    • Put me down for 6.

    • by Narnie ( 1349029 )

      Put me down for 0.5 since I check it 1 day per fortnight and I don't use any features besides app blocking, ignoring people, and rejecting photo tags.

    • by Anonymous Coward

      Facebook ID for voting. Vote early and vote often!

  • PHP (Score:4, Insightful)

    by Coolhand2120 ( 1001761 ) on Friday October 05, 2012 @07:22PM (#41564397)
    Oh yes, please tell me all about the computer geniuses that wrote the PHP scripts that power facebook!
    • Re: (Score:3, Insightful)

      by Anonymous Coward
      Yea. Because everyone knows no real website could possibly be written in structured, maintainable PHP. Well, except the biggest site on the Internet.
    • Re:PHP (Score:5, Insightful)

      by Anonymous Coward on Friday October 05, 2012 @07:54PM (#41564637)

      PHP has proven to be the best web development kit. It's only persistent failure is the legacy growth of inconsistent api calls. For the rest, it's turing complete, does scale well, and most of all is the best tuned hammer for the job. It delivers.

      In effect, PHP is a huge C api with its own C like language constructs, a layer of abstraction which takes away the mundane and gets you building web sites.

      Now C is hailed for its great power, and not made fun of because of its ability to make real crappy, insecure code.
      PHP however is not hailed for its great power, and made fun of because of its ability to make real crappy, insecure code.

      It's all a matter of perspective. The problem is low level programmers who can't live with the fact people make a billion dollar without obsessing over pointers or garbage collection.

      • by Anonymous Coward

        PHP has proven to be the best web development kit.[citation required]

        • by Cruxus ( 657818 )
          Yes, and developers agree [hammerprinciple.com] with that sentiment. PHP has inertia behind it: tons of cheap webhosts and lots of libraries and existing codebases. As a programming language, there are definitely better out there: Python, Ruby, etc.
      • Good post AC. I think you are on to something with your last sentence too. Technical research on the web is a nightmare, because you have to parse the motivation behind the opinions and filter on that too. Low level programmers ... obsessing over pointers or garbage collection ... indeed. These people can come across as enormously well-informed but their opinions are often worthless outside their tiny, unknowable silos.

      • by Raenex ( 947668 )

        The problem is low level programmers who can't live with the fact people make a billion dollar without obsessing over pointers or garbage collection.

        Facebook relies on C++ because PHP is too slow to use it for everything.

      • It's only persistent failure is the legacy growth of inconsistent api calls.

        I can think of a few more which were still relevant around the introduction of PHP 5:

        • Prone to insecurity by design: things like "addslashes" by default (or at all—what a horrible idea), auto-registration user input variables mixed with implicit variable declarations, mysql_real_escape_string(), etc.
        • Weird decisions with language design as it evolved: Destructors were added, but they got called in the wrong order for RAII wh
    • by trawg ( 308495 )

      Oh yes, please tell me all about the computer geniuses that wrote the PHP scripts that power facebook!

      Well, I know PHP bashing is all the rage, so how about the computer geniuses at Facebook that wrote HipHop, their PHP-to-binary compiler?

      I think it is a pretty cool technical thing (and according to their stats it dropped their CPU usage by some significant figure) - and even better, they open sourced it. Like they do with a lot of their stuff [facebook.com].

  • Social networking maps very nicely to decentralized resources.
    (I know who my friends are, and I can scrape their RSS feeds by myself.)

    When you try to cram all that into one data center, and then try to replicate that across many data centers in real time ... yep, you've got a problem.

    The mistake is in the belief that it's an "information technology" problem.

  • I'm kinda disappointed... I am truly interested in how Facebook scales and was hoping there would be actual Computer Science related material in the article... Any Facebook employees care to comment? What do you guys do to scale stuff? How about ./'ers from other companies that have to deal with scaling? Hell, how do porn sites scale? I've done the traditional Distributed Systems courses in University but I really wanted to know how it's done in the real world by AWS, Facebook etc...
    • Shops anymore tend to scale by throwing RAM and bandwidth at everything... It drives developers crazy because management cares little about what kind of mess they force their developers to ignore due to due-dates. And of course, the only casualties are the developers who were never given a fair shake to start with. Wanna know how something scales? Continual tweaking, and yes, more RAM and bandwidth. It's the only way to scale things anymore.
  • Terrible (Score:5, Informative)

    by thePsychologist ( 1062886 ) on Friday October 05, 2012 @07:43PM (#41564569) Journal
    The print version is available [businessweek.com].

    I don't recommend reading it. There is absolutely nothing in this article about the actual engineering problems behind scaling for this number of users and how these problems are solved. In fact, there is nothing technical at all in this article except for some vague descriptions of the "bootcamp".
  • by Anonymous Coward

    I'm sure there are some smart people working on how to mine every last drop of money out of our private lives at facebook, but IT?

    Last I heard, fb uses mysql. That's not cutting edge CS.

  • It's rather clever (Score:5, Informative)

    by Animats ( 122034 ) on Friday October 05, 2012 @07:52PM (#41564625) Homepage

    It's actually a rather impressive setup. Some Facebook architects gave a talk in EE380 at Stanford a few years back. Originally, Facebook's architecture assumed that most "friends" would be regionally local, reflecting Facebook's college-campus origin. That's not how it worked out after some growth. So they have to assemble pages across regions and data centers. There's caching, but there's also active cache invalidation, which they can do because they control both sides of the cache. There's extensive inter-process communication, and it's not HTTP. There's a lot of PHP for the user-facing stuff, but it's compiled with their in-house compiler, not interpreted.

    Facebook's purpose is banal, but the technology behind it is non-trivial.

    • by xQx ( 5744 )
      You misinterpreted the heading - Facebook has the hardest information technology problem on the planet.

      That information technology problem has nothing to do with servers and storage.

      The hardest information technology problem on the planet is: How do the Facebook exec's stop the company going the way of Silicon Graphics (NYSE: SGI) - oh wait, no, (DELISTED by NYSE because the share price couldn't stay above $1: SGI); since the company creates no real value, and has done nothing but drop it's price since IPO.
  • With the way Facebook runs, surely it doesn't take much more than a six-hour lecture to learn.

  • Facebook's 1 Billion Users is very simple -- There isn't 1 Billion (unique) Living, Breathing, Computer Using Humans on Facebook.

    If you believe there are actually 1 Billion, completely unique users on Facebook, then I need to ask that each of you turn over your Internet Licenses, power down your computers, and find a new hobby. You are just to dangerous to be allowed on the Internet without adult supervision.

    Ever hear of bot nets?

    You know...all those virus infected zombie computers that have been u
  • by macbeth66 ( 204889 ) on Friday October 05, 2012 @09:28PM (#41565301)

    How do they count these 'users'? I have six accounts myself and most people I know have at least two. Now, there is a poll for Slashdot; How many Facebook accounts do you have?

    • by dzfoo ( 772245 )

      The article does not clearly say how they count them. However, it does suggest that it is not a real, accurate quantification of actual live accounts--more like a statistical figure.

      In the article, the figure is compared to the United Nations announcing the population on earth, so I guess it involves a lot of extrapolation based on subscription rates and usage loads.

      If you read the article, it's a bit comforting that they have absolutely no idea how many real people are actively using the system, nor which

  • Facebook actually thinks these are one billion distinct humanoids? Zuck is stoopider than his investors look. As anyone who ever posted a signup sheet in a college dorm for IM softball can attest, at least a third of people who sign up for anything, anywhere, ever, are fake.

  • It just means it doesn't work well enough.

    Facebook is the worst performing and most opaque large scale site with the worst interface that I use regularly.

    Browsing photos, the most basic Facebook activity is still a pain and buggy as hell on a slowish connection, and they keep changing the damn interface just when you figured out the previous unintuitive change. The mobile website sucks, their Android app sucks, I don't know what the new iOS app is like. The interface has gone from simplicity to being clutte

  • engineers it takes to keep such massive infrastructure up and running. If all it takes today is 2000 people, to manage the data of a billion people, then I really can't see a very __large__ need for software developers in the future.
  • by kriston ( 7886 ) on Saturday October 06, 2012 @01:07AM (#41566207) Homepage Journal

    It's not really one billion users. As any developer in any online service knows, the real figure is around 30% of the actual reported total. Still, it's no small challenge.

  • I would be interested in learning more about the software and hardware side of Facebook. But after 15 seconds of scrolling I hadn't seen any ... just a lot of tedious "gotta do this" journalism ... and gave up. LOOOOOONG BOOOORING

  • .. hearing day after day about Facebook or Zuckerberg, seeing Zuckerberg's face in some *cough* "creative" way or hearing him heralded as some business guru (let's just say I disagree).

    I think the face is the worst. I can live with the claims of him being an innovator, I got inured to that after decades worth of Microsoft marketing.

    Hell, I may switch back to a text only browser for my news - speeds things up as well.

E = MC ** 2 +- 3db

Working...