Forgot your password?
typodupeerror
Programming The Internet

Scaling Facebook To 140 Million Users 178

Posted by CmdrTaco
from the that's-a-lotta-load dept.
1sockchuck writes "Facebook now has 140 million users, and in recent weeks has been adding 600,000 new users a day. To keep pace with that growth, the Facebook engineering team has been tweaking its use of memcached, and says it can now handle 200,000 UDP requests per second. Facebook has detailed its refinements to memcached, which it hopes will be included in the official memcached repository. For now, their changes have been released to github."
This discussion has been archived. No new comments can be posted.

Scaling Facebook To 140 Million Users

Comments Filter:
  • Impressive (Score:5, Interesting)

    by txoof (553270) on Wednesday December 17, 2008 @12:18PM (#26146513) Homepage

    It's pretty impressive that Facebook has been able to grow so quickly and handle so much traffic. Their down time has been pretty insignificant related to the sheer number of requests that blow through their servers every day.

    There's probably a thing or two that can be learned from their developers and IT folks. I just wish I knew more about the whole underlying structure so I could appreciate exactly what they've done.

  • by pintpusher (854001) on Wednesday December 17, 2008 @12:21PM (#26146571) Journal

    at least for me being a 38yo undergrad.

    We had one of their engineers give a talk a couple of weeks ago. The most recent number he had was 120 million members (who've logged on in the last 30 days) and over 65 billion page views per month. And they do it with 200 or so engineers.

    I was fully expecting (being interested primarily in verifiable systems and fp) to be annoyed by this talk, but they have some pretty interesting problems to solve over there. The fact that they're doing it with OSS, and giving back to boot, really made my day.

  • It's pretty impressive that Facebook has been able to grow so quickly and handle so much traffic. Their down time has been pretty insignificant related to the sheer number of requests that blow through their servers every day.

    There's probably a thing or two that can be learned from their developers and IT folks. I just wish I knew more about the whole underlying structure so I could appreciate exactly what they've done.

    Well, call me cynical but the things that interest me about Facebook are what has gone wrong. Like hackers selling account details for pennies [dailymail.co.uk]. This is the end result:

    The scam works by a victim clicking on a spam link that appears to be coming from one of their Facebook friends or someone in their address book which lodges spyware in their machine. This then records all the information, including passwords, when they log in to various sites.

    The passwords can then be sent on to money-laundering gangs who use them to infiltrate users' bank accounts.

    While this is true of any other networking site, I think this severe security issue needs to be address successfully one of these days.

    All I've seen Facebook do to remedy this is explain how to clean it off your computer [facebook.com].

    I fear for the millions of homes where a kid logs onto Facebook, gets mail from Timmy. Clicks the link, finds nothing and leave. Mom and dad log into their online banking/credit card statement later that night and ... it's only a matter of time.

  • "PHP Doesn't Scale" (Score:5, Interesting)

    by 0100010001010011 (652467) on Wednesday December 17, 2008 @12:51PM (#26147097)

    Like or hate social networking. Facebook has gone a long way in showing how well PHP can be made to scale. They also contribute quite a bit back to the PHP project and PHP related projects.

    5 years ago if anyone came along saying they were going to build a website in PHP ./ would be up in arms calling them idiots of all sorts and saying they NEED to go with compiled C or Perl.

  • by gnick (1211984) on Wednesday December 17, 2008 @12:55PM (#26147151) Homepage

    Facebook would do well to proactively encourage users to prevent such attacks by securing their systems. For example, by installing this simple application, you can ensure that your computer will never fall victim to malware:
    http://not-malware.i-promise.org/magic-bullet.htm [i-promise.org]
    Just enable scripts and click OK whenever it tells you to. It's that easy.

    Now, if /. allowed me to post the (fake) link above, how are they any more at fault than facebook is for allowing potentially dodgy links to be shared via their service? They even went the extra step of helping users remove the malware from their PCs. I'd imagine that most conduits for malicious links (IM, social networking, e-mail, online forums, etc) wouldn't have even gone that far. Their users were being targeted and exploited, so they helped them avoid being taken advantage of - Good on 'em.

    Were I malicious, I could grab the e-mail address you share in your title line, look through your /. 'friends' list for other accounts with posted addresses, and e-mail you a malicious link "From" one of them. How would that be different?

  • by Animats (122034) on Wednesday December 17, 2008 @01:01PM (#26147269) Homepage

    Amazon and Google faced similar problems, and dealt with them in ways that are roughly equivalent - by adding a tuple store to their system.

    If the data behind your web site is mostly accessed via one primary key, a tuple store, something that stores name/value pairs, beats a general-purpose relational database. Both Amazon and Google have such a mechanism in their "cloud" systems. Facebook has a somewhat low-rent solution; they're front-ending MySQL with a tuple store cache. This only works if all the queries contain some ID that has to match exactly, like user ID. Effectively, instead of one big database, the problem consists of a large number of tiny databases, all somewhat independent. Problems like that can be scaled up without much trouble.

    Tuple stores distribute nicely - you can spread them over as many machines as you want, just by cutting up the keyspace into conveniently sized shards. There are distributed relational DBMS systems, but they have to be able to do inter-machine joins, which is a hard problem. (That's what you pay the big bucks to Oracle for.)

  • Re:Blaming Linux... (Score:3, Interesting)

    by inKubus (199753) on Wednesday December 17, 2008 @01:05PM (#26147347) Homepage Journal

    Then there was this:

    Another issue we saw in Linux is that under load, one core would get saturated, doing network soft interrupt handing, throttling network IO. In Linux, a network interrupt is delivered to one of the cores, consequently all receive soft interrupt network processing happens on that one core.

    Likewise, I thought irqbalance [irqbalance.org] already handles this? It's fairly commonly installed in 64-bit distros, probably most others by now. Not to mention you could go to TOE for the machines you have the most traffic on, offloading the TCP stack to the network cards, minimizing the amount work the CPU has to do. You can max out a current processor with 10GB ethernet just on overhead..

  • by Anonymous Coward on Wednesday December 17, 2008 @01:12PM (#26147457)

    User is sent link, directed to website with malware payload, such as a 0-day IE exploit.

    Funny you should say that ... I find it hilarious that the group think here is that when a tool like IE is being raped by malicious people, it's IE's fault. When a product like Facebook is under the same target of malicious users then it's the malicious users faults? How do I know when it's the hacker's faults and when the tool's maker should be protecting me?

  • Re:Impressive (Score:3, Interesting)

    by madhurms (736552) on Wednesday December 17, 2008 @01:15PM (#26147537)
    Here is a presentation which discusses how Facebook handles billions of photos. That should give an idea about how they handle massive load in other areas: http://www.flowgram.com/f/p.html#2qi3k8eicrfgkv [flowgram.com]
  • by guruevi (827432) <(eb.ebucgnikoms) (ta) (ive)> on Wednesday December 17, 2008 @01:17PM (#26147579) Homepage

    PHP is good for all types of projects. It's the use of PHP that makes the difference. If you write clear, intelligent and documented code it runs fine. It's even better if you use good function design and definitions. It's plenty fast too and can be pre-compiled or cached. It's also good at scaling because the programmer only has minimal interaction with threading, locking and similar issues and PHP leaves most of it over to the libraries (Apache, IIS, MySQL).

    Programming in PHP is a lot like programming in Java: you have a bad developer and your code will run as slow as hell and will be difficult to maintain. Coding is simple and the optimization is minimal because it's a quite high level language. There are of course a lot of inherited problems in PHP (magic quotes and safe mode to start off with) but with PHP5 and PHP6 they are slowly being phased out. But if you do it well, you can write very secure and fast applications in PHP.

  • by Azarael (896715) on Wednesday December 17, 2008 @01:25PM (#26147717) Homepage
    I believe that there's some clever tricks you can use when generating tuple keys to make things fuzzier. Not easy, but if you customize your approach and know enough about the data, it should be possible

    You're right about the key space splits, there's an addon to memcached called libketama that uses consistent hashing to do exactly that.
  • by jcarkeys (925469) on Wednesday December 17, 2008 @01:31PM (#26147821) Homepage
    Actually, they recently created a "go-between" page for all external links, I believe. It repeats what URL is being requested and then has a button that says "go there anyway". The ones that are known viruses are completely blocked.
    That sounds pretty proactive to me
  • by Bill, Shooter of Bul (629286) on Wednesday December 17, 2008 @02:00PM (#26148297) Journal
    And if the rumors of Microsoft eventually buying a majority stake in them, that's exactly what they'll have.

    It would be hotmail all over again, but even stupider.
  • by StandardDeviant (122674) on Wednesday December 17, 2008 @02:24PM (#26148619) Homepage Journal

    If I understand the grand-parent post and this space in general correctly, think things like BigTable [google.com] at google or open-source implementations like Hypertable [hypertable.org] or HBase [apache.org].

  • by Anonymous Coward on Wednesday December 17, 2008 @02:35PM (#26148795)

    at least for me being a 38yo undergrad.

    Wow, and I thought my taking six years to graduate from college was bad. Hopefully you're getting some good tail from confused 20 year olds (okay -- 'confused' is redundant here). Once you're too enfeebled by age to be able to lift your own 40, it's probably time to graduate.

    There is nothing "bad" about it. I'm 35 and back in school after a 13+ year career in software engineering (no schooling, so this is my first real stint in college). I grew bored with computers so I decided to get a degree in math and econ. I'm having a great time and learning so much. The best part about it is that I know why I'm in school - I didn't go because that's just what you do after high school. ;)

    I wish my fellow "old person" the best of luck!

  • Re:Impressive (Score:3, Interesting)

    by CFrankBernard (605994) <cfrankb@gm[ ].com ['ail' in gap]> on Wednesday December 17, 2008 @03:00PM (#26149147)
    I'm not surprised considering who has a vested interest in Facebook profiling: http://albumoftheday.com/facebook/ [albumoftheday.com]
  • by Anonymous Coward on Wednesday December 17, 2008 @04:34PM (#26150463)
    From the article by Paul Saab:
    "We discovered that under load on Linux, UDP performance was downright horrible. This is caused by considerable lock contention on the UDP socket lock when transmitting through a single socket from multiple threads. Fixing the kernel by breaking up the lock is not easy. Instead, we used separate UDP sockets for transmitting replies (with one of these reply sockets per thread). With this change, we were able to deploy UDP without compromising performance on the backend..."

    He mentions at least 3 other problems which (to anyone wanting to get the job done well) read as "Linux is not the best OS for this job!", but they're still struggling with Linux and trying to hack up some kind of ad hoc solution. Why not just use FreeBSD instead?

    No, this is not flamebait, I'm being serious.
  • by Anonymous Coward on Wednesday December 17, 2008 @07:48PM (#26152793)

    Facebook has gone a long way in showing how well PHP can be made to scale.

    I know someone who works at Facebook. PHP causes a world of pain for Facebook. They are scaling out IN SPITE OF the horrible performance of PHP.

    They are finding ways to mitigate some of the pain and they are sharing those ways. But I'll bet they wish they could go back in time and build their site on something else.

Wherever you go...There you are. - Buckaroo Banzai

Working...