Scaling Facebook To 140 Million Users 178
1sockchuck writes "Facebook now has 140 million users, and in recent weeks has been adding 600,000 new users a day. To keep pace with that growth, the Facebook engineering team has been tweaking its use of memcached, and says it can now handle 200,000 UDP requests per second. Facebook has detailed its refinements to memcached, which it hopes will be included in the official memcached repository. For now, their changes have been released to github."
Thank goodness (Score:5, Funny)
Re: (Score:2, Insightful)
Well, I think it's kind of cool that they are putting back, so to speak. If they can use that tweak, so can everyone else. If your requirements all fit on one host server, then that server might now be able to do much more. Perhaps the next changes should be to allow a setting that penalizes retail advertisements by adding some arbitrary delay of greater than 10 seconds?
Re: (Score:2)
Re:Thank goodness (Score:4, Funny)
I know what you mean, but I don't have that trouble much. Using FF with plugins I don't see much advertising at all. Sometimes, when I'm feeling nostalgic, I'll surf using the SeaMonkey browser because I left it default bare. That way I can see all those ads from doubleclick et al if I want to.
Sad but true, I don't get nostalgic much :-)
Re: (Score:3, Informative)
I was amazed at how many sites i regularly frequent that are now plastered in ads and horrible to use.
Re: (Score:3, Funny)
I did my bit a while ago by closing my Facebook account. If you care about Facebook, vote with your, um, mouse !
Support Facebook ! Close your account !
Re: (Score:2)
Re: (Score:2)
[Unintelligible] Facebook [Unintelligible] (Score:3, Funny)
Re: (Score:2, Informative)
Re:[Unintelligible] Facebook [Unintelligible] (Score:5, Funny)
You're wrong, that's five word.
Re:[Unintelligible] Facebook [Unintelligible] (Score:4, Funny)
If you want to be *quite* technical (and I think it's quite hilarious we're being modded "informative" and "insightful"), the string "140 millions" would be broken into only four words in correct English: One hundred forty millions.
I presume the "five words" comes from the usual way to say it, one hundred and forty millions, which is technically incorrect as the "and" should refer to the decimal point, as in thirty-two and five one-hundredths.
I am unsure about the hyphen between one and hundred, though...
Re: (Score:2)
Where does this idea come from? I remember being told it in grade school, but the practice of saying "and" between hundreds and tens has a long pedigree in English.
Re: (Score:2)
Its also used in older English when writing numbers such as "three score and four".
Re: (Score:3, Informative)
Wow (Score:1)
Re:Wow (Score:5, Informative)
Re:Wow (Score:4, Funny)
"Your business sound more important with VmWare!"
Re:Wow (Score:5, Informative)
Source: http://frro.net/blog/2008/04/26/just-how-big-is-facebooks-infrastructure/ [frro.net]
Re: (Score:2)
What can I say: What they know about you can fill a warehouse.
Re: (Score:3, Insightful)
What they know about you can fill a warehouse.
What they know about you is only what you tell them.
Re: (Score:2)
No, that's the problem with facebook, what they end up knowing about you is much more than you them. As soon as you add a page, people can find you and add information about you without your consent, not just the bare fact that you're friends with these psycho's but they will tag photos they have taken of you at various events you might not want potential employers or mates to know about etc.
Re: (Score:3, Funny)
From hardware perspective, Facebook uses 10,000 web servers and 1800 database servers to handle the massive traffic.
That's funny because the Russian Business Network uses a 250,000 strong zombie botnet to create the Facebook accounts and massive traffic...
Impressive (Score:5, Interesting)
It's pretty impressive that Facebook has been able to grow so quickly and handle so much traffic. Their down time has been pretty insignificant related to the sheer number of requests that blow through their servers every day.
There's probably a thing or two that can be learned from their developers and IT folks. I just wish I knew more about the whole underlying structure so I could appreciate exactly what they've done.
... And Yet Very Lacking From a Security Angle (Score:2, Interesting)
It's pretty impressive that Facebook has been able to grow so quickly and handle so much traffic. Their down time has been pretty insignificant related to the sheer number of requests that blow through their servers every day.
There's probably a thing or two that can be learned from their developers and IT folks. I just wish I knew more about the whole underlying structure so I could appreciate exactly what they've done.
Well, call me cynical but the things that interest me about Facebook are what has gone wrong. Like hackers selling account details for pennies [dailymail.co.uk]. This is the end result:
The scam works by a victim clicking on a spam link that appears to be coming from one of their Facebook friends or someone in their address book which lodges spyware in their machine. This then records all the information, including passwords, when they log in to various sites.
The passwords can then be sent on to money-laundering gangs who use them to infiltrate users' bank accounts.
While this is true of any other networking site, I think this severe security issue needs to be address successfully one of these days.
All I've seen Facebook do to remedy this is explain how to clean it off your computer [facebook.com].
I fear for the millions of homes where a kid logs onto Facebook, gets mail from Timmy. Clicks the link, finds n
That is *not* a Facebook problem (Score:5, Insightful)
It's just a standard trojan with an unusual delivery method of using fake Facebook profiles run by trojan bots. I can't see how this is Facebook's problem any more than it's your email program's fault that you clicked on a dodgy link without checking it.
Re:... And Yet Very Lacking From a Security Angle (Score:5, Insightful)
It can't be addressed... because it's not a security issue with the site. It's an issue that the user needs to be trained on how to spot, and good luck getting that to happen.
I mean, come on, banks have the "problem" you described, and most banks aren't what we'd call insecure.
Re: (Score:1, Funny)
It can't be addressed
Are you daft? Not only did he provide a link where Facebook was addressing it, addressing it is the only way it can be combated!
Re:... And Yet Very Lacking From a Security Angle (Score:5, Interesting)
Facebook would do well to proactively encourage users to prevent such attacks by securing their systems. For example, by installing this simple application, you can ensure that your computer will never fall victim to malware:
http://not-malware.i-promise.org/magic-bullet.htm [i-promise.org]
Just enable scripts and click OK whenever it tells you to. It's that easy.
Now, if /. allowed me to post the (fake) link above, how are they any more at fault than facebook is for allowing potentially dodgy links to be shared via their service? They even went the extra step of helping users remove the malware from their PCs. I'd imagine that most conduits for malicious links (IM, social networking, e-mail, online forums, etc) wouldn't have even gone that far. Their users were being targeted and exploited, so they helped them avoid being taken advantage of - Good on 'em.
Were I malicious, I could grab the e-mail address you share in your title line, look through your /. 'friends' list for other accounts with posted addresses, and e-mail you a malicious link "From" one of them. How would that be different?
Re:... And Yet Very Lacking From a Security Angle (Score:5, Funny)
I really need that application. I get so many viruses.
Re: (Score:2)
It would be no different. I think the more interesting problem here is that while social engineering attacks are pretty damn easy to pull off with complete strangers (I speak from experience; I did some harmless stuff ages ago just to see), they move into the realm of tr
Re:... And Yet Very Lacking From a Security Angle (Score:4, Interesting)
That sounds pretty proactive to me
Re: (Score:2)
Re: (Score:3, Interesting)
Re: (Score:3, Interesting)
Since it's that time of the year... (Score:2, Funny)
...I thought I should make a Christmas carol about what we see on the net everyday.
Smashing through the door, comes Firefox three browsing sites we go laughing at IE all the way ha ha ha!
Steve Ballmer yells on youtube, making children cry. Oh what fun it is to see that stupid Windows guy. Hey!
Jingle bells Digg smells Slashdot all the way! Oh what fun it is to post on facebook every day, yay!
Pretty impressive operation (Score:5, Interesting)
at least for me being a 38yo undergrad.
We had one of their engineers give a talk a couple of weeks ago. The most recent number he had was 120 million members (who've logged on in the last 30 days) and over 65 billion page views per month. And they do it with 200 or so engineers.
I was fully expecting (being interested primarily in verifiable systems and fp) to be annoyed by this talk, but they have some pretty interesting problems to solve over there. The fact that they're doing it with OSS, and giving back to boot, really made my day.
Re:Pretty impressive operation (Score:5, Funny)
Yea, but if they could do it with Windows, now that would be a challenge!
Re: (Score:3, Interesting)
It would be hotmail all over again, but even stupider.
Re:Pretty impressive operation - NOT! (Score:2, Interesting)
"We discovered that under load on Linux, UDP performance was downright horrible. This is caused by considerable lock contention on the UDP socket lock when transmitting through a single socket from multiple threads. Fixing the kernel by breaking up the lock is not easy. Instead, we used separate UDP sockets for transmitting replies (with one of these reply sockets per thread). With this change, we were able to deploy UDP without compromising performance on the backend..."
He
Blaming Linux... (Score:5, Insightful)
We discovered that under load on Linux, UDP performance was downright horrible. This is caused by considerable lock contention on the UDP socket lock when transmitting through a single socket from multiple threads. Fixing the kernel by breaking up the lock is not easy. Instead, we used separate UDP sockets for transmitting replies (with one of these reply sockets per thread). With this change, we were able to deploy UDP without compromising performance on the backend.
I bolded the quote to show what their real problem was. They had a shit load of threads trying to use a single socket and of course there was huge overhead involved due to the mutex lock (Semaphore on kernel side) on a shared resource (the socket). So they blame Linux instead of them selves for such a half-ass implementation of sending out packets from multiple threads with a single socket. They would have gotten the same exact result if they tried it with a single TCP connection socket and attempted to have multiple threads firing off packets with that. If you want multiple threads sending out packets use multiple sockets... Wow what a concept!
Sorry for my ranting, but it just pisses me off when moron programmers blame the operating system for their own stupidity.
Anyway, haven't nearly all MMOs gone with using UDP internally of the game cluster network and TCP externally to reduce latency and network overhead? So this is nothing new to me.
Re: (Score:2, Insightful)
Linux is pretty terrible for performance multi-threading, that's a fact. It features unreliable file IO too, but I digress..
In the case of Facebook, it's true that it's not the OS fault since Mutexes are always slow anyway.
There are lockless libraries that lock the CPU(s) for one cycle so that the program doesn't need to lock a mutex to increment a counter, for example. Thousands of times faster...
But these wouldn't have helped there. Like you said, it just seems like a design problem in the software. S
Re:Blaming Linux... (Score:4, Informative)
Mutexes aren't always slow. In the uncontended case they don't require a system call (although they do require an atomic operation which involves some inter-processor signalling).
Lockless algorithms are generally harder to get right, from what I've seen. It's not just locking the cpus for a cycle, but you also need to worry about using memory barriers (generally written in assembly) to enforce correct visibility across all cpus in the system.
There are guys on comp.programming.threads that spend a *lot* of time trying to perfect them, and there are often subtle errors that pop up later on. Given the number of problems that regular lock-based algorithms cause, I'd only use lockless if it's absolutely necessary.
Re: (Score:2)
Excuse me?
Linux is pretty terrible for performance multi-threading, that's a fact. It features unreliable file IO too, but I digress..
Which part of your sentence do you digress? What facts do you own that the rest of us don't have?
My SBC 486 class ELAM chip with 16Meg of RAM running a 2.4.16 Linux kernel says your full of Shit. The SBC board sitting next to me is currently handling 203 simultaneous threads/sockets and responding with the less than 1ms response time required by the hardware manufacturer(se
Re: (Score:1)
Re:Blaming Linux... (Score:5, Insightful)
Re:Blaming Linux... (Score:5, Insightful)
This statement is just downright disingenuous and wrong. UDP performance in general on Linux is comparable or better than other Operating Systems. What he found out is that accessing a single UDP socket on Linux requires a lock, and that when trying to share that lock over multiple threads you have a performance issue. Welcome to intro level operating systems.
This has nothing to do with UDP performance, which I define as either throughput or in some cases packets per second. He then goes on to imply that he worked around some issues in Linux, when in actuality he attacked the problem from the wrong angle and through trial and error found the obvious solution. Why would you even think to use the same socket in a connectionless protocol like UDP in the first place?
I do agree that in general the article was written in more or less praise of Linux, but reading that sentence makes my blood boil.
Re:Blaming Linux... (Score:4, Insightful)
Too often the people that are left to explain the problem in detail to the press are not the engineers that worked on the solution for that problem. If we had a discussion with one of them, we would hear a totally different story!
Re: (Score:2)
If we had a discussion with one of them, we would hear a totally different story!
Then again, a lot of times "software engineers" are fumbling around and gaining experience on the job. There's just too much to know and too little standardized knowledge. Having read the article, the author sounds like he was involved.
Re: (Score:3, Insightful)
Wow, you're uninformed on multiple levels with this post.
1. "They" didn't write memcached. Livejournal did, and then they open sourced it. "They" didn't provide a half-assed implementation. They pushed a piece of open source software further than it had before, and found problems.
2. If you'd read the next sentence right after your bold line, you'd notice they were talking about a kernel lock. Not a lock in memcached. Thats a totally valid reason to blame linux.
If you bothered to actually spend some t
Re:Blaming Linux... (Score:4, Informative)
2. If you'd read the next sentence right after your bold line, you'd notice they were talking about a kernel lock. Not a lock in memcached. Thats a totally valid reason to blame linux.
How do you hope to architect a fix for this? Thought I don't know the specifics, they said that they were using the same UDP socket to transmit from multiple threads. That means you have one kernel space data structure across the entire UDP/IP stack being shared by multiple threads. Therefore you need a lock around updates to that data structure.
Until we see some atomic sendto() operations this is not going to change.
Re: (Score:2)
How do you hope to architect a fix for this? Thought I don't know the specifics, they said that they were using the same UDP socket to transmit from multiple threads. That means you have one kernel space data structure across the entire UDP/IP stack being shared by multiple threads. Therefore you need a lock around updates to that data structure.
No idea, I haven't reviewed the kernel either. But from this line:
Fixing the kernel by breaking up the lock is not easy.
It would appear that they did. It is not impossible to write a lockless queue mechanism.
Re: (Score:2)
2. If you'd read the next sentence right after your bold line, you'd notice they were talking about a kernel lock. Not a lock in memcached. Thats a totally valid reason to blame linux.
If you bothered to even read my entire post you would see that I acknowledged the fact they were talking about the kernel lock on the socket being the problem, but I also mentioned reason as to why it was happening (the socket is a shared resource: buffer management, FIFO, etc..) and realistically completely unavoidable in the kernel. Instead the only reasonable way to fix it is to use multiple sockets of which they did afterward to resolve the issue which should have been a no brainer to begin with.
My p
Re: (Score:3, Interesting)
Then there was this:
Likewise, I thought irqbalance [irqbalance.org] already handles this? It's fairly commonly installed in 64-bit distros, probably most others by now. Not to mention you could go to TOE for the machines you have the most
Re: (Score:2)
It's just you thinking that they're blaming Linux. They built their system, found some roadblocks in memcache and the Linux kernel, and fixed or worked around them. Then they publicized their fixes like good OSS users should.
It's only "blaming" Linux if you think Linux is perfect and can do no wrong.
Re: (Score:2)
Are there other oses (FreeBSD, Solaris?) which would had been able to handle multiple threads using the same socket better?
Re:Blaming Linux... (Score:5, Insightful)
[...] So they blame Linux instead of them selves for such a half-ass implementation of sending out packets from multiple threads with a single socket.[...]
Sorry for my ranting, but it just pisses me off when moron programmers blame the operating system for their own stupidity.
The point is that it wasn't their own stupidity. They took someone's open source project and improved it so it could better handle high loads. I don't see them blaming Linux, I see them recognising the limitations of the system they are using and coming up with a solution and then sharing it. Normally, this is cause to say "Yay! Open source!" rather than calling them "moron programmers".
Re: (Score:2)
Part of the problem is that it's not better at handle higher loads reliably. Much of what they are doing is basic stuff to improve performance. They could improve performance even more if they stored required data in Varnish instead of memcache and use the remaining mem for other more important things. Of course it sounds like they are now learning about parallel programming so it'll be a while before they get there.
Re: (Score:2)
multiplexing? (Score:1)
Why not just multiplex memcached requests on single connection at web host level?
I'm jenny seeing forest gump @ wash momument (Score:1)
I went to high school with the guy who wrote that post at facebook!
"PHP Doesn't Scale" (Score:5, Interesting)
Like or hate social networking. Facebook has gone a long way in showing how well PHP can be made to scale. They also contribute quite a bit back to the PHP project and PHP related projects.
5 years ago if anyone came along saying they were going to build a website in PHP ./ would be up in arms calling them idiots of all sorts and saying they NEED to go with compiled C or Perl.
Re:"PHP Doesn't Scale" (Score:4, Interesting)
PHP is good for all types of projects. It's the use of PHP that makes the difference. If you write clear, intelligent and documented code it runs fine. It's even better if you use good function design and definitions. It's plenty fast too and can be pre-compiled or cached. It's also good at scaling because the programmer only has minimal interaction with threading, locking and similar issues and PHP leaves most of it over to the libraries (Apache, IIS, MySQL).
Programming in PHP is a lot like programming in Java: you have a bad developer and your code will run as slow as hell and will be difficult to maintain. Coding is simple and the optimization is minimal because it's a quite high level language. There are of course a lot of inherited problems in PHP (magic quotes and safe mode to start off with) but with PHP5 and PHP6 they are slowly being phased out. But if you do it well, you can write very secure and fast applications in PHP.
Re: (Score:2)
PHP has the same problems that Basic and VB have, it is easy to write very bad code and relatively difficult to write fast, easily maintainable code
It is possible to write bad code in any language but it is easier in some and PHP has a reputation (well justified) for making it easy or even encouraging bad coding practices
PHP is improving, and is vastly better than it was (mainly due to it's use in large websites) but there are still languages that are intrinsically better ...
Re: (Score:2)
Re: (Score:2)
Anything can be made to scale if you have millions of dollars worth of servers providing terrabytes of memcached instances. Scalability is an architecture problem, not a language problem.
Re: (Score:2)
No... 5 years ago they still could have made PHP scale better and then used it. (Which they did, albeit with help from the rest of the people who also helped make PHP better over the past 5 years.)
They built a tuple store. (Score:5, Interesting)
Amazon and Google faced similar problems, and dealt with them in ways that are roughly equivalent - by adding a tuple store to their system.
If the data behind your web site is mostly accessed via one primary key, a tuple store, something that stores name/value pairs, beats a general-purpose relational database. Both Amazon and Google have such a mechanism in their "cloud" systems. Facebook has a somewhat low-rent solution; they're front-ending MySQL with a tuple store cache. This only works if all the queries contain some ID that has to match exactly, like user ID. Effectively, instead of one big database, the problem consists of a large number of tiny databases, all somewhat independent. Problems like that can be scaled up without much trouble.
Tuple stores distribute nicely - you can spread them over as many machines as you want, just by cutting up the keyspace into conveniently sized shards. There are distributed relational DBMS systems, but they have to be able to do inter-machine joins, which is a hard problem. (That's what you pay the big bucks to Oracle for.)
Re: (Score:3, Interesting)
You're right about the key space splits, there's an addon to memcached called libketama that uses consistent hashing to do exactly that.
Re: (Score:2)
Re: (Score:3, Interesting)
If I understand the grand-parent post and this space in general correctly, think things like BigTable [google.com] at google or open-source implementations like Hypertable [hypertable.org] or HBase [apache.org].
Re: (Score:2)
Right. The term "key/value pair" is generally used by "cloud" people. The term "tuple store" is a more generic term from academia.
140 million (Score:2)
Re: (Score:2)
That's not active users. Many people register and never go back. Many people register several user accounts. For me, I registered a Facebook account a year or two ago, looked around and have never been back. Never will. There's nothing of value nor interest to me on Facebook. Yet they are presumably counting my id in that 140 million like I'm
Re: (Score:3, Insightful)
According to a poster further up, the figure is based on the number of users that have logged in in the last 30 days. While that number will still be a bit high it shouldn't be awful.
Biggest takeaway from this story? (Score:2, Funny)
Yes, (Score:4, Insightful)
Being able to find old friends you haven't been able to contact in years.
Having a central pull information spot rather than the push model of spaming every email address you have with pics of the new baby, house, car, toaster.
A central and standardized organization spot for arranging informal gatherings with friends, like parties.
Re: (Score:3, Funny)
Re: (Score:2, Insightful)
You left out 'providing a commercial data mine for companies to exploit'.
Re: (Score:2)
Er, in Soviet Amerika, HTML validates you?
Yeah, but how many REAL users are there? (Score:2, Funny)
And 150 million of those users are bots.
Either that or facebook has tonnes of supermodels that have only two or three friends. ...not that I've been searching ;)
Upgrading.... (Score:4, Funny)
Re: (Score:2)
the reverse would be more likely (Score:2)
The traffic levels [google.com] aren't even close.
Re: (Score:2)
Re:I have been wondering for a while... (Score:5, Informative)
Myspace used to run on cold fusion but switched to .NET. facebook runs on LAMP, though they have a customized MySQL and a customized linux kernel with support for the hierarchial page pinning algorithm.
Re: (Score:2)
[citation needed]
v.2
Re: (Score:2)
I wonder how long that will last, with MS now being a 5% stakeholder.
Re: (Score:2)
By 5%, I of course mean the 1.6% that was successfully negotiated in 2007.
Re:I have been wondering for a while... (Score:5, Informative)
Re: (Score:2)
The number of accounts is not relevant. What matters is the traffic, double for face book, with Myspace slowly decreasing:
http://trends.google.com/websites?q=facebook.com%2C+myspace.com&geo=all&date=all&sort=0 [google.com]
Re: (Score:2)
That's what he said. I knew that reading the article and summary were considered taboo here, but not even reading the comments now? This place really has gone to shit.
Re: (Score:2)
Now I get the mundane details of everyone's life, such as "Getting a haircut, yea!" on the rare occasions I check it. At least people can't bug me to be on it anymore.
Re: (Score:2)
I generally like looking through my friends' new pictures and sometimes their notes (if the note shows up in the feed and looks interesting).
Re: (Score:1)
There are some people from work who I added as friends, before I knew them really well. Now I get all their exciting updates like "So and So just joined the group 'Whereever you go, there is a Jew' or 'Jews are the nicest people'". This person is really nice at work, but I'd really like to sever this facebook relationship. Not because they are Jewish, mind you, but because they wear their religion on their sleeve, have some strong religious views (they could easily be Hindu, Muslim, Christina and I'd think
Re: (Score:2)
Re: (Score:3, Informative)
Yes, you can delete your account... not sure if Facebook purges the data from their servers, but it shouldn't be accessible to anyone else after you delete your profile.
You can also set it so that only certain groups of people (or no one at all) can see your profile, customizable on an item-by-item basis (including various things like phone, address, profile picture, status, birthday, birth year, friends list, bio, wall posts, videos, pictures) and/or comment on your wall, pictures/videos, or send you messa
Re: (Score:2)
Simple solution is don't tell facebook ...and then Facebook will not know
If your phone number, address, work history, educational history is on facebook then you are foolish, your friends and family already know this information (or don't care), and facebook does not need to know
The security on facebook should be assumed to be flawed, since it is unlikely to be perfect and so you should not put any more information on than you would be willing to let everyone on facebook see ...
Re: (Score:3, Informative)
Advertising, I assume.
Re: (Score:2)
No assumptions necessary
Well, I'm positive of the advertising, and I'm assuming that's how they make money. They might have other revenue sources that I'm unaware of...
just visit the site
'course, I use adblock plus, so visiting the site wouldn't prove much in my case. heh.