Forgot your password?
typodupeerror
Earth Facebook PHP

The Environmental Impact of PHP Compared To C++ On Facebook 752

Posted by Soulskill
from the efficiency-is-overrated dept.
Kensai7 writes "Recently, Facebook provided us with some information on their server park. They use about 30,000 servers, and not surprisingly, most of them are running PHP code to generate pages full of social info for their users. As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down (assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code), or a reduction of 49,000 tons of CO2 per year. Of course, it is a bit unfair to isolate Facebook here. Their servers are only a tiny fraction of computers deployed world-wide that are interpreting PHP code."
This discussion has been archived. No new comments can be posted.

The Environmental Impact of PHP Compared To C++ On Facebook

Comments Filter:
  • by theNetImp (190602) on Sunday December 20, 2009 @12:01PM (#30504286)

    You mean kind of like Road Send

    http://www.roadsend.com/home/index.php?pageID=compiler [roadsend.com]

  • by LizardKing (5245) on Sunday December 20, 2009 @12:24PM (#30504486)

    .. because I didn't ever think I'd be defending PHP.

    However, it is a much better choice for a web application than C or C++ - and I say that as someone who codes C, C++ and Java for a living. There are no decent web frameworks for C++, memory management is still an issue despite the STL, and the complexity of the language means both staff costs and development time are inflated. Peer review is harder, as the language is fundamentally more difficult to master than PHP. Compared to Java, the development tools are poorer, and things like unit testing a more complicated despite the availability of things like Cppunit. There's no "standard" libraries for things like database access, and no literature that I am aware of that describes how you would go about designing a framework for C++. You'd most likely end up porting something like Spring to C++, and the even if you published your code on the web, I doubt much of a community would build up around it.

    If you want a less contentious argument, and one which can be backed up with hard evidence, then argue PHP that should be replaced with Java. A well written Java web application, using a lightweight framework such as Spring or PicoContainer, should outperform ad-hoc C++ code.

  • by pjr.cc (760528) on Sunday December 20, 2009 @12:33PM (#30504560)

    Seriously, years ago I started working on a c++ version of j2ee (not just servlets, the whole kit) and i mean providing similar functions not identical methods of execution obviously. It wasnt terribly hard actually. But it all falls apart really quickly cause of several reasons:

    1) platform architecture - the dependence here, even between different versions of the same distribution was a pain and essentially spelt the end of my work. So I was stuck with "do i make web apps c++ soruce, or shared library binaries?" to which there is only one real answer for portability - source.
    2) its a systems langauge - dear god that makes it painful for so many reasons.

    There are caveats to both those, but the reality is that php exists because it fulfils a need and it does it quite well. To compare the two (c++ and php) is a little ridiculous and ultimately this article just reeks of "please everyone advertise my c++ web tool kit for me!". Sure, facebook (and trillions of others) MIGHT move to c++ web tool kit, but find me a dev that knows how to code an app it, now find me 2, now find me 200 cause thats how many i'd need to write and maintain faceboot apps in c++.

    Even taking the OP's assumtion c++ is 10 times more efficient at what php does and that you could actually code facebook in it as actually acurate and that php vs c++ is a one-to-one relationship for things like code maintenance, your still stuck with "how many API's am i going to have to re-write and how many php api's do i use that dont even exist in c++". Its ludicrous to assume that you could drop-in replace php with witty without ending up coding tonnes of c++ code just to do things that PHP already provided. Not to mention the zillions of little extensions that revolve around php to accelerate its web-abilities (memcached for example). The number of things that can be used along side php for web-related things and the number of api's in-built to php just mean witty is never even going to be viable as an alternative. Lets also not forget there are millions of people round the globe using php for web stuff - which ultimately leads to php being a good web language (i.e. security problems being found, optimizations, etc etc).

    Of course, wouldn't facebook be using something like zend to compile php pages? I mean seriously, if the 25000 servers are running php and not running zend the waste here just in cost of servers would be unbelievable - shear idiocy on facebooks part (if it were true, and i'd very much doubt it) and I imagine zend would have almost given it away for free just so facebook could say "we got a x% improvement using the zend compiler".

    So, I wonder how many people are now learning about witty for the first time (which seems like the only real reason for the article to begin with). Better advertising than adwords!

  • by postbigbang (761081) on Sunday December 20, 2009 @12:39PM (#30504614)

    Some optimized assembler would make a difference (ducks).

    But network latencies, number of sustainable TCPs per session, db latency, weird table lookups (even arp drags a server down when you have 20K+ connects) are all at issue. Add in various dirty caches, file locks/unlocks and other OS machinations, and life can be tough for any app written in anything.

    Then there are the backup servers, the availability servers, the DNS servers, the coffee servers, it just gets bogged down. A 10:1 efficiency claim is probably just language fanboy-ing..... or a consulting job looking for a spot marked X.

    Certainly it's nice to be green... but using better optimization tricks (like GCD) for multi-cores is bound to help.... tickless kernels..... SSDs..... C++ wouldn't be my first pick.

  • by Colin Smith (2679) on Sunday December 20, 2009 @12:45PM (#30504664)

    It's a phenomenon we have also noted.

    Sure C++ would be faster running but not necessarily more efficient in terms of dollars.

    I think you'll find that the servers come out of the operational budget, not the development one. So the costs of running 10x more servers don't factor into development effort. The costs should of course be charged back to the dev teams.
     

  • Re:Umm... no. (Score:5, Interesting)

    by mariushm (1022195) on Sunday December 20, 2009 @12:58PM (#30504762)

    The author is pulling numbers out of his ass and has no clue about what uses most time (waiting for database results mostly), about PHP accelerators and about caching systems like memcached.
    He's comparing performance of php script running on a raw PHP installation versus running a C++ version of the same script, doing calculations that almost never apply to real world scenarios.

    I don't see how any company would use C++ to develop their whole systems except maybe for some CGI scripts. Not even Google does it, afaik they use Perl and Python a lot.

    Anyway, the number of servers has no direct correlation to the programming language. Out of those thousands of systems, lots of them are read only database servers in a cluster, lots are only serving static files (thumbnails, images used in CSS files on people's pages and so on), some servers are used solely for memcached instances and content used very rarely, some are load balancers....

    Basically, the author has no clue.

    I always found Livejournal's presentation about scaling very insightful, especially as it's a pretty big site, just like Facebook and other big time sites. The second link gives a lot of details about how they fine tune mysql and other parts of the system, which just goes to show how the apparent speed improvement of C++ versus PHP can overall be actually insignificant.

    http://video.google.com/videoplay?docid=-8953828243232338732&ei=3VUuS5-hLaKi2ALXqanJBQ&q=livejournal# [google.com]
    http://www.danga.com/words/2004_mysqlcon/mysql-slides.pdf [danga.com]

  • by digsbo (1292334) on Sunday December 20, 2009 @01:04PM (#30504820)

    Unfortunately, the C++ programmer who writes bad C++ code is more common than the C++ programmer who writes good C++, and the bad C++ is probably harder to rework than bad php.

    I once rewrote a bit of software that some MIT grads did. Theirs was 20K lines of C++, used 110 MB ram (constantly newing and deleting), used dozens of threads (constantly spawning and harvesting), and drove the system to its knees (90% system, 10% user load). My 2K (yes, one-tenth) lines of straight C used 5 threads (preallocated), a configurably preallocated ring buffer (about 100K in practice), and used less than 5% of user time with no measurable system load. And I was able to do this adding functionality and improving the reliability. Very few defects in 2000 lines of C.

    The moral of the story is that C++ is complex, and even really smart people can do awful things with it. And the awful things really smart people do are worse than the awful things average or below-average people do. The programmer should use the tool he knows best, and if the tool isn't the right one, learn it, or let someone else do it.

  • by Anonymous Coward on Sunday December 20, 2009 @01:15PM (#30504906)

    "Sure C++ would be faster running but not necessarily more efficient in terms of dollars."

    I guess that's the question: is there a net economic benefit to switching from php to C (or C++) for the sake of runtime efficiency, given that running that many servers does cost a fair chunk of money? If you are writing your code for a personal server or a few dozen, obviously it doesn't pay. Development and maintenance costs dominate. And the performance ratio being assumed here (10:1) is probably ridiculous, in part because the runtime costs of PHP are mitigated by caching. But a few thousand servers? It's hard to be certain anymore. Even a 10% saving might be worth quite a bit.

    How much does a typical developer cost versus a potential saving of 1000 servers? What is the human to server runtime cost ratio? And how many developers would it take to actually replace and maintain php with C/C++ code? (It would probably take more than the number of PHP programmers, and they'd probably cost more per person, and I question whether it would be as maintainable)

    In any case, it wouldn't be as simple an analysis as the article implies. That much is certain.

  • by mebrahim (1247876) on Sunday December 20, 2009 @01:36PM (#30505054) Homepage
    Your C++ info is so 90s. FYI:
    A descent C++ web framework: Wt [webtoolkit.eu]
    Memory management is a non-issue with RAII, smart pointers, Boehm [hp.com] garbage collector, and finally Valgrind [valgrind.org].
    Qt or LiteSQL [sourceforge.net] don't need to be "standard" to do their job.
  • by tomhudson (43916) <barbara.hudsonNO@SPAMbarbara-hudson.com> on Sunday December 20, 2009 @01:39PM (#30505072) Journal

    The biggest bottle neck is probably data access, in which case the language really doesn't make much, if any difference.

    Wrong - the language makes a huge difference. Try using the c api and CLIENT_MULTI_RESULTS and CLIENT_MULTI_STATEMENTS and concatenating 10,000 queries into one request, then using mysql_next_result() to get the next result set (no, not the next row, the next result set - 0 or more rows).

    One connection. Not 10,000. A BIG difference in execution time. Testing showed that the optimum amount of strcat()ed or fsprintf'd queries was between 10,000 and 20,000 on hardware with limited resources (half a gig of ram, single cpu).

    If each page requires 50 hits on the database, you're going to see a big difference.

    Now imagine this on a machine with much more ram and more than one core.

    More reading: http://dev.mysql.com/doc/refman/5.0/en/mysql-next-result.html [mysql.com]

  • by Xtravar (725372) on Sunday December 20, 2009 @01:44PM (#30505130) Homepage Journal

    It's more like you decide you want a whole new room dedicated to watching movies, but in order to add that to your current house you'd have to spend tens of thousands of dollars and get approval from city hall and your homeowner's association. Just for a fairly small addition.

    So instead you decide to go build a new house the way you like it, from the ground up, and while you're at it you add ethernet outlets into the planning because you always wanted that in your old house but you would have had to take down the drywall in order to get them where you wanted.

  • by Cassini2 (956052) on Sunday December 20, 2009 @02:55PM (#30505628)

    I have done projects like this, and received massive speedups and performance increases. The issue is that you need to understand the real reasons why rewriting a program in C and/or assembly gives a massive performance increase. Inevitably, the reason why the C program is so much faster, is that a programmer has went through and rethought the application. The programmer eliminated string copies, string manipulations, data communication overheads, and data manipulation/translation overheads by rethinking the programs design.

    For example, imagine a very simple application designed to take a digital input, and display a red/green indicator to a user depending on the input state. Count every time a major string overhead, data communication overhead, or data translation overhead occurs in each of the proposed solutions.

    Web Solution
    1. Input digital input via PLC (Data Overhead #1)
    2. Upload data from input via PLC communications protocol to PC (Data Overhead #2)
    3. Make data available to other programs, for example RSSQL makes real-time I/O appear as SQL database queries (Data Overhead #3)
    4. Use PHP or ASP to generate a web page based on a SQL query for the real-time input (Data Overhead #4)
    5. Use a web browser to query the relevant web page. (Data Overhead #5)
    Web Solution performance: it might be able to update the display screen every 1/5 second.

    Embedded C Solution
    1. Input a data point using real-time I/O
    2. Paint a computers display screen accordingly. (Data Overhead #1)
    C Solution Performance: 1/60 second, limited by the refresh rate of the monitor.

    Assembly / Microcontroller Solution
    1. Input the data point, with INP , AX
    2. Output the data point to a Red/Green LED, with OUT AX,
    Note: the assembly implementation doesn't have any string manipulation, so it doesn't have any significant data overhead.
    Assembly Execution Time: Less than 1 micro-second.

    The crucial concept from the above example is that the programmer reduced overhead and execution time, by simplifying program operation. The problem was solved in 3 different ways, and the fastest solution wiped out all the communication/string/data management overhead. If you want to make a computer program very fast, it is necessary to reduce data communication, string manipulation, and complex data structure overhead.

    Which languages do this and why:
    Level 1 - Simplest: Assembly is the best at wiping out string overhead, because engineers willingly migrate complex functionality to hardware before implementing it in assembly. In this case, the display screen was eliminated in favour of a direct output to an LED.
    Level 2 - Low-Level: C is remarkably quick at string manipulation programs, because programmers minimize the amount of string manipulation. String manipulation in C sucks, and is difficult to get correct. As such, programmers attempt to minimize it, or use optimized tools like lex/flex or yacc/bison that automate the difficult problems.
    Level 3 - Garbage Collected: Java and .NET encourage carefree string use and data structure use. The have automatic garbage collection. As such, minimal penalties exist for the programmer to use strings.
    Level 4 - Scripted: PHP, Perl, Python are higher level languages focused on easy programming for high-level tasks. They pretty much assume the programmer doesn't care about the overhead of processing strings or complex data structures. Instead, they make it easy for the programmer to program the complex data structures.

    An application like FaceBook has to have some complex data structures to do its job. In that case, a migration from PHP to C will likely not produce great benefits, because the C program still has to do all the same work the PHP program does. The old rule was that interpreters were very slow. With modern techniques, just about any language can be sufficiently compiled to

  • Re:people use PHP? (Score:4, Interesting)

    by moosesocks (264553) on Sunday December 20, 2009 @03:31PM (#30505912) Homepage

    Actually, both parent and GP are right. PHP is wonderful for web development, but has more than a few annoying quirks with regard to consistency.

    On the flipside, it has hands-down some of the best documentation on the planet, which makes the quirks tolerable, and is a big part of the reason why the language is so popular (especially with new programmers)

    I'm seriously hoping that a new PHP release finally clears up all of the inconsistencies in the main namespace once and for all. It'll be painful at first, but a very-good-thing in the long term. Updating old scripts could even be a semi-automated process, given that the necessary changes are extremely superficial.

  • This logic is crap (Score:4, Interesting)

    by Giant Electronic Bra (1229876) on Sunday December 20, 2009 @04:35PM (#30506386)

    It would take a really serious amount of in-depth analysis of the server application to even approach knowing what the efficiency impact of using a compiled language vs an interpreter would be on any specific stack. Or even stacks in general. Plus we don't even know what it really means to be "using PHP". What is PHP doing? Is it processing templates, doing just some post or pre processing with some kind of XML pipeline in the middle, how is the PHP deployed, etc?

    It is simply ridiculous to make any assertions and claim accuracy for them. I'm no PHP fan boy by a LONG shot, but I know from hard experience that often a higher level tool which is optimized for a particular job can get the job done quite a lot MORE efficiently than a lower level one that isn't.

  • by sopssa (1498795) * <sopssa@email.com> on Monday December 21, 2009 @02:15AM (#30509516) Journal

    Contrast to php, where every script has to be loaded, interpreted, then flushed out of the system so it leaves a clean memory footprint for the next script, and where tons of variables that your script may never call have to be initialized each run. Obviously only compiling what you need and loading it once is more efficient :-)

    You're forgetting all the php optimizers, script and chunks of code caching in bytecode and ram caching of scripts. These make a major difference, but are probably just used on larger websites (like on facebook)

Never test for an error condition you don't know how to handle. -- Steinbach

Working...