Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
PHP Programming Social Networks

Facebook Rewrites PHP Runtime For Speed 295

VonGuard writes "Facebook has gotten fed up with the speed of PHP. The company has been working on a skunkworks project to rewrite the PHP runtime, and on Tuesday of this week, they will be announcing the availability of their new PHP runtime as an open source project. The rumor around this began last week when the Facebook team invited some of the core PHP contributors to their campus to discuss some new open source project. I've written up everything I know about this story on the SD Times Blog."
This discussion has been archived. No new comments can be posted.

Facebook Rewrites PHP Runtime For Speed

Comments Filter:
  • by Daengbo ( 523424 ) <daengbo@gmai[ ]om ['l.c' in gap]> on Sunday January 31, 2010 @09:27AM (#30969866) Homepage Journal

    Try the "Lite" [facebook.com] version. It's much faster, and doesn't have that annoying chat bar.

  • by Anonymous Coward on Sunday January 31, 2010 @09:44AM (#30969946)

    From TFA: UPDATE: After sifting through the comments here and elsewhere, I'm inclined to agree with the folks who are saying that Facebook will be introducing some sort of compiler for PHP.

    Not a fork. Not as newsworthy as implied.

  • by diretalk ( 1712478 ) on Sunday January 31, 2010 @09:47AM (#30969966)
    This PHP compiler item was revealed three weeks ago by a Facebook employee. Read at http://therumpus.net/2010/01/conversations-about-the-internet-5-anonymous-facebook-employee/?full=yes [therumpus.net]
  • by Anonymous Coward on Sunday January 31, 2010 @10:05AM (#30970046)

    Not true. When you start using memcache (which FB heavily relies on), the latency of data retrieval drops down below the latency to execute PHP opcode.

    PHP has a big problem, it executes scripts from scratch on every request. Every request has to load over and over again same configuration data, even if it comes from memcached -- there is TCP latency involved.

    Applications that use regex-based URL mapping to controllers suffer heavily in PHP because on each request the interpreter has to compile the regex and then do the matching.

    Instead of preparing all the app specific configuration once upon startup, and use that as process-local data.

  • by paziek ( 1329929 ) on Sunday January 31, 2010 @10:05AM (#30970048)

    If thats so, then they are reinventing wheel, since there is already PHP compiler available, with is also open source: http://www.roadsend.com/home/index.php [roadsend.com]

  • Re:Gotten (Score:3, Informative)

    by Aladrin ( 926209 ) on Sunday January 31, 2010 @11:10AM (#30970440)
  • by Anonymous Coward on Sunday January 31, 2010 @11:11AM (#30970454)

    You should check your facts first.

    I did not say PHP had to PARSE each script per request -- which would happen without an opcode cacher. It has to EXECUTE the script, opcode cached or otherwise, from scratch on each request.

    So there is no way for PHP to hold application data in memory between requests, except by using shared memory or a memcache. Other languages like Python, Java, etc... allow you to instantiate your classes and their data upon startup, and then call methods per request, without having to instantiate all the classes over and over again in each request.

    Of course, sometimes you have to, but PHP does not even allow you to pre-instantiate those classes that do not have to.

    So on each request, the application has to load its configuration, even if it is stored as PHP array in some config.inc.php. It has to re-evaluate the arrays (construct the array, build hash keys, etc...) even when opcode cached.

    Granted, certain important extensions do keep pools of resource handlers between requests, like PDO, memcached, etc...

    Also, I did not mention Apache. I said (PHP) applications that map URL patterns to controllers -- aka central index.php that patches URL to controllers and their methods. That is different from Apache rewriting that maps URL patterns to PHP scripts.

  • by Lennie ( 16154 ) on Sunday January 31, 2010 @11:21AM (#30970514)
    Facebook added to memcache the ability to use UDP instead of TCP. They also changed MySQL so one replication-command from one datacenter to the next would also invalidate what is in memcache on that location.

    At some point they have so much traffic from their webservers to their backendsystems, they saturated their internal network and were dropping UDP.

    That's the kind of problems/scale they deal with, I'm surprised PHP wasn't their biggest bottleneck before (they did some work on PHP already, but not something like this).

    After all Facebook is the second site after www.google.com-search page (which handles 'just' one task) and Google has pretty much a custom-build platform.
  • by Anonymous Coward on Sunday January 31, 2010 @11:38AM (#30970614)

    That's the kind of problems/scale they deal with, I'm surprised PHP wasn't their biggest bottleneck before (they did some work on PHP already, but not something like this).

    It is easy to add new front end servers, they all connect to the same pool of backend resources. Backend coordination is the most fragile point in the life of a request. You can't just add backends, which they realized with memcached, adding more did not help.

    However, if they used a faster language, which is the point of TFA, they would have to use LESS front end servers and with that spare the money.

    Because, the backend is already working as fast as possible -- in terms of programming language used -- native on CPU. I guess the backend could get faster only if they reimplemented MySQL and memcached, taking out the functionality that is not required, or adding critical new functionality. Oh, wait... they did.

    So that leaves removing PHP from the chain.

  • by TheRaven64 ( 641858 ) on Sunday January 31, 2010 @12:22PM (#30970924) Journal
    The interpreter we have uses direct AST interpretation, which is pretty slow. On a simple test program (a parser), it took 0.96 seconds of CPU time in the interpreter, 0.023 seconds in JIT-compiled code (pretty primitive so far, doesn't use any profiling info) and something a bit less than that in statically compiled code. Since running that benchmark, I've made a few improvements to the compiler, so it's probably a bit faster now.

    For a recursive Fibonacci calculation, implementing the same algorithm in C, Objective-C and Smalltalk took 2.35, 6.60, and 5.69 seconds, respectively (calculating fib(30) 100 times), with about a 50% variation margin on successive runs. The Smalltalk version was not always faster than the ObjC version (it was most of the time; not sure why, probably some weirdness with the Smalltalk code happening to line up with cache boundaries better), so it's safe to consider them roughly the same speed.

    It's worth noting that the Smalltalk version, unlike the C and Objective-C versions, will never suffer from integer overflow. Tweaking the benchmark so that it computes fib(47) in the three versions, the timing results are: 50, 130, and 280 seconds, respectively.

    The difference is that the Smalltalk version generates the correct answer, while none of the others do. Personally, I'd rather have slow code generating the correct answer, but maybe that's just me. It is, of course, possible to write code in C that would check for overflow (in this case it's relatively easy, you can just test whether the sign bit flipped because you're just adding two positive integers), but returning something that is either an integer or an arbitrary-precision value in C is a bit harder and you'd end up with at least four times as much code to make the C version, and a lot more potential for bugs.

    By the way, calculating fib(47) with a sensible algorithm in Smalltalk takes a tiny fraction of a second, highlighting the fact that good algorithms are usually more important than good compilers.

    The compiler targets the (GNU) Objective-C ABI, so Smalltalk and Objective-C classes can be used interchangeably (you can, for example, subclass an Objective-C class with Smalltalk and then call the Smalltalk methods from Objective-C). Some of the improvements I've recently made to the Objective-C runtime mean that the compiler can now emit code to do polymorphic inline caching and speculative inlining. It doesn't yet do either, but in benchmarks these reduce the cost of a message send to within a hair of the cost of a C function call. For most uses, Objective-C is already fast enough, so I'll probably only implement these as a profile-driven optimisations and enable them for hot code paths where the message sending overhead is actually important.

    I'm giving a talk about this at FOSDEM in the GNUstep developer room next weekend.

  • Re:Resin Quercus (Score:3, Informative)

    by parryFromIndia ( 687708 ) on Sunday January 31, 2010 @12:35PM (#30971028)
    Here is the URL in case people are interested in checking this out - http://www.caucho.com/resin-4.0/doc/quercus.xtp [caucho.com] .
    In summary:
    It is OpenSource, 100% Java and it brings all the advantages of using a JVM to PHP - performance (JIT), Safety, Scalability (clustering/load balancing), quality tools (Development, Profilers). One can use most of the Java technologies in PHP to ease development even further - XA Transactions, JNDI, Connection pooling, object caching for example.

    Besides, improving performance of this pure Java PHP implementation ought to be easier than improving the PHP runtime. (Java6 onwards the available tools to debug and optimize Java applications have made significant progress. jmap/jhat , easy heap dumps on OutOfMemory, Object Query Language etc. already come bundled with the JVM and then there are Eclipse and NetBeans GUI profilers.)

    Also worth checking out Dr. Cliff Click's extensive Java vs. C performance blog post - http://blogs.azulsystems.com/cliff/2009/09/java-vs-c-performance-again.html [azulsystems.com] .
  • by rpetre ( 818018 ) on Sunday January 31, 2010 @12:37PM (#30971046)
    I'd like to point out that long before xkcd there was userfriendly [userfriendly.org], and that in my circle we still like to and this sort of joke by saying "magnets" and giggle. The "Edward Lorenz, the butterfly and the chaos theory" punchline seems a bit forced (unless you go for the 'M-x butterfly' twist to make the emacs guy get the attention ;) )
  • by rel4x ( 783238 ) on Sunday January 31, 2010 @01:00PM (#30971230)
    Facebook has a few problems. Overuse of ajax combined with this absolutely bizarre habit of including dynamic javascript at random points in the script. These lead to slower runtimes, especially with older browsers where (upon encountering a JS file) they completely stop doing everything else to execute it.
  • by raju1kabir ( 251972 ) on Sunday January 31, 2010 @03:44PM (#30972812) Homepage

    Just scale your photos down to 1024x768

    Scale your photos down to 604x453, which is the size Facebook displays them at, and you will get to control the sharpness and image quality.

    Upload at any other size, and Facebook will re-sample them with some very cheap algorithm and apply aggressive compression and they will look like ass.

    Try it, you'll be amazed how much better your photos suddenly look.

    I normally use "convert -strip -sharpen 0.3 -quality 85 -geometry 604x604" before uploading - it just takes a second, and makes a huge difference.

  • by thetoadwarrior ( 1268702 ) on Sunday January 31, 2010 @04:12PM (#30973098) Homepage
    I'm sure a lot of intensive stuff is done in a system language but Amazon still uses Perl. Google use Perl and Python through their sites.

    There's no need to to use a system language for everything. Facebook is probably using PHP on its own and that's just not wise for a site like that.

"Life begins when you can spend your spare time programming instead of watching television." -- Cal Keegan

Working...