On PHP and Scaling 245
jpkunst writes "Chris Shiflett at oreillynet.com summarizes (with lots of links) a discussion about scalability, brought about by Friendster's move from Java to PHP. Chris argues that PHP scales well, because it fits into the Web's fundamental architecture. 'I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.' (The article is also available on Chris' own website.)"
Re:A few things that could lead to scalability (Score:1, Informative)
Another article (Score:4, Informative)
http://www.onjava.com/pub/a/onjava/2003/10/15/p
jsp is a bad idea, but Java is not (Score:5, Informative)
Re:Author seems to live in a vacuum (Score:5, Informative)
The article doesn't mention it, but Smarty [php.net] is an excellent PHP library that implements, among other things, caching. I have used it extensively with excellent results.
Re:What's Really Going On Here... (Score:3, Informative)
Also, what's "Database.java" -- if it's part of the MySQL/Java interface layer, this would be perfectly appropriate behaviour.
Re:A few things that could lead to scalability (Score:2, Informative)
What is a PHP "server"... it is the combination of Apache and PHP and a request being served. Since the web is stateless with simple session IDs tying things together it's not really necessary to share memory or resources between requests... hence Rasmus Lerdorf's "share nothing architecture."
It doesn't make sense do an olympic-sized web crawling script, and certainly not invoke it in the time of a web request. It makes more sense to write a script that is spawned by cron, with probably multiple instances that divy up the task of doing the search and creating the index.
Re:Yahoo. (Score:5, Informative)
Yahoo is very much a C/C++ shop first and foremost - PHP is used as a template system (alongside several proprietary systems) to allow easy modification of high level behaviour.
Re:jsp is a bad idea, but Java is not (Score:3, Informative)
rebuttal (Score:4, Informative)
I will start with mandatory links to the great series of articles that Ace's Hardware ran, describing their server scenario and their migration from PHP to Java/J2EE:
The PHP Scalability Myth starts of by defining three types of server architectures. The first, two-tier, and the last, logical-three-tier, are the same conceptually (there is the slight distinction between whether display and business logic code is "mingled", but this is typically not a performance issue, but just an aesthetic or design issue). This two-tier/logical-three-tier architecture is the only one PHP supports natively. The article then proceeds to compare a two-tier PHP architecture against the most elaborate full three-tier Java architecture, which is used rarely in practice, and extremely rarely in the same domain in which a PHP solution is feasible. Instead of comparing apples and oranges (if PHP supported a full three-tier architecture, I would imagine two-tier PHP vs. three-tier PHP would have the same performance discrepencies), let's simply compare the only architecture PHP supports natively, two-tier, against JSP talking directly to a database, as this scenario is the most analogous to the PHP one. Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state. And let's assume the database is the largest bottleneck.
The article states:
I'm not sure what "stub" the article is referring to, but I will assume it means an Apache module which talks a "native" protocol to the servlet engine. The first such module was mod_jserv, which could run the servlet engine both in-process and over a compact protocol called AJP (Apache Java Protocol), which represents essentially a pre-parsed HTTP requests. This module, as well as the AJP protocol itself has gone through severel revisions, from mod_jk, to mod_jk2. I cannot quite recall, but I think some version of mod_jk might have lost the ability to run in-process. Every other version, including the most current, can, if I recall correctly. This is besides the point, because as far as I know, AJP always has been a trivial performance overhead (I believe recent versions can run over Unix domain sockets). In fact, Apache is routinely used in production as the front-end web server, instead of the built-in servlet engine web server, simply because it is faster at serving static content, and that the AJP protocol is negligable. If the "stub" referred to in the quote is not the AJP module, then this may not be relevant, nevertheless AJP has always been highly efficient and typically negligable with regard to performance (the same typical connection min/max/idle count configurations apply as do to Apache itself).
The article goes on to proclaim the complexities of caching and data object persistence which we have eliminated from our comparision. Let's move on to the real bottleneck - the database. The article says "PHP's connectivity to the database consists of either a thin layer on top of the C data access functions, or a database abstraction layer called PEAR::DB. There is nothing to suggest tha
Re:jsp is a bad idea, but Java is not (Score:2, Informative)
Re:Scalability and Maintainability go hand in hand (Score:5, Informative)
For the most part though, I would say that PHP is slightly better equipped for web development, just like Perl is better equipped for general scripting tasks... I'm a python man myself though
Re:jsp is a bad idea, but Java is not (Score:4, Informative)
just as Velocity on it's own would be a bad idea.
Write your buisness logic in plain java, use servlets to manage the flow of control, and to call your java API to create value objects (beans) to place in the request, and then use JSP to format the data.
You only run in to problems if you try to do everything with JSP, which is always a bad idea, just as it's always a bad idea.
and JSP 2.0 is even better with the JSTL expression language built in.
Re:rebuttal (Score:5, Informative)
I'm not sure what you're on, but you can build however-many-tiers-you-like applications with PHP. In fact, PHP supports a number of technologies specificallly designed to communicate with additional tiers, including CORBA, JavaBeans and SOAP.
Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state
PHP supports persistent state through shared memory blocks trivially. The implementation of data caching schemes that use this feature is not hard.
17 child threads attempt to connect, one will not be able to. If there are bugs in your scripts which do not allow the connections to shut down (such as infinite loops), a database with only 32 connections may be rapidly swamped
Why would you limit your database to serving fewer connections than you have limited your web server to?
PHP supports an option to kill runaway scripts and reclaim their resources after a time limit has elapsed, which handily prevents the infinite loop problems mentioned.
Ok, so now we have a bunch of "persistent" connections that hang around with the process. How long do they hang around?
Until the database closes them or the PHP server process is killed.
What if two threads in the same process want to use a connection?
The connection is locked from the moment a thread acquires it (using the *_pconnect function) until the script using it terminates.
In the worst case, persistent connections make your problem much much worse, because now you have many more connections open to your database.
What does an inactive open connection to the database cost? Not very much, in my experience.
Your arguments have a little merit, but please try to do your research before ranting about a system.
Re:Yahoo. (Score:1, Informative)
Re:scalability is a dead issue (Score:3, Informative)
http://www.php.net/manual/en/ref.sem.php [php.net] -- system V shared memory. See specifically the functions shm_put_var() and shm_get_var().
Re:Let's find out. (Score:2, Informative)
One year of PHP at Yahoo [yahoo.com]
Making the case for PHP at Yahoo [yahoo.com]
Re:Maintaining State (Score:3, Informative)
As your application scales beyond one server, you then need to find a way to share your session between servers. This can be done in PHP via NFS with the default file based session driver (I think sourceforge does this), or with a database session driver.
If you had stored sessions in memory, then you would encounter problems with having to route requests based on session, or migrate to a method for sharing session data between machines.
PHP frontend and Java Backend (Score:2, Informative)
The specification will describe mechanisms allowing scripting language programs to access information developed in the Java Platform and allowing scripting language pages to be used in Java Server-side Applications. JSR 223 [jcp.org]
Re:Implementing a site in PHP... (Score:3, Informative)
This also allows me to move code blocks between different platforms without issue. It also allows some of our beginning programmers to make changes and updates to this systems without having to know 5+ different languages. Most of them took C classes in school and the transition to PHP is fairly easy. We have a online documentation server (php/postgresql) that we also keep a list of no nos for programming in php so alot of those new to php don't make common mistakes. I have found php to be invaluable. Sure it's doesn't fit for every job in you come up with, but it makes system automation a snap.
Anyway, it's made my job much easier. Perl can do everything that CLI PHP can, but it's far less cryptic to those that are new to it which means far less training time and far less debugging on my part after someone new to the language drops syntactic money wrenches into our code or logical errors.
Working link (Score:2, Informative)
Here's an article from Jack Herrington on PHP's scalability
And here is an actual link to the article [onjava.com].
Re:Author seems to live in a vacuum (Score:5, Informative)
Personally, I find the lighter weight Savant [phpsavant.com] to be a better choice, since it's straight PHP (No syntax to learn either -- bonus!). That removes the need for Smarty's "compile into php"
step entirely, which has giving me MUCH better performance than when I was using Smarty. IMHO&experience, at least.
(And if you want caching, it can be done at the PHP engine level rather than in your templating engine -- see any of the PHP accellerators out there)
Scalability has little to do with language (Score:4, Informative)
- The skill of the developers implementing the system
- The foresight of the original plan/architecture design
- Understanding of where bottlenecks/growth problems will occur
Any project that doesn't plan the scalability in from day one will likely struggle to fix the problem when scalability does become an issue.
IMHO scalability is a design and architectural problem, the language used (within reason) makes no difference- it's the quality and structure of the design itself which will make or break the system.
Re:Sorry buddy... (Score:4, Informative)
See their explanation on why they use PHP [yahoo.com]
Re:Author seems to live in a vacuum (Score:2, Informative)
Personally I use and love both Java and PHP for web apps, horses for courses certainly, but I would be far more comfortable with Java for a large webapp any day.
Re:Author seems to live in a vacuum (Score:4, Informative)
But if you are running a site that can use the output caching that Smarty offers and the code is done properly, you will see huge speed increases as you can skip everything in the page including opening a db connection. Which gives very close to flat HTML performance.
As to using PHP accelerators, they don't handle output caching by themselves. You can code your own, but my time is better spent doing other things
Using Smarty and Turck together is pretty impressive.
Re:Scalability and Maintainability go hand in hand (Score:3, Informative)
Your kidding right?
urpmi php-mysql php-pgsql php-curl php-xml php-sockets
service httpd restart
See any "make; make install" commands in there?
How is that not modular?
Nearly everything in PHP is a module (or PHP's term, an extension) that can be installed or removed without recompiling.
Re:Author seems to live in a vacuum (Score:1, Informative)
Re:Sorry buddy... (Score:1, Informative)