Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
PHP Programming IT Technology

On PHP and Scaling 245

jpkunst writes "Chris Shiflett at oreillynet.com summarizes (with lots of links) a discussion about scalability, brought about by Friendster's move from Java to PHP. Chris argues that PHP scales well, because it fits into the Web's fundamental architecture. 'I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.' (The article is also available on Chris' own website.)"
This discussion has been archived. No new comments can be posted.

On PHP and Scaling

Comments Filter:
  • by Dozix007 ( 690662 ) on Saturday July 03, 2004 @09:44AM (#9599565)
    *Will Inherently Lead to Scalability* (Damn, can't type this early)
  • Another article (Score:4, Informative)

    by Anonymous Coward on Saturday July 03, 2004 @09:49AM (#9599585)
    Here's an article from Jack Herrington on PHP's scalability.

    http://www.onjava.com/pub/a/onjava/2003/10/15/ph p_ scalability.html
  • by ahmetaa ( 519568 ) on Saturday July 03, 2004 @09:50AM (#9599588)
    if someone wants to produce a high performance web site in Java, jsp is a bad choice. use Velocity - pure java objects - a decent DB abstraction mechanism (Hibernate, iBatis). . Plus, i used php, ok, it is easy to use and can be preferred small to medium size web sites. but call me biased, it is nowhere near the elegance of java.
  • by lamz ( 60321 ) * on Saturday July 03, 2004 @09:57AM (#9599625) Homepage Journal
    I don't see any part of the article addressing how PHP can benefit the developer facing real issues of large scale web development (such as the need for caching systems on high volume websites, or the maintence challenge of larger code bases on complex sites).

    The article doesn't mention it, but Smarty [php.net] is an excellent PHP library that implements, among other things, caching. I have used it extensively with excellent results.

  • by julesh ( 229690 ) on Saturday July 03, 2004 @10:05AM (#9599659)
    What do you mean "calling mysql directly"? I can assure you that isn't actually possible in Java. MySQL is a C application, Java can't call C code without some kind of intermediate layer.

    Also, what's "Database.java" -- if it's part of the MySQL/Java interface layer, this would be perfectly appropriate behaviour.
  • by Anonymous Coward on Saturday July 03, 2004 @10:12AM (#9599687)
    You're not thinking in a PHP architecture.... thinking Java style J2EE does not apply to using PHP.

    What is a PHP "server"... it is the combination of Apache and PHP and a request being served. Since the web is stateless with simple session IDs tying things together it's not really necessary to share memory or resources between requests... hence Rasmus Lerdorf's "share nothing architecture."

    It doesn't make sense do an olympic-sized web crawling script, and certainly not invoke it in the time of a web request. It makes more sense to write a script that is spawned by cron, with probably multiple instances that divy up the task of doing the search and creating the index.
  • Re:Yahoo. (Score:5, Informative)

    by Anonymous Coward on Saturday July 03, 2004 @10:17AM (#9599710)
    Actually that's only partially true. Yahoo uses C/C++ for almost all backend development. PHP is used mostly for what it's good at: Simple web frontends that call on extensions written in C and C++ to do most of the heavy lifting, or access backend systems written in C/C++.

    Yahoo is very much a C/C++ shop first and foremost - PHP is used as a template system (alongside several proprietary systems) to allow easy modification of high level behaviour.

  • by Decaff ( 42676 ) on Saturday July 03, 2004 @10:17AM (#9599711)
    The problems with JSP are to do with writing maintainable code, not speed. There is a principle of software development that suggests that it is a bad idea to embed software logic in presentation code, as this does not allow for easy modification. If you support this principle, JSP (and some ways of using PHP) are not a good idea. However, JSP is not slow: the JSP pages are translated into Java Servlet source code and then compiled. This can result is very fast websites.
  • rebuttal (Score:4, Informative)

    by Anonymous Coward on Saturday July 03, 2004 @10:20AM (#9599724)

    I will start with mandatory links to the great series of articles that Ace's Hardware ran, describing their server scenario and their migration from PHP to Java/J2EE:

    1. Building a Better Webserver [aceshardware.com]
    2. Building a Better Webserver in the 21st Century [aceshardware.com]
    3. Scaling Server Performance [aceshardware.com]

    The PHP Scalability Myth starts of by defining three types of server architectures. The first, two-tier, and the last, logical-three-tier, are the same conceptually (there is the slight distinction between whether display and business logic code is "mingled", but this is typically not a performance issue, but just an aesthetic or design issue). This two-tier/logical-three-tier architecture is the only one PHP supports natively. The article then proceeds to compare a two-tier PHP architecture against the most elaborate full three-tier Java architecture, which is used rarely in practice, and extremely rarely in the same domain in which a PHP solution is feasible. Instead of comparing apples and oranges (if PHP supported a full three-tier architecture, I would imagine two-tier PHP vs. three-tier PHP would have the same performance discrepencies), let's simply compare the only architecture PHP supports natively, two-tier, against JSP talking directly to a database, as this scenario is the most analogous to the PHP one. Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state. And let's assume the database is the largest bottleneck.

    The article states:

    At the time when the first versions of the JSP and EJB standards were released, the prevalent web server was (and still is) Apache 1.x, which had a process model that was not compatible with Java's threading model. This meant that a small stub was required on the web server side to communicate with the servlet engine. The remains a non-trivial performance overhead for those that decide to pay it, and was a significant performance overhead when the first scalability comparisons were made.

    I'm not sure what "stub" the article is referring to, but I will assume it means an Apache module which talks a "native" protocol to the servlet engine. The first such module was mod_jserv, which could run the servlet engine both in-process and over a compact protocol called AJP (Apache Java Protocol), which represents essentially a pre-parsed HTTP requests. This module, as well as the AJP protocol itself has gone through severel revisions, from mod_jk, to mod_jk2. I cannot quite recall, but I think some version of mod_jk might have lost the ability to run in-process. Every other version, including the most current, can, if I recall correctly. This is besides the point, because as far as I know, AJP always has been a trivial performance overhead (I believe recent versions can run over Unix domain sockets). In fact, Apache is routinely used in production as the front-end web server, instead of the built-in servlet engine web server, simply because it is faster at serving static content, and that the AJP protocol is negligable. If the "stub" referred to in the quote is not the AJP module, then this may not be relevant, nevertheless AJP has always been highly efficient and typically negligable with regard to performance (the same typical connection min/max/idle count configurations apply as do to Apache itself).

    The article goes on to proclaim the complexities of caching and data object persistence which we have eliminated from our comparision. Let's move on to the real bottleneck - the database. The article says "PHP's connectivity to the database consists of either a thin layer on top of the C data access functions, or a database abstraction layer called PEAR::DB. There is nothing to suggest tha

  • by javab0y ( 708376 ) on Saturday July 03, 2004 @10:25AM (#9599739)
    No...you are not correct. Your point about JSPs is only this is only true under a model 1 implementation/design. A model 2 implementation (where your business logic is done in Java objects) utilizes JSPs exactly like Velocity...only as a template. See Struts and other MVC frameworks that embrace a model 2 implmentation of JSPs.
  • by iamdrscience ( 541136 ) on Saturday July 03, 2004 @10:40AM (#9599795) Homepage
    You sound like somebody who didn't use PHP long enough. Large PHP projects become plenty maintainable once you start using handy stuff like the Smarty templating engine (which IIRC is included by default now). There are also a myriad of great PEAR classes and PECL extensions. As for a module architechture that doesn't require you to recompile, that would be nice, however, I would bet that most PHP programmers have never recompiled their installation or needed to do so. You're right though, it would be nice.

    For the most part though, I would say that PHP is slightly better equipped for web development, just like Perl is better equipped for general scripting tasks... I'm a python man myself though ;-)
  • by mabinogi ( 74033 ) on Saturday July 03, 2004 @10:43AM (#9599808) Homepage
    JSP on it's OWN is a bad idea.

    just as Velocity on it's own would be a bad idea.
    Write your buisness logic in plain java, use servlets to manage the flow of control, and to call your java API to create value objects (beans) to place in the request, and then use JSP to format the data.

    You only run in to problems if you try to do everything with JSP, which is always a bad idea, just as it's always a bad idea.

    and JSP 2.0 is even better with the JSTL expression language built in.
  • Re:rebuttal (Score:5, Informative)

    by julesh ( 229690 ) on Saturday July 03, 2004 @10:50AM (#9599851)
    This two-tier/logical-three-tier architecture is the only one PHP supports natively.

    I'm not sure what you're on, but you can build however-many-tiers-you-like applications with PHP. In fact, PHP supports a number of technologies specificallly designed to communicate with additional tiers, including CORBA, JavaBeans and SOAP.

    Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state

    PHP supports persistent state through shared memory blocks trivially. The implementation of data caching schemes that use this feature is not hard.

    17 child threads attempt to connect, one will not be able to. If there are bugs in your scripts which do not allow the connections to shut down (such as infinite loops), a database with only 32 connections may be rapidly swamped

    Why would you limit your database to serving fewer connections than you have limited your web server to?

    PHP supports an option to kill runaway scripts and reclaim their resources after a time limit has elapsed, which handily prevents the infinite loop problems mentioned.

    Ok, so now we have a bunch of "persistent" connections that hang around with the process. How long do they hang around?

    Until the database closes them or the PHP server process is killed.

    What if two threads in the same process want to use a connection?

    The connection is locked from the moment a thread acquires it (using the *_pconnect function) until the script using it terminates.

    In the worst case, persistent connections make your problem much much worse, because now you have many more connections open to your database.

    What does an inactive open connection to the database cost? Not very much, in my experience.

    Your arguments have a little merit, but please try to do your research before ranting about a system.
  • Re:Yahoo. (Score:1, Informative)

    by Anonymous Coward on Saturday July 03, 2004 @11:03AM (#9599924)
    That's only partially true as well -- Yahoo uses Perl for tons of their backend stuff. But yes, PHP is only the finally delivery bit, not the actual applications at Yahoo.
  • by julesh ( 229690 ) on Saturday July 03, 2004 @11:03AM (#9599925)
    My main issue with PHP scalability is the lack of a global context for app-level caching.

    http://www.php.net/manual/en/ref.sem.php [php.net] -- system V shared memory. See specifically the functions shm_put_var() and shm_get_var().
  • Re:Let's find out. (Score:2, Informative)

    by selkirk ( 175431 ) on Saturday July 03, 2004 @11:11AM (#9599954) Homepage
    The answer is Yahoo. Here are a couple of talks on the issue:

    One year of PHP at Yahoo [yahoo.com]
    Making the case for PHP at Yahoo [yahoo.com]

  • Re:Maintaining State (Score:3, Informative)

    by selkirk ( 175431 ) on Saturday July 03, 2004 @11:29AM (#9600020) Homepage
    PHP sessions are NOT stored in a memory block shared between PHP processes. The default is to store session information in a file on disk. This is the point of the debate. "Shared nothing" means that there are no memory blocks shared between PHP processes.

    As your application scales beyond one server, you then need to find a way to share your session between servers. This can be done in PHP via NFS with the default file based session driver (I think sourceforge does this), or with a database session driver.

    If you had stored sessions in memory, then you would encounter problems with having to route requests based on session, or migrate to a method for sharing session data between machines.

  • by proudlyindian ( 781206 ) on Saturday July 03, 2004 @11:46AM (#9600083) Homepage
    JSR 223: Scripting Pages in JavaTM Web Applications
    The specification will describe mechanisms allowing scripting language programs to access information developed in the Java Platform and allowing scripting language pages to be used in Java Server-side Applications. JSR 223 [jcp.org]
  • by C_Kode ( 102755 ) on Saturday July 03, 2004 @12:00PM (#9600151) Journal
    I don't think it's inefficient. I use it. I have an extensive CLI PHP scripting system setup that does it all. It connects to FTP systems downloaded data for updates, runs updates on several databases, generates plain text reports, csv (Excel type reports), and most of all combining it with crontabed called from others systems it allows me to share data between two systems that previously where unable to do so.

    This also allows me to move code blocks between different platforms without issue. It also allows some of our beginning programmers to make changes and updates to this systems without having to know 5+ different languages. Most of them took C classes in school and the transition to PHP is fairly easy. We have a online documentation server (php/postgresql) that we also keep a list of no nos for programming in php so alot of those new to php don't make common mistakes. I have found php to be invaluable. Sure it's doesn't fit for every job in you come up with, but it makes system automation a snap.

    Anyway, it's made my job much easier. Perl can do everything that CLI PHP can, but it's far less cryptic to those that are new to it which means far less training time and far less debugging on my part after someone new to the language drops syntactic money wrenches into our code or logical errors.
  • Working link (Score:2, Informative)

    by Uninen ( 746304 ) on Saturday July 03, 2004 @01:04PM (#9600481) Homepage

    Here's an article from Jack Herrington on PHP's scalability

    And here is an actual link to the article [onjava.com].

  • by claar ( 126368 ) on Saturday July 03, 2004 @01:09PM (#9600505)
    Um, this is an article about scaling, and therefore performance. Mentioning Smarty in such context is almost off-topic ;-)

    Personally, I find the lighter weight Savant [phpsavant.com] to be a better choice, since it's straight PHP (No syntax to learn either -- bonus!). That removes the need for Smarty's "compile into php"
    step entirely, which has giving me MUCH better performance than when I was using Smarty. IMHO&experience, at least.

    (And if you want caching, it can be done at the PHP engine level rather than in your templating engine -- see any of the PHP accellerators out there)
  • by PhotoBoy ( 684898 ) on Saturday July 03, 2004 @01:25PM (#9600578)
    Having developed systems in Java and PHP I think it's wrong to try discussing how well either of them scales without considering the main factors that affect the scalability of projects, namely:

    - The skill of the developers implementing the system
    - The foresight of the original plan/architecture design
    - Understanding of where bottlenecks/growth problems will occur

    Any project that doesn't plan the scalability in from day one will likely struggle to fix the problem when scalability does become an issue.

    IMHO scalability is a design and architectural problem, the language used (within reason) makes no difference- it's the quality and structure of the design itself which will make or break the system.
  • Re:Sorry buddy... (Score:4, Informative)

    by hotgazpacho ( 573639 ) on Saturday July 03, 2004 @01:38PM (#9600653) Homepage Journal
    scaleable enterprise systems just AREN'T written in PHP
    Tell that to Yahoo!

    See their explanation on why they use PHP [yahoo.com]
  • by tolan-b ( 230077 ) on Saturday July 03, 2004 @01:44PM (#9600707)
    Quite. The advantage of Java when combined with a database (and as you rightly point out how often is a webapp *not* combined with a database?), is that you can take advantage of in memory caching, improving scaling up to a point by reducing load on the database, which is typically the slowest part of a web app transaction.

    Personally I use and love both Java and PHP for web apps, horses for courses certainly, but I would be far more comfortable with Java for a large webapp any day.
  • by justMichael ( 606509 ) on Saturday July 03, 2004 @02:14PM (#9600894) Homepage
    Yes Smarty compiling the templates into PHP causes some overhead. Compiling templates only happens once (unless you modify the template) so I'm not sure why your performance numbers were so much better with Savant, maybe the config?

    But if you are running a site that can use the output caching that Smarty offers and the code is done properly, you will see huge speed increases as you can skip everything in the page including opening a db connection. Which gives very close to flat HTML performance.

    As to using PHP accelerators, they don't handle output caching by themselves. You can code your own, but my time is better spent doing other things ;)

    Using Smarty and Turck together is pretty impressive.
  • by IpSo_ ( 21711 ) on Saturday July 03, 2004 @02:36PM (#9601014) Homepage Journal
    "a module architechture that doesn't require you to recompile"

    Your kidding right?

    urpmi php-mysql php-pgsql php-curl php-xml php-sockets
    service httpd restart

    See any "make; make install" commands in there?

    How is that not modular?

    Nearly everything in PHP is a module (or PHP's term, an extension) that can be installed or removed without recompiling.

  • by Anonymous Coward on Saturday July 03, 2004 @03:58PM (#9601391)
    Actually, last time I checked (yesterday), Zend's Performance Suite Enterprise Edition includes both a PHP accelerator and does handle script output caching. Yes you can code your own, but then it depends how you like to spend your spare time... ;-)
  • Re:Sorry buddy... (Score:1, Informative)

    by Anonymous Coward on Saturday July 03, 2004 @04:16PM (#9601481)
    Executive summary:
    • ASP/ColdFusion are ugly and expensive.
    • Java threading on FreeBSD sucks.
    • mod_perl has poor sandboxing.
    • PHP isn't particularly good, but it does meet our criteria.

New York... when civilization falls apart, remember, we were way ahead of you. - David Letterman

Working...