There are seven chapters in this volume, each dealing with real-world problems. Many problems are those you've seen solved on sites you admire and wondered "How did they do that?" Others are frameworks that allow your site to run smoothly, with nobody getting accidentally logged out or having to wait too long while your script gluttonously pulls the same data out of the database for the Nth time. At the end, Fuecks goes back to the beginning, to show how proper design and development can save you time when you start your next project.
Chapter 1: Access Control
Authentication is the process by which users identify themselves. This is difficult in HTTP, a stateless protocol in which the server handles one request at a time and instantly forgets you. Luckily HTTP allows cookies, which are bits of data the server sends to the client for to reveal upon revisiting. At first cookies were used only to annoy ("Hello, Steve! You have visited this page 3 times"), but a cookie can hold the ID of a session record in a database, which contains any state-information that you like.
You can authenticate without sessions via HTTP server configuration, as long as you like the dull dialog box the browser pops up when users enter a restricted area. Oh, and you don't mind the fact that users won't be able to "log out" without quitting their browser, nor can you force a logout after a certain timeout value. Nor can you allow users to register themselves... these are all existing, solved problems and the author shows some of the best solutions. Common tasks like allowing users to change their passwords, recover their passwords if (I mean when) they forget them, and arranging users in groups to which you assign common permissions are also covered.
My favorite example from this chapter is the humans-only registration application. Remember when online voting for the Major League Baseball All-Star Game first started? Anyone who knew how to write a web client could have automated a task to vote as many times as the server could handle, and have his favorite players be the all-star team.* To bring it closer to home, what if somebody decides to bog down your site by automatically registering a huge number of times and filling up your database? You can keep these things from happening by making users look at images which contain text but are hard for computers to "read." PHP is in use at all stages of this game, from writing the registration form's HTML to generating the obscured image on-the-fly.
Chapter 2: XML
XML is a fact of life and, hype aside, is a great way to store and transmit machine-readable data. One of the most visible applications is the thousands of bloggers and news sites providing XML feeds of their headlines. You can write portal sites that grab these headlines, parse them all and present them on your site with links to the full text at the source.
There are two ways to parse XML: with events, or by using the Document Object Model (DOM). The methodologies are similar to reading a plain-text file line-by-line or all at once. Using events you can implement a finite-state machine based on which tags and text come down the pike. Or you can slurp the whole document into memory and find any part of it with ease. The built-in library for the former is based on the popular Simple API for XML (or SAX; don't you like those nested acronyms?), while the latter often uses Xpath to find the particular document nodes you want.
The author shows how to parse RSS feeds with both SAX and the DOM, and how to render a feed with DOM. Further, you can use Extensible Stylesheet Language Transformations (known as XSLT) to transform XML -- whether it's to XHTML for regular browser reading, WML (Wireless Markup Language) for viewing on mobile phones, or even SQL to communicate with a database.
Another exciting XML application is in the area of web services, in which agents (often but not necessarily web servers) communicate with each other over an XML-based protocol built on top of PHP. The two most popular protocols are XML-RPC (the RPC stands for "Remote Procedure Calls") and SOAP (which used to stand for "Simple Object Access Protocol" but now is just a name). Often-changing information such as stock prices and weather are often offered through web services, but they can also be used as an object API between agents over the network. What's cool about using SOAP is you can publish to clients exactly what services you offer and how they can call them using the Web Services Description Language (WSDL).
Chapter 3: Alternative Content Types
If you've ever printed out a web page that was designed for browser viewing, you know the less-than-desired effect. The navigational elements, search boxes, and banners, while necessary for the web page, are useless once a static copy is printed. Furthermore, you need to extend your site to include users with less-featured browsers, such as mobile phones.
Fortunately, PHP has been taught many languages. PDF is the standard for print-quality documents, and there are several libraries (free and non-free) which allow you to generate them. WML is the HTML of cell phone browsers, in which screen space is at a premium and bandwidth scarce. SVG is an XML application which allows vector-based images like PostScript does. The coolest example, however, uses XUL (the XML User interface Language, not to be confused with Zool) to make full GUI applications that you run through Mozilla. This isn't useful for the outside world where you can't force your users to use Mozilla (sigh), but works well for intranet applications that run on a variety of platforms.
The author also brings up in this chapter an HTML SAX parser he has written. You can process HTML pages chunk-by-chunk and extract the pieces you want. I hadn't known about such a class until I read the book and I'm very excited I know about it now. For sometimes it's necessary to parse a web page meant for humans to read (perhaps to pretend to be a user and automate your all-star voting), and most HTML pages won't validate as HTML, let alone XML.
A good point here is that a well-designed, tiered application will allow you to swap out different presentation classes with little code rewrite. Separating the tasks of extracting the data from the database and presenting to the user in variety of formats is a common task that when done right becomes subsequently easier.
Chapter 4: Stats and Tracking
Once your site is up and running, you'll be interested to know which parts are the most active, and how much traffic you're getting. Into a dynamic page you can obviously insert any logging mechanism, but a great place to put it is inside your site's logo. PHP can send binary data as easily as text. Why would you want to do this?
- The logo is usually on every page (or it should be). You don't have to cut-and-paste code.
- You can serve the image, then use the flush command to send the output on and do extra processing. This way logging doesn't get in the way of page rendering.
There are lots of packages available to collect and analyze data. The author goes through phpOpenTracker which is quite rich in features. There are also ways to collect data on what links users follow to leave your site, and to keep requests from search engines from cluttering your log files.
Chapter 5: Caching
Another possible knock against PHP is that, while it's good to have dynamic pages, some pages are unnecessarily so. This is a waste of server resources to keep rendering the same page anew. There are different ways to conserve.
On the server side, as can be expected, you have a greater level of control. You can use output buffering to delay sending of output to the browser, then save a copy of the output locally. On subsequent requests, you can serve the file rather than generate the HTML all over again. This can be implemented on a chunk (or block) level, so that you can keep some parts ultra-time sensitive and others not so much. The package PEAR::Cache_Lite can help with this.
Chapter 6: Development Technique
The last two chapters were my favorites of the two-volume set. They are on a higher level of abstraction than the features of PHP's library of functions, or previous five chapters on real-world solutions. After you've reached a certain level of expertise in PHP coding, you being to wonder about the "right" way to do things. The author shows how to use Xdebug to find bottlenecks in your code, as well as a few quick optimization tips (for instance, design your flow control so that the first choice is the one most often taken).
He then discusses the principles of N-tiered design. N is usually 5, but the data layer (usually a database or file system) and presentation layer (usually the browser) are most often handled outside of PHP, so you normally have three levels to worry about:
- Data Access: Getting data from the outside world into your application
- Application Logic: Doing whatever unique thing your application is supposed to do
- Presentation Logic: Forming a response in a format acceptable to your client
Keeping these layers separate and restricting them to communicating through well-defined interfaces allows you maximum flexibility. If you need to change databases (say you just got venture capital money and can afford Oracle now), you can do so only changing one layer. If you want to serve different flavors of HTML, or different markup languages altogether, or binary data, you can do so by only changing one layer. You can even strive for maximum distributability by enabling your layers to "live" on physically independent machines and communicate with XML-RPC or SOAP.
Documenting your code is essential. Anybody who's been programming for over a year has gone back to code he or she's written and thought, "Now what the heck was this supposed to do?" It's even more essential when you write something and wish to distribute it for the benefit of others. You can expect them to grok your code at an even lower rate since they didn't write it the first time.
Luckily, scripted languages like PHP are excellent at parsing text files, including PHP scripts themselves. Using well-defined documentation formats akin to JavaDoc, you can embed documentation in your code inside comments, and use tools like phpDocumentor to extract these documentation blocks and format them as nice, cross-reference HTML. In fact, writing doc blocks before your code is a good way to think ahead about how you want your classes and methods to work.
Unit Testing, one of the most digestible dogmas of Extreme Programming, is an awesome way to test your code for logic errors. You build up tiny test cases (using mock objects to isolate the class you're testing) and build as many as you like. Once you do this (PHPUnit and SimpleTest are two rich frameworks), you keep your tests and each time you add features, you run your test to make sure you haven't added bugs as well.Chapter 7: Design Patterns
Design Patterns is one of the modern classics in information technology. After having done OOP for a while, you will inevitably get the feeling of deja vu that you've solved a problem before. Not so concretely as "I need a database abstraction layer," or "I need a templating system," but as in "I need a way to create objects without specifying exactly what class they belong to," or "I'm tired of writing so many if statements." Design patterns are common object architectures which can be used to solve common (though unique) problems.
Many design patterns are more suited to state-equipped applications with GUIs, but there are plenty to assist the PHP coder. The Factory Method is a pattern through which an object can create other objects of varying classes. So instead of writing mysql_connect everywhere, then having to change every occurrence of that function, you can abstract all database interaction to a class, then instantiate a database connection through a class method of another class: $db = MyApp::getDatabaseConnection(). This is useful when the connection (not just the RDBMS, but the actual database) you want varies depending on whether you are developing, testing, or going live with your application. Factory methods are also a good way to avoid global configuration variables.
The Iterator Pattern and the Observer Pattern are two others mentioned in this chapter. Iterators are used often in paging through database results. Observers are used to let objects notify other objects of changes in their state. This chapter will make you want to go read the whole Gang-of-Four book if you haven't already.
My biggest beef with the book is that this wasn't presented earlier on, perhaps at the beginning of Volume II. As a climax, it leaves me flat, wondering how the rest of the volume could have been derived from this very cool concept. But most PHP books conclude with chapters on how to extended PHP on the C level, or giant case studies involving massive code dumps, and I'm often not satisfied with them. This is a nice philosophical note to go out on. And there's something to be said for the argument that books like these aren't written to be read cover-to-cover.
The book closes with the same indices as in Volume I. Since I don't know the URL of my review of that volume, I'll just copy: You can read about which configuration directives you're probably most interested in (the complete list you can get on PHP's web site), some common security breaches, and how to install PEAR, PHP's version of CPAN. My favorite appendix is the "Hosting Provider Checklist," a great reference for evaluating whether kewlhosting.com is going to give you the freedom and support you need to make a great hosted web site.
This volume was informative, well-written, and inspirational in that it made me want to go out and add cool and useful features to my web sites. Check it out if you can.
*Not really (not that I tried or anything), but they've always been a little bit smarter about it. You get my point, though. This did happen on an ESPN.com Page 2 mascot popularity contest, but they noticed through request headers that millions of votes were coming from the same place, and invalidated all those votes.
In real life, Matthew Leingang is Preceptor in Mathematics at Harvard University. He promises to review any book sent to him for free, and sometimes actually does it. Both volumes of The PHP Anthology are available from SitePoint. Slashdot welcomes readers' book reviews; to see your own review here, carefully read the book review guidelines, then visit the submission page.