XML and Perl

XML and Perl 138

Posted by timothy on Thursday January 30, 2003 @12:30PM from the texty-bits dept.

davorg writes "One of Perl's great strengths is in processing text files. That is, after all, why it became so popular for generating dynamic web pages -- web pages are just text (albeit text that is supposed to follow particular rules). As XML is just another text format, it follows that Perl will be just as good at processing XML documents. It's therefore surprising that using Perl for XML processing hasn't received much attention until recently. That's not saying that there hasn't been work going on in that area -- many of the Perl XML processing modules have long and honourable histories -- it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." Read on to see how well Davorg thinks this book introduces XML text processing with Perl to the wider world.

XML and Perl
author	Mark Riehl, Ilya Sterin
pages	378
publisher	New Rider
rating	8
reviewer	Davorg
ISBN	0735712891
summary	Good introduction to processing XML with Perl

XML and Perl is written by two well-known members of the Perl XML community. Both are frequent contributors to the "perl-xml" mailing list, so there's certainly no doubt that they know what they are talking about. Which is always a good thing in a technical book.

The book is made up of five sections. The first section has a couple of chapters which introduce you to the concepts covered in the book. Chapter one introduces you separately to XML and Perl and then chapter two takes a first look at how you can use Perl to process XML. This chapter finishes with two example programs for parsing simple XML documents.

Section two goes into a lot more detail about parsing XML documents with Perl. Chapter three looks at event-driven parsing using XML::Parser and XML::Parser::PerlSAX to demonstrate to build example programs before going to talk in some detail about XML::SAX which is currently the state of the art in event-driven XML parsing in Perl. It also looks at XML::Xerces which is a Perl interface to the Apache Software Foundation's Xerces parser. Chapter four covers tree based XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM and XML::LibXML. In both of these chapters the pros and cons of each of the modules are discussed in detail so that you can easily decide which solution to use in any given situation.

Section three covers generating XML documents. In chapter five we look at generating XML from text sources using simple print statements and also the modules XML::Writer and XML::Handler::YAWriter. Chapter six looks at taking data from a database and turning that into XML using modules like XML::Generator::DBI and XML::DBMS. Chapter seven looks at miscellaneous other input formats and contains examples using XML::SAXDriver::CSV and XML::SAXDriver::Excel.

Section four covers more advanced topics. Chapter eight is about XML transformations and filtering. This chapter covers using XSLT to transform XML documents. It covers the modules XML::LibXSLT, XML::Sabletron and XML::XPath.

Chapter nine goes into detail about Matt Sergeant's AxKit, the Apache XML Kit which allows you to create a website in XML and automatically deliver it to your visitors in the correct format.

Chapter ten rounds off the book with a look at using Perl to create web services. It looks at the two most common modules for creating web services in Perl - XML::RPC and SOAP::Lite.

Finally, section five contains the appendices which provide more background on the introductions to XML and Perl from chapter one.

There was one small point that I found a little annoying when reading the book: Each example was accompanied with a sample of the XML documents to be processed together with both a DTD and an XML Schema definition for the document. This seemed to me to be overkill. Did we really need both DTDs and XML Schemas for every example. I would have found it less distracting if one (or even both) of these had been moved to an appendix.

That small complaint aside, I found it a useful and interesting book. It will be very useful to Perl programmers (like myself) who will increasingly be expected to process (and provide) data in XML formats.

You can purchase XML and Perl from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

XML and Perl

This discussion has been archived. No new comments can be posted.

Search 138 Comments Log In/Create an Account

Comments Filter:

You lost me on the incredible leap of logic... (Score:0, Insightful)

by rand.srand() ( 243903 ) writes: on Thursday January 30, 2003 @12:50PM (#5189678)

As XML is just another text format, it follows that Perl will be just as good at processing XML documents.

Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.

XML is NOT just text! (Score:5, Insightful)

by Anonymous Coward writes: on Thursday January 30, 2003 @12:51PM (#5189685)

The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.

This was a review? (Score:4, Insightful)

by Syris ( 129850 ) writes: on Thursday January 30, 2003 @01:03PM (#5189749)

I'm sorry, but this just wasn't a terribly deep review and well below par for /. Listing contents of a book and then nitpicking a detail don't a book review make.

How effective were the examples? How easy to read and understand were the general concepts? Were the descriptions of libraries and API's clear? Was the writing generally readable?

Would this book even make a good reference?

Jeez, anyone want to follow up the post with a real review?

Re:XML is NOT just text! (Score:4, Insightful)

by consumer ( 9588 ) writes: on Thursday January 30, 2003 @01:31PM (#5189894)
Let's see...
- Editable in emacs (or vi). Check.
- Grep-able. Check.
- Diff-able. Check.
- Understandable to the naked eye. Check.
Sure smells like text to me.
Re:XML is NOT just text! (Score:3, Insightful)

by Anonymous Coward writes: on Thursday January 30, 2003 @01:39PM (#5189951)

What you're looking at there is one possible representation of an XML document. What you can see is NOT XML. XML is an idea - a hierarchical data structure. If you're manipulating some XML programatically, you should be manipulating this hierarchical data structure, and you'll be using some sort of API (SAX or DOM, probably) to do so. You should emphatically NOT be manipulating text strings. Any code of the form

tag = tag + "</" + tagname + ">"

means you're doing it wrong.

So, no, XML is not editable in emacs (or vi), grep-able, diff-able or understandable to the naked eye. Go and think about it again.

Re:You lost me on the incredible leap of logic... (Score:4, Insightful)

by sheriff_p ( 138609 ) writes: on Thursday January 30, 2003 @01:45PM (#5189984)

Ah no, see, you forgot to read the first line:

"One of Perl's great strengths is in processing text files."

Perl is good at handling text files. XML is a text file. Therefore, Perl is good at handling XML.

As opposed to:

My pasta maker is good at making pasta. Pasta is a type of food. Ice-cream is also food. Therefore, my pasta maker is good at making ice-cream.

Does that help?

Re:XML frees us from Perl (Score:5, Insightful)

by glwtta ( 532858 ) writes: on Thursday January 30, 2003 @01:45PM (#5189992) Homepage

how do you tell when a regexp has a false positive match?
A what? You (or rather the brilliant person being quoted) either mean that it matches a string that the expression isn't supposed to, which would be a serious bug in the language (and I am not aware of any such bugs); or you mean that it matches correctly, but matches things you didn't expect it to, in which case you tell, by (gasp!) testing your code. In any case, how do you tell a "false positive" regexp match in Java?
but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer
Perhaps you can't. I have, and I do.

So, where's the review? (Score:4, Insightful)

by mattdm ( 1931 ) writes: on Thursday January 30, 2003 @01:45PM (#5189994) Homepage

I see the table of contents explained in paragraph form. And then one complaint about the organization of the book. And then I expect to read the review, but it's already on to "you can buy this book here", and user comments.

I know complaining about slashdot stories is like shooting those proverbial barreled fish, but sheesh.

Re:XML is NOT just text! (Score:3, Insightful)

by EvlG ( 24576 ) writes: on Thursday January 30, 2003 @02:10PM (#5190125)

I think it is interesting to note that this is precisely the reason that XML is poorly suited for any task that requires human intervention.

Re:You lost me on the incredible leap of logic... (Score:4, Insightful)

by IpalindromeI ( 515070 ) writes: on Thursday January 30, 2003 @02:53PM (#5190351) Journal

Except that your syllogism is faulty, whereas his is not.

His:
1. (from earlier in his post) Perl is well suited for processing all text formats.
2. XML is a text format.
3. Therefore, Perl is well suited for processing XML.

Yours:
1. Your pasta maker is good at making pasta.
2. Pasta is a type of food.
3. Therefore, your pasta maker is good at making all types of food (for example, ice cream).

You can see that he went from general to specific, whereas you went from specific to general. He argues that being able to do all things in a given set (process all text formats) gives the ability to do one of the things in that set (process a particular text format). You argue that being able to do one thing in a set (make a particular food) gives the ability to do all things in the set (make all foods).

You could save your argument by changing your middle point to be "All foods are a type of pasta," and then your conclusion becomes trivially true. But you'd also have to get everyone to agree that ice cream is pasta.

Re:i hate perl... (Score:2, Insightful)

by etcshadow ( 579275 ) writes: on Thursday January 30, 2003 @03:51PM (#5190628)

"I once rewrote a Perl parser in Java and went from 9hrs to 45mins"

Well, shit. I once rewrote a Perl parser in *Perl* and went from 9hrs to 45mins. What the hell kind of flame-bait shit is this!?

It is true that extremely well-written C code can outperform perl code at anything. It is also true that for things that perl is made for (like ripping through tons of text-data), a typical Perl program will *most likely* do it better than a typical C program, simply because it is making use of more optimized underlying algorithms (even though the actual execution structure is slightly more bloated than C... double-dereferencing pointers, compile-time imediately before run-time, etc). ... However, Java is just as goddamn interpretted as Perl, if not more so! Perl compiles to *native* byte-code prior to execution, unless you are talking about eval'd strings, whereas Java sits in non-native byte-code that has to be interpretted real-time by the VM. Best case: you have a good just-in-time compiler that pulls Java up to even with Perl (that is, compiled imediately prior to run-time into native byte-code).

Also, Java has all the same disavantages with respect to C... that is more insulation from the *actual* memory (no such thing as a real pointer in either, garbage-collection, etc).

Anyway, bottom-line is this. If what you say is at all true, then you had a shittily-written Perl program. I promise you that I can write just as shitty a program in Java... does that mean that we should trash Java?!?!? Abso-f*cking-lutely not! I'll do you one better, too: I'll write just as shitty and slow of a parser in Java that doesn't even *look* that bad to someone who doesn't understand the subtleties behind such simple abstractions as strings, lists and arrays.

I'm very serious with what I said originaly, I have, in fact, taken a Perl parser (a super-light-weight XML parser, actually) and reduced the parse-time by several orders of magnitude. The idiot who wrote it originaly (myself), went walking through the string or stream looking for 's (with a regexp), at the highest level. It is *terribly* slow to strip leading characters off of a long string in Perl (I'm pretty sure that it copies the whole goddamn string, minus those 10 (or however many) characters on the front). I made a *very* simple change, namely this:

# split on positive lookahead assertion of a ''
# then we just deal individually with blocks of text that all start
# with a ''... should save time
my @xml = split(/(?=)/,' '.$xml);
shift @xml;

And, you'll note that I f*cking commented it (something which people just don't seem to understand when they trash perl). Bang! Many orders of magnitude in speed improvement. Simple.

Anyway, pull your head out of your ass.

Re:XML frees us from Perl (Score:3, Insightful)

by scrytch ( 9198 ) writes: <chuck@myrealbox.com> on Thursday January 30, 2003 @03:57PM (#5190636)

Maybe the author was unable to write anything but hacks, and couldn't make anything elegant or maintainable. I've written programs with multiple subsystems, and put them well into maintenance without a lick of trouble, all in perl.

Yes, $dd->updsp( 1,3, @ad ) looks worse than $Driver->update_displays( $Display:LOBBY, $Display:CUSTSERV, @additional ), and boy it's just a shame that perl doesn't let me use meaningful identifiers or document API's or forward declare functions for arg checking ahead of time. Oh wait... Really. The argument is dead, continuing to raise it is just trolling.

I switched to python because I got tired of leaning on my shift key. Tcl has probably the prettiest syntax for me, but as a language it's braindead beyond belief (not to mention slow)

Re:XML is NOT just text! (Score:3, Insightful)

by cygnus ( 17101 ) writes: on Thursday January 30, 2003 @05:02PM (#5191148) Homepage

you're doing it wrong.
...
So, no, XML is not editable in emacs (or vi), grep-able, diff-able or understandable to the naked eye. Go and think about it again.
yes it is.. just because you claim that "you're doing it wrong," doesn't mean it's impossible.
xml is text just as much as html is.. are you going to tell me that html isn't editable in emacs or human-readable? how is html different from DocBook, for example?

Re:XML is NOT just text! (Score:3, Insightful)

by orcrist ( 16312 ) writes: on Thursday January 30, 2003 @05:41PM (#5191493)

The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.

I'll respond to you though many others are making similar arguments. First of all, when you say "XML is NOT just text!" do you mean "XML is NOT merely text" or "XML is not solely text"? I'll agree with the first, but the second is generally not true.

What noone seems to be mentioning is what you get out of those libraries: you get the entire structure in nodes thanks to the library's parser, but what are the contents of those nodes? Text! You might argue that the element names and most of the attributes are either defined by the dtd/schema, etc. but at least CDDATA will often be abitrary text. And, at least in my experience (mostly web-based applications), there will often be a need to process some of that text, e.g. extract links which are embedded in the text, convert newlines to <br>s, and many other things. And then, isn't it handy when the language reading the contents of those nodes has strong text-handling abilities?

Just a thought.

-chris

Re:You lost me on the incredible leap of logic... (Score:3, Insightful)

by Golias ( 176380 ) writes: on Thursday January 30, 2003 @06:08PM (#5191749)

As XML is just another text format, it follows that Perl will be just as good at processing XML documents.
Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.
That only correlates if ice cream is a type of pasta, because XML is a text format.
This is a lot more like saying "since my pasta maker is good at making Ziti, Rigate, Macaroni, etc., all pastas really, and Spaghetti is a type of pasta, my pasta maker should be good at making Spaghetti.

Re:XML is NOT just text! (Score:2, Insightful)

by grantm ( 531986 ) writes: on Thursday January 30, 2003 @09:04PM (#5192962)
What you're looking at there is one possible representation of an XML document.

I couldn't agree less. In fact, XML is one possible representation of the abstract hierarchical data structure you described. Furthermore, XML is in fact a text representation. There are many other ways you could represent that data structure (eg: a custom binary format, records in a relational or hierarchical database, a object serialised to a binary stream etc) but none of them are XML.

The W3C themselves say that "XML is text [w3.org]" and then go on to point out that advantages of being a text format include:
- you can look at data without needing the program that produced it
- you can read it with you favourite text editor
- it's easier for developers to debug
They also say: "Like HTML, XML files are text files that people shouldn't have to read, but may when the need arises".

In parallel with the development of XML, our notion of the definition of 'text' has also moved forward. Through the adoption of standards like Unicode and bridging facilities like encoding declarations, we have moved past 7-bit ASCII as being the one true text.

To claim that an XML file is not "editable in emacs (or vi), grep-able, diff-able or understandable to the naked eye" is demonstrably untrue. You'll obviously need a text editor that understands whichever encoding the file uses (both emacs and vim fit that bill) but a text editor is a perfectly servicable tool for viewing and editing XML (obviously not the best tool in many cases, but acceptable nontheless)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

XML and Perl 138

XML and Perl More Login

You lost me on the incredible leap of logic... (Score:0, Insightful)

XML is NOT just text! (Score:5, Insightful)

This was a review? (Score:4, Insightful)

Re:XML is NOT just text! (Score:4, Insightful)

Re:XML is NOT just text! (Score:3, Insightful)

Re:You lost me on the incredible leap of logic... (Score:4, Insightful)

Re:XML frees us from Perl (Score:5, Insightful)

So, where's the review? (Score:4, Insightful)

Re:XML is NOT just text! (Score:3, Insightful)

Re:You lost me on the incredible leap of logic... (Score:4, Insightful)

Re:i hate perl... (Score:2, Insightful)

Re:XML frees us from Perl (Score:3, Insightful)

Re:XML is NOT just text! (Score:3, Insightful)

Re:XML is NOT just text! (Score:3, Insightful)

Re:You lost me on the incredible leap of logic... (Score:3, Insightful)

Re:XML is NOT just text! (Score:2, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot