Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Perl Books Media Programming Book Reviews

Perl & XML 125

dooling writes: "Perl & XML is a well-written book that accomplishes what it sets out to do. It states in the preface that it is written for Perl programmers who want to learn about XML and what is available in Perl for XML processing. It achieves this goal, but little else. When you are done reading this book you will have been given an overview of Perl and XML, know where to begin to attack an XML document, and know where to look to find more information." For dooling's more complete review, read on below.
Perl & XML
author Erik T. Ray & Jason McIntosh
pages 202
publisher O'Reilly and Associates
rating 6
reviewer dooling
ISBN 059600205X
summary Good introduction to XML for Perl programmers.

The book starts out with a brief explanation of why XML and Perl are well-suited for each other. It then provides a teaser of things to come: an explanation of how to use the XML::Simple module. The first chapter concludes with some warnings and gotchas that seem a little premature since they have not really explained XML. Fortunately, most of these gotchas are covered in context later in the book.

The second chapter provides a whirlwind overview of XML -- covering its structure, DTDs, schemas, and XSLT (transformation). The discussion of XML in general, its history, and parts of an XML document are well done. They give someone who is familiar with static HTML the needed background to understand the structure of an XML document and the vocabulary used to describe it. Unfortunately, the discussion of where XML begins to distinguish itself from HTML, namely with DTDs, the new replacement for DTDs called schemas, and the transformation language XSLT, is too brief. They gloss over these topics with little explanation and few examples. That said, there are other books that do provide more in-depth coverage of XML (this book only promises an introduction).

The next five chapters cover Perl modules designed to process XML, starting with simple parsers and writers. Only methods and syntax relating to XML processing are explained. Therefore, if you are considering reading this book, you should be fairly comfortable with Perl and object-oriented (OO) interfaces to CPAN modules (nearly all the modules discussed provide OO APIs). Again, there are other books and perldoc documentation that cover Perl and it's OO features; so read them first if you are not familiar with OO Perl. If you are familiar with OO Perl, these chapters provide a good overview of the different ways XML can be processed (stream- and tree-based approaches), the advantages and disadvantages of each, and the Perl modules best suited for each approach. These chapters are the biggest strength of this book. The modules discussed in these chapters are by no means an exhaustive list of XML-related modules available from CPAN nor do the explanations of each module cover everything the module does. These chapters do, however, provide the reader with enough information that she can begin to process XML documents intelligently and know where to turn when she needs more information.

The next chapter, Chapter 8, covers XML tree iterators, XPath, XSLT, and XML::Twig. All of these topics are covered in a span of 16 pages (with only slightly over two pages dedicated to XSLT). Indeed, after reading the chapter, you may get the feeling that it was only included so the authors could cram more trite colloquialisms into the book. The short shrift given to these topics creates the impression, which is strengthened in the chapters that follow, that this book was rushed a bit to press.

Chapter 9 discusses applications of XML, including RSS and SOAP, and Chapter 10 is mostly example code. These chapters are intended to give you a feeling for what is possible without really giving you enough information to make it happen. The main problem with these chapters are the examples: the examples are long and the explanations are short. Thus, they are more useful as templates or a quick reference than for learning these topics in detail. Of course, the authors never promised you would be programming SOAP applications when you were done reading this book. And again, there are other books out there which discuss these topics in more detail. So the authors stay true to their promise throughout the book: they will introduce you to XML and tell you how to interact with XML using Perl, no more.

Personally, I found this book did, in general, give me enough information to get started using XML and pointed me where I needed to go to get more information. I am an experienced Perl programmer who is new to XML and comfortable with on-line documentation. This book seems to be written for people who fit this profile and who want to learn by doing (finding the answers to the "hard" questions as they arise). It does introduce a wide variety of XML-related topics and the Perl modules used to interact with them, which is what the authors promised to do in the preface. While it is by no means an authoritative text on Perl and XML, there is something to be said for keeping promises ...

Index As with most first-edition books, the index was adequate but not complete. For example, XML::Twig, which has an entire section covering it, does not appear in the index at all.

Contents
Preface

  1. Perl and XML
    • Why Use Perl with XML?
    • XML Is Simple with XML::Simple
    • XML Processors
    • A Myriad of Modules
    • Keep in Mind ...
    • XML Gotchas
  2. An XML Recap
    • A Brief History of XML
    • Markup, Elements, and Structure
    • Namespaces
    • Spacing
    • Entities
    • Unicode, Character Sets, and Encodings
    • The XML Declaration
    • Processing Instructions and Other Markup
    • Free-Form XML and Well-Formed Documents
    • Declaring Elements and Attributes
    • Schemas
    • Transformations
  3. XML Basics: Reading and Writing
    • XML Parsers
    • XML::Parser
    • Stream-Based Versus Tree-Based Processing
    • Putting Parsers to Work
    • XML::LibXML
    • XML::XPath
    • Document Validation
    • XML::Writer
    • Character Sets and Encodings
  4. Event Streams
    • Working with Streams
    • Events and Handlers
    • The Parser as Commodity
    • Stream Applications
    • XML::PYX
    • XML::Parser
  5. SAX
    • SAX Event Handlers
    • DTD Handlers
    • External Entity Resolution
    • Drivers for Non-XML Sources
    • A Handler Base Class
    • XML::Handler::YAWriter as a Base Handler Class
    • XML::SAX: The Second Generation
  6. Tree Processing
    • XML Trees
    • XML::Simple
    • XML::Parser's Tree Mode
    • XML::SimpleObject
    • XML::TreeBuilder
    • XML::Grove
  7. DOM
    • DOM and Perl
    • DOM Class Interface Reference
    • XML::DOM
    • XML::LibXML
  8. Beyond Trees: XPath, XSLT, and More
    • Tree Climbers
    • XPath
    • XSLT
    • Optimized Tree Processing
  9. RSS, SOAP, and Other XML Applications
    • XML Modules
    • XML::RSS
    • XML Programming Tools
    • SOAP::Lite
  10. Coding Strategies
    • Perl and XML Namespaces
    • Subclassing
    • Converting XML to HTML with XSLT
    • A Comics Index
Index


You may also want to check out Erik T. Ray's home page, Jason McIntosh's home page, or O'Reilly's page for the book. You can purchase Perl &amp XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.

Perl & XML

Comments Filter:
  • my opinion (Score:5, Informative)

    by larry bagina ( 561269 ) on Thursday July 11, 2002 @11:23AM (#3864843) Journal
    I am a professional developer, working mostly with Perl. I work in the field of biology and bioinformatics, but have spent the last 8 years working as a web and database Internet developer. And, I own practically every O'Reilly Perl book ever published (not that I necessarily think they're all worth buying). So, now that you know where I'm coming from...

    If you are preparing to do a serious amount of XML development, and you're in the process of determining a) which Perl XML modules on CPAN you want to use, and b) how to use them; and, you don't have a whole lot of time to spend tracking down the sometimes-hard-to-find documentation on these modules; then buying this book is a no-brainer. It covers all the major XML modules, how to use then and really helps you figure out when to use the different modules.

    Even if you're not new to XML and Perl, this book would serve as an excellent refresher course on what XML tools are available out there for you... Maybe you haven't looked at your code in awhile, or want to update it to use a newer module from CPAN? Or, maybe you're looking for a better way to do it? Then, this book would definitely help you out.

    While a fan of O'Reilly books in general, I'll be the first to admit some of them are more useful than others. I highly recommend this book, though, as it's actually useful, comprehensive and very well presented. I find myself cracking it open all the time, especially as my utilization of XML has grown more complicated. It has definitely earned its place in my Aqua Perl book collection.

    • I think that if you are making a large enough (enterprise?) website that requires a serious amount of XML development, you should ditch perl altogether and write it in Java. Java already handles XML efficiently, and is a better choice for large apps.

      If you are writing a smaller website that uses XML, sure, perl is a nice choice. But is the XML necessary?

      And I have used (and still use) both perl and Java. I just view Java as a better choice for large web apps, and perl for small web apps and scripting.

      Before I get hit with flames, please understand this is my opinion, not fact.
      • by Matts ( 1628 ) on Thursday July 11, 2002 @11:52AM (#3865028) Homepage
        You are correct it is a biased opinion, but inquiring minds want to know why.

        You can't seriously be suggesting that because Java "already handles XML efficiently" that developers should switch to it, or that suddenly its the holy grail of languages that can use XML? At least not without backing up your statements. There's an enormous amount of work involved in dumping a language for another one. Witness the number of dot-coms who tried it when they bought another company only to fail (whether trying to make everything "Java" or "Enterprise" is a contributing factor or not is a judgement call).

        Perl has incredibly efficient libraries for processing XML. For example XML::LibXSLT is faster than every Java XSLT module out there according to freely published benchmarks, so it's hard to see where you find your bias.
        • sure, migrating languages it costly. so, if you've got legacy code in perl, leave it alone. but... if you're putting together a new web based "application", you have the choice to use java, .NET, or legacy technologies such as perl/php/asp, etc. My first choice would be java. There wouldn't be a second choice.
          • Why would your first choice be Java, when the above poster already mentioned that Perl's XML libraries are faster?

            Perl is by no means a "legacy" language - it's still in development, and if I dare say, it's developing faster than Java. Look at the upcoming Perl 6.

            I'm not saying that you should use Perl; I'm just curious as to why you would use Java and refuse to use anything else (when something else may be the best tool for the job).

            Don't forget that some of the largest web apps out there (*cough* Slashcode *cough*) are in Perl.
            • Why would your first choice be Java, when the above poster already mentioned that Perl's XML libraries are faster?

              How many times would you need XML loaded vs. say a DB, where J2EE has an enormous advantage over perl. How about compairing the overall speed of perl's webapp (including its faster XML loading) vs. the speed of the Java webapp (mod_perl vs a good java webcontainer is no contest).

              Perl is by no means a "legacy" language - it's still in development, and if I dare say, it's developing faster than Java. Look at the upcoming Perl 6.

              Compair the technologies in Perl 6 vs. the technology in Java1.4, and the newest tech in J2EE.
              Just because Perl is on a higher version doesn't mean its more advanced, nor does it mean its moving faster.

              I'm not saying that you should use Perl; I'm just curious as to why you would use Java and refuse to use anything else (when something else may be the best tool for the job).

              I'm DEFINATELY not saying Java is great for everything. BUT, the best tool for large enterprise websites in most cases is Java (well, J2EE).

              Don't forget that some of the largest web apps out there (*cough* Slashcode *cough*) are in Perl.

              First, have you read slashcode, before? Yick!
              Second, have you seen anything similar written in Java (yeah, nothing exists, though I've been tempted to write a slash-like logger in Java). I'd think you'd be surprised at the speed increase, and the maintainability/fixability increased.
              Third, large apps (IMHO) should use OOP. Perl 5's version of OOP is horrible (can't wait for Perl6, though).
              Slashcode started off small, so it was great as a perl/cgi site. When it increased in size, it shoulda went to MVC patterned J2EE webapp, complete with taglibs. It, unfortunately, didn't go that direction.
              • " (mod_perl vs a good java webcontainer is no contest)."

                This surprises me, as the only bechmarks I've seen showed mod_perl as faster than java - have you seen benchmarks that show otherwise (this *was* over a year ago, however)? Please post url - I'd like to see them.
              • How many times would you need XML loaded vs. say a DB, where J2EE has an enormous advantage over perl.

                That is complete bull. Perl's database interfaces are at least as fast as Java's if not faster. For most databases they are a thin layer over the C library provided by the database company. They fly.

                How about compairing the overall speed of perl's webapp (including its faster XML loading) vs. the speed of the Java webapp (mod_perl vs a good java webcontainer is no contest).

                Actually, it's very close, but mod_perl is a little faster. I benchmarked Resin against mod_perl doing a select from an Oracle database and displaying it. Both were fast, but mod_perl was faster. The Resin guys have some benchmarks on their page where they claim that Resin is faster, but the mod_perl code they used in that test has some mistakes in it that slow things down.

                I'm DEFINATELY not saying Java is great for everything. BUT, the best tool for large enterprise websites in most cases is Java (well, J2EE).

                There are some very large sites out there who would disagree with you. Yahoo and Amazon use a lot of Perl. I don't know of any J2EE sites that handle anywhere near that amount of traffic.

                There's nothing terribly wrong with Java. I use it at work almost every day. However, it has no real technical advantage over Perl for building web applications.

      • I'd be the first to flame on this topic (see my previous posts...) - but I'm curious - why is java *better* in your opinion for larger apps than perl? Is this becasue OO is built into java (rather than added to perl) or is it something else? Also, speaking as someone who does *not* code in Java, what do you mean by "Java already handles XML efficiently". Does this mean that the functionality provided by the XML perl modules is built into java?

        Not that I'm thinking of chamging (I'm pretty pig-headed about this), but I am interested in why people pick java over perl...
      • absolutely. while not knowing who "larry bagina" is, what he is doing or who he is doing it for, FortKnox is right on target in setting mr. bagina straight.

        it's rare that someone so completely ignorant of requirements, methods, or even the people involved can give such insightful comments to wayward programmers. FortKnox, you are gold indeed. Tell me, do you think I should use java in my project?


  • I wonder if this conforms to the dtd
    a bit of xml i wrote -- xml comments look something like this
    -->

  • XML sucks (Score:1, Flamebait)

    by micromoog ( 206608 )
    XML is a giant leap backwards. It makes us convert data from a binary format, to a terribly verbose text format, then back to a binary format for use. This leads to:
    • conversion complexity
    • conversion errors
    • vast storage requirements
    • vast bandwidth requirements
    The only benefit AFAIK is that applications from different vendors can use it to "talk" to each other. However, the applications still have to understand the same set of XML tags to begin with, then must conform to this hideous standard with all of its verbosity. Said applications would have been better off using a proprietary (read: efficient) binary storage format in the first place.

    Ban XML!

    • Right... (Score:5, Insightful)

      by alexhmit01 ( 104757 ) on Thursday July 11, 2002 @11:36AM (#3864915)
      You realize that if I get an XML file, I can figure out what it is saying and decide what to do with it. With your ideal (binary) files, I need to reverse engineer the format.

      With binary, I need permission to interoperate. With XML, I need a text editor (or print-out) and some common sense.

      You worry all you want about the computer's efficiency. I use my machines to make my life easier. I don't jump through hoops to make the computer's life easier...

      Taking troll bait,
      Alex
      • Re:Right... (Score:2, Insightful)

        by jkroll ( 32063 )
        With XML, I need a text editor (or print-out) and some common sense.

        Actually you still need the same thing you would need with the binary format. You need the documentation as to what the tags mean. Try looking at XML produced by commercial software sometimes. Just because SAP calls the tag "ITEM" doesn't mean it refers to a part number or something you can pick up at the store. And looking at the schema or DTD probably won't help you figure out the tags mean either.

        Finally parsing XML is easy when the document produced by an external system exactly mirrors your internal representation, unfortunately if you are interfacing XML documents with external systems this will almost never be the case.

        Don't get me wrong, XML has its uses - just the exchange of data between disparate systems isn't one of them.
        • Finally parsing XML is easy when the document produced by an external system exactly mirrors your internal representation, unfortunately if you are interfacing XML documents with external systems this will almost never be the case.

          This is true. It is also why XSLT (and other XML transformation languages before anyone flames about the shiteness of XSLT) were invented. Of course, you do still need the documentation to understand the XML to be able to write the stylesheet. If the design of the XML schema (or *shudder* DTD) is badly thought out and hard to understand, well, you can't really legislate for that with out taking the "extensible" bit out of XML.

      • You're implying that XML was designed to make data more human-readable? Bollocks! That's what documentation is for!

        So, other than making it easier to reverse-engineer a data format that you're not supposed to be reverse-engineering (complete with bugs that come in because you guessed wrongly that <co> means Country and not County), what other benefits are there?

      • Actually, I for one am quite concerned about the computer's efficiency. When you have 10,000 concurrent users, those minute speed improvements will pay huge gains.
    • C sucks (parody) (Score:4, Insightful)

      by FortKnox ( 169099 ) on Thursday July 11, 2002 @11:44AM (#3864978) Homepage Journal
      C is a giant leap backwards. It makes us convert data from an verbose ASCII format, translate and compile it into a binary format. This leads to:
      • Conversion/translation complexity
      • Syntax Errors
      • Conversion errors
      • Storage requirements (object files)
      The only benefit AFAIK is that people can read the code better. However, the applications still have to understand the standard coding syntax, which comprises of a hideous amount of keywords and styles. Said applications would have been better off using Assembly (read: efficient) code in the first place.

      Ban C!


      Please note the extremely sarcastic tone of this post.

      Your complaints are old fashioned. Maintainability is a major overlooked flaw in Computer Science.
      • I say we ban computers. That way we illiminate the problem. :)
      • If I ever wrote or read my stored data manually, your post might be relevant. But we're not talking about computer/human interface here. No, we're talking about machine representation of data.

        Of course progamming languages need to be human-readable; that's what they're for. Stored data does not; it needs to be efficient. XML is bloat of the worst kind.

        • Of course progamming languages need to be human-readable; that's what they're for. Stored data does not; it needs to be efficient.

          Do you mean like, say Microsoft's Word binary format? It is much more efficient than plain text, and much more compact. Of course, of you decide to process all of your documents with another application, you are at the mercy of the vendor. Writing conversion tools for some arcane binary format is a pain in the ass. Plain-text, while space-consuming, will always be accessible.

          • Of course, of you decide to process all of your documents with another application, you are at the mercy of the vendor.

            That's the vendor's problem, not the format's. If Microsoft wants you to be able to reverse-engineer their document format, they'll give you documentation of it. If they don't want you to reverse-engineer their format, they won't (and they sure as hell won't make it XML).

            Plain-text, while space-consuming, will always be accessible.

            That's true. But we're not talking about plain text; we're talking about XML (which is a bloated "self-documenting" structure that happens to be stored in plain text). If you want plain text, save your Word docs as text.

    • Re:XML sucks (Score:3, Insightful)

      by sporty ( 27564 )
      Ok, i'll reply to this. I'm proly being trolled :P

      Conversion complexity, granted. It does take a bit of work. But would you recomend describing each record with individual lines? That's a bigger pain than ever. What XML gives isn't just a structure for your data, but a language to describe it. It also allows for non-2d data. By this, we can have people with subsets of data, with subsets of different types. This is great, as now we can have a language that describes data in a logical manner and be completely portable.

      Conversion errors, please be more spefic. If you convert to a comma delimited format, you are screwd if you do it wrong. If you do it straight to binary, you have to worry about how many bits represent any given data. Why do you think that pack has so many different switches for converting data?

      Just because you use XML doesn't mean you must store data in XML form. Hell, it's stupid if you have gigs of data to use XML to relate it all. DB's dont' use xml except for expression of data back to the user/software it talks to (if asked for).

      If you are worried about bandwidth, on a simplistic level, gzip it. Yes, compress it. Hell, do a gzip stream which is supported by many browsers.

      If your program plans on sharing data, you'll want to use XML. If you never want to share your data, fine use binary. It's not terrible. But once you wanna share it between two machines or processes, now you have to worry about deciphering the binary format. THat is.. unless you work by yourself and have documentation on everything.

      • Wow!
        The one other slashdotter that realises you don't have to actually store the data in XML form!

        And I was beggining to feel lonely... ;)
        • Of course. XML is mostly useful for sharing data outside the realm of one person. It's MUCH faster to store a C struct in binary form and reload it than parsing XML. No parsing.
          • Faster for the computer sure.
            But what about later when you need to support
            different languages, or change your struct,
            or change to a different architecture?
            And still need to make it backward compatible?

            • Change in structure? No problem. What you do is use XSLT or rewrite stuff using DOM or SAX. SAX being the biggest hack, DOM being the biggest pain. XSLT being the quickest solution. XSLT would allow you to transform any single file. Write a quick bit of code to loop the transformation on all suplied files and poof, you are done.
              • :)

                Exactly! Easy in XML, but how would you do this if you saved the files as a dump of a C data object...

                • Simple. you load up the data in one struct, create a new struct, do a 1-1 mapping and output the new one. Don't taunt the systems programmers ;)
                  • And if you forgot to indicate a version number?
                    Or forgot to indicate the language encoding
                    or endiness...

                    • The same could be said of XML. An XML document doesn't have to validate against a dtd.

                      As for language encoding.. and that silliness, you can do a binary->binary conversion. You just have to know the language you are using. Like int's are 32 bits, unsigned are 31 + the signed bit. Etc etc.. so it's not terrible. :)
                    • Sure, but how many people specify the size of an int, the endiness etc in the actual file when they save a binary file?
                    • Ah, but remember, in a controled environment, where source and everything is documented, it's not a "bad thing". Oracle works like this of course. Do they have problems migrating their binaries data format? Doubtfully.

                    • Oh definetly!
                      If you plan and document etc, anything you do will probably be good.
                      I was thinking more along the lines of a small hobbiest programmer, or the average company, where they knock out something quickly, then 6 months down the line run into problems because they forgot to specify the size of their ints in their file format, or what have you.
                      XML just allows the novice-to-medium programmer and company to make less mistakes at that stage.
                      Particulary early on where the data they are trying to save my vary a lot.
                    • Then are we agreeing on the same thing? :)

                      Note: added you as a friend. I'm glad slashdot actually added messaging when someone replies to you. Actually nice to have conversations.
                    • The messaging thing is pretty neat yeah..

                      I'm working on a load of xml apps, and although they use and rely on xml a lot, they hardly ever have anything in the xml format.
                      Everything is stored in databases, and database files, the data transfer is negotiated to find the optimin format, the xml schemas are turned into actual code, and so on.

                      However it does mean that I can plan and design it with xml in mind - meaning i get all the advantages of schemas etc, but without the disadvantages of verbosness, slowness, etc.

                    • Heh, wish where I worked was like that. We have stupid things going on such as our Database doing direct connections to 3rd party web services. We have about 2000 tables in our database. It's flippin' stupid.
    • Re:XML sucks (Score:2, Interesting)

      by Trilaka ( 172371 )
      Ok, granted it is generally bad form to respond to trolls, but this one reminded me of a good story that I thought I would share.

      Problem: Given a document in Word format containing a table on which various operations must be performed, resulting in an HTML page with a consistent format.

      Now, first of, simply saving the document as HTML from within word was far from sufficient. So, what to do? We tried various methods using Microsoft products to do the requisite transformations, all to no avail. We simply didn't have the control we needed.

      Solution: Import the file into OpenOffice.org's Writer, save in OOo format (XML based), write a quick one-page perl script using XML::Twig (even though I had never examined OOo XML format prior to this exercise), and voila, problem solved.

      This was a great example to me of the power of XML. Sure, XML is verbose, but remember, it is all ASCII text, and compressing ASCII text is basically a solved problem in computer science, so the verbosity needn't create much of a storage hit.

      Horray for adoption of XML file formats!
    • www.xmlsucks.org (Score:4, Insightful)

      by alispguru ( 72689 ) <bob.bane@me.PLANCKcom minus physicist> on Thursday July 11, 2002 @12:00PM (#3865072) Journal
      For a more detailed, and more depressing, take on the above, see http://www.xmlsucks.org/but_you_have_to_use_it_any way/ [xmlsucks.org].

      Yes, it's a PDF. Unroll it - it's worth the effort.
      • XML parsing errorfatal parsing error: the document is not in the correct file format in line 12, column 13
        fatal parsing error: error while parsing element in line 12, column 13
        fatal parsing error: error while parsing content in line 12, column 13
        fatal parsing error: error while parsing element in line 12, column 13
        fatal parsing error: error while parsing content in line 12, column 13
        fatal parsing error: error while parsing main element in line 12, column 13
        <sub><title>or: why XML is technologically terrible, but you have to use it
        ^

        sucks to be xml
    • Why is the parent modded as flamebait? It's a valid point of view, and completely accurate for many uses.

      However, I'm not as 100% opposed as the poster above. Specifically, I have a minor SGML background as well as XML and so can see the uses a little better.

      Basically, the data is nothing without the associated transform. I'm talking about the literal meaning of the word transform here - whether transformed into into memory structures within a program through parsing, or whether merely transformed into other document types via XSLT.

      For example, I use XML as the format in which my servlet-based reports are generated. JAXP then handles transforming into my desired output format. The user wants a web pagea as their result? Fine - here's the XSLT stylesheet to transform into HTML. They wan't Excel? Well...almost but here's the XSLT to transform into .csv.

      You get the idea. Automatic, reflex use of XML is over the top and unecessary. Use of XML where portable source data is transformed into one of a variety of options however is quite useful.

      Cheers,
      Ian

  • Parsing XML indeed. I mean seriously, have any of you ever actually tried to impliment XML parsing? It's an order of magnitude slower than accessing a database, ten zillion times slower than reading a flat file ASCII database, and a trillion times more expensive (well, I'm exaggerating a bit) than reading in a text file with nested variable=value pairs.

    Interoperability is great and all, but I think XML is nothing but hype.

    Programmers, hear my cry! Spend your precious hours working on your program interface, your error-checking, your overall design and modularity, don't spend time worrying about a scheme with a fancy name that saves data like this: value.

    Don't mod me up or down, I just want to foster a discussion about this. I mean, as a standalone programmer using Perl for a majority of their web application products, what benefit does XML give you other than buzzword compliance?

    ----
    • ...A Six Figure Salary
    • "Programmers, hear my cry! Spend your precious hours working on your program interface, your error- checking, your overall design and modularity, don't spend time worrying about a scheme with a fancy name that saves data like this: value."

      Argh! Slashdot cut out my pseudo-tags, in my original post I meant <variable>value</variable>.

      I bet I won't be the only one to make that mistake today. If you're posting XML, be sure to save the post as "Extrans (html tags to text)" instead of "Plain old text" or "HTML formatted" to save your braces from being truncated.

      This has been a public service announcement.

      Now I'm depressed, I'm going to go work on my latest server. At least I have some control there.
      -----
    • Having done some Java/XML work last year, then stumbling back into a homogenous Windows environment, I can honestly say I think XML has strengths and weaknesses.

      Speed, however, is a primary problem, or at least it was when we were using XML to store/parse tens of thousands of elements for a Financial Services app in Java. It had the advantage of not being tied to a particular platform, and needing no database of any form distributed with it, and being read-compatible with some existing applications. So it was worth it to the client.

      In my current environment, it makes no sense, other than "buzzword compliance".

      So it's got it's uses, but it's not a magic bullet. Then again, I think Client/Server is a better solution than going web-based for a lot of things, so what do I know? :)

    • We just dumped XML for our primary storage format for an application we're about to release. We're still using it as a backup/restore format though.

      The problem was that while our program will generate probably only a few K of data per day in binary format, once you translate that binary data to XML with descriptive tag names and attribute names, it would get to be 100K per day due to the number of attributes per record. After a couple months you may be looking at 10 megs or more. Load/save times were several minutes. We switched to the BDE (I know it sucks) and load times are instant due to being able to load records on demand instead of everything at once.
    • why xml? (Score:3, Interesting)

      by paranoic ( 126081 )
      XML is for moving information from one machine to another, much like HTML. Heck, XML is just HTML where you get to make your own tags. The usefulness comes when you, as a programmer, have to decipher a document that comes from somewhere else, so you can do something with the information contained in it. The documentation is built it. It's the next generation past using key=value pairs. How do you nest those in a generalized way, so the machine you are sending information to, can decipher it?

      As far as parsing it, there are libraries for that.

    • The trouble is that people use it wrongly. (IMHO)

      If it is done right, most of the time you should never have any actual XML text!

      Store the data you are using in DOM, and manipulate it there. Store the data in DOM format on the disk, with a way to dump to XML if you have to (which you rarely will). Compile your Schemas, etc.

      I'm also not sure why you imply it is costing time using XML - one of the ideas is that it is supposed to save time - and it does.
      I can write schemas to validate the output and input, write XSL's to transform the data to anything, and so on.

    • I have spent my Junior and Senior years in college working with XML (as a personal project). I just graduated and I am still working with XML. I will agree that in some places XML is being used in ways it was never intended. That is why they call it extensible. It will fit almost anywhere but is not the best solution for most problems.

      One good example of XML use is in Open Office. I believe the Open Office file format will end up being the most important contribution that Sun made in the office application arena.

      Another good place to use XML is in cofiguration files. The advantages are obvious.

      Parsing XML takes resources, so in most applications you should not do it in real time. An example of this is in a dynamic web environment. Try implementing Slashdot with a XML based backend. But with browsers becoming XML aware, you can offload this parsing to the browser.

      The worst place to use XML IMHO is to describe logic. Some people have tried this - like XSP or JSP. There are some advantages to it but I think it ends up being a mess. XSLT got away with being a mess because it was one of few solutions to the problem of XML transformations.

      An argument about the evils of XML is akin to saying Perl is a nasty language. A professor who taught Perl actually told me that. It is a nasty language but it solves some problems elegantly.
      XML on the other hand is quite pretty and it solves several problems elegantly.

      It is when you fall to the hype and use XML becaue you want to advertise it as a "feature" that it fails. XML is not a feature - it is a solution to certain problems.

      I could care less if your program uses XML in some obscure place that I can't see. If you can give me a way to export my data to XML, I will be happy. I can write a config file as XML, I will be happy. But if you say, I use XML in this application to implement the Help feture which is only accessible through the Help button, I could care less.

      P.S: I hate it when interesting stuff gets posted in the middle of the day when I am at work

    • I use XML as the interchange format for a web publishing system which publishes our internet web site (http://www.bms.com), but the data is actually stored in an oracle database. I have a perl object which handles all the fuss of getting/putting xml into the database.

      As an interchange format XML is ideal; think of it comma separated files on steroids. When all your data can be serialized to XML you get the following benefits:

      1. XML has rich data structures for complex info.
      2. XML can be self describing.
      3. XML is 100% portable.

      Like HTML, people will discover uses for your XML files that you never thought of. Also, if you lose all the docs, you can read the XML in a standard text or unicode editor and figure it out. This is even better than comma separated, since most CSV files don't bother to include a first row field discription.

      Like CVS, you can parse XML files with standard command line tools like grep. And in 100 years, all those Oracle tablespaces will require a lot of reverse engineering to get the data off it, while your text based xml files will still be parsable.

      I agree though, with the general notion of the parent. Definitely don't do XML because it sounds cool. Use the best process for the job, and for many data related jobs, relational tables and SQL are best.

      One thing you can do to improve speed; serialize your DOM objects using the Perl Storable module, and save along with your plain text versions. Then when you need to access the data, all you need to do is unserialize the object, which is a lot faster than reparsing.

    • by The Pim ( 140414 ) on Thursday July 11, 2002 @04:36PM (#3867151)
      Interoperability is great and all, but I think XML is nothing but hype.

      Heck, let me give this my crack... :-)

      Ok, obviously the biggest reason for XML's popularity is hype. That's just the way the industry works; it doesn't make XML good or bad.

      There are a several legitimate technical benefits to XML, that might be persuasive in one context or another.

      • It looks like HTML, so everyone intuitively "gets" it.
      • It's textual (not binary)--but of course, many formats are textual.
      • It's reasonably easy for humans to understand without a spec, provided the tag and attribute names are not obfuscated, and the relationships are relatively simple. Note this does not make it easy for programs to understand!
      • You don't have to write your own parser. You don't even have to write a grammar--just throw in a tag and the corresponding code to read and write it. This advantage is not as big as some make it out to be: many languages have easy-to-use features for parsing, and those that don't can make use of easy-to-use parser-generator tools.
      • There are lots of libraries and tools. Of course, this is self-reinforcing (tools -> popular -> more tools -> more popular -> ...).

      Many XML proponents, including some in this thread, would add to this list that XML is a good data storage and/or interchange format. Some "insightfully" note that it is better for data interchange than data storage. This is the biggest delusion over XML: XML is a rotten format for data.

      Remember what XML was back before the hype machine was in overdrive? It was a better HTML, and a simpler SGML. HTML and SGML have always been formats for documents, and XML was intended to be the same. XML is indeed a pretty good match for documents. (This is debated of course: documents are complex things, and modeling them is non-trivial. Embedded Markup Considered Harmful [xml.com], by Ted Nelson, is a good introduction.)

      But XML is a poor match for data. This is because an XML document is a tree, and most data are not hierarchical. Consider that the database industry abandoned hierarchical databases many years ago (ok, abandoned is a little strong: we still use LDAP). Hierarchical data formats force you to pick which relationships will define the hierarchy, and any other relationships have to be kluged in.

      Take a simple example of the sort of thing people use XML for: address book entries. Say you start out with a person element (I'm not going to write out the examples in XML syntax because it's too painful on slashdot) containing a name element and an address element. Now, you realize that multiple people may live at the same address, and you don't want to duplicate the address (data formats should be normalized). You either have to turn things inside out, putting the person element inside the address element, or make person and address both top-level elements, and link them somehow. In the former case, you have chosen an awkward hierarchy, and have "used-up" your ability to group people. What if you want a different grouping in the future? In the latter case, you have given up a lot of simplicity and read/writability (since now names and their corresponding addresses are in different places) by forcing non-hierarchical data into a hierarchical format.

      What is the solution? Well, I won't assert that it is the best data model that will ever exist, but the database industry has settled (roughly) on the relational model. So I think we should create a format describing relations, combined with the other advantages of XML: extensible, textual, readable, and most of all, standards-based. Yes, this would mean we would have to learn two technologies, one for documents and one for data. But the technology for data would be so much simpler--and as a bonus, integrate easily into our databases--that it would be a huge win overall. I don't have time to defend this model in depth. But think about it.

      By the way, another example of the bad match between XML and data is the great debate over when you should use elements, and when you should use attributes. The fact that there is an arbitrary decision to be made shows that XML has degrees of complexity that only get in the way when you use it as a data format. (If you're going to use XML for data, at least have the decency to eschew attributes except for an id attribute.)

      • You don't even get into its real uglyness...

        If you have to parse a bad xml file you will get into two states, one requires an infinite amount of memory, the other infinite amount of memory.

        how do you get these broken xml files? Either a closing tag went missing or the program that created them has a bug or your program has a bug or...

        Dr Knuth wrote about these problems because TeX suffers from them too. His solution is to bail out to the user and ask them to type stuff that will fix it. Anyone that has used TeX for a while knows that when they see that prompt, the only thing to do is let TeX do auto cleanup which means dropping quite a bit of its internal stack and trying to rebuild.

        XML is one of a family of nested formats and they all have the problem when you have nest to any depth and most tags can be at any depth. The problem is your parser can't know when it sees something out of place because most tags can fit anywhere.

        At work I deal with hundreds of compaines IT departments sendin me data that involves real money. Many of these people can't get a standard ascii file in the right format, how are they going to get XML done right? Our expierments show that it takes nearly two weeks of full time support per client for XML and about two hours for a plain ascii file.

        I think the only place in the real world for XML is buzzword bingo.
        • If you have to parse a bad xml file you will get into two states, one requires an infinite amount of memory, the other infinite amount of memory.

          At least you're no worse off if you hit both bad states at once. ;-)

  • Comparisons? (Score:2, Interesting)

    by Anonymous Coward
    That TOC reminded me of the Python/XML FAQ [sourceforge.net], which I'm more familiar with.

    Any Perl/Python bilingual folks out there care to comment how the XML abilities of the two compare nowadays?

  • by Wee ( 17189 ) on Thursday July 11, 2002 @11:41AM (#3864951)
    I bought this book not too long ago. I was getting really tired of looking all over the Net for information on XML perl modules. Predictably, there are about 5 ways to get anything done with Perl and XML, so just examining all of your options is a hard enough task. Do you use XML::Simple or XML::Writer to make that XML schedule document? You can look online for a week straight or grab this book, which essentially condenses all the Perl XML docs into one place.

    The book is a little sparse, though. It's about the same thickness as Using csh and tcsh, so don't expect more than an overview of anything. In fact, it might be a little small for US$35.00 (although Bookpool has it for US$21.50 [bookpool.com]). Another small gripe was that it covered parsing XML in far greater detail than generating XML (which was my task at the time I bought the book). Admittedly, parsing XML is typically what most people tend to do and is far more difficult that creating new XML, but I thought a little more coverage was warranted.

    If you are faced with doing something involving XML and you're not sure what software bits are up to the task, then this is a good place to find out where to start. You could wind up looking elsewhere if you need lots of nitty-gritty details, but getting off on the right foot is a hard enough task and might be worth the price of the book.

    -B

    • > Another small gripe was that it covered parsing XML in far greater detail than generating XML (which was my task at the time I bought the book).
      I haven't read the book, but that is also my concern (also at the moment).

      Why is it that libxml2 has several parser options, including DOM and SAX, but only on writing option - generating a complete tree in main memory before writing it out to a file?!?

      Like XML::Writer can, it must be possible to create an interface that, while still validating stuff like closing the correct elements, doesn't have to keep it all in memory.
  • by Dr. Awktagon ( 233360 ) on Thursday July 11, 2002 @11:42AM (#3864955) Homepage

    ...know where to begin to attack an XML document...

    I can tell you from personal experience, you want to attack the soft, weak center of each element, or, even better, any undefended #PCDATA.

    You'll want to avoid attacking the sharp angle brackets present on every element. Your sword blows will simply glance off, and then the XML document will jab you with the sharp corner.

    Entities are another hidden danger. The ampersand prefix character is very quick and wiley, and even though it appears smooth and undefended, it can quickly turn on you, showing its offensive nature and bristling an array of pointy teeth. (Note, this depends on your screen font).

    In short, attacking XML documents is risky, but with the proper strategy, can yield a nearly limitless supply of delicious data.

    Ahem.

    Does anybody know of any Perl XSLT module that allows Perl functions to be called from the templates? I.e., to format dates or stuff like that.

    • XSLT has the format-number() function, which could be used to format a date. I personally don't recommend mixing XSLT with anything. If you need your date formatted in a specific way, I recommend storing the data in a different structure and use pure XSLT. Rick
    • Does anybody know of any Perl XSLT module that allows Perl functions to be called from the templates? I.e., to format dates or stuff like that.

      I'm looking into it for XML::LibXSLT, but it's non-trivial due to lack of docs and lack of context. Keep watching CPAN is all I can suggest!
  • TMTOWTDI (Score:4, Informative)

    by Pinball Wizard ( 161942 ) on Thursday July 11, 2002 @11:44AM (#3864974) Homepage Journal
    and using Perl with XML is no different. If you are just getting started with using Perl to process or write XML files there are an array of libraries out there for you to use.

    I personally didn't want a handholding book as I've worked with XML in other languages, but something that cut through the confusion of all the different ways to do the same thing.

    This little book was perfect for me as it's a nice overview of what is out there and how to pick the right library for the job. Don't expect a complete enterprise application in this book - its for programmers that already know perl and the basics of XML and just need a jumpstart in using the libraries available.

  • by mir ( 106753 ) <mirod@xmltwig.com> on Thursday July 11, 2002 @12:14PM (#3865162) Homepage

    I found this book an excellent introduction for Perl programers who want (or have) to start processing XML. It cuts through the long list of XML modules on CPAN [cpan.org] (485 results!) and gives you the basic techniques and tools you can use.

    XML is really not that difficult to deal with but it can be a little intimidating. "Perl & XML" is written in a simple and direct style that gives the reader enough information to start writing code, and pointers to find more specific information once they have chosen the tools they need.

    Armed with this book, The Perl-XML FAQ [perlxml.net] and Kip Hampton's column on XML.com [xml.com] any Perl programer can start working confidently with XML.

  • I am in the same boat as the author of this article - an experienced Perl programmer who needed to learn some XML stuff.

    I found this book to be an outstanding resource that got me up to speed very quickly on both XML in general and that variety of ways that Perl deals with it.

    Don't let the small size fool you, it is packed with useful information and well worth the price.
  • Here are some other useful books on the subjects:

    Learning XML [barnesandnoble.com] - also an O'Reilly book

    Perl in a nutshell [barnesandnoble.com] - a good starter book on perl
  • I have recently learned XML/XSLT. I have been able to completely stop writing HTML code. I have been able to write all of my static web pages in XML and use XSLT (via Xalan) to generate the HTML. This is great... but I have only been successful with static web pages, not dynamic ones.

    I would like to extend on this flexibility to dynamic web pages. I would like to have some type of CGI script that generates XML instead of HTML... then transforms the XML to HTML via XSLT; all on the fly. I have looked into XML::XSLT. Anyone have any other good solutions?
    • If you are looking for a complete interface to Perl/XML/XSLT then you should look into AxKit [axkit.org].

      AxKit is a complete dynamic pipeline managment system for mod_perl.

      I would agree with a number of the comments here concerning the use of XML as a data storage format. We use a relational database to store our information, output it as XML and format it with XSLT all using the AxKit pipeline. This allows us to output the information in any way that we see fit.

    • using XML::LibXML and XML::LibXSLT you can gernerate your XML and apply an XSL transformation very easily from a perl CGI program. One thing I've done is read in data from a relational database, build an XML DOM on the fly and then apply the XSL. This way I don't have to make any formatting decisions in my program. How the page will display is all handled in the XSL. This is a great way to impliment the Model View Controller thing where the data, the programming, and the presentation are all separate.
      Paul
  • PERL is an old dinosaur. It's so filled with modules it's hardly even PERL anymore.

    Instead, use PHP. It's alot easier, and more web-based. It's also faster, and more wide-accepted now.

    There are two awesome books on the subject, I happen to own both (ordered via Amazon a week ago!). They are as follows: Both are pretty good. I like PHP and XML better. It seemed like Wrox just wanted to have a book on the subject to keep up with O'R.
  • by kellan1 ( 23372 )
    I can't believe "Perl & XML" is only get a 6, thats about an F- on the /. scale of book reviews.

    Not much larger then a pamphlet, the book packs an amazing amount of info into its svelte form. It covers standards, tools, thought process, programming tips, and history in an effortless, breezy tone. In the best tradition of Oreilly books (particularily the Perl ones) you can sit down and read the book cover to cover and enjoy it, or jump in here and there for quick reference.

    The authors manage to stir clear the problem that plagues so many XML books, the endless reams of theory without application. E.g., who the hell deals with PIs on a regular basis when parsing XML? And yet every book drones on and on about them, but when the time comes to actually parse a little xml, the example will be a cop-out, the XML equivalent of "hello world", parse this simple, 1 level deep key-value pairs in XML.

    Not so with "Perl & XML", the author cover the theory of XML, but are much more interested in getting you coding and producing then being pendantic. The w3c as already got the monopoly of pendantism anyways.

    I particularily liked the walk through of XML::RSS late in the book, for an example of how to build something very much real world, and useful without being overly complicated.

    And, at least for right now, the book is up-to-date, miracle of miracles, chronicling important new changes in the Perl XML parsing story. (like the new Perl SAX work being done)

    Contrast Perl & XML with New Riders' "XML & PHP", which I almost abandoned in the first 20 pages, when they tried to tell me that expat was a compliant SAX parser. Expat is important, and confusing, and its understandable for the authors' to feel defensive about PHP's xml toolset, but the solution isn't to lie, nor be blithely ignorant. The book continues on from there, totally disorganized with no sense of building upon what you've just learned. Also, an entire chapter is dedicated to WDDX? Who uses WDDX? And the authors contribute yet another half-assed PHP RSS parser to the world; is it possible to get negative karma for sharing source?

    The reviewer mentions:

    Unfortunately, the discussion of where XML begins to distinguish itself from HTML, namely with DTDs, the new replacement for DTDs called schemas, and the transformation language XSLT, is too brief.
    This seems to me to show a lack of understanding about much of the real work being done with XML. Its been my expirence that most XML parsing being done, particularily in a scripting environment, does not check against a DTD assuming one even exists. Plus covering DTDs, the proposed W3C Schemas, the increasingly popular challenger RELAX, plus Schematron, and others could easily have added another 100pgs to the book. And XSLT is a book unto itself (and in fact has an Oreilly book to itself).

    The reviewer suggests that the XPath coverage is included for the purpose of "trite colloquialisms", and while, I'm not sure what that means, I think the fact that Perl has high quality tools supporting standards like XPath is awesome, and very gratifying. Without that sort of work being done, Perl simply wouldn't be a competive choice with Python and Java as an XML processing language.

    And finally " it is by no means an authoritative text on Perl and XML,", there are good authoritative books on Perl (lots of them), and good authoritative books on XML (a handful), this book bridges the gap, does it nicely in my view, and I personally love the shortness, the focus, and the form factor.

"The great question... which I have not been able to answer... is, `What does woman want?'" -- Sigmund Freud

Working...