Please create an account to participate in the Slashdot moderation system


Forgot your password?
Programming Books Media Book Reviews IT Technology

XML in a Nutshell 122

The indefatigable chromatic wrote this review of what sounds like another solid offering from the hard workers at O'Reilly & Associates. If you're in the market for dead-tree references to XML, it probably belongs on your list of candidates.
XML in a Nutshell
author Elliotte Rusty Harold & W. Scott Means
pages 480
publisher O'Reilly & Associates
rating 8.5
reviewer chromatic
ISBN 0-596-00058-8
summary A solid and useful reference for XML developers.

The Scoop

While one of the original goals of XML was to create a specification simple enough that a computer science student could produce a working parser in a week, a few new developments have complicated things slightly. The sea of W3C-recommended acronyms includes namespaces, XPath, XSL, XPointers, schemas, and dozens of specific XML applications. Adopting the simple rules of well-formed data helps, but the quickly-growing stable of related technologies is enough to make the sturdiest information architect weep. The specifications aren't as easy to read as, say, the latest Terry Pratchett novel, either.

XML in a Nutshell covers just the most important concepts. Cleanly written, it walks through the XML aspects likely to be used in most projects. As it assumes existing familiarity with the subjects, it does not spend much time in tutorial mode. Instead, these are the guts of the subjects, arranged nicely in dissection jars.

The first section covers XML basics. This includes the ubiquitous grove of angle brackets, the semantic intent and implication, a good chapter on DTDs, as well as internationalization concerns. The short discussion of namespaces is the clearest explanation this author has yet encountered.

Part two delves further into the reasons for using XML, exploring documents that use the structure to explain semantic relationships. DocBook and XHTML appear, as extended examples. Further, it explores the assistive technologies of XSL, XPath, XLinks, and XPointers. Again, the discussions of XSL and XPath compare very favorably to longer works, intended as tutorials. A brief examination of CSS and XSL Formatting Objects rounds out the section.

Part three explores the use of XML as a data transport. In this section, programming languages come into play. There's a strong hint of Java in the air, though most of the discussion follows a language-neutral path. Both the DOM and SAX parsing models have a dedicated chapter. They're short, but the essential pieces are described simply and effectively.

The final section makes or breaks the book. Luckily, XML in a Nutshell won't have much chance to gather dust. The two-hundred page reference section includes the most useful information. There's an annotated copy of the XML 1.0 Reference, arranged logically. The XSL reference, in particular, is quite good. DOM and SAX programmers will also enjoy their respective chapters. Finally, it's nice to have a large set of printed character tables handy.

What's to Consider

The parsing examples don't go much beyond DOM or SAX, and there's more than a strong Java flavor. (Of course, the models are very similar in most modern languages.) As well, some of the class interfaces in the SAX reference are hard to read. This is probably due to the complexity of the information instead of any editorial decision. There's also little discussion of actual XML applications. Instead, the book covers the principles behind perhaps 90% of XML usage. Again, this is not a complaint, just a clarification of the intended audience.

The Summary

The value of XML in a Nutshell should be readily apparent to XML developers. The material is well-organized and concise. It's a quintessential Nutshell book, upholding a tradition of utility and quality. Readers who've already been exposed to the presented material will likely keep this book close at hand.

Table of Contents

  1. XML Concepts
    1. Introducing XML
    2. XML Fundamentals
    3. Document Type Definitions
    4. Namespaces
    5. Internationalization
  2. Narrative-Centric Documents
    1. XML as a Document Format
    2. XML on the Web
    3. XSL Transformations
    4. XPath
    5. XLinks
    6. XPointers
    7. Cascading Stylesheets (CSS)
    8. XSL Formatting Objects (XSL-FO)
  3. Data-Centric Documents
    1. XML as a Data Format
    2. Programming Models
    3. Document Object Model (DOM)
    4. SAX
  4. Reference
    1. XML 1.0 Reference
    2. XPath Reference
    3. XSLT Reference
    4. DOM Reference
    5. SAX Reference
    6. Character Sets

You can purchase this book at Fatbrain.

This discussion has been archived. No new comments can be posted.

XML in a Nutshell

Comments Filter:
  • by ( 114827 ) <> on Thursday September 13, 2001 @12:07PM (#2292511) Homepage
    I'm sorry. Really.
    • <EXPRESSION TYPE="amused"/>


      <REPLY TYPE="response to troll">How can you say that Windows NT is better at running Broadcast 2000 than Linux? It doesn't even run under Windows! RTFM!</REPLY>

      <EXPRESSION TYPE="angry">Damn lameness filter!</EXPRESSION>
  • by rkischuk ( 463111 ) on Thursday September 13, 2001 @12:09PM (#2292528)
    Take information you want to store and sandwich it between <{name}> and </{name}> where {name} describes the information in between. Mimic the structure of the data, and sprinkle in <{name} otherData="{neatStuff}"> every once in a while. Congratulations, that's XML.
  • Great Book... (Score:2, Insightful)

    the only problem is, I learned a lot of the concepts, however I usually learn a lot faster with code examples. Anyways the SAX and DOM areas have a little bit of code, but do not go into huge parsing examples. (Maybe I read it wrong...) Good book O'Reilly usually doesn't put out bad ones. Hopefully there will be Java / XML Cookbook. (I know there already is a Java Cookbook) I love those...
  • Useless (Score:1, Funny)

    by Anonymous Coward
    I want to learn plain XML, not XML in a bash shell, XML in a csh shell, or any other shell for that matter. What were they thinking?
  • by Anonymous Coward
    Unfortunately the book doesn't cover the successor to DTDs: XML Schema.

    Some people are under the misapprehension that XML's role is as the successor to HTML; that's a very limited viewpoint. Far more important and interesting is the role of XML as a language and host independent way of specifying data, particularly with respect to relational databases, and to type in conventional languages.
  • by L-Wave ( 515413 )
    Does anyone have any *good* links to online XML references? whenever I look all i find are things like "What is XML? ..Its not HTML"
  • by pubjames ( 468013 ) on Thursday September 13, 2001 @12:16PM (#2292582)
    XML is not likely to succeed

    We had dumb comments like this last time XML was discussed here.

    Let's me make this clear now, before we get too many more comments like this. HTML is a formatting language for displaying information in web browsers. XML is a data storage toolkit, a configurable vehicle for any kind of information. It is completely different to HTML - the majority of uses for XML have nothing to do with displaying information in a browser.

    XML is an extremely important standard and I urge everyone to learn it.

    And please, don't make comments on Slashdot about technologies you don't know much about.
    • XML is an extremely important standard and I urge everyone to learn it

      I couldn't give a damn if people learn it. Those who haven't already learned it are just grooming themselves for being the next generation of COBOL programmers. We'll always need someone to be the industry's under-achievers.

      HTML is a formatting language for displaying information in web browsers

      Formatting language for display ? 8-) That's fighting talk ! Get over to news:c.i.w.a.h and try saying that rendering behaviour is implicit in HTML semantics.

    • It is completely different to HTML

      You can't be different to something, but you can be indifferent to it, and that just goes to show how different from different indifferent really is.

  • XML !=HTML (Score:5, Informative)

    by digital_freedom ( 453387 ) on Thursday September 13, 2001 @12:17PM (#2292596)
    XML is not going to replace HTML and that's great because XML is better suited to data than display.

    I have used XML on several projects not to send to Browsers to display, but to transfer data between disparate systems. Finally there is a way that two computers can exchange data & meta data without worrying about memory use, big/little endian, EDI formats, and character positions. XML is great in that almost everyone agrees to use it to transfer information. HTML is great for formatting display to a degree (PostScript people please don't flame me! ;) ). I have worked with EDI formats before and it is a pain in the butt to set up message positions for all of your data and to work with nested lists of information. XML makes that so much easier and lets you use DTDs to enforce stuff. I also like the fact that XML was made to be read by a human being. We can actually look at the data file and tell what a field is by looking at the tag. This is why XML is going to be ubiquitous.

    Don't expect it to be a browser language, it's just data. With nicely structured data you can use that to generate HTML, WML, anything...

    The future of data transfer looks bright.
    • Re:XML !=HTML (Score:2, Informative)

      Here's some links to some good info & tutorials on XML

      W3C School -- excellent []
      Anti-christ XML school -- MSDN site []
      Sun's Java/XML school []
      Crash Course in XML []

      Hope these help!
    • XML is not going to replace HTML and that's great because XML is better suited to data than display.

      Well, I think XML is a generalization of HTML because of the repetition of HTML extension. The W3C committee designed it so that the future extension wouldn't be as painful. However, this XML thingie creates an unprecendented hit so that everything can be encoded in that form, albeit not efficiently sometimes. Because of this, XML is then used to represent database, and so on.

      Just my 2c.

      • Well, I think XML is a generalization of HTML because of the repetition of HTML extension.
        That's not precisely correct. XML is an extension of SGML, which means that XML is more like HTML's younger brother, or a cousin, than its descendant. It's probably accurate to say that XML is the more anal of the two, retaining more of the "no, the really is a right and a wrong way to do it" sense of SGML, but managing to avoid the unbelievable complexity

        XHTML [], on the other hand, is what happens when you marry HTML's docment types to XML's rulebase. This is an exceedingly rare example of how inbreeding isn't necessarily a bad thing.

        • which means that XML is more like HTML's younger brother, or a cousin

          I hope they are cousins and not brothers, because they have been married fairly well in Whitebeam []. As it happens, the marriage is a polygamous one, as it includes Apache and JavaScript (SpiderMonkey) too...

          <ob:disclaimer>Yes I'm involved on the Whitebeam project</ob:disclaimer>
    • But,
      if you learn XSLT, it's quite easy to generate an style sheet that will output XHTML from your XML document. Also, if you ensure that your HTML is well formed, you can handle it as if it were XML.
    • if you have trouble dealing with nested lists in EDI, you're using the wrong tools to parse the data.

      Honestly, is the solution to exchanging business data making up new standards to convey the very same information for which there exist internationally recognized standards? (I know XML can be plenty useful in situations in which standards don't yet exist).
    • XML WILL replace HTML.

      More specifically, a dialiect of XML known as XHTML will replace it.

      Go talk to the w3c if you think it shouldn't

      • XML WILL replace HTML.
        More specifically, a dialiect of XML known as XHTML will replace it.
        Go talk to the w3c if you think it shouldn't

        XML is not replacing HTML... XHTML is really just taking HTML and making it XML compliant. The fundamental difference between XML and HTML, or if you prefer, XML AND XHTML is that XHTML represents a set of already defined tags. XML only defines how those tags should be represented, *not* what those tags are. For example, XHTML sees the <a> tag and says "I know what that is" but sees the <asdf> tag and says "I don't know what that is." XML looks at both of those tags and says "I don't care what you call yourself, just as long as you have a closing tag!" (oversimplified, but you get the picture)

        XHTML is an application of XML. XHTML should be compared to VoiceXML, MathXML or the various other *XML-based* markup languages. These are things that follow the XML syntax but define a set of tags appropriate for the specific application. In XHTML's case, that application is display in the browser.

        I take issue with the comment about XML replacing HTML because if you write XHTML-compliant sites (which I'm sure a lot of us do) you know that you're really just fixing the HTML syntax so that it conforms to XML. XML is not replacing anything here... it cannot. It's the semantics that do the work... it's knowing that <a> means link and <b> means bold - and that's all [X]HTML.

        • Apparently I wasn't clear in stating that XHTML, which is a dialect (or application as you say) of XML IS replacing HTML. Which I think you are trying to convince me of anyways.
    • Right on!

      There are still people hung up on the perceived XML == ++HTML thing.

      I realised the importance as you have, and use it *constantly*. I use it for all stored files, data interchange, and I even stick XML-RPC into everything now.

      While it still is a format, I realised it was better to think of it as a protocol. It, for some reason, made more sense to me.
    • And HTML != CSS.

      HTML is also a data markup language. If you want to make it look pretty, then CSS is the thing to use.

      A comparison along the lines of "XML is to HTML, what XSL is to CSS".

      Sign on a wall: All content to the left. All layout to the right."

  • You can purchase this book at Fatbrain. []

    The link: sp ?theisbn=0596000588&from=MJF138

    Not a bad idea - using a slashdot posting to drive sales through a referral link. I'll be back later - I'm off to find some books to review...
    • Great. I look forward to reading them. Sorry I don't have any other convenient way to pay you for writing it.
    • As someone else called it, this is a racket: most every Slashdot review does this. Why do you think so few of the books reviewed here get generally positive reviews ? If I really feel like buying a book recommended by Slashdot, or even finding out a bit more, I type in the URL to the main site myself.
      • As someone else called it, this is a racket: most every Slashdot review does this.

        I'm not saying it's a racket - they have every right to get some cash for the massive amounts of time/cash/bandwidth they put into letting us use this site for free. I just found it amusing,

      • Why would you do this? Does it cost you any more to purchase the book when you give someone credit for it? Then why bother?

        Why not let someone make a bit of money off of it. You're just being petty.
    • Just to be clear, this is a hobby for me. I receive no financial remuneration for book reviews. That's right -- no money from OSDN, no money from referral links. I've never even joined any sort of affiliate program. Hemos (and others) have sent me free review copies, though I've also purchased books on my own to review.

  • by DGolden ( 17848 ) on Thursday September 13, 2001 @12:25PM (#2292627) Homepage Journal
    Take LISP, make the syntax twice as annoying, and hey presto, XML!

    XML is just an annoyingly verbose way of representing s-expressions, data structures that lisp was designed around.

    So much so, in fact, that it's possible to do a 1:1 mapping of XML into Scheme - see this site [] for the most sensible way of processing XML - translate it into the equivalent scheme representation.

    This allows you to use all the LISPy tricks in the book to munge your XML data.
    • How the XML is constructed is just like the usual context-free language. Any context-free grammar language (C/C++, Java, Pascal, etc) can easily be parsed by any functional language, such as Scheme, LISP, ML, or OCAML. Because context-free language is based on recursive grammar, it is pretty direct to translate it into the functional language. Manipulating and constructing the AST are also very easy.

      Mapping 1:1 from XML to functional language representation is highly exaggerated. In ML, for example, one would have to build the table data structure -- eventhough this thing can be easily made. There are still some idiosyncrasies that you have to handle too, albeit is not as intricate as the one in imperative languages like Java or C/C++.

      Mapping to AST itself does NOT yield the full usable extent of XML. XML itself is used to describe tuples of data. How you can flatten the AST tree out to records/structs/classes that is directly usable to the subsequent program? It's not that easy either in functional language. Moreover, the post product of records is highly suitable to imperative language rather than the functional language's.

      • What? Who said mapping from XML to a functional language in general? Who said you need to flatten things out? (although there are techniqes to do so if you feel the need to.) And how is the post product of records highly suitable to imperative languages?

        Have you ever used LISP or Scheme? The basic data structure is the list. Hint: What does the "LIS" portion of "LISP" mean? Hint2: A list is a tuple, and you admitted that XML itself is used to describe tuples. As far as the tree nature of XML, a LISP list element may be itself a list, sort of like an XML element may itself be a tree. The "post product of records" is the native data of LISP. I'd call that highly suitable to the functional language of LISP.

        Of course this is just a quick post that I didn't put a whole lot of thought into. I haven't actually used LISP since school. I think in imperative terms now, not functional terms, but I think DGolden was right. I'd been meaning to get back up to speed with Scheme, and XML may just be the excuse that I need.

  • by wangi ( 16741 )
    On a related note - O'Reilly's 'Java & XML' book by Brett McLaughlin was eventually released this week after sliding from it original July release date.
  • A very helpful book (Score:4, Informative)

    by sben ( 71467 ) on Thursday September 13, 2001 @12:30PM (#2292658)
    Highly useful, and highly recommended.

    When I was between jobs earlier this year, I decided to learn XML, and bought this book after perusing several others in the bookstore. I'd had a vague introduction to it at my previous job, and understood the basic ideas behind it. The book gave me a thorough understanding, and I was able to talk about it intelligently (and correctly) at subsequent job interviews. I now work with it on a nearly-daily basis, and the book is a big source of my knowledge.

  • by Genom ( 3868 ) on Thursday September 13, 2001 @12:32PM (#2292673)
    As someone already said, XML is the ultimate replacement for the comma-delimited file. For the purposes of storing human readable/modifyable data, it's great, and does fill many of the roles a comma seperated file used to fill. XML itself is pretty darned easy to pick up.

    That's not the problem.

    The problem is with the description technologies - most of which just add a layer of abstraction to the XML data, and try to pass a secondary version of the data back to an HTML template.

    That's all well and good - but quite frankly, the current incarnation of XSL stinks. It's tough to comprehend, easy to butcher, and half the time doesn't make sense.

    Much easier (and more useful, I would think) are the parsers which transform an XML document into a data structure you can use in an existing language like Perl or PHP (for the web), or C, or whatever you want. Once you're in a native data format, you're set, and can manipulate the data just as you normally would.

    That's the way to leverage the strength of XML. Ditch XSL for now, until it can be made clearer - and use some existing backend technology to format the data once it's in a data structure.

    My 2 cents, anyway =)
    • I spend most of every day working with XSL and XML, and continually have to listen to people complain how hard XSL is. It's not. Though it's a different meathod of writing code than some people are used to, most people I work with, have no problem with it, once they break out of the C-type syntax of coding. Once you comprehend the template concept of development, XSLT is actually rather easy.
      Don't get me wrong, there are limitations to the language, and hopefully, we'll see those limitations removed in 2.0.
      But, if you can make the conceptual jump in coding styles, it can be very effictive.
    • There is a perl module that will read in an XML file into a big hash for you, so you can treat it like a normal perl data structure. It is called XML::Simple [].

      The problem I have with it is that it doesn't respect a DTD. This places too much dependence on a specific XML file. If I have a node that is allowed to have more than one child, XML::Simple will return different results depending on how many children are in the node. If it is just one, then the data in that node is placed as a scalar. If there are more than one, then the data is put into an array(an arrayref, actually).

      Personally, I think it should always be an array if there is a possiblity for more than one element. If there is just one thing in there, then it should have just one element. But you can't tell if there would ever be more than one element inside a node just by looking at the XML file, because that is just one instance. You have to look at the DTD.

      XML::Simple would be extremely useful if it returned the same data structure for the same DTD, every single time. Each XML instance would have different data filled out, of course, but the structure of the data would be the same. Maybe this isn't quite possible in perl.

      I think a lot of the XML development I have seen really ignores the usefulness of DTD's. If you want to make nicely structured data, XML is great. But if you really want to provide something robust and extensible, you have to provide a DTD and test that you will be able to handle anything that DTD provides. Otherwise you are just kidding yourself.


    • the current incarnation of XSL stinks.

      It doesn't stink, it just smells different. It's a functional language, not a procedural language, and those of us who didn't grow up at MIT still find that a bit weird. There's certainly a culture shock, but once you start to get it, then it's no harder than anything else.

      There's nothing wrong with variables that you can't change the value of ! You just need to lose that inate fear of recursion most of us procedural people still carry around.

      XPath does look a bit like Martian, granted, but it's no worse than regexes.

      A really good text on XSLT needs to go beyond the reprinted standard level, and Michael Kay's [] is pretty good for that. Lots of useful cookbook stuff, and the 2nd edition is also well up to date.

      Now xmlns:xsl="" used to pong a bit... Nested templates ? Blaurgh !

    • You're right. XSL sucks hard.

      Use your fave language and load the XML into your own data structure.

      I did a project using DOM and, while that's all well and good for C and Java (I recommend it highly for those languages), I was using Python and was spoiled by the way Python works with regards to large, complicated data structures, which is, quite well.

      Later, I found this article [] about a module called xml_objectify, which transforms XML into a data structure that Python people (and probably LISP and even Perl people as well) would feel more comfortable with. Remember that we could care less about index numbers half the time. :-)*

      Whether you use Python or not, I highly recommend the article for it's discussion on the topic of converting XML into complex data structures in your fave language.
      • You rule! I was just about to develop configuration saves and loads, w/ DOM, for a medium size python project I am working on. This will help immensely.

        • It's good to know I helped someone. I found IBM DeveloperWorks to be a very good resource for both XML and Python info, among many other languages.

          I probably should have mentioned a couple of things:

          Using Expat in xml_objectify speeds up processing and decreases memory usage by a few orders of magnitude. I learned this while loading a 1.8MB file with no PCDATA larger than about 1k (TV listings)...

          In xml_objectify, UTF-8 is the default encoding. I had to change it, in, to ISO-8859-1 for an app I was writing, that used data containing many French characters (I'm in Canada).

          Hope that helps.
  • I often buy these books with a few questions in mind that I need answered. I always find that such questions are hard to find the answers to even when the book contains the answers. This was the worst case of it.

    When do I use an attribute and when do I nest a sub-element? Any "leaf" could be either. The pathetic answer was "duh, nobody's made up their mind about this." Oh well, so much for the genious of OReilly and the w3c. (hint, how about coming up with a good reason to use one or the other.)

    The worst thing you can do is have a programmer write a programming manual. The second worst thing you can do is organize these books like school text books.
    • "When do I use an attribute and when do I nest a sub-element? "

      Typically its a matter of taste. Howver there are several cases where you must use one of the other.


      If you wish to maintain ordering you must use shild elements.

      If you wish to use duplicate names you must use sub elements.

      If you wish to enforce certain value types you must use attributes. (from your DTD...get a book for more info)

      there are more...sign up for my class, I'll give you all the gory details :)

  • Java & XML (Score:4, Interesting)

    by Anonymous Coward on Thursday September 13, 2001 @12:45PM (#2292770)
    While eveyone seems to agree that XML is important but a book simply about XML may not be as useful as a book with an explanation of XML and some examples of real life usage.

    Possibly a better book (also on O'Reilly title) is O'Reilly's Java & XML (ISBN: 0-596-00016-2 or EAN: 9780596000165). I have read this book and found it to be execellent. Although it is java-centric, it discusses concepts that could be easily applied to other languages. The book has good coverage of XML as well as usage of SAX, DOM, and JDOM, and using XML with databases, as configuration files, and in wireless devices. It also covers XSL/T and focuses on Apache XML projects.

    A GOOD READ for anyone iterested in using XML.
  • XML/XSL Confusion (Score:2, Insightful)

    by Kallahar ( 227430 )
    I've had trouble with the implementation side of XML. While the concept behind XML is extremely simple, getting it to display is quite another. XSL chose some extremely hard to understand syntax for a data structure designed to be human-readable.

  • Schemas? (Score:2, Insightful)

    by Malc ( 1751 )
    The book review mentioned a chapter on DTDs, but what about schemas? Aren't schemas the way we're supposed to go? Without coverage of Schemas, I will stick with ageing but excellent "Professional XML" book from the Wrox Press.
    • I own the O'Reilly book, and considering when it came out, it should've had way more on Schemas than it did. (That is, Schemas weren't a W3C recommendation yet, but enough was known to be able to give more coverage in this book.) Instead, the coverage is tilted way toward DTDs.

      In fact, even though I think the O'Reilly book did an excellent job covering the most important XML-related standards in this book, the future importance of Schemas keeps me from recommending this book. If they covered the subject as well as they covered everything else, I'd easily say that this was one book that should be on every XML-monkeys' shelves. I can't say that now, so hopefully they have a second edition in the works where they fix this gaping hole. As for now, I'd probably stick for recommending Holzner's Inside XML, but note that I'm not familiar with the Wrox book that I've seen other people recommend.

    • Disclaimer: I was one of the two tech editors for this book.

      We decided not to include schemas coverage because the Nutshell books cover not just a description of the technologies, but also best practices. Schemas best practices are only just becoming clear, as can be seen on the xml-dev mailing list []. Along with that, Schemas were not yet ratified when the book went to tech review, so we could have only covered an old draft.

      Rest assured though, W3C Schemas (and if I can persuade Elliotte, RELAX too) will be covered in the second edition, which I believe is being worked on already.
  • Apple seems to have utilized XML in a rather remarkable fashion in OS X... makes all those annoying .plist's quite easy to understand.
  • Our application uses J2EE. Configuring the app so that web devlopers without extensive Java experience can make things happen was made MUCH easier by configuring a single flexible display Servlet (rather than many disparate function-specific servlets), which calls methods and returns JSPs, basically connecting the front end to worker beans and EJBs. The displayServlet in turn is configured using a single xml file which defines every service, every entry point into the app, and is remarkably simple to understand, use and maintain. And it is so much more powerful than using a straight text config file.

    The point is, there are many incredibly useful places for XML to contribute to web dev and app dev without XML ever being sent to a browser.

    Yes, XSLT is a hassle, and no, web developers are not likely to move quickly to a technology that requires strict adherence to syntax rules and well-formed code, and no, browsers are not likely to have decent support for XML display anytime soon... BUT XML and Java are excellent together, and this doesn't even touch on data feeds which are exponentially more reliable and configurable and maintainable using XML than any other format...

    Ok I'm meandering now. Just a big fan, having used XML extensively over the last year or so.

    • Isn't this part of the idea behind the Apache Struts [] project? It seems like a great idea, although I haven't yet developed anything using the framework. Still the servlet/service mapping process looked pretty easy to do since it was all in XML.

      I'm also a big fan of XML. I have been pulling XML from a variety of news sites (including Slashdot) for display on my own site. I just built a Java class which gets run every 30 minutes and pulls the latest headlines and then rendering is all done with XSL. The XSL was a pain to generate since there are all sorts of different implementations of the RSS/RDF frameworks, but once you figure it out it isn't too bad. In the end I had about 4 different XSL files for the 14 feeds I pull. The benefit is that once you have the basic XSL framework it is relatively easy to tweak it to appear different. If you build in CSS support you can even change the look and feel by modifying the CSS and not the XSL. Plus, if you use JRun it has a built in XSLT taglib for doing the conversion (although it is only about 3 lines using Xerces/Xalan). Too bad my site is down right now since my site (DSL) re-initialized my IP and I don't know what it is.

      I also found the XML Pocket Reference book pretty handy to have around (although a bit slim on explanations).

      Also, anyone know what is up with the slashdot.xml file? It doesn't seem to be updating as often lately. I thought this was automated in the code. Am I wrong?

  • The purpose of XML is not as an HTML replacement. Those who use XML to generate HTML are doing one moderately interesting thing with some powerful technologies. But the real power of XML is that everyone is speaking the same language.

    When you see technologies like SOAP and ebXML, you really start the understand the value of this common language. Don't judge XML as an HTML replacement.
  • If it should be possible for a CS student to write an XML parser, are there any good texts that people know of that go through this very process? I think it would be an interesting side project, but since I am not a computer scientist (or a student), its not easy to get a start.
    • That all depends on exactly why you are doing this. If you are doing this just to get practice on building a basic parser, then you probably want to look at some basic compiler books, or the documentation on the common lexical and parser generators (i.e. Flex [] and Bison []). While that may be useful, remember that correct XML requires a little more work than just parsing (opening and closing tag names must match exactly, etc.). You probably want to read the w3c recommendation, or some annotated version if it.

      Alternatively, if you just want to be able to read in XML, there are several free or GPL libraries out there already. The one I'm most familiar with is Xerces, the xml parser for the apache project. You can find it here [].

      If you are not a CS student, you probably want to make sure you're familiar with some of the basics (a set of languages, basic data structures, etc.) before taking on this sort of project. I'd recommend C++ and its Standard Template Library, but there are many other viable alternatives out there (e.g. C, Python, Java, etc.). There are lots of books which cover this, though none come to mind offhand. If any other reader would like to help, I'd be much obliged.

      I hope some of this info helps, and I wish you luck.
    • you can find simple parsers on most source sharing sites (,, etc).

  • XML is hard to learn, and easy to remember. Nutshell guides are best for complex lists of obscure settings in little-used config files. I have a bunch of similar Nutshell guides, and they see much hard and useful service.

    This book isn't a good tutorial (it isn't meant to be) and I see no need for a "handy quick reference" guide to the parts of XML that are covered here. It's not a bad book, but I see no real useful purpose to it.

    Sometimes I need to read the XML Spec. This is only ever for really obscure and bizarre minutiae, and in those cases I have to go back to the W3C original. Fortunately that's on-line and already on my desk in a well-thumbed paper copy. I've never felt the slightest need for an XML Nutshell.

    Omitting Schema is a real drawback. The Schema spec is one of the very few XML-related specs that's at all large and can't easily be memorised.

  • by thomis ( 136073 ) <> on Thursday September 13, 2001 @01:57PM (#2293279) Homepage Journal
    I've accumulated a wide variety of links to resources that have 1 or 2 useful items... but I need the equivalent of an O'Reilly 'Definitive Guide' for XSL. Something that's heavy on Xpath, code examples and other red meat.
    I agree with the first post flamebait to an extent; XML is all well and good, nice way for my database guy to get me the goods for Web presentation, but I need to DO something with that data.
    The answer is XSL, but i've had to blunder around for what works. There isn't even a decent FAQ anywhere, that I know of. Suggestions anyone? Following is a list of links i've found useful; please don't send me to any of those...


    http :// --very good!


    • O'Reilly have just published an XSLT [] book. I've not read it yet, but will hopefully pick it up soon. It does include a chapter and an appendix on XPath.

    • I've had the Wrox book (Kay's "XSLT : Programmer's Reference") for a while and am reading the O'Reilly book now (literally, as in it's sitting here on my desk).

      As chromatic said, the O'Reilly book includes a chapter on XPath and an XPath reference as an appendix, which is great. Additionally, XPath functions are covered (along with XSLT functions) in an "XSLT and XPath Functions Reference" appendix. While "XSLT : Programmer's Reference" is well-written and very useful (my copy is dog-eared), the absence of any separate discussion of XPath in Kay's book is, IMO, a significant flaw.

      Kay tells readers near the outset that his book is written as though XSLT and XPath were one language. Since XPath acts as a sort of "sub-language" in an XSLT stylesheet, I can understand why he chose to cover the material in this way, but ... I still would have preferred to see a separate introductory discussion of XPath somewhere near the beginning of the book. XPath isn't rocket science, but covering XPath concepts as they arise in examples muddles things quite a bit. If you've read pretty far into the book and are wondering how/why a given XPath expression was written in a certain way, you can't easily "flip back" to the section on XPath to answer your question because the information that you need is scattered throughout the book.

      Right now, I'm about halfway through Tidwell's "XSLT" (the O'Reilly book). Based on my impressions so far, I would definitely recommend it. Back to my book.
  • I more or less agree with the review, but I found an inordinate amount of typos, particularly in the XML examples (where it matters).

    If I can spot them on a more-or-less casual read, how many more did I miss? What about the others who might not catch them?

    O'Reilly needs to step up their technical reviewing, it's been lacking lately.
  • Speaking (loosely) of O'Reilly XML books, my local library messed up recently and actually got some current, useful, tech books. One I picked up there was "Learning XML", and I am finding it a very good read. And I am not a neophyte, XML-wise (no expert either, mind you).
  • How does this book compare with Elizabeth Castro's book on XML?

    There are many good books on most computer topics. A review that says "this book is good" is useless. What we need is a comparison of the book being reviewed, with other books which cover the same material.
  • I use this book a lot at work, and it generally gives out the straight facts, unadorned by interpretation or comment.

    There are odd bits of editorializing, for instance a bizarre rant about how useless unparsed external entities are, and how we should replace them all with HREFs. Harold and Means little rant suggests that they think they're a bad idea simply because they've never found a use for them. Our clients use XML for document markup, with the documents being produced in multiple languages - they think unparsed external entities are fab.

    The books a good reference. Be a bit wary of the opinion.
  • I think the best book I've read is Essential XML. Good coverage of XML, XSL, XML Schemas ,SOAP. Examples in JScript for MS and JAVA for the rest. ISBN is 0-201-70914-7

A verbal contract isn't worth the paper it's written on. -- Samuel Goldwyn