Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Authoring Schemas With XSD 111

Dare Obasanjo points to his own "article on O'Reilly's XML.com that specifies a set of guidlelines for authoring schemas using the W3C XML Schema Definition language commonly abbreviated as XSD. The primary theme is embracing simplicity by showing how to avoid the more complex and esoteric features of the language."
This discussion has been archived. No new comments can be posted.

Authoring Schemas With XSD

Comments Filter:
  • by jacquesm ( 154384 ) <j@NoSpam.ww.com> on Monday November 25, 2002 @02:17PM (#4752606) Homepage
    I am impressed with the amount of press coverage and hype that has been surrounding XML and related technologies for the longest time now, but where is this stuff REALLY used ?

    Does anybody have an example of high volume (so mainstream) websites using XML ? whereever I look all I see is good old HTML, a div or two (mostly tables still though) and Javascript stuff.

    • by Anonymous Coward
      I've found XML to be useful for data transfer and syndication. I haven't looked into building sites with it, but it's handy to have Company A export their data so Company B and Company C can use it. I work on a site that displays pricing and availability of houses using XML data exported from a third party in XML. It's also widely used in weblogs (save the flames) and news sites so people can syndicate the content.
    • XML's reach extends beyond just websites. The business savvy amongst you will of course recognise its 2 major uses in projects...

      1. "Extracting more money from the customer" - We must implement these system interfaces in XML.

      2. "Protecting the money you have already extracted from the customer" - We have problems implementing these system interfaces in XML. This tactic can only be used once the customer is dependent on you.

      Fuck - I've become a jaded old bastard :)
    • by Anonymous Coward
      Take a look at www.gentoo.org The whole site uses XML.
    • XML has a few practical applications, but only in human-readable file storage and data exchange. Basically, anything that involves raw text, XML is a good thing to use to add metadata to that text.
    • by Anonymous Coward
      You wont see XML show up in web browsers for a while, HTML still does the job on the client side. I have been writing applications in the XML/Java space for 2+ years. The uses of XML are countless, XML Schemas provide the meta information needed by applications that use XML. Most of the XML apps I've written are in the enterprise application-to-application area, but Ive found use for it almost everywhere else. Any application that needs its data to be descriptive and portable can use XML. Many high end user apps store information primarily in XML, the ones that don't probably will soon. The problem remains that many people dont understand the need for XML and descriptive data.
    • by Anonymous Coward on Monday November 25, 2002 @02:42PM (#4752732)
      The product I work on (billing software) uses XML to define the API we export for external systems. A client that wants functionality not provided by our stock system is free to develop its own programs (front end and back end) that uses our XML interface to get the data.

      A client will call an API such as getFOO which will have an input type of InputGetFoo and an output type of OutputGetFoo defined in an xsd file. Because both our system and the client use the same xsd files, there is very little problem with synchronization. Using xml allows our clients to have a heterogenuous environment; anything that can deliever xml over tcp/ip can use our interface.

      • And that's better than good ole' CORBA IDL, how?
        • And that's better than good ole' CORBA IDL, how?

          Perhaps because it is more widely accessable to all languages?

          Not all languages can use CORBA, but all that I've ever worked with can read and write simple text.

          • CORBA maps to most conceivable languages out there (name one that doesn't have an IDL mapping). XML is also less efficient as a way of implementing B-2-B communication (extra parsing time required for each invocation and general verbosity of the text sent). IDL is fast, efficient, standardised and now with all this new schema nonsense coming, it also appears simpler than SOAP.
            • Ok, tell me, how do I call something from Visual Basic (or even better, MS Access) that has a CORBA interface? I'm sure there are ways of doing it, but the simplicity of communicating through simple text is the benefit of using XML. Yes, there are trade-off's in performance, but in most of the places I've worked, the time for development and simplicity of support is more important than the extra few milliseconds you might gain doing things the most efficient way possible.

            • by Anonymous Coward
              Which was very important in the days when business machines were linked by 24k modems. With that limitation mostly removed from b2b and a2a, XML looks like a better solution in most cases. A little parsing overhead is not going to impact most apps, XML parsers can be very efficient.
        • by Anonymous Coward
          Among other reasons, using xml for an API is much easier to maintain and debug than CORBA. The data being passed around on the network is always human readable. I can see what the client is sending and the server is receiving far easier than with CORBA.

          Not to mention testing is much easier. I can capture an input call to the API created by whatever client program is being used and hack it with my favorite text editor to test.

          And as another person mentioned, xml can be used by a far larger range of programming languages and tools than CORBA bindings.

          I'd also contend that xsd files are far easier to understand than IDL.

          • You have a point with debugging but I'd definitely argue about the ease of XSD vs IDL. I find IDL far more natural and obvious than XSD. Personal perference perhaps, but I know more than one person who also shares that view. Also IIOP is much more efficient in terms of bandwidth and processing power required to marshall/unmarshall invocations. I know in this day and age that doesn't count for squat but it explains why we constanly need more and more powerful machines to accomplish essentially the same stuff we did ten years ago.
            • ... but I'd definitely argue about the ease of XSD vs IDL

              Never looked at IDL, but I definitely prefer working with an XML schema described as a DTD rather than in XSD. The DTD (i think an EBNF?) is much more easily readible for a start. You can glean all the information you need about an element with a glance. In XSD you have to check through the various levels of containment etc to work out what is what. Not that it's a major challange, but it certainly lacks the immediacy of reading a DTD. Against that, of course,

              XSD, is however, extremely useful. I regard it not as a tool for "authoring" or editing schemas, but as a standard XML representation of schemas. With a DTD -> XSD (or, I guess, an IDL -> XSD) translator, you have a way of bringing your schema into the XML world, where it can be processed by standard XML tools. Very useful indeed.

              In response the the question of where XML is used, from my perspective it looks like, everyfuckinwhere! That's because I'm working on an XML based project, mind. But seriously, it looks to me like XML is going to be the the '00s and '10s what ASCII was to the '80s and '90s, and then some.

              • Ooops!

                it certainly lacks the immediacy of reading a DTD. Against that, of course, ... you XSD can express schemas which a DTD cannot describe (eg data types).

          • It's RPC; you should have a client library whose job is to marshal requests and unmarshal responses. If using it for its intended purpose is so painful you're better off constructing requests manually, something is seriously wrong with your environment.
        • 1. Please tell me how you version a corba interface so that it simulatneously supports multiple protocols.
          2. Please tell me how you extend a corba interface without recompiling both the server and client.
          3. Please tell me how you perform corba over http, ftp, or any other lightweight network protocols.

          The only answer I know is you can't, perhaps you don't care about those capabilities, but I need them all and several more. CORBA is great where its great, but it hardly reduces the need for XML.
          • 1. Not sure what you mean. Please elaborate.

            2. You use Dynamic Invocation Interface

            3. You can do corba over http without any problems. There are a number of tools to do just that. By the way, http is not lightweight compared to iiop by any stretch of imagination.

            SOAP is the same soup reheated and served over and over again. It solves nothing that CORBA wasn't able to solve. Initially it was simpler than CORBA because of its incompleteness. Now that it is catching up to CORBA in terms of supported functionality its complexity has grown exponentially.

          • Modifying an interface leaves clients and objects disagreeing about which invocations are valid! Make a new interface, inheriting the old one if you want to remain compatible (that's what the major and minor version numbers in an "IDL:" RepositoryId are for). Objects can implement both (probably sharing most of the code). Clients can reject objects that don't implement the new interface, or fall back on the old one.

            If you don't have a real IIOP proxy or a tunnel for arbitrary TCP (like HTTP's CONNECT method), you can wrap an IIOP request in HTTP and pass it through a HIOP proxy. Callbacks may not work, but that's also a defect in the HTTP binding for SOAP.

    • by Kragg ( 300602 ) on Monday November 25, 2002 @02:47PM (#4752760) Journal
      XML is good for industry standards bodies. It's open, there are open implementations, and you can irrefutably lay down the syntactic and semantic law in a schema without any ambiguity.

      FpML [slashdot.org], ArApXML [slashdot.org], MDML [idealliance.org] are good examples of industry-specific XML standards. Going into the wider space, you get ebXML [ebxml.org], SOAP [w3.org] and more.

      XML is the new-world replacement for EDI (Electronic Data Interchange) and it's biggest uses are B2B and company-internal, with a small B2C following starting up for things like weather data, news feeds etc. It's not surprising you've not come across it... and until you go and work for a megalithic corporation on the IT side, you probably won't.
    • I don't know how much traffic it gets (probably quite a bit), but the Gentoo site is all done in xml.

      http://www.gentoo.org

    • by kelzer ( 83087 ) on Monday November 25, 2002 @02:53PM (#4752800) Homepage

      I'm not sure how much traffic they get these days, but the InfoWorld [infoworld.com] website is XML based. I believe it uses server-side XSLT transforms to turn XML into HTML.

      Also, don't assume that just because the URLs don't have ".xml" in them that the site isn't using XML - it's often transparent, such as when using Apache Cocoon [apache.org]

    • Dark Age of Camelot [camelotherald.com] uses XML to display the character stats.
      Here [camelotherald.com] is the display page and here [camelotherald.com] is the XML source.
      This page [camelotherald.com] explains how they do it. Very nifty for making guild web pages using their data.
      • by Coz ( 178857 )
        The current Everquest UI also uses XML for customizing the user interface - there are megs of documentation and several nice-looking ones available for download at the various EQ fan sites.
    • Right here on slashdot (IIRCC)! XML is very often used on websites to truly seperate content from style. However, when you go to look at the HTML source, you won't see it. Why? Because the XML is being transformed into HTML by an XSLT engine on the server side, and the resulting HTML is sent to you.

      Moreover, XML itself is intended only to express content. It is standards built on top of XML (such as XHTML), that define how that content should appear. Many websites use XHTML, but the casual observer would not notice -- the source code of the page is nearly identical.

      In all of the three programming internships I've held (I'm still in college), XML was the underlying technology.

    • SRW is a Search and Retreive Web Service that makes full use of XML and XPath. This is backed by the Library of Congress, and version 1.0 is released today:

      http://www.loc.gov/srw/ [loc.gov]

      This is from the Z39.50 Implementor's Group, an attempt to bring the experience of the last 20+ years with the bibliographic protocol that supports 99% of library searching into the mainstream.

      Check it out!

      --Azaroth
    • It's great for complex config files. The project I'm working on uses two of them, one for configuring the charting part and the other for configuring the user-controlled configuration (i.e., what options does the user get? What are the defaults? What are the fixed values, which may vary between versions, that the user doesn't get to change?). It's nice to have access to automatic syntax-checking, the ability to refer to other parts of the file, nested sections, etc. It's also relatively easy for people to modify it.

      XML and HTML aren't really competing standards; the future versions of HTML will be XHTML (HTML based on XML instead of SGML), which mostly makes it more convenient to embed SVG, MathML, etc. in your web page, and is nearly identical, except that all of the close tags are (supposedly) required. You're unlikely to see the sorts of XML streams that have been hyped on the web, because they require interpretation by the client, which your browser doesn't know how to do; unless you've got something like a stock-ticker application, you actually want the HTML version, which is presented for display.
    • you'd be amazed how many sites & companies use it - but don't expect to see it if you look in the browser source - it's all behind the scenes.
    • Check out tldp.org. All the documents are created usi ng XML/SGML. The idea offcourse is to ultimately create structured content, that can be queried and linked in future.
    • Transformations. That is the part of XML that I like the most. At work I develop a product that has a lot of raw data. Its representations can be vast and numerous. As an example, we can represent the data in an XML format, then using XSLT transform it into HTML, ascii text, CSV, PDF, and even formats that will import into quicken and such.

      Unfortuneatly XML has become a buzzword. So when someone claims they use XML (as people yack about MS with their new file format) it doesn't actually mean anything. XML is not the proverbial hammer to wack nails, screws, and hooks with, but it does have its uses. For us, it is best used with its transformation ability with XSLT. When we first started work with it we found that it did not help with many of the things we do however.

      Personally I have come up with my own little XML definition to help seperate content and display. I am putting together the classic "see my cats" page at home, and also having other applications like webmail and budget database on my website for myself. One thing that is really nice is to be able to change just the XSL stylesheet for processing the XML to change then entire feel for the site. Like if I change the forms handler, I just change the transformation for the forms and all forms will look similar, instead of having some with a particular feel and others with a different feel.

      Ofcourse, ymmv, but XML is not cursed simplely because it is a buzzword.

    • Mac OS X. A lot of metadata in OS X is kept in XML. Aren't .plist files all XML?
    • Here's an example I worked on that sort of helped me see the light for one use of XML.

      I did contract work for the state for a while and they wanted to do some credit card processing but didn't want to deal with the liability so we contacted a large company to handle that. They provided an XML interface where we sent them descirptions of the items we wanted charged, along with some other goodies including a return address to our site so we could handle their response XML.

      So we accepted some HTML form input, built a server side XML document and sent it to them, they processed it, and then sent us an XML result.

      While there are a lot of ways this could be accomplished, this was pretty easy and platform independent since XML is really just string data.
      It's going to be used for backend processing mostly, and then for the UI you render it with XSL, or custom server side code in our case.
    • If you're rolling your own tools, XML is a life saver.

      I've built a shopping cart system for my wife's store, and XML forms a key part of several pieces of the internals.

      For instance, UPS, US Postal Service and FedEx all have XML-based APIs that allow you to get shipping rates, track packages, generate labels, etc. In the case of rate retrieval, the XML is simple enough that I don't need XML code libraries: the weights and zipcodes get interpolated into a simple Perl string on the way out, and what I get back is easy enough to parse with a pattern (gotta remember non-greedy patterns!).

      sub getUSPSDomCost {
      my ($orig,$dest,$lbs,$oz,$user,$pass,$test) = @_;
      my $server = $test ? "testing.shippingapis.com/ShippingAPITest.dll" : "production.shippingapis.com/ShippingAPI.dll";

      my $ua = new LWP::UserAgent;
      $ua->agent("MyShipFinder/0.1 " . $ua->agent);
      # Create a request
      my $req = new HTTP::Request POST => "http://$server";
      $req->content_type('application/x-www-form-urlenco ded');

      my $stuff = qq|<RateRequest USERID="$user" PASSWORD="$pass">
      <Package ID="0">
      <Service>EXPRESS</Service>
      <ZipOrigination>$orig</ZipOrigination>
      &nb s p; <ZipDestination>$dest</ZipDestination>
      &nb s p; <Pounds>$lbs</Pounds><Ounces>$oz</Ounces>
      &nb s p; <Container>None</Container>
      <Size>Regular</Size>
      <Machinable></Machinable>
      </Package>
      <Package ID="1">
      <Service>PRIORITY</Service>
      <ZipOrigination>$orig</ZipOrigination>
      &nb s p; <ZipDestination>$dest</ZipDestination>
      &nb s p; <Pounds>$lbs</Pounds>
      <Ounces>$oz</Ounces>
      <Container>None</Container>
      <Size>Regular</Size>
      </Package>
      </RateRequest>|;

      $req->content('API=Rate&XML=' . $stuff);

      # Pass request to the user agent and get a response back
      my $res = $ua->request($req);

      # Check the outcome of the response
      if ($res->is_success) {
      return $res->content;
      } else {
      return -1;
      }

      }

      Obtaining a username and password is left as an exercise for the reader, as is parsing the result.

      Her primary distributor, Ingram Books [ingramcust...ystems.com], is implementing inventory checks over XML also -- but it's not quite working yet. Their previous tools required FTP-ing a request file to a site, and then waiting around for a response file to show up in the same folder!

      So in that respect, B2B is where XML is dominating.

      • Build vs. buy is enjoyable but rarely worthwhile. That code will create a malformed request if certain characters (quotes, ampersands) appear unencoded, and an ad hoc pattern may fail to match if the response is extended in certain ways--does it handle namespace scoping, marked sections, and mustUnderstand headers? If you're paranoid enough it's possible to do it right, but even then all you have is an unmaintainable substitute for SOAP::Lite.
        • I don't disagree with you on a general case.

          This just happens to be a terribly trivial request, which could have been cheaply implemented by the Post Office as a standard web query. The only parameters are the weight in pounds and ounces (simple integers), Zip codes (5 digit numbers), and username and password, which are established strings. No muss, no fuss, no unsightly odors.

          I only implemented this myself because it was so blinkin' easy, and because there isn't a CPAN module for it (yet).
  • XSD is not a "language" any more than XML is. XSLT I suppose can be considered one.

    I think misrepresentations like these from story submitters ultimately detract from the overall quality of the articles and I'm sure that the author wouldn't be all too thrilled about it.

    It's akin to those Costco salespeople who tell me that the "shweet" HP computer over there has 20GB of "RAM". It looks a bit dumb given the target audience.

    • > XSD is not a "language" any more than XML is. XSLT I suppose can be considered one.

      Actually both XSL and XSD are languages. Mathematical language is a set of strings made of symbols. Programming languages are mathematical languages but there're many other languages that fit this definition.
    • by Daniel Dvorkin ( 106857 ) on Monday November 25, 2002 @02:39PM (#4752717) Homepage Journal
      XSD is not a "language" any more than XML is. XSLT I suppose can be considered one.

      ...

      It's akin to those Costco salespeople who tell me that the "shweet" HP computer over there has 20GB of "RAM". It looks a bit dumb given the target audience.
      Um ... you do know what the "L" in "XML" stands for, right?

      XML is a language. So is HTML. So is SQL. Just because a language isn't Turing-complete doesn't make it not a language.

      Actually, I agree with you that XSD isn't a language -- it's a specific set of rules for using a language, XML; it would be better to call it a grammar. But saying "___ isn't a language" because ___ doesn't do everything C does is as silly as the "MySQL isn't a database (management system)" crap that floats around here every so often.

      I think the target audience is sophisticated to understand the difference between a language that's Turing-complete and one that isn't, and also to know that markup languages are still languages by any reasonable definition of the word.
      • XML is not a language, in the sense that it doesn't define any semantics; only syntax. It's up to the person using XML to create a language out of its constructs (elements and attributes and such).

        It's best to think of XML as a language construction toolkit. In the same manner, SOAP is a protocol construction toolkit, more than a protocol in its own right.
    • by Anonymous Coward
      XSD is a Language for describing XML documents. Whats not a language about that? Its more of a grammar, so maybe you have a point. Schemas dont actually DO anything (neither does XSLT on its own) they just describe to you (or your application) the semantics and structure of an XML document, so a grammar would be more appropriate. The use of 'language' confuses newbies into thinking that XSD is a programming language and has the same set of responsibilities as a pl.
    • Depends on your definition of a language, which in turns depends on your community. Common definitions:

      • Language: a set of strings (drawn from some set called an alphabet).
      • Programming language: a language with an operational semantics (a description of the runtime behaviour of those strings, could be defined by an implementation).

      XML and XML Schema are both languages according to this definition. In addition, XML Schema has a semantics that associates a subset of the set of all XML documents to every XML Schema.

    • XML is a meta-language -- that is, a language for creating markup languages (such as XHTML). XSD is a schema language, which defines the structure of a class of XML documents (the class that conforms to the structural rules of the schema). XSLT is a Turing-complete programming language -- yes, you could write an operating system in XSLT, in theory if not in practice -- so it is definitely a language, under any definition.
  • Article Motivation (Score:5, Informative)

    by Carnage4Life ( 106069 ) on Monday November 25, 2002 @02:26PM (#4752659) Homepage Journal
    When the W3C XML Schema recommendation was first released, there were certain parties whom overwhelmed by its newness, complexity and buggy implementations began to advocate using as few features as possible which culminated in the article W3C XML Schema Made Simple [xml.com] by Kohsuke Kawaguchi [xml.com]. However, a year later with parser implementations getting up to speed and more people using the technology it is clear that a number of the earlier misgivings about using some parts of the technology were misguided.

    This is very similar to the situation with Mozilla and C++. In 1998, a few months after the ISO standard was ratified a set of guidelines for using C++ were specified by the Mozilla team [mozilla.org] which included rules like don't use templates [mozilla.org], don't use exceptions [mozilla.org], and don't use namespaces [mozilla.org]. Since then the Mozilla team has looked back at their decision and realized that some of the decisions they made were unwise [mozilla.org] specifically listed as mistakes were avoiding exceptions and templates. I truly commend the Mozilla team for making their post mortem available online for other [C++ or otherwise based] software development projects to learn from.

    This article aims to do the same thing for the XML community and the W3C XML Schema recommendation.
    • by Anonymous Coward
      .. there were certain parties whom overwhelmed by its newness, complexity and buggy implementations..

      Kind of like Microsoft, who instead of working with the spec at all, decide to create their own Schema impl. called the "Biztalk Schema". Which is all but useless outside the Biztalk Mapper app. I think their motivation for this was that implementing an XSLT generator that could map two w3c schemas together would be too useful to the public, and would involve too many non-M$ proprietary technologies.
      • Actually it was called "data reduced schema" (XDR) and it was created by the SOAP/BizTalk folks because the W3C hadn't approved XSD yet when the 2.5 version of MSXML was released.

        As of version 4.0 of their parser XDR is still supported but deprecated, and they're recommending everyone use (or upgrade/convert [microsoft.com] to) XSD instead. Full support for XSD has yet to make it to the full BizTalk server product, though.

        If you're going to FUD, get your facts straight.


      • This is not in the least bit "insightful". It's pure FUD. Microsoft implemented a schema type language in XML called XDR (XML Data Reduced). They stated upfront that this was simply an intermediary until the full XSD spec was finished. As of version 4.0,their parser supports both schema types, with the recommendation that you use the newer, standard schema type (XSD).

        I do not use BizTalk, so I'm not sure of the status of that product using XML Schemas, but I'm sure support for them is not far behind.

    • by Anonymous Coward
      I'm sorry, but I am not at all convinced by the Mozilla recommendations. I have worked with aCC under HP-UX and it has none of the problems he attributes to "HP-UX".

      Also, I noted a recommendation based on Visual C++ 1.5. I'm sorry, but that one is stone age! If you are so desperate to support every C++ ever written I imagine you will have major trouble porting newer C++ features, but that really is not necessary if you use recent versions of the various compilers.

    • by Ed Avis ( 5917 )
      I haven't yet seen convincing explanations of why the new schema languages like XML Schema are needed at all. Or at least, I can't see that 'nice to have' features like specifying restrictions on textual element content outweigh the huge extra complexity compared to its predecessor.

      There is already an XML schema description language, called DTD. It is less powerful than XML Schema or Relax NG or DSD or a dozen other edifices of overengineering, but by the 80/20 rule it's likely to do what you need. The tools to validate against a DTD (nsgmls) are already included with most Linux systems. And you can learn all you need to know in about one page.

      XML Schema might be useful for some applications, and being a W3C standard does give it some extra clout (DTD is a W3C standard too, part of the XML specification), but it looks like second-system syndrome to me.
      • I can't see that 'nice to have' features like specifying restrictions on textual element content outweigh the huge extra complexity compared to its predecessor

        The ability to specify these restrictions are a huge advantage. For example...

        <ip-addres>foobar!</ip-address>

        ... is almost certainly not valid input. You can easily use schema to validate that this should appear in IPv4 dotted decimal form, for example. Take any moderate size document, and suddenly schema validation saves you hundreds of lines of validation logic inside your own code. This kind of validation is also a huge improvement for XML remote procedure calls (SOAP, etc.)

        I'll agree that the W3C schema language is complex, and I prefer something like Relax NG. However, I think both are a huge improvement over DTD for many applications, and if you don't need the extra features, you can stick with DTD.

        • But an error like an IP address 'foobar' is (IMHO) less likely to occur, easier for a human to spot, and easier to give an error message for in the application than a badly-structured document. It must depend on the application, but I feel that DTD does a good job of warning about those mistakes that are likely to occur. It doesn't catch them all, depending on the details of your file format, but then neither does XML Schema. The question is how much extra complexity you are prepared to accept to catch the additional few percent of errors. I feel I would rather use DTD and write program code to check IP addresses, it is certainly easier to write some code in your favourite programming language than to learn the monstrosity which is XML Schema. YMMV.

          My point was that many apps don't really need the extra features. In all the hype over schemas, plain old DTD doesn't really get the coverage it deserves.
      • I believe that the main impetus behind XML Schema was to get a description language that was also XML so that one parser could operate on both the document and that document's description (meta information).
        • Having the schema description itself be XML is one motivation, but it wasn't the only one - otherwise it would have been better to just code up a simple DTD-like schema language in XML markup.
  • Darn DTD's (Score:5, Interesting)

    by HillClimber ( 530465 ) on Monday November 25, 2002 @02:27PM (#4752668)
    As someone who's just downloading the XML Mind editor and about to write an .xsd for my data -- this is great timing! Thanks, Dare.

    I also want to gripe a bit about the complexity of XML Schema. DTD has all the restrictions I'd typically want to use (the main thing I want to do is just specify element names, contents, and attributes). The *only* problem with DTD's is that they are totally namespace-challenged!

    You can't use combine two DTD's for different namespaces into a combo document. You can't even allow arbitrary other elements in a DTD element declaration -- every element must be declared and local. Even worse, you have to pick and stick with a namespace prefix in your DTD -- defeating the whole point of globally unique namespaces.

    What I *really* want is just DTD with a smidge of namespace smarts and the ability to combine DTD's for one document. Anyone want to give it a shot?
    • Re:Darn DTD's (Score:2, Informative)

      by bloo9298 ( 258454 )

      Try Relax NG [oasis-open.org]. Relax [xml.gr.jp] was developed in Japan and is quite popular there. TREX [thaiopensource.com] is James Clark's attempt at a type system that is more flexible than DTDs but less complicated than XML Schema. Relax NG is the merger of those two systems. It looks appealing but I have not used it in anger yet.

    • Re:Darn DTD's (Score:3, Interesting)

      by bay43270 ( 267213 )

      What I *really* want is just DTD with a smidge of namespace smarts and the ability to combine DTD's for one document. Anyone want to give it a shot?

      I think this is what everyone wants. The problem is, aside from Microsoft, no one can just make up a standard that contradicts the W3C and expect it to be accepted.
      • I posted that a bit quickly, let me clarify: implementation of verification isn't as much an issue as turning it into an excepted standard. I don't think people should be able to just make up their own standards. I think in this case, W3C should have put a bit more thought into DTD. I heard people were still in disagreement over namespace support in DTDs when it was released (not a good way to start a standard).
        • Re:Darn DTD's (Score:3, Insightful)

          by Theatetus ( 521747 )

          I don't think people should be able to just make up their own standards.

          Isn't that the whole point of XML to begin with? That my company and the company we're partnering with can write a simple data-exchange standard without locking both development teams in a conference room for two weeks?

          So, in your case, you and whoever your dealing with could pretty easily nail down what kind of namespace smarts you want in your DTD validations and implement it without having to RFC the whole world.

          • The entire point of this thread is that DTDs have no namespace smarts. Its simply a flaw with DTDs.

            I agree that companies should be able to create their own XML formats... that's what it's all about. My comment was that There should be a limited number of ways to validate those formats. There's a simple reason for this: extendability. Once a set of XML tags has be restricted by DTD, it cannot be extended using schema (at least without repeating ALL the validations). DTDs (because of the namespace issues mentioned earlier) can only be extended by other DTDs in special situations. If everyone and their brother created a new validation format, these extendability issues would get even worse.

            IMHO, the W3C should start working on DTD version 2 right now. We need namespace support in a validation format that's simple enough for everyone to use.
    • the schema is really not that difficult to use. go check out http://www.w3schools.com - they have some great tutes over there.

      -- james
  • by T-Kir ( 597145 ) on Monday November 25, 2002 @02:30PM (#4752678) Homepage

    ...no seriously, then they might actually follow the standards that are out there (i.e. their supposed use of 'proper' XML in Longhorn [winsupersite.com]).

    Actually, maybe not. If they do have an O'Reilly Zoo, maybe the animals/books have been re-engineered into abominations... and then they follow what has been rewritten to produce anything.

    Well I suppose Long"Horn" could be an animal derivative; the bit that MS chops off and gives to the customers.

  • Just Say NO! (Score:5, Insightful)

    by gurnb ( 80987 ) on Monday November 25, 2002 @02:42PM (#4752731) Homepage
    James Clark fights with XSD, pushing his Relax NG. Relax NG is *not* W3C. Let me repeat a background.
    XML (markup language created by W3C) is a subset of SGML (markup language created by ISO).
    XML has been created by few smart marginals from SGML world plus some MS politicians
    Now those MS politicians (and alikes) rule the show in W3C and the smart marginals have left W3C and work for ISO ( OASIS ).
    XSD is XML Schema language by W3C. Relax NG is XML Schema language by ISO (OASIS). Sofar, Relax NG is the first visible XML applicatuon which belongs not to W3C
    Now when we have a big picture written down, I would recommend reading the letter from James Clark
    http://www.imc.org/ietf-xml-use/mail-archive/msg 00 217.html
    The RELAX NG formalism has a solid basis in tree automata theory. W3C XML Schema has no such basis.

    e t.c.
    BTW, even RELAX NG is definately better than XSD, Relax NG itself is also not that perfect. The 'perfect' solution could be based on regular expressions. Nevermind. In the next years, nothing interesting (except for political battles) would happen in the world of XML Schema.
    • Re:Just Say NO! (Score:2, Interesting)

      by Anonymous Coward
      I cant think of a 'perfect' solution for anything that involved regular expressions to a large degree, unless you are a dedicated perl addict. The perfect solution would be one that everyone would adopt regardless of political standing or implementation details like.
      "Now that RELAX NG is an ISO/IEC DIS, I think that we should prefer RELAX NG
      to XML Schema."
      (http://www.imc.org/ietf-xml-use/mail-archive/msg0 0231.html)

      HMM, Glad I decoupled the use of W3C schemas in my application early on, or I would certainly be in a pickle now! Oh wait, I didnt, DAMN!
      • I don't think he means perl-style regexps. I think he just means that the schema should incorporate the formalism of regexps so that you can prove how large and how limited your language is.

        Not that I know the first thing about this whole topic. Maybe you understood perfectly what he meant.
    • I'm not sure I follow you here.

      For one thing, those more nasty of companies seems to have a lot more say in OASIS and ISO than in W3C. Nor can I look upon XML as a subset of SGML, but maybe I'm totally wrong about that.

      In ISO, it seems like that cross-licensing of patents is done to shut smaller companies out of the process. OASIS too has RAND licensing [oasis-open.org] and as usual, makes no attempt to define what is "reasonable" or "non-discriminatory":

      OASIS.IPR.3.3 Determination of Reasonable and Non-discriminatory Terms The OASIS Board of Directors will not make any explicit determination that the assurance of reasonable and non-discriminatory terms for the use of a technology has been fulfilled in practice. It will instead use the normal requirements for the advancement of OASIS specifications to verify that the terms for use are reasonable.

      This is contrary to how hard W3C has worked to ensure royalty-free patents.

      I'm not saying that James Clark's stuff isn't better, it may well be, he is certainly among the foremost in this field. But this bashing of W3C seems highly undeserved.

      • Re:Patents (Score:3, Interesting)

        by __past__ ( 542467 )
        Nobody (in their right mind) bashes the W3C because they are a bunch of evil corporate drones. People bash it because the W3C has created too much overengineered, half-working, overly complex and generally crappy specs recently, without listening to valid complaints from their actual users.

        XSD is one example that is particularly nasty because the W3C seems to plan forcing it in every other spec they create (for example XSLT 2.0, XQuery, RDF/OWL), making them very hard to implement (how many conformant XSD implementations are there? For languages other than Java?)

      • XML is a strict subset of WebSGML. They designed it carefully so that you could produce a DTD for any well-formed XML document and process it using any up-to-date SGML tool that implements the WebSGML TC extensions.

        Thanks; I never noticed the potential legal minefields.

    • There's no such thing as a perfect schema language in a general sense. It all depends on the application at hand. Actually, trying to be perfect is one of the main reasons for the W3C XML Schema bloat; they simply try to squeeze too much into a single specification.

      RELAX NG is much more streamlined; it focuses on specifying grammars for XML structures. Nothing more. I doesn't try to glue on concepts like object orientation (which is another reason for the W3C XML Schema blur). It's just very pure and hence easy and intuitive to learn and use.

      Also, with the recent addition of the Compact Syntax [oasis-open.org], editing and reading schemas has never been easier. Utilites for working with the compact syntax can be found here [pantor.com] and here [thaiopensource.com].

      So even though there is no perfect schema language, I'd say RELAX NG is far more perfect than W3C XML Schema in many situations. If you have applications that require W3C XML Schema, you can use Trang [thaiopensource.com] to convert your RELAX NG schemas.

  • by ellem ( 147712 ) <{moc.liamg} {ta} {25melle}> on Monday November 25, 2002 @02:55PM (#4752807) Homepage Journal
    I don't have hopes or dreams (yeah)
    I don't have plans or schema's
    I can't author anything
    Since I don't have XSD

    I looked on Google
    And I checked out ActiveState
    And I tried MacMall
    They don't carry XSD

    Microsoft says just use .Net
    Redhat says there's no RPMs for it yet
    And Ellen Feiss says hers disappeared
    I don't have XSD
  • The article said something about months to fully understand. Months? This is a standard? ANyone remember XSL formatting language? Anyone use it? Nope, because it was huge. XSL-T was reasonably sized and it got used.

    How anything in XML-land takes months to understand when XML docs are little more than glorified hyped text files is beyond me. The only time this should happen is understanding what human/cultural aspect XML is trying to represent, not a technology.
    • Then maybe you just shouldn't be doing such "complex" programming. XSD should not take more than a couple days to understand, another few days to iron out the details, and by the end of 30-40 hours you should have produced a good sample of whatever you're modelling with XSD.

      Too many "how to" documents and books are written for people who just don't "get it". While I sympathise that they need to learn, you have to grasp the concept of abstraction before some of these "languages" make sense. XSD, ERDs, SQL-DDL, etc. are just different ways of describing data structures and organization. Each is "complex" if you try to make it do more than it was intended for, but only takes a couple weeks to understand well enough to get by.

  • Mix XTC and LSD together and what do you get??

    XSD!!!

    Yes, thats right kids, this is really nothing more than a wild halucenegenic for your computer! Mix this in to your web pages, and look at all the `trippy' designs and patterns that it creates.

    Available by the tab, strip or sheet.

    (for the humor impaired, this is a joke. Maybe you need to take this stuff.)
  • A while back I ran into a rather stupid bug in RedHat [redhat.com]. After some debugging I figured out the problem was due to a "description" field missing from one of the config files (/boot/module-info).

    At the top of this file in the "comments" was a brief note about the format that this file was supposed to follow. According to this, there should never have been an entry without a description field. The code for one of the GUI programs ASS-U-MEd this would be the case, didn't do any checking and of course crapped out when it found a malformed entry (someone obviously hadn't read the format info in the comments when the made changes).

    Now of course the code should have been robust enough to handle such and error, but ideally there would have been some checking of the config file to determine whether the file actually conformed to the listed format rules before it was sent out into production.

    These config files are where XML's strengths can be quite useful. If this file had a formal schema definition and was created in XML, it would be quite easy to check this config file for correctness (syntactical at least) as part of the build process. The code which reads these files could be changed to use a simple XML parser to read through it, or hell you could even write an XSLT transform to turn it into the "non-XML" format (i.e. use the XML for developing the file only) and not have to change any of the existing code that reads that file!

    There's been a big push towards XML configuration files in some segments of the industry (the Java J2EE camp for example). I for one hope this trend will continue.

If you think the system is working, ask someone who's waiting for a prompt.

Working...