Forgot your password?
typodupeerror
This discussion has been archived. No new comments can be posted.

XML Schema a W3C Recommendation

Comments Filter:
  • thanks for the info....I think I have a grasp of it now :)
  • <whining>Maybe it would've helped you if they would've posted it when I submitted it yesterday afternoon, or maybe when they rejected it yesterday at 6:44 PM. :) </whining>
  • Eh.. No. You need a parser to handle Schemas just as much as you need it for DTDs

    A Schema is an XML document, thus an XML parser will parse a Schema. A DTD has a completely different syntax, hence requiring a different parser.

    What you probably are thinking of is the semantic interpretation which follows that parsing. But that would hold for any XML document used, whether a Schema, an EJB deployment descriptor, an XSLT transformation etc.

  • I'm rather clueless when it comes to XML, but I thought that a DTD did what the schemas seem to do. What exactly is the difference between them?

    Basically, the setup with XML + DTD + CSS means you need parsers for three different data formats.

    With a new setup using XML + XML Schema + XSL you only need one. A major advantage.

  • Oddly enough, Microsoft helped write the standard.

  • > This is very much akin to having database schema for databases.

    In fact there are many people [rpbourret.com] proposing to store data as XML. I see a big danger here. Lots of people are encoding data with XML to enable exchange, and that's OK. But when people start thinking about storing that XML representation in a database, they fail to realize all the benefits that a relational database gives, both today with semi-relational SQL [frick-cpa.com] and in the future with fully-relational Tutorial D [acm.org].


    --
    Leandro Guimarães Faria Corsetti Dutra
    DBA, SysAdmin
  • > relational databases will be around for a long time to come

    That is, whenever they arrive they will be around for a long time. Up to now I know of no fully relational database or DBMS, apart from BS12 [mcjones.org] and perhaps Quel [berkeley.edu] and Leap [sourceforge.net].


    > it's clear that no automated solution exists that will optimize performance in every case

    Performance is not the only issue, not even the biggest one: data access path independence and data integrity are bigger ones, and more fundamental.

    The problem is that up to this day no one fully implemented relational theory in a modern system, nor proposed a better theory than the relational one.


    --
    Leandro Guimarães Faria Corsetti Dutra
    DBA, SysAdmin
  • the problem with schemas is that just because a customer id is 14 digits long does not mean it is a valid id. just because someone's last name is an uppercase character, followed by less than 20 lowercase characters does not make it a valid customer. while schemas solve the problem of specifying the formatting of the field, this syntactic problem is only half the necessary information needed to exchange data. this doesn't solve the semantic problem, and therefore it's still just as easy to get bad data in your database.
  • Although most of us here dont' approve of Microsoft's business practices (and its true they can seem almost evil at times) you CANNOT forget OR deny that they DO make some good pieces of software
    Yep, that Micro$oft Works 2.0 sure kicks major ass!!!
    --
    You think being a MIB is all voodoo mind control? You should see the paperwork!
  • Well, I think a lot of the arguments between OODB and RDB camps echo here. Mapping the hierarchical data model of a set of classes (it's there, buried among all the methods) to a relational model has generally been a manual and tedious process. The big problem OODBs have had is their type systems do not map seamlessly with that of a given programming language, forcing post- or pre-processing of the codebase.

    Where it gets interesting is where each language develops a type map to a common data model, such as one determined by XML Schema. All of a sudden, the mapping problems to be dealt with diminish in number.

    There remains the problem of correlating hierarchical structures to relational ones, however. I'm not knowledgeable enough to proffer whether relational models are inherently superior to hierarchical models for a large realm of applications. What I do know is that relational databases will be around for a long time to come, so XML RDB mapping will remain an issue, as I think it's clear that no automated solution exists that will optimize performance in every case.
  • There are aboud a dozen datatypes that DTD supports. There are 45 that XML Schema supports. That means that for XML documents used to transport large amounts of data, much tighter contraints can be imposed that can be checked up front in the document processor, without the application having to do so.

    XML Schema also supports the ability to define your own datatypes through inheritance, so the type space for XML Schema is practically unlimited.
  • W3C is not a standards-setting body, so they say. A Recommendation is the highest level endorsement that W3C gives to a protocol proposal. It may become a standard if ISO or IETF picks it up and runs with it, like they already have done with a competing XML schema language called RELAX.

    There's also several other competing schema proposals out there for XML, none of which seem to be getting much of a hearing at W3C: in addition to RELAX there's Schematron, TREX, SOX, Examplatron (sp?), and more. I think it's hard to say at this point which is better suited for most applications processing XML as data, which is why it is of concern to some people that W3C will be pushing to incorporate XML Schema support in other existing and proposed XML protocols.
  • Read the press release:

    "The third part is a primer [w3.org], which explains what schemas are, how they differ from DTDs, and how someone builds a schema."

  • A big advantage of XML schemas over DTDs, besides providing a richer language to express the structure of a document, is this: a schema is a valid XML document itself. DTDs have their own syntax that does NOT match XML; schemas ensure that you only need 1 type of parser, an XML parser.

  • It just dawned on me that it wasn't accepted yet, Seems like I've been writing this stuff forever. Maybe its the job, not the XML.

    I remember a XML Conference in '98 or '99 I went to in Chicago, interesting to listen to a salesperson try to explain the differences between a relational database and a object orientated. comedy..

    propane

    • DTDs suck.
    • XML Schema doesn't suck.

    As to why DTDs are suck-worthy, then:

    • They're expressed in their own syntax, which no-one understands.
      XML Schema is expressed in XML.
    • Because XML Schema is in XML, it's machine-processable by a whole bunch of simple tools. It's trivial to write a database data-model -> XML Schema export tool.
    • DTDs are poor on complex structures.
    • DTDs don't do data-typing
    • DTDs are very poor when complexity meets data typing. XML Schema can support structured types in a manner that's akin to a struct.

    XML Schema is quite a good TR for datatyping within XML, but it's still very limited at expressing large-scale structure (this is an XML limitation, not an XML Schema one) and does nothing for semantics or ontologies.

    We still need RDF, and RDF still needs schema expression languages that are smarter than XML Schema. Current practice seems to be that RDF Schema is dead, DAML [daml.org] is the way forward and DAML gets its low-level data-typing from from XML Schema. Incidentally, this is a very good example of why XML Schemas structured types are dead handy, and usefully different from the normal XML structure expression. There's a couple of interesting papers by Jane Hunter [dstc.edu.au] and Carl Lagoze [cornell.edu] on XML, RDF and various schemas that describe some of the issues involved in different schema requirements, different expression languages, and how to compare them.

    Microsoft have done what they usually do with XML: Build an excellent implementation before the W3C got there, get flamed to hell by the Slashdot weenies, then bring it quickly back into line once there's a standard worth using. We all like to beat on Bill, but for a few things (and XML is a big one) Microsoft deserve a lot of credit for some really good work.

  • The Semantic Web doesn't need XML Schema, I've never heard TBL say that it does [w3.org], and XML Schema is certainly not a sufficient condition for the SW !

    TBL seems to have an obsession with XML namespaces as a solution to everything (look at the big SVG "roadmap" in his WWW10 conference slides [w3.org]) and lately he seems to be taking an "Agents with Everything" line.

  • People seem to think that "self-describing data" is going to save the world in the same way that XML was supposed to eliminate the need for parsing and interpretation of information by a computer program.

    Self describing data is going to "save the world" (for small value of "save") - but XML never did that, and XML Schema barely begins to either. If you're going to communicate, then you don't just need self describing data, you need a shared vocabulary of description -- and this is a problem for DAML+OIL et al., not just XML Schema.

    XML has removed the need to parse, or at least the need to write new parsers. XML Schema allows structural comprehension (which isn't the same as an ontological understanding) and validation to be similarly automated once and for all by a common toolset with a single API.

    you still have to write programs to interpret the contents of the XML information in pretty much the same way as with data exchanged in any format.

    You still need to "interpret", but you no longer need to parse. If your documents fit into the RSS 1.0 model, then interpretation becomes trivial for all documents expressible in RSS 1.0, because RSS 1.0 has a shared ontology behind it that's implicit in the use of the protocol. If you use RDF in conjunction with something like DAML+OIL, then you gain the same advantages, but over a much woder range of application than that of a "newsfeed"

    XML alone can't remove the interpretation burden for you, but it can remove the parsing burden (which isn't trivial) and some of the protocols on top of XML, such as RDF, remove further horizontal slices of this "interpretation" workload for you.

    most functioning protocols out there are able to exchange information without the need for a formal validation model

    All protocols that work have a validation model -- but sometimes it's implicit and informal. You either do this (you "trust" the information provider) or you formalise it. Formalising it has two advantages; it removes some of the "trustworthiness" issues about unknown providers, but it also allows the "allocation of trust" to be deferred to individual documents. Rather than pre-agreeing with a known and trustworthy provider that they'll send you documents, they'll be valid documents, and you know the format for which they'll be valid, then you can do this for each document as it arrives. This is great, because it now means you can receive docuemnt formats you've never seen before, and you can still build a useful level of trust upon them. Implicit trust is great, but it limits you to providers you know about, and to formats that you know about.

    Not that you would really want to use one [a validation schema] on either the generation or consumption side of a real system, since it just slows things down.

    It doesn't slow things down, it speeds them up. Think of network protocols and the layer stack model. The lower level you can perform an operation at, then the dumber that operation becomes and the less work it is to do it.

    Ignoring the trivial "I don't care if this goes wrong" applications, then all applications need to validate data that they receive from external systems. If I can validate an invoice as being structurally valid by expressing useful constraints in a low-level schema (like XML Schema, but more likely DAML), then that's a lot quicker to work on than interpreting the document to be a data structure or object state for an invoice (which may have been so broken so as to raise an exception when I unserialized it) and then validating that object's internal state. Someone needs to do this validation, and it's quicker to do it low-down and dumb (and it's no less valid or reliable to do it that way).

    Another thing that bugs me is the fiercely defended text-only approach used in XML.

    Text only is good. It's fundamental to XML (XML Schema cannot change this) and there are many good arguments presented from back in the SGML days onwards as to why this is the right way to do it.

    First, there's no way to directly include binary data

    Don't need it. Encode it instead. Yes, there's an overhead (both in processing, and in volume) but that's minor, the advantages outweigh it, and DUMB ENCODING IS WHAT COMPUTERS ARE FOR !.

    There's an argument that a DOM should be binary-aware, but no reason at all that the serialized XML document should be.

    This is pretty strangely limited given that XML data is generally exchanged over an 8-bit clean pipe (i.e., the Web).

    The web is not 8 bit clean, and the usage of character sets mean that anything in the top bit is already exposed to mis-interpretation errors. I guess you're American, because developers in most other countries hit this problem on a regular basis.

    Also, XML is 16 bit clean (sic, in some views), because it's perfectly OK to Unicode that CDATA. I really do NOT want a binary route through XML that makes my application worry about whether the XML document had passed through an 8 bit, UTF-16 or even EBCDIC (!) transport on its travels.

    <xml:binary size=10>kjiu õéçäá</xml:binary>

    would be quite reasonable, with "size" octets placed directly between the closing '>' of the opening tag and the opening '' of the close tag.

    First of all, your XML fragment sucks; xml as a namespace local identifier already has implicit connotations - i.e. xml:lang, and you ought to be quoting attributes properly if you're trying to lecture on syntactic changes.

    Secondly, what's an "octet" ? There are no octets in XML - An octet is a well understood term in low-level comms meaning, "We have no idea what a word length is on this crate, or if a char is 7, 8 or 5 bit clean, but an octet is going to have exactly 8 bits in it". In XML, you just don't know this. It's hidden from you, and it's done deliberately so that your apps don't need to worry about it. If you change this, and make it visible, then that breaks a whole lot of i18n text-processing code.

    They should have included a mechanism in XML Schema to declare this.

    They can't. Even if it was a good idea, that level of change would have to be in XML itself, not XML Schema.

    What is needed is a simple, widely acceptable binary encoding of exactly the information included in XML text, which uses lookup tables to optimize handling tag names.

    We already have this. It's Huffmann coding, and any network protocol will already be doing it for you, low-down in the stack.

    The third problem with exchanging raw text XML encoded data is that it explodes the information you want to ship

    See above. Bloat is bad, but it's dealt with low-down.

    The MIME tags really need to be updated too

    They have been. See RFC 3023 [isi.edu].

  • First of all, what's the difference between Open Source and the W3C ? I have complete faith that many Slashdoterati will be able to explain exactly where they diverge 8-), but this is likely to involve Stallmannesque hair-splitting.

    What can this chimerical "Open Source" movement offer that will suddenly fix all of the W3C's ills ? Who are these masked coders ? In what way do they differ from the people who already write the specs at the W3C ? The W3C isn't the Bilderberg Group ! It's not some shadowy neo-Illuminati cabal with arcane membership rituals and a split devotion to the forces of satan and global capitalism (that's ICANN).

  • What would happen if you tried to use a schema to validate itself?

    I wonder if people will use DTDs to validate their schemas. :-)

    have some flash [thekristo.com]

  • I had a hard time figuring out how to express hierarchy. If I have a type of time, how do I express that other types such as date, hour, minute are subtypes of time? -Willy
  • Come on, that's just asking Microsoft to break the standard...
  • XML is text, so your data gets to be 100 times bigger than storing it in binary. Only 10 times bigger if you compress the xml file. Two orders of magnitude ain't negligible, even with today's fast computers.
    Also, I have yet to see an XML parser that actually parses correctly even simple xml files. For example, the MS XmlReader class (in .NET) does not always parse correctly the output of the MS XmlWriter class. Which is giving me enormous headaches at work currently.
    Those are just the things I got off the top of my head.

  • Though that parser just got a whole lot more complicated because it now has to understand XML schema, rather than well-formedness and DTDs.
  • Today: XML "embraced" my Microsoft.... heck they don't have to spend a dime on R&D for a wonderful thing that makes life easier for them. Tomorrow: Redmond notices that XML is being written on non-Microsoft text editors, being used in non-Microsoft apps, and that data is being pulled from MySQL databases & formatted in XML by PHP. Next Week: mXML (microsoftXML) is indroduced. It's just like XML, except if you look at in InterNet Explorer 6.5, it strokes you off while it stabs you in the back =)
  • Does the XML Schema replace the need for a DTD? Can you do all of the validation with an XML Schema that you can with a DTD?

    Last thing I want is to have to write both a DTD and an XML Schema for my documents ....
  • So you dont think M$ can find a way to make 'just their product' compatible with XML? I'd wait a bit before I cheer. And trust me, it wont be well-formed ;D
    /Smuffe
  • Am I the only one who didn't get anything from that headline this early in the morning?
    Nope, I didn't understand a word either. I'm just here for the games.
    /Smuffe
  • Or you could read the linked press release and find out for yourself.
  • World Wide Web Consortium has officially given its Stamp of Approval

    This means that I can use it now? Cuz I don't use anything that doesn't come with a *Stamp of Approval*. I also prefer products that are "As Seen on TV"
    ----

  • I haven't looked closely at schema yet but as it seem's the key differences are that schema lets you specify constraints on the content in much more detail that DTD could (eg you can specify that the content must be numeric) and a schema is itself an xml document which makes parsing and validating eaiser.

    It's a good thing.
  • This is not meant as a flaimbait, however, I'm quit sick of this kind of attitude about Microsoft. Its always microsoft sucks this, microsoft sucks that... bla bla bla... its becoming harder and harder for me to read a lot of this and take it seriously.

    Although most of us here dont' approve of Microsoft's business practices (and its true they can seem almost evil at times) you CANNOT forget OR deny that they DO make some good pieces of software and somtimes can do GOOD.

    So pretty please, with sugar on time, get a grip!
  • XML is getting noticed by other standards developing organizations, such as the IETF. XML Schema provides features for protocol design, such as strong data typing, that just weren't available with DTDs, giving application layer protocol designers a viable alternative to other specification languages such as ASN.1.
  • w3c is definitely a good place to be devising standards, but i was under the impression they're concerns were only with html and the other experimental or rare sgml languages which were starting to become evident in the 'web.

    then, i see an abiword schema and some other things on the w3c site. it makes me wonder that one of two things are happening; either the w3c is expanding their expertise to other areas of programming (or scripting), or the web is expanding to be able to handle all data types.

    of course, the latter has already happened to some degree, but i guess xml is acting as the shuttle bus which is just running a little late.

    ---
    if the sun shines, they run and hide their heads. they might as well be dead.
  • Am I the only one who didn't get anything from that headline this early in the morning?

    Maybe Michael should have shed a little light on what this is and what it means.

    "just connect this to..."
    BZZT.

  • w3c is definitely a good place to be devising standards, but i was under the impression they're concerns were only with html and the other experimental or rare sgml languages which were starting to become evident in the 'web.

    Well, XML is also a rare SGML language.
  • FYI, XML has nothing much to do with cross-platform, cross-browser compatibility.
  • Yes! This is awesome. It allows for something I've wanted for quite a while... introspection. This means that an application can actually use document structure information to do things like build user interfaces and write automated SQL to backup arbitrary XML repositories.

    I have been thinking about building a generic content management solution for a while, and schema was the only missing piece of the puzzle. With schema, I can build a document type that makes sense for a client, and an HTML content management tool for entering data can be automatically generated from the schema document! How cool would that be?



    Well, your fingers weave quick minarets; Speak in secret alphabets;
  • 2001: WireSL, XYesPath, XNoPath, XMaybePath, XDontKnowPath, XWhatTheHellIsPath
    2002: BookSL, XButtonPath, XButtonPhone, XQueryTaxes
    2003: ToasterML, XThatDirtyThingOnTheButton, XBetterVoice, XSuperStuff
    2004: XAppProtocol, XMLSTLK89762KK828

    2004: 10000 acronyms and W3C realizes they don't need (and can't) create acronyms for every possible use of XML.
    ------------------------------------------------
    You think Bill Gates is evil?
  • Now there there is a fully endorsed spec by the W3C how long till it gets full support?

    Is XML what we have all been looking for?
    I think it is. I cant wait untill I can define everything about something with minimal effort of crossing paths with a black cat (so to speak)...I Dont see any downsides to XML, does anyone?

    Long live XML! Down with propaganda of propritery RTML and OTHER CRAP!


    Are you on the Sfglj [sfgoth.com] (SF-Goth EMail Junkies List) ?
  • by Kraft (253059)
    > Sure you can repressent documents, and with stylesheets rewrap them into a design of your choise

    Well, this is exactly what I use XML for, and so does IBM for the entire ibm.com (sorry, can't find the link, but read an entire article a few months ago). There are also one or two [ibm.com] other megacorps using XML.

    In the XML solution I am working on now, all the XML is generated on the fly and converted with an XSLT engine (Sablotron and PHP [phpbuilder.com]) so I don't really have a use for XMLquery, do I? I mean, I can just use SQL to get whatever I want.

    -Kraft
  • Amen....these front end bastards piss me off. When will they learn that it's about the exchange [hr-xml.org] of data?
  • This is woefully inadequate, given the current structure of the Internet. Again, Open Source raises its hand, but the teacher refuses to call on it... -HooD
  • Best as I can tell.. there are diferent sections with diferent colors.. like BSD, Apache.. etc.. etc..
  • Not break...."embrace and extend" is the slogan, isn't it?

    The nice thing about XML is there IS nothing defined...just the structure of the document. The user can make up whatever they want basically, so long as it is well-formed

    -Scott
  • XML: a standardized framework for creating
    incompatible data formats.
  • Schemas are much more powerful than DTDs. They do not only allow you to specify the structure of the tags in a very flexible way, they also make it possible to do type checking on attributes, make the substructure of a tag dependant on an attribute value, etc.
    So schemas are what DTDs never were: A really useful tool to check your XML, not just some simple sanity checks on the coarse structure ...
    But you really need to read at least the primer (part 3) to appreciate what you can do with Schemas. They are *very* complex.

  • <!ATTLIST languages
    another CDATA #IMPLIED
    2learn CDATA #IMPLIED
    e-dtype CDATA #FIXED
    "enough-is-eno ugh"
    a-dtype CDATA #FIXED
    "pubdate date
    binding length">

    After following Schema from its introduction [w3.org] a while back I just briefly looked at it and said "Another HTML Markup Language" and tossed it to the back of my mind. I had worked at a company who built a product exclusively using XML which had been hacked up to make it useful enough for the company, and found most of it lacking for the Unix side of things where programming was concerned, not interactive webpages, strictly lacking as complete portable solution.

    Often I wonder when I hear these news stories about new protocols appearing, just how long will they last, and how much of an impact would they have in reality, ePerl, PHP, etc., and often I hear of one "standard" coming out only to be overshadowed by another one in the making. So not to troll but how many people are actually looking forward to this becoming a standard? Aren't the current available languages enough?

    I guess it depends on what someone wants to do, but in all honesty I feel the market for things are becoming so saturated with so many different variations claiming to be the best thing, yet from what I see many people often use the standard norms available just fine.

    So how exactly is this beneficial to achieve what you already can using the standards? Sometimes the language can be so confusing when your in the midst of nailing "the next best protocol" which was overshadowed just a second ago, and now you have to tweak what you already know to jump on this latest 'technology` all because its been endorsed, or recommended. Maybe its me not being innovative enough to really look at Schema for its face value, but all I see is another language. Not a big deal.

    Please don't flame this, don't think I'm being arrogant, or trollish, just posting an honest thought to see some insightgul replies. Sure I joke here and there, but I would like some enlightenment.


  • I could be wrong this declares DTD's to be part of XML now.

    A DTD defines what is allowable in an XML document. XML schema is just a different way to do this. DTD's have always been available to describe XML documents.

    An XML document is well-formed if it adheres to the XML syntax specification; it is valid if it adheres to a DTD. XML documents do not have to be valid - i.e. do not required to have a DTD.

  • This doesn't come as a great suprise. In the release, Tim Berners Lee is the W3C director that gets quoted saying how great XML schema is. Since his new fangled Semantic Web relies on the mainstream acceptance of XML schema what else is he going to say?
  • I think he meant that XML, XMLSchema and XSL are all XML format so you can use a single XML parser with them. DTD and CSS files are not XML.
  • Microsoft will create a proprietary version

    All M$ has to do is add a few characters to the start and end of every file format they currently use. For example, something like this:

    <xml //ms//dtd windowsmedia 9.0> <asf> FVRT&*&@#$ERDFHh678$#D%TGW3 [big stream of binary deleted] VFBTY*&^%$$#@WEDFGHGG&^43F#%@w </asf> </xml>

    Then they can hold a press conference to proudly announce: "Microsoft Office is the only suite that is 100% XML compliant!" The word XML is like the word consultant -- it can hold so many meanings that it's pretty much meaningless.

  • I'm rather clueless when it comes to XML, but I thought that a DTD did what the schemas seem to do. What exactly is the difference between them?


    Suppose you were an idiot. And suppose that you were a member of Congress. But I repeat myself.
  • have W3C standards ever meant that I can get solid cross-platform, cross-browser compatibility on my (correctly coded) web pages, six months, a year, two years down the line?

    The major fifth-generation web browsers (Mozilla, IE 5.x, Konqueror, Opera, etc.) support most of CSS1 and CSS2. If a page crashes 4.x browsers, that's the fault of the 4.x browser user for not installing a 5.x browser. 5.x browsers don't use that much more resources than 4.x browsers; see also Galeon [sourceforge.net] and K-Meleon [kmeleon.org].

    If people use shitty browsers, that's their problem.

  • So you dont think M$ can find a way to make 'just their product' compatible with XML?

    They won't E&E XML too soon. They're still working on embracing and extending TCP/IP and ZIP codes.

    After all, XML, including Schema, is just a way to format your data to make it easy for other machines to parse. It doesn't help you understand what the data means.

    What MS is likely to do is to send data while not documenting the meaning of the data, and then claim that they're "Standards Compliant." I can just see the schema, defining fields such as

    <xsd:element name="reserved" type=ObfuscatedType>

    <xsd:complexType name="ObfuscatedType">

    <xsd:sequence>

    <xsd:element name="undoc1" type="xsd:integer"%gt;

    <xsd:element name="undoc2" type="xsd:boolean>

    </xsd:sequence>

    <xsd:attribute name="BillsSecretCode" type="xsd:string"/>

    </xsd:complexType>

    It may be standards compliant, but without Microsoft Secret Decoder Ring 2001, it won't do you much good.

  • >>Is this what you're waiting for?

    Well, not exactly. Kweelt is a development from Quilt, and so is XQuery. That means they probably have a lot in common. But i still wait for a "standard" to evolve, thats what W3C is for :)

    Even if it supports all requirements, as long as it's not a standard.. its not really useful in the long run.

  • >>I Dont see any downsides to XML, does anyone?

    Well there's always one thing... there is no way to make good use of it yet ;)
    XML (at current) doesnt have a query language, which means you dont have that much to use XML for. Sure you can repressent documents, and with stylesheets rewrap them into a design of your choise, but large-scale use are yet to come.

    What we are waiting for is XQuery [w3.org], that will hopefully make a big difference :-)

  • I ran across this company awhile back and have found their tools extremely useful. Primarily they provide a tool that will compile a Schema into a Java object model. It provides built in functions for marshalling/unmarshalling/validate. I have used it in one project so far and it made dealing with XML data very easy.

    http://castor.exolab.org [exolab.org]

  • I could be wrong this declares DTD's to be part of XML now.


    Are you on the Sfglj [sfgoth.com] (SF-Goth EMail Junkies List) ?
  • Eh.. No. You need a parser to handle Schemas just as much as you need it for DTDs, and just as DTD parsers to a large extent is built into many modern XML parsers, expect schema parsing to be too. Also, XSL still adds tons of code.

    The advantage doesn't come in the need for or reduced need for parsers, but in that schemas can specify the structure of an XML document with much more detail than a DTD can. As for XSL, I don't like it at all - it's a lot more complex that needed.

  • A nice troll... But the equivalent of SGML and DTDs would be XML and DTDs or XML and schemas.

    All the other stuff you mention provides extra functionality that you don't need to do the same stuff you can do with SGML and DTDs. In fact XML + schemas already provides lots of useful validation of the structure that you won't get with SGML + DTDs.

    Thanks to the simplicity of XML it got the widespread usage that SGML never managed, which is what have resulted in all the other stuff you mention: Standards that, if it weren't for the popularity of XML most likely would have been represented in tons of disparate representations, instead of using XML as the common representation.

    With XML schemas too, now you'll be able to properly validate documents for a lot of the standards you mention without writing a separate validator - just specify the types with a schema.

  • First, for XML itself. What is XML? A standard way to store and describe data in a manner that is readily addressable by virtually any computing platform. [...] What else offers that?

    There are many existing textual representations that are equivalent in power to XML but a lot simpler. The simplest example would be Lisp's textual representation. Lisp's textual representation is a lot easier to define and parse than XML. In fact, any collection of functions and type constructors in a programming language, together with the syntax of that programming language, define such a representation. A Schema corresponds to a type system in such a representation.

    I don't think XML has been very well thought out. It's a standard for data representation, but it's based, through historical accident, on a standard for text markup, and that causes all sorts of problems. Still, despite its failings and shortcomings, at least XML gets the industry away from junk like OLE structured storage, Bento, or ad-hoc binary formats. For that, I'm willing to live with XML's messy syntax and semantics.

  • Thanks to the simplicity of XML it got the widespread usage that SGML never managed,

    XML's popularity probably has little to do with its design. SGML was marketed as a text markup language. XML is being marketed as a universal data representation. The market for the latter is several orders of magnitude larger than for the former, and that accounts for XML's popularity. In terms of its design, XML is neither particularly simple (compared to alternatives), nor particularly clean.

  • by Anonymous Coward on Friday May 04, 2001 @06:37AM (#246821)
    I have been doing XML for the last two years (XML Schema, Namespaces, XSL Transforms, plus some other misc stuff) and while I think this stuff is way cool, it is a bitch to explain to folks. The best analogy I have found is libraries. I choose libraries because a card catalogue is a great example of "meta data" in use and most folks know about card catalogues and why they are useful (OK, necessary).

    So what is the big deal with XML Schema? XML Schema is important because it provides the widgets to define a "card catalogue" for your library of data, be it air plane parts, phone bill, hotel reservations, or porn.

    Now metadata has been with us since the mud table libraries of Mesopotamia (they had indexes of stuff so they could find how many cows were traded in the Xth year of SomeRulerDude), however the printing press is what made all the difference. You see, before the printing press books were so expensive and time consuming to write, there were not that many of them. The general strategy to manage a library was an index of all the books. As long as the book population was not too big, then this works. For example, when you search on google for "McCain", you get congressman, porn sites, and damn near everything in between. Search engines today are just really, really big indexes of stuff. Still in the stone ages, aye?

    The printing press changed that and forced libraries to find an EXTENSIBLE way to keep up with books. The Dewey Decimal System is a great example. So I pose to you the following question, "When was the last time the DDS was updated?" Well, how long have they been publishing books on computer science, biogenetics, or nanotechnology. The DDS is an extensible system to classify knowledge. So I leave you with the following statement...

    HTML was the functional equivalent of the printing press, which is just an electronic version of fast, cheap publication. HTML forced us to follow down the path of XML, just like the printing press forced Mister Dewey to put on his thinking cap. The only difference is that the printing press took a few hundred years to do its thing where HTML only took a few years to do its thing.

    Now for all the other XML specs out there (SAX & DOM, RDF, XSLT, XHTML, XPointer, etc) are just tools to work with your (library of) data. Better to have many specialized tools that can evolve independently than one big honking tool, aye? Use only the tools you need.

    So does TBL's dream of a semantic web make more sense now...?

    If you want some links, try...

    Danny Hillis - The big picture [wired.com]

    Roger Costello's XML Schema Tutorials [xfront.com]

    "You can drive a car by looking in the rear view mirror as long as nothing is ahead of you. Not enough software professionals are engaged in forward thinking." - Bill Joy

  • by csbruce (39509) on Friday May 04, 2001 @04:26AM (#246822)
    While somewhat important, I think that people give data validation far too high of a priority. People seem to think that "self-describing data" is going to save the world in the same way that XML was supposed to eliminate the need for parsing and interpretation of information by a computer program. I've been involved in using XML to exchange information and make remote invocations of services in a Web environment, and you still have to write programs to interpret the contents of the XML information in pretty much the same way as with data exchanged in any format.

    So you can automatically validate it. So frikking what! The rabid theoriticians in the consortium of people that I work with get all hung up on this without realizing that most functioning protocols out there are able to exchange information without the need for a formal validation model. Not that you would really want to use one on either the generation or consumption side of a real system, since it just slows things down. All you need is a clear spec for the protocol.

    Another thing that bugs me is the fiercely defended text-only approach used in XML. For some reason, XML fans seem to think that computers cannot exchange and understand binary data, or that editing tools would be unable to allow people to see it.

    The text-only approach has two major limitations. First, there's no way to directly include binary data. There's lots of binary-encoded objects out there, like image or sound file formats, but you have to encode it in BASE64 or something. This is pretty strangely limited given that XML data is generally exchanged over an 8-bit clean pipe (i.e., the Web). Something like:

    <xml:binary size=10>kjiu õéçäá</xml:binary>

    would be quite reasonable, with "size" octets placed directly between the closing '>' of the opening tag and the opening '' of the close tag. They should have included a mechanism in XML Schema to declare this.

    The second problem with text is the high cost of parsing it. Probably the majority of time spent in a system that processes a large bulk of XML data is spent in the lexical analysis stage of consuming the XML stream. They had their big chance with binary-WAP-XML, or whatever they called it, but that seems to be kind of screwed up and includes patented technology. What is needed is a simple, widely acceptable binary encoding of exactly the information included in XML text, which uses lookup tables to optimize handling tag names.

    The third problem with exchanging raw text XML encoded data is that it explodes the information you want to ship over the Web by a factor of about 20 times. It needs to become commonly accepted practice to, at least, exchange this information in a compressed format, such as GZIP. The MIME tags really need to be updated too, to allow a nesting of formats, to say "this is a gzip-compressed stream of Bob's fabulous graphics markup format encoded in XML".
  • by j-w (73063) on Thursday May 03, 2001 @11:05PM (#246823) Homepage

    What we are waiting for is XQuery, that will hopefully make a big difference :-)

    You might want to have a look at Kweelt [upenn.edu] which claim to (and I quote) "implements a query language for XML that satisfies all the requirements from the W3C query-language-requirements" [w3.org]

    Is this what you're waiting for?

    --
    jw
  • by Y-Leen (84208) on Friday May 04, 2001 @01:55AM (#246824)
    doesnt have a query language, which means you dont have that much to use XML fo

    There's a host of languages you can use to pull subsets of XML data out. Everything from XPath expressions with XSLT to building DOM trees or SAX parsers to manipulate the data with your favorite programming language. That's as powerful as you can get.

    large-scale use are yet to come.

    Reuters produces all their news in XML format. There's a contant stream that comes in at a few MB an hour. That's a massice scale use if you ask me.

  • by PicassoJones (315767) on Thursday May 03, 2001 @11:36PM (#246825)
    To all you stating that now this is a "standard" organizations will start "breaking" it:

    It is not a standard, it is an official W3C recommendation. And part of the process of making it a standard is for developers to experiment with it to see what works and what doesn't. So whereas some propietary extensions die out, some survive and become part of the standard.

  • by joenobody (72202) on Thursday May 03, 2001 @11:15PM (#246826)

    So now that it's a standard:

    • Microsoft will create a proprietary version
    • XML will stop being the current "hot" buzzword
    • Browser companies will start getting more creative about why they don't support it
    • Every XML book out there ("XML in 21 Hours!" "XML for Joe Sixpack!") will have a second edition printed "packed" with "new information" - ie. a reprint of the spec in the appendices
    • Current web professionals will groan and start thinking seriously about getting around to learn it
    • Jakob Neilson will write a column about how it's the worst thing a designer could ever consider using
    • Ziff-Davis will write a glowing review (I mean, c'mon, they do it for everything)

  • by Ergo2000 (203269) on Friday May 04, 2001 @05:53AM (#246827) Homepage

    Microsoft already supposed XSD schemas in the MSXML 4 preview release. Microsoft has been more of a force in pushing the implementation of XML than any other company, so to fault them unjustly seems quite silly.

  • by The_Runcible (447043) on Thursday May 03, 2001 @10:14PM (#246828)
    Just look at how unified html has become!
  • by Anonymous Coward on Thursday May 03, 2001 @10:29PM (#246829)
    Well, you see, in the beginning oh, about 15 years ago, there was SGML and DTDs. But the powers that be decided that this was far too complicated. So they decided that they would replace it with a much simpler framework. This new system currently consists of XML, DTDs, XML Schema, CSS, DOM, SAX, SOAP, UDDI, WDDS, WSDL, RDF, RSS, URIs, URLs, URNs, XForms, XHTML, XLink, XML Signature, XPath, XPointer, XSL, XSLT, JAXP, JAXM, TrAX and a few hundred other acronyms and abbreviations which I shall omit for brevity.

    As you can clearly see, the old system was just far too unwieldy and complex. I am glad that they have made things so much simpler.

  • by jlowery (47102) on Friday May 04, 2001 @01:07AM (#246830)
    XML has needed a truly powerful schema language to enforce data constraints in data-heavy documents. This is very much akin to having database schema for databases. With a declarative language and a common processor enforcing primary constraints on data, you free each application from having to do their own consistency checks.

    XML Schema has a lot of powerful features, including the separation of types from structure, two kinds of type inheritance, modularization, default values for attributes and simple elements, and the flexibility to be as strict or as lax as the situation dictates for validation.

    Having said that, the big battle brewing is whether XML Schema is going to be shoehorned into all the other XML protocals that need a data model description before there's been a wide base of practical experience developed. There's already a divide between data modelers and application developers because of the specialized knowledge that SQL and relational database design imposes; I think XML Schema does nothing to narrow that gap, which is unfortunate since class hierarchies and the hierarchical data model of XML seem a natural fit.
  • by divec (48748) on Friday May 04, 2001 @02:27AM (#246831) Homepage

    Not to detract from the humour value of your post, let me give a simple example of everyday XML usage where schemas are essential for XML.

    You've got a database, with a 2 column table. Say "Company name"(char[40]) and "Net profit this year"(int). Ywanna get data to go in this table, in XML format, from another company. That XML's gonna look something like this:

    <data>
    <Company>
    <Name>Lastminute.com</Name&gt ;
    <Profit>-12345678</Profit>
    </Company>
    <Company>
    <Name>Apple</Name>
    <Profit>31337</Prof it>
    </Company>
    [...]
    </data>

    Ok. Now how do you specify that the Company name should be <= 40 characters and the profit should be an integer? A DTD gives no way of doing this, it just says what order the tags can come in. Without XML schema, you're reduced to sending emails saying "Please make the Company name at most 40 chars and please make Profit a signed integer". Which is evil, cos you might have to do that for a 200 table database, and also there's no way of using that email to automatically check that XML file.

    OTOH a schema lets you specify exactly what you want in a precise, even fairly simple, machine-readable format.

    Now do you believe me that schemas are really important? :-)

  • by SnakeStu (60546) on Thursday May 03, 2001 @11:37PM (#246832) Homepage
    I can answer that from the perspective of someone who is looking forward to XML Schema acceptance on a large scale. But first I'm wondering if you're addressing XML in general, or just the Schema specification, because my answer depends on what you're not seeing the sense of. Thus, I'll answer both, as briefly as I can.

    First, for XML itself. What is XML? A standard way to store and describe data in a manner that is readily addressable by virtually any computing platform. I could write Vic20 programs that handle XML (to a limited degree, 4K ain't much to work with). What else offers that? Let's examine a couple alternative data formats that, while not a comprehensive sample, illustrate the problem with non-XML formats. First, a comma-delimited format is pretty well standardized and can be addressed on virtually any computing platform -- but the data is not described. A database in Visual FoxPro provides column names that describe the data -- but it's not readily addressable on a wide variety of platforms (at least not directly). Thus, XML provides the data and the description, even including the relationship among data (i.e., the 'name' is a component of the 'customer').

    So what's the Schema big deal? Well, with XML alone, you can't give someone a data format to follow which provides type checking, length restrictions, etc. If you're trading data with someone, you not only want to know the names and relationships of the data fields, and the data itself, you also want to know how the data will be formed. Is it an integer? Is it a 20 character field? You could presumably build a proprietary extension to XML that would allow you to describe those constraints, but why go through that trouble to get an end result that works only for you, when you can take a pre-built language for describing those constraints which works for everyone?

    If you want to just store your own data, and you're certain that you'll never change your software, then XML doesn't offer much. It's not the most compact format. But if you exchange data with others, and/or if you are likely to change your data management software, XML becomes a valuable tool, and the Schema spec strengthens it considerably.

    (Caveat: I'm relatively new to XML and am definitely in learning mode. The above describes the benefits I see from the viewpoint of someone who has several very messy data exchanges to clean up.)

  • by dgrage (214118) on Thursday May 03, 2001 @10:17PM (#246833)
    DTDs are rules on how the document is to be "formatted". In other words, where certain elements and tags are to be placed within a document. This refers to the document's structure. Liken it to HTML .. most HTML files have <html><body> then </body></html> tags (in that order). So long as those are in the correct order (as specified by the DTD), the document is "verified" correct by a validating parser. Yet, this has nothing to do with the data between those tags.

    This is where schemas come in. They represent a validation against not only the document's structure, but also the data it contains (i.e., the data between the tags). You could liken it to the constraint on a database table's field. I.e., CustomerType = V or I (Valid, or Invalid). To continue the example from above, you could specify a schema the restricts the content of the data between the html and body tags.

    Hope this helps.

Is a person who blows up banks an econoclast?

Working...