DTD vs. XML Schema 248
AShocka writes "The W3C XML Schema Working Group has released the first public Working Draft of Requirements for XML
Schema 1.1. Schemas are technology for specifying and constraining
the structure of XML documents. The draft adds functionality and
clarifies the XML Schema Recommendation Part 1 and Part 2. The XML Schema Valid FAQ
highlights development issues and resources using XML Schema. This article at webmasterbase.com addresses the
XML DTDs Vs XML Schema issue.
Also see the W3C Conversion Tool from DTD to XML Schema
and other XML Schema/DTD Editors."
Power (Score:5, Insightful)
XML Schema are much more flexible and powerful.
There're also about 100 times more difficult and confusing.
Re:Power (Score:4, Insightful)
The phrase "difficult and confusing" goes hand-in-hand with any flexible or powerful computer utilities.
Full utilization of XML (and myriad programming languages) takes time.
They call them "languages" for a reason. You can't write a sonnet in French if you have only studied it for a year; and you can't write a full-featured browser suite if you started coding a month ago.
Re:Power (Score:3, Insightful)
Ironic, no, really... (Score:4, Interesting)
The back-end curmudgeons are right, XML stinks for a universal wire format. But for loosely-coupled, message-based, semantically-rich systems it is hard to beat. And document-oriented systems which don't use XML barely deserve notice any longer.
I gently refer s-expression trolls to paul [prescod.net] and oleg [okmij.org]
Re:Power (Score:5, Informative)
[ibm.com]
Comparing W3C XML Schemas and Document Type Definitions (DTDs)
This is a bit old, but still correct. Not a lot has changed in either spec.
I am currently working on a series of articles on RELAX NG. In most ways, I think RELAX NG really is the best of all worlds. It is more powerful than W3C XML Schemas, while being a natural extension of the semantics of DTDs. Moreover, if you choose to use the compact syntax (non-XML), you get something very easy to read and edit by hand.
David...
Re:Power (Score:2)
I am old, and I am wary of the ways of hype. But after reading this and other comments on this thread, I had a look at the RELAX NG tutorial. [oasis-open.org] All I can say is: wow. Given that this stuff is already known to be formally correct, I am finding it very hard to believe that the W3C should not just punt on XML Schema and just adopt RELAX NG instead. It seems to have every advantage: You can understand it, it is powerful, James Clark endorses it, the tutorial is helpful...what's not to like?
Re:Power (Score:3, Interesting)
I also just read through the RELAX NG tutorial and I am now looking at Bali (for generating Java RELAX NG validators).
Good stuff! I agree with the other poster that W3C should punt on XML Schemas.
That said, I think that for the forseeable future, that simply
using DTDs works well because all the hooks are already
in place for the popular XML parsers.
I suppose the next step would be to get Xerces and other
XML parsers to natively support RELAX NG (I have to look
to see if Clark has such a parser already
- Mark Watson
- Free web books: www.markwatson.com
Use both! (Score:5, Insightful)
Absolutely. All the possible attributes, and kids of any element are there in one (OK, two) place(s) and you can garner the information about any element in a matter of seconds. With XML Schema you have to keep track of the levels of nesting and rifle through a series of name/value pairs to get the same information. It is in its greater expressiveness that the advantage of XSD is seen to lie. And there might be applications where this expressiveness necessitates the use of XSD.
However, XML Schema, has besides this expressivenss, one other great advantage. It is XML. As such it can be processed with the same XML tools one uses elsewhere with an XML application.
As an example, in one application, I take a DTD, translate it into XSD, and then run an XSL stylesheet over the XSD file to generate some base code used in my application. In this way I can ensure that my code will automatically be changed to reflect any minor changes made to my Schema.
So while I continue to write DTDs, I look on XML Schema as a way to translate, and bring my DTD into the XML universe, with all its attendant advantages.
Who needs XML when you got PXML? (Score:2, Informative)
believe me, you won't use XML anymore if you once tried PXML [pault.com]
Re:Who needs XML when you got PXML? (Score:5, Interesting)
There are tons of parsers available.
markup is simple:
(this_is_the_tag
this is all data
(except_this_is_a_nested_tag with still more data))
Even better still, there are customizable parsers available that can treat these S-Expression as data OR interpret them as program OR a combination of both. One such parser is called "Lisp". Once again, several implementations are available.
Note that things like S-Expressions and Lisp have only been around for 40 years so you might want to give these technologies some time to mature.
Re:Who needs XML when you got PXML? (Score:2)
If it was simple and standard, it would be useful, even though it's slow, but if it's complicated and incompatible, it's part of the problem, not part of the solution.
Re:Who needs XML when you got PXML? (Score:4, Interesting)
There are tons of parsers available.
How does one specify the character set in some, imagined or real, S-Expression markup? Do these "tons of parsers" support Unicode at least? Where to put processing instructions? Character entities? External entities? "Raw data" sections with markup suppressed? How does one specify the document type identifier? Namespaces? All these things fulfill important tasks for XML to be an universal, yet concise, markup language, and all this can make your dreamt-up S-Expression language as contrived as XML is sometimes perceived to be. Attributes, I presume, are out of our concern? You note that the means for syntactic description of data trees are around for 40 years. Yet there was yearning for something more... handy, or something. Doesn't it give any hint to you?
Re:Who needs XML when you got PXML? (Score:2)
You need unicode for internationalization, you want namespaces for differentiation of data, you want comments to make.. comments. Troll elsewhere.
Re:Who needs XML when you got PXML? (Score:2)
The hell I won't (Score:5, Insightful)
How do you embed MathML in another document (like XHTML)? Currently it's with namespaces. How do you propose to do that without namespaces? Just the prefixes? What happens when two different markups use the same prefix? Wups! You're screwed!
No comments? This is supposed to make a better alternative to XML? It won't help readability, and it certainly isn't a major bottleneck during parsing.
Don't want the "bloat" of namespaces and comments? Wait for it... Wait for it... Don't use namespaces and comments in your documents! Wow! What a concept!
Maybe no Unicode in PXML hunh? So much for interoperability for any kind of data. You don't ever want your pet project used in East Asia (or Russia or Greece or most other places in the world) do you? Unicode too bloated? Why not just use ISO-8859-15 (basically ASCII w/ a Euro character -- which incidentally a Euro character isn't available in ASCII)? Oh wait! That's right. You don't want to allow processing instructions, which in XML tell you what encoding is used.
What happens if you want to change some of the basic syntax of PXML? Because you've nuked processing instructions, you can't specify a markup version like you can in XML.
Yes, yes. We've all seen your little pet project. I hope it was just a class assignment.
Re:The hell I won't (Score:2)
Second feature I'd consider removing would be CDATA sections. It is nifty when manually modifying XML, but otherwise it's just a pain (not a huge one for XML-parser, but additional bloat).
For other people list would look different I'm sure. :-)
I agree with you on entities (Score:3, Informative)
'<' takes up less space than '<'. Assuming you have more than three or four of these in your text node, a CDATA section reduces the size of your document. For the parser, after the CDATA section is begun, only the character sequence ']]>' can end it. This means the parser only has to check for ']]>' and not '<', '&', '<?', '<!', etc.
And yes, there is such a beast called XInclude, but it's currently only a candidate release. It's used like this:
<foo>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="bar.xml" parse="xml">
<xi:fallback>
<para>This text goes in if bar.xml cannot be found or has an error</para>
</xi:fallback>
</xi:include>
</foo>
Hopefully most entities can go the way of the dodo.
Re:I agree with you on entities (Score:2)
Re:I agree with you on entities (Score:2)
Well, CDATA is nothing like entities no matter what, and parsing them is not THAT much of a problem. But there are some subtleties, when programmatically accessing CDATA blocks (when manually editing there are no problems; it's just automatic processing that's trickier). In any case, CDATA is an extra feature that's not really "needed"; normal quoting can be used instead... it's a convenience feature.
However, doing quote/unquote when parsing/outputting is a breeze as well, and outputting CDATA sections automatically and reliably is tricky. At least if you want to do it 100% foolproof (granted, need to include ]]> token anywhere is a small, but still a possibility). You need to check for ]]> and split contents into two. When parsing contents the problem (minor I guess) is that whereas text segments are usually normalized (ie. when reading a new doc without mods, you never get 2 text segments in a row), you can get combination of CDATA blocks and text segments; there's no way for parser to combine them on the fly (well, DOM-parsers do have the normalize method that may do combination? Or perhaps that's not allowed by specs?)
But just like entities, CDATA is meant for manual quick-quoting of blocks, and makes it easier for humans to quickly understand the contents. For programs it doesn't make big difference.
(streamability, ie. reading only what is needed currently while still managing some structure unlike SAX, was needed to handle > 100 meg XML export files... implemented both read-only and write-only versions for internal use).
Who needs PXML when you got HTML? (Score:4, Funny)
believe me, you wont use XML (and those pesky XSLTs) anymore if you once tried HTML [w3.org]
AND (most importantly) in virtually every single web browser that you can find, support for viewing this format over the internet is available and built into the browser itself!
Vs.? What is this, another poll? (Score:4, Funny)
One is derided, one is end-of-life'd (Score:4, Interesting)
While the W3 continues to push Schema, they are also forming working groups for RELAX after pressure from XML luminaries such as James Clark.
Re:One is derided, one is end-of-life'd (Score:5, Informative)
I think James Clarke's RELAX NG and W3C XML Schema [imc.org] is the best description (if slightly biased ;--) of the relative strength of the 2 technologies. Note that James Clarke also just released a new version of Trang [thaiopensource.com] , a tool that does conversions between Relax NG, Schemas and DTDs.
WTF!!!! (Score:3, Funny)
Re:WTF!!!! (Score:2)
I learned XML a while back, and we learned Schemas and DTDs. While I can write a DTD in 10 seconds, it takes literally hours for me to write a useful XML Schema that is dynamically populated. But it's been around.
All this hype about XML (Score:4, Insightful)
On the other hand the one thing that I did find XML useful for is easy parsing. If you use XML to develop a lower level protocol you end up with bloated 10k messages. But for high-level protocols or for configuration files it's great for only one reason: There are lots of ready-made tools. If you want to parse XML in Windows just load the IXMLDocument interface and it works at lightening speed. If you want to parse the messages in a web-browser through together a quick DOM parser or even use the build in DOM one! If you want to parse XML in PERL or C/C++ there are great libs. The only reason XML is good is because all the hype got people developing very neat tools. In one of my latest projects that needs to pass information between two programs written in different languages a used a Home-Made SOAP and designed a base class the persists using XML. I developed it in both langauges in under an hour!
So although it wastes bandwidth and there really isn't anything neat about it, it is comfortable I'll give it that.
Re:All this hype about XML (Score:5, Insightful)
Great thing about XML, is if you need to convert your communications, you can write XSLT against it to convert it while you convert your XML source.. easily. For instance, one vendor I worked with decided that the old protocol didn't work well anymore, and a ne one would be better. Forget the reasons for the change, good or bad.
I plopped an XSLT processor in front of it. Took minutes to implement. In the mean time, I was able to properly rewrite the XML producing code. So I had some flexibility in terms of patching the protocol quickly, while taking the weeks I needed to fix things right.
As for self describing, what is more self describing than HTML? You see a bold and italics tag around an element, you can easily figure out what style the text would be in. Yes, I know about CSS, but the point is, XML IS descriptive, so long as you use good names. Naming elements a, b and c is just developer fault.
If in today's age of gigabit ethernet and cheap parts, you really really need to squeeze that extra bit through, compress the line. Seriously. Simplest case, is using ssh. Hell, it auth's AND encrypts. If you are worried about anonymous access, there are other tools.
Re:All this hype about XML (Score:4, Interesting)
Great thing about Lisp, is if you need to convert your communications, you can write Lisp against it to convert it while you convert your Lisp source.. easily.
I plopped an XSLT processor in front of it. Took minutes to implement. In the mean time, I was able to properly rewrite the XML producing code. So I had some flexibility in terms of patching the protocol quickly, while taking the weeks I needed to fix things right.
I plopped a Lisp processor in front of it. Took minutes to implement. In the mean time, I was able to properly rewrite the Lisp producing code. So I had some flexibility in terms of patching the protocol quickly, while taking the weeks I needed to fix things right.
the point is, XML IS descriptive, so long as you use good names.
the point is, Lisp IS descriptive, so long as you use good names.
If you use XML to develop a lower level protocol you end up with bloated 10k messages.
If you use Lisp S-expressions to develop a lower level protocol you don't end up with bloated 10k messages.
Besides, in Common Lisp [elwoodcorp.com] you'll really appreciate MOP [mini.net] - Meta-Object Protocol. Much better than SOAP.
Trust me, I know well, actively use and actually love both Lisp *AND* XML.
Re:All this hype about XML (Score:3, Interesting)
It is not just a matter of using good, descriptive names. Whatever code is reading the xml is going to have to know what the names mean. A program reading xml could care less if the name is "a" or "AVeryMeaningfulName"
Re:All this hype about XML (Score:2)
I suggest re-reading my post.
And don't make judgements on my efficiency. XML is a technology that can be used for great things. If you don't know how to apply it properly, it's no one else's fault other than your own.
Re:All this hype about XML (Score:2)
Re:All this hype about XML (Score:5, Informative)
Gzip uses the Lempel-Ziv algorithm used in zip and PKZIP.
The amount of compression obtained depends on the size of
the input and the distribution of common substrings. Typ-
ically, text such as source code or English is reduced by
60-70%. Compression is generally much better than that
achieved by LZW (as used in compress), Huffman coding (as
used in pack), or adaptive Huffman coding (compact).
Mind you, XML is highly repeditive in it's tag use on long documents. Long as in multiple records, not necessarily byte length.
Now let's take a larger file, 'cause after all, since modem users can download 5k html really quick. I've taken the soap distribution from apache (or was it sun) and took all the xml files in there and concatonated them together. 22k XML file. Not huge, but big enough for this example.
Here's my findings:
[caligraphy:~] spencerp% ls -al o.xml
-rw-r--r-- 1 spencerp staff 22118 Jan 23 21:21 o.xml
[caligraphy:~] spencerp% gzip o.xml
[caligraphy:~] spencerp% ls -al o.xml.gz
-rw-r--r-- 1 spencerp staff 3021 Jan 23 21:21 o.xml.gz
[caligraphy:~] spencerp% gzip -l o.xml.gz
compressed uncompr. ratio uncompressed_name
3021 22118 86.4% o.xml
Not bad for taking non repeditive text, with random xml schemas and getting 86.4%. Now imagine a larger one with a consistent schema. Compression goes even higher. Granted, it will be slightly larger than a binary. But even a 100meg file can be moved across a 100megabit network in 5 minutes time. And THAT is a lot of data.
Btw, there is a falacy with your math. If I get 50% compression of an XML file, which could have been implemented in binary format, it doesn't mean the binary format would be 49 times smaller.
Re:All this hype about XML (Score:2)
Re:All this hype about XML (Score:5, Insightful)
Amen to that. Sad to say, but certain parts of the IT industry (and in particular, anything to with one computer (or piece of code) magically talking to another one owned by a different organization) are constantly buying into the bogus claims of snakeoil salesman with silver bullet technologies. XML is merely the latest in a long line.
The only new things about XML, IMHO, are that is has spawned more sub-specifications than any previous pretender to the crown.
Anyone remember CORBA ? Or any of the other zillions of RPC-type mechanisms that people have jumped on the bandwagon of ?
I'm not blaiming the people who push these agendas. I too would love to spend my weekends sunning on the beaches of exotic European tourist destinations and chugging beers on my expense account. The price of sitting through a ferw stiflingly boring and pointless standards meetings seems a small price to pay. All large IT companies employ 2 or 3 people whose job it is to front up to these meetings. Typically these people are articulate and highly versed ex-programmers but architecurally challenged and with little understanding of the real nature of building complex IT systems.
Ultimately, these RPC mechanisms all end up as nothing - or rather, as only perhaps 1% of the eventual solution.
All that XML is, is an easy-to-parse, text based data transfer mechanism. And as the parent posting says, there are some nice tools around for it. Big deal. Probably you'd be silly to use anything else if designing a data transfer. But is it ever going to change the world ? Or even rock it a little ? No.
Re:All this hype about XML (Score:5, Insightful)
As for SOAP and XML-RPC, what's so hard about compressing it before sending the message? The whole point about XML is that you don't need to write a new parser. You don't need to write a new broadcaster. Your project is about getting a task done, not micromanaging implementation details.
If (and only if) your higher level API/transport is insufficient for the task do you roll your sleeves up and dig in. Do you write everything in assembly? Why not? It would be faster than whatever language you are using now. The reason you don't is that you have better things to do with your time. The goal is important, not the tool. Everyone has standardized on and is optimizing this one particular tool and it works well. So many people have done work so that you don't have to.
Will it change the world? Of course not. It's just a markup language. Will any other computing tool change the world? Of course not. The end users have never cared how you got to the solution. They cared only if you got to the solution faster than the other guy.
Re:All this hype about XML (Score:2, Insightful)
As for SOAP and XML-RPC, what's so hard about compressing it before sending the message?
Well, that it is hard. Try forking a few thousand gzip processes and you'll see what I mean.
Your project is about getting a task done, not micromanaging implementation details.
Um, you're the one suggesting we should use compression to manage the SOAP and XML-RPC overhead. That shure sounds like micromanagement of implementation detail to me.
Do you write everything in assembly?
Well, in fact, for years, I didn't, but I recently picked it up again, and the speed gains are tremendous, in just a few dozens lines of code.
The end users have never cared how you got to the solution. They cared only if you got to the solution faster than the other guy.
So then how does that explain why developers all over the world are suffering through hundreds (thousands) of pages of documentation just to send a message across the room? Standards are good and XML is progress, but
Re:All this hype about XML (Score:2)
But in truth I've done CORBA and XML-RPC. I far prefer the simplicity of XML-RPC over the complexity of the CORBA specification. I found XML-RPC to be more reliable as well, but I'm probably making the mistake of judging the technologies or CORBA and XML-RPC relative to the abilities of the implementations I had on hand.
Great article! (Score:2)
I tend to agree, but don't dig out the ORB yet... (Score:3, Insightful)
Basically, they are saying - use HTTP as it was intened to be used, not abusing it in a way it was not meant to be abused.
Where to begin... (Score:3, Insightful)
What happens when you want to have an Alpha box, a Pentium box, a handheld device, and an UltraSPARC box talk to one another? Simple, right? After all, an int is always 32 bits...err...umm...and everything is big endian...err...ummm...and all architectures use the same data structure padding... Well, at least your program took care of the padding issue...for that one data structure.
Wups! We've got a core dump waiting to happen. Okay, so we'll just make sure that everyone is using the same sizes and padding all around for any data structure I may need to pass over the network. Of course, this requires a mapping layer so I don't have to do this for every app and data structure that I write. I know: it's for interfaces and defines the general structure. I'll call it Interface Definition Language or IDL for short. Now I'll make sure that all of this information serializes to the network correctly and decodes on the other end without errors. This will be kind of like a stock broker it that I tell it what I want, and it translates it into something usable but more complex than I need to deal with for each app. I think I'll call it a broker too...an object broker...wait...missing something...messages going back and forth...asking for resources...aha! Object Request Broker! Yeah! Oh wait, but people may have different implementations and I want to be able to work with others. Let's agree on this. We'll call it the Common Object Request Broker...ummm...Architecture! Yeah, that's it!
Hmmm...now I need to make a configuration file for my program. I'll make it plain text. Hmmm...but it needs some kind of structure. I'll make it key/value pairs -- just put in a few equal signs and I'm done. Uh oh. My program is fairly modular, but I want to keep all of the settings in one place. If it's just key/value pairs, everything will get jumbled together. I know! I'll use an INI file. Microsoft used to use those to group items together. Now I can just use those nifty GetPrivateProfileString calls, specify the group and the key, and away I go. But uh oh! I have this subcomponent that requires a group within a group. Let me hack something together... Argh! This data file is getting tougher and tougher to parse. I want to finish writing my program that does something useful, not fiddle away at a dumb configuration file parser. What I need is a standardized, hierarchical format that is still plain text and human readable. Hmmm...what's this "XML" thing? I can have the configuration all in memory or read it in piecemeal? Parsers are already written? If I don't like the parser I'm using, I can just plug in another one? I can read the file from any programming language out there? Sign me up!
FYI: This binary vs. "plain text" tripe needs to go away. All text files are binary files. What is the letter 'B' but a 0x42 (66 in decimal, 01000010 in binary)? It's a piece of translation software that turns that 0x42 into the character 'B' on our screens. I just so happens that <foo/> is clearer to the human eye -- after the preliminary software translation step -- than a serialized C data structure. Clearer to the human eye means that the human fixing bugs can see the error faster. CPUs are hovering aroung the 3GHz range now, but the human mind seems to be falling further and further behind Moore's Law. Perhaps we should help the human mind out a bit and give a bit more work to the CPU.
Yes...I know...I'm a dick. I'm comfortable with that.
Re:All this hype about XML (Score:2)
Corba is more than just a data format. It's an architecture. XML is not an architecture, it's just a data format.
You give these people 0 credit. Really. They probably have real jobs doing real things, while for the company's benefit help in creating these standards.
Ultimately, these RPC mechanisms all end up as nothing - or rather, as only perhaps 1% of the eventual solution.
All that XML is, is an easy-to-parse, text based data transfer mechanism. And as the parent posting says, there are some nice tools around for it. Big deal. Probably you'd be silly to use anything else if designing a data transfer. But is it ever going to change the world ? Or even rock it a little ? No.
Disclaimer:Not reviewed for relevance or accuracy - results may vary.
Re:All this hype about XML (Score:2)
Re:All this hype about XML (Score:2, Informative)
Sub-specifications?
You mean like MathML, SMIL, SVG, XHTML, et al.?
These are all modular lanuages that use XML.
The XML client application uses one or more DTD or schema to determine how to interpret the various elements in the XML file, and you can intermingle e.g. MathML and XHTML and so forth all in the same XML file.
Unless I'm grossly misinterpreting your comment (in which case I apologize), I can safely say that you didn't understand the article, since these "sub-specifications" you mentioned are exactly what DTDs/Schema are for, and exactly what makes XML a Good Thing.
They didn't call it "Extensible" just so they could put a nice pretty "X" in "XML". (Though in all fairness, I must wonder if anyone could take something called "EML" seriously... ;)
Re:All this hype about XML (Score:2, Insightful)
Well, that statement is on par with saying that ASCII is just an OVERHYPED binary format for storing text. Its not, and neither is XML for the same reasons.
Xml allows me to stamp out robust document schemas in minutes or hours, instead of months or even years if working from scratch. Because of the rich set of tools you mention, I don't have to write a metric ass-load of documentation on my formats, either. XML spec + my extensions == all the client needs. Because XML is a stable standard, things like MathML, ChemML, DocBook, DOM, etc. can exist, and proceed to maturity faster than otherwise.
Yes, there were some that want to XMLify everything, but that's not an intrinsic fault of XML any more than when some dumb programmer that wants to redesign the Linux kernel to use ASCII-based API calls...
XML is Great of Content Syndication and much more (Score:5, Informative)
In my experience, many benefits of XML come when dealing with the presentation layers of many application architectures, with the ability to repurpose syndicated data at wil, here are a few examples:
Effective use of XML and XSLT allows you to easily aggregate informational data from one or multiple sources and "repurpose" for an infinite variety of business and technological goals.
One of the main benefits of XML is that it offers and effective, textual representation of "scructured data", that can be conveniently accessed and manipulated according to a slew of various surrounding standards such as XPath [w3.org], DOM [w3.org], XSLT [w3.org], namespaces [w3.org].
Re:XML is Great of Content Syndication and much mo (Score:2)
It would be the exact same thing- except it would be faster, use less bandwidth, be more secure, have session level security (which HTTP lacks). But it wouldn't be buzzword compliant.
Re:XML is Great of Content Syndication and much mo (Score:4, Informative)
Well, that's why you'd use HTTPS with certificates, no? And nothing is wrong with the port. If you meant HTTP, then yeah, it's plaintext.
Mind you, I don't have a choice of OS's at work. We use solaris and linux. Now amazon, being a windows shop (i'm guessing), only gives out dll's. Great, now I'm not supported. So fine, we use java. Did you know java class (binaries) are versioned? I'm stuck with 1.3.1 ATM and a 1.4 jdk is in the works. Problem is, some jdk's use one version of the binary while another uses.. another. I always hoped it was a universal format. Sadly let down.
That's why technologies like JAXB and translets are poping up. with JAXB, you can bind particular classes to particular schemas/dtd. It speeds up processing. Translets are just compiled XSLT. Really fast since your xslt can be compiled/interpted once, run anyhwere. Kind of a chain technology. translet->xslt->java->machine language.
And mind you, nothing is more secure about a binary format. It's just obfuscated. Hell, I hacked rengeade bbs's users database format so i can write a user deletion tool. Were they going for security, prolly not. Point is, binary is just obfuscated.
As for your sessoin level security, that's not the job of your data format. Your data format and transport layer should be indepenent. It's why you can do SOAP over HTTP, SMTP/mail and possibly anything else that has a function() like response format. request->response. It's probably why ssh is so great. All it is, is a way of authentication, communication and encryption. You can create ssh tunnels for http as a proxy.
Re:XML is Great of Content Syndication and much mo (Score:2)
Re:XML is Great of Content Syndication and much mo (Score:3, Insightful)
Open data formats are a good thing.
Re:All this hype about XML (Score:2)
Another point- XML is not more open. Its only as open as the developer wants it to be. He can use a wierd XML schema made to obfuscate (or use an xml schema, ignore the parser, and have the real fields in the data for the tags. Ooh, that'd be evil. Watch MS do it with Office) and it becomes as bad as binary. Meanwhile if a binary format has its format published, it becomes as open as XML claims to be.
So where exactly is the gain? I'm mising it. Oh, wait- XML is buzzword compliant. And it has an X, and X is cool, look at all the xtreme sports. Bleh.
Re:All this hype about XML (Score:3, Insightful)
We would have a standard binary format of information exchange that is small and much easier to create and parse(from a performance standpoint). You can still edit the xml by hand with a decompiler, which would be a VERY trivial editor. Hell... even verification of the data would be trivial. Someone will make one to improve performance of XML-RPC some day by setting up proxies, and you will be able to achieve DSL results on a modem.
Re:All this hype about XML (Score:2)
Now. I know a lot of people are going to moan about how slow XML is compared to any DB, and they'd be righ--at the moment. But there is one thing that XML has that DBs don't, and that's fexibility. You can add new elements to XML as you go. You can't do that with a DB (well you can, but any DB admin/desiginer would shoot you for it, and it would be hard), DBs are designed to be strict and uniformed.
Some data types need fexibility, and this is where XML benifits.
You said that the tools for XML are great, don't you think that could be because of the way XML was designed?
Re:All this hype about XML (Score:2, Informative)
Re:All this hype about XML (Score:3, Insightful)
That's a matter of opinion. XML on it's own isn't too impressive. It's the other technologies such as XSLT, Schema, XInclude, XPath, SOAP, RelaxNG, XML-RPC, SVGML which accompany XML which really make XML a big deal.
If
<PERSON>
<NAME>
<FIRST>BOB</FIRST>
<LAST>MARTIN</LAST>
</NAME>
</PERSON>
isn't descriptive, I don't know what is.
Re:All this hype about XML (Score:2)
Wow, that's insightful!
In other news, implementing a clustered relational database system to store preferences for an email program is killer in terms of expense and complexity.
Re:All this hype about XML (Score:2)
Oh yeah? Wait until you have been approached by the management to have your data viz. output all jazzed up in XML.
Dude(ette?), it might be obvious, but I speak from unfortunate first hand experience.
Almost all the DBs that I've worked to export into XML have ended up having weird schemas. That is a fact. Take a look at most database schemas in real-life projects. Even by themselves, they are a big pain. Is your DB all normalised and all redundancies removed? Have fun!
Go try exporting any half-decent Natural Language or Gene Data. XML would not even budge. Kiss your data and app goodbye.
What I meant was that XML does not scale for most real-life data apps, which would almost always tend to have large, nested and complex hierarchies. It defeats its purpose, except for maybe in an academic environment or certain exceptional cases, or perhaps when your need far outweighs the payoff.
Re:All this hype about XML (Score:2)
In shorter words than my first post: DUH. Who the hell is trying to store large complex databases in XML?
Your problem isn't that XML doesn't scale, your problem is that you have no clue where it is appropriately used.
Re:All this hype about XML (Score:2)
I have worked on such a projects. We had very higherarchial data and decided to go with XML as the main data format. We used perl as a native interface. Hey Perl has great XML libraries. We used XSL and FOP for report writing. It gave us hugh gains in extendability and maintainability over our previously human readable format (which is a requirement for our line of business).
This was not a small project. Several 10K per database, the XML ended up being 1K's of lines long, but in the long run, the calculations we did on the data long far longer than the actuall XML manipulations. Just because you can not fathom a large project being successful which XML, does not mean it can not exist.
The time spent on an alternate binary solution probably would have taken another man year to implement.
Re:All this hype about XML (Score:2)
It sounds like whomever architected your dtd or schema foulded up. Or whomever is generating the data you need is giving youtoo much.
So you are blaming a buggy server on XML? What if it was... a large CSV file? Or some other format? Hell, photoshop uses binary internal storage for working with their native format. It even uses scratch disk.
Besides, you are making a large generalization. I've written XSLT over large XML documents that take a while to translate that work quite well. Maybe you have bad ram?
Then you are using the wrong technology. If you are trying to deal with gig-sized files, you should prolly be using a SAX processor, sinec they take up realtively little ram. DOM is great for when you don't have JAXB (java xml binding) available and you want your data in memory, in an OO/structured memory format.
RelaxNG (Score:2, Interesting)
Schemas are often a bad idea (Score:3, Insightful)
On very important use is in creating interfaces between heterogeneous systems. Areadable character set and meaningful tags is very handy for developers. The hierarchical structure is extremely powerful. And, of course, the fact that it is a standard with common tools is invaluable.
However, one useful principle of such interfaces is "if you don't understand it, ignore it." In other words, when you get a message, look for what you want in it and use it. Ignore anything that isn't what you want. XML is ideally suited for this approach - especially if you use path based access rather than DOM tree traversal.
This approach to interfaces allows systems to interchange messages without exact version consistency, and without requiring a tight congruence of the applications. It allows a system to "tell what it knows" and another system to "read what it needs" without further ado.
Unfortunately, the use of schemas goes against this idea. It is IMHO a more old fashioned approach of rigidly constraining the messages to an exact specification. This can make interfaces far less robust and flexible, and increase the amount of work.
Schema processing may also be promoted to "verify" message integrity before processing. However, it only does so in the most primitive ways. Real world messages, especially in the business world, tend to have integrity rules that go far beyond what can be expressed in anything short of a complex computer program or equivalent declarations.
I am sure there are plenty of places where schemas make sense, but in the areas of commercial message interchange, they take a powerful and flexible construct and hobble it.
Not really (Score:5, Insightful)
Unfortunately, the use of schemas goes against this idea. It is IMHO a more old fashioned approach of rigidly constraining the messages to an exact specification. This can make interfaces far less robust and flexible, and increase the amount of work.
If your talking about using XML for data messaging not using schemas is just lazy. XML Schema allows optional elements and attributes and/or default values. So if it isn't required, then just make it optional. If you want multiversion interfaces, you have a different XMLSchema for each version. Then each side knows explicitly what the messaging protocol is.
While it's probably true that things mostly kinda work if the versions don't match, you shouldn't be relying on this. There's lots of software out there that does this but that doesn't mean it's the ideal.
If your using XML for markup of documents, schemas are somewhat less useful since the underlying semantics of the tags is usually more important.
Re:Not really (Score:3, Insightful)
Lazy in this circumstance is often good. What you just described is a bunch of work, which translates into *money*. The important question to ask is what is the utility of creating this schema, vs what is the cost of doing so. The answer varies from case to case.
After all, do I really care that much that a message passes a schema validation? It doesn't tell me that it is valid, since most of the validity is determined by far more complex criteria than can be expressed in a schema. IOW, what you assert about underlying semantics of documents is even more true with business transactions. A schema doesn't *document* those details of the "protocol".
Furthermore, XML messages (with the exception of configuration files where schema may actually be quite useful) are normally generated by computers, not people. The rules to generate those messages are then embedded in code (or tables, which is code by another name). Once it works, it will usually continue to work. So again, the schema has offered no advantage, while adding bureaucracy.
As an analogy, consider a schema to be like a syntax checker. It can tell you if the niggling details are right, but it can't tell you about the whether the proram will work. Since in many cases of message exchange, the niggling details are not even important, this is often a waste of time!
Lazy != Good (Score:2)
Work does translate into *money*, not doing work doesn't translate into *saving money* except maybe in the extremely short term.
Furthermore, XML messages (with the exception of configuration files where schema may actually be quite useful) are normally generated by computers, not people. The rules to generate those messages are then embedded in code (or tables, which is code by another name). Once it works, it will usually continue to work. So again, the schema has offered no advantage, while adding bureaucracy.
It's true that XMLSchema provides syntactic rather than semantic constraints. But that's *really* useful information. For example XML Schema allows type checking. Sure you can just treat everything as a string and ignore the problem. You can also use it to contrain the valid values for something with regular expressions. This allows you to do assertions at the protocol level. Again, I can get away with not using them but in the long term, that's just stupid.
And if your schema is generated by computer doesn't that make it more useful, not less? It's like saying that COM/CORBA interfaces are nice but IDL is just pointless niggling...
As an analogy, consider a schema to be like a syntax checker. It can tell you if the niggling details are right, but it can't tell you about the whether the proram will work. Since in many cases of message exchange, the niggling details are not even important, this is often a waste of time!
Yes, you could consider an XMLSchema as kind of type checking and syntax checking for your XML. It's been my experience that most real problems are niggling details (unless your doing demoware). Given the broad spectrum of programming tasks out there using XML these days, it would be careless to say that they *all* need Schema (and/or schema validation) which I didn't. But saying that Schemas are always (or for that matter often) a waste of time is IMO a lazy attitude.
No they don't (Score:3, Insightful)
And finally, Schemas don't force any of that on you. If you don't need schema support, then don't turn it on in your parser. You can still grab what you need out of the tree. Although you might not be able to throw just anything into it, that's probably a good thing. The last thing the world needs is thousands of tiny, ill-conceived exotic extensions to various Datatypes. It would make achieving universal compatibility a nightmare.
If your app doesn't need schemas, don't use 'em. If you don't need to validate, don't check em. If you need to put more data into your tree, maybe you should rethink what your doing or rewrite maybe your schema.
Re:Schemas are often a bad idea (Score:2)
I think you miss the point. When I am designing an interface, an important question is *whether to have a schema* in the first place!
And frankly, the less structure that is enforced on the XML file, the better! The important information is whether the XML file contains the necessary information, and a schema cannot provide you with that information because it cannot represent the semantic relationships except in the most trivial (CS 101) cases!
The purpose of an interface is to transmit information, not make people feel good because it conforms to some arbitrary structure!
If or When? (Score:2, Funny)
Validating with XML Schemas (Score:4, Interesting)
It's occurred to me maybe we are being too diligent in actually validating the schema itself, but I'm wondering what others think?
Re:Validating with XML Schemas (Score:2)
Maybe. See, at our shop we're a bit lazy and often times our apps don't check validity at all. I think none of our apps really goes beyond the local realm of the validation chain which has its advantages.
Besides, you should keep a cached copy of the w3c master docs around. They are not changed very often, so you could as well keep them locally forever without having to have internet connection (which also slows everything down).
Re:Validating with XML Schemas (Score:2)
Validate at system boundries. Once in the system, you no longer need to validate, as it's already been done.
Re:Validating with XML Schemas (Score:5, Informative)
In most cases, if you are doing schema validation, you already know whta schema you can expect, so they should be not only locally available, but also cached in memory...
As for the
Re:Validating with XML Schemas (Score:2)
XML Schemas are in XML (Score:4, Informative)
One of the greatest things about XML schemas is that they themselves are well-formed XML documents. This makes it a breeze to parse and create XML Schemas. I've just started using XML Schemas in development for the past few months, and they are fantastic. A huge improvement over both DTD and XDR (Microsoft's temporary schema format until XML Schemas came out).
SDAI (Score:2)
web masterbase.com, hrm... (Score:2)
XML Schemas aren't just for validation (Score:5, Interesting)
I can't believe nobody's mentioned this yet. Microsoft has a tool [microsoft.com] that will do several things:
This makes writing your XSD almost trivial. The code-generation capabilities are very powerful, as well, as you can generate runtime classes for serialization/deserialization or classes derived from DataSet so you can treat XML files like any other database, etc. It's very useful if you're doing any
I'd be very surprised if there weren't other tools out there doing similar things. I simply mentioned xsd.exe because that's what I'm familiar with.
Re:XML Schemas aren't just for validation (Score:2)
Shouldn't that be "schemata"? (Score:3, Funny)
When I was in school, the plural of "schema" was "schemata".
</>
I've already selected "No Karma Bonus". Beyond that I can't mod myself downward.
They already addressed this issue (Score:3, Interesting)
The Schema WG decided on "schemas" so as not to add unexpected obscurity to the specification.
See this message [w3.org].
Expected obscurity is of course just fine.
Parsing without a DTD (Score:4, Informative)
The key to robust parsing is deferring the decision as to whether a tag has a closing tag until you've seen enough input to know. You have to read the whole document in, build a tree, then work on the tree, but for anything serious you want to do that anyway.
This parser is in Perl. If anyone would like to take it over and put it on CPAN, let me know.
only partially agree (Score:3, Interesting)
I don't agree with you that schema validation is useless. In many cases the documents are fully processed for business rules much later, but you want acknowledgement that your document has reached correctly and it passes atleast the most basic validation (e.g. dtd or schema validation). XML Schema do wonderful job at that. In our case, we always keep schema validation on new doc types until the system is stable and bug free and then remove validation for efficiency (for internal docs). We have discovered many subtle bugs in system which would have been extremely hard to track by looking at application error but were easier to find by looking at parser errors.
Re:only partially agree (Score:2)
One of the benefits of schema validation is that it is not a "yes/no" result like DTD validation is. When properly using the PSVI (Post Schema Validation Infoset) you can achieve exactly the results that you want - you will know if the parts of the XML instance that you are interested in are there, constrained by the partial schema that you provide...
Relax-NG is a Draft ISO Standard (Score:3, Interesting)
It would be somewhat unfortunate if both end up popular, because it will be more work to maintain both sets of tools than either one alone. That's probably what will happen, though, at least in the short term.
Links (Score:2)
Official?? site of RELAX [xml.gr.jp] (RELAX earthiling! we come in peace!)
OASIS on Relax-ng [oasis-open.org] (much more dry).
I'm not sure it would be so bad if both standards came to be popular. A few years ago at an XML conference one of the speakers described the XML world being split into three camps - data modelers (who would be backing XML-Schema), Document-centric folks (who would back RELAX), and one other group (whose leanings I forget but I guess they don't care about typed XML documents!!). Having a data-centric and document-centric approach to XML might not be so bad, each having good uses in different scenarios.
XML and Schemas (Score:2, Interesting)
Check out Relax NG (RNG) (Score:3, Informative)
What's more there's a fantastic tool dtdinst [thaiopensource.com] that converts DTDs into Relax NG. There's also tools to convert back and forth between WXS and RNG. So if I ever need to provide someone with a WXS schema I can just run it off automatically.
Now I'm working on a system using AxKit [axkit.org] to parse out the RNG schema, generate HTML forms for completion, roundtrip the data back to the server, assemble an instance document using DOM and display it using XSLT and CSS. But that's another story. People who don't "get" XML should really check out AxKit.
simon
DTDs are broken (Score:4, Interesting)
XML Schema is also kinda whacked. It shows all the signs of being a committee specification.
The big problem with schema is that you actually have two type systems going. Element definitions are types for elements. Type definitions are actualy types for types for elements. I saw a hopelessly confused attempt by some UML people to express XML schema in UML, they simply could not understand that there was no way it could ever work. UML has completely different semantics.
There are a bunch of schema proposals that folk have said good things about. Eve keeps telling me I should look at Relax. But for the time being XML schema is going to be the basis for standards in W3C and OASIS.
There might be an opportunity to do a clean up job on XML schema in 4 or 5 years but that will only happen if it is causing real problems.
Re:DTDs are broken (Score:2)
The thing that bores me about all of the acronyms churned out by marketing is that these are all systems for representing information.
Sure, the semantic power of the systems vary. Yet, if the DTD works, Don't Touch Dat!
There is some trick of the mind whereby we develop religious attachment to the acronym du jour, and seek salvation in spiffy new technologies.
Gotta spend that budget before the money evaporates!
Yet, behind the high-frequencies emitted by the hype box is the bass hum of simple systems plugging along, producing the correct answers without all of the shiny chrome...
What will be the next variation on the theme that will save us from the (retrospectively obvious) weaknesses of XML?
Stay tuned.
</curmudgeon>
Re:DTDs are broken (Score:2)
Find myself doubting the Three Amigos would agree with you.
I know they agree.
XML is about describing data and the universe of variants in which that data my be presentated.
E.g. you can describe that two element types, lets call them and may be used inside of the body of the element
In XML you can define that the tag needs to be in front of the tag.
In UML you can't describe that.
That measn you describe in XML how the data needs to look that you want to process.
In UML you describe what abilities the processor needs to have to be able to process the data.
Hence you can describe in UML how a XML parser needs to be implemetned, but you can not describe the data the parser will parse with UML.
At least not in general, in some cases you might be able to do it, however.
angel'o'sphere
Re:DTDs are broken (Score:2)
insert HEAD, BODY and HTML at the white places in my post
What the heck does plain text mean except: this is plain text, dont do anything with it!!
AGAIN:
Find myself doubting the Three Amigos would agree with you.
I know they agree.
XML is about describing data and the universe of variants in which that data my be presentated.
E.g. you can describe that two element types, lets call them HEAD and BODY may be used inside of the body of the element HTML.
In XML you can define that the HEAD tag needs to be in front of the BODY tag.
In UML you can't describe that.
That means you describe in XML how the data needs to look that you want to process.
In UML you describe what abilities the processor needs to have to be able to process the data.
Hence you can describe in UML how a XML parser needs to be implemetned, but you can not describe the data the parser will parse with UML.
At least not in general, in some cases you might be able to do it, however.
angel'o'sphere
Re:DTDs are broken (Score:2)
& g t ; & l t ; (sans whitespace) are the keys.
Last time I looked at it in detail (1.0?), UML had an extension mechanism.
Surely, Rational will take effective action to protect market share if the Xxx onslaught becomes too fearsome.
To do otherwise would be...(adjective).
Re:DTDs are broken (Score:2)
Yeah but will there be any point?
UML is already bodged, further extensions are not going to help much.
I never saw the value of graphical methods until I became a consultant. Now I understand that the difference between a $500 a day consultant and a $5,000 a day consultant is the ability to use powerpoint and visio to confuse and confound.
They say a picture is worth a thousand words. If you are the customer, running code, debugged or not is worth a thousand stupid pictures in any graphical programming methodology you chose.
Re:DTDs are broken (Score:2)
Oh, let's differentiate between the information itself and the tools used to describe it (the heart of my argument).
There is probably a high-end consultant who uses these digitial crayons to do masterpieces for the corporate refrigerator and seriously earns that loot.
Your remark certainly applies to a large population of consultants, though. OTOH, I'm a relatively modestly priced consultant, and I've walked in and _demonstrated_ simpler, better (from the theoretical standpoint of orthogonal data, logic, and presentation), more robust code and been told _by the client_ to go back and do hideous things to it. So where is the justice?
Sure, the money, but, given a finite lifetime, why piss it away on crap?
UML is already bodged, further extensions are not going to help much.
Even God gets negative feedback. What will be amusing with XML is when somebody figures out that in a sizeable application, offloading a lot of the textual fluff that makes it so flexible will buy considerable speed, and markets an XML svelte-ifier of some sort.
Re:DTDs are broken (Score:2, Funny)
Re:DTDs are broken (Score:2)
Since the individual concerned was working for OMG I very much doubt it. I have had ten years experience of UML and its antecedents such as OMT.
I saw the proposals OMG made, they simply do not understand the data model of XML Schema.
Even if they did UML has become a grotesque caricature. It is even more of a committee spec than XML Schema. You have a bit of object orientedyness and a bit of entiry relationalness and a hodge podge of finite state theories and then the use cases stuff thrown in on top. Thats hardly suprising since its just the earlier work of Booch, Rumbaugh and co smashed together for the benefit of the company selling the graphical design tool.
I put together a graphical notation for XML Schema I used in some of the SAML meetings that seemed to help discussions. But that notation was very carefully chose to illustrate a few carefully chosen aspects of the schema.
The big mistake with graphical languages is attempting to use them as substitutes for code. By the time the notation has enough decorations for that it has become so complext that it is unreadable.
The involvement of OMG group does not impress me in the least. Those are the same turkeys who gave us CORBA and took more than ten years to realise that maybe it might not be taking on as fast because the idea people would rip out their legacy systems and migrate them all to an ORB was fundamentally clueless.
Object Modling Group? (Score:2)
Re:DTDs are broken (Score:2)
And yes, the OMG defined besides UML, the modeling language, also the meta model for UML. They used UML to define the meta model for storing UML data. And they defined a portable interchange format using DTDs and they defined how the UML meta model "contents"!!! the data
However
You try to make a point like: because cars can be used to transport parts of cars to assemble cars they can also be used to describe how cars are to be assembled.
angel'o'sphere
Re:Blah, blah, blah (Score:2, Funny)
And XHTML really bites. You can tell the w3c doesn't listen.
Re:OT: how do you correctly embed flash (Score:3, Informative)
Yup. Even in XHTML. Check out this article [alistapart.com] on A List Apart for a useful method.
Since when... (Score:3, Insightful)
If it was called it a programming language that would be wrong, but it's certainly a language.