Forgot your password?
typodupeerror
The Internet

HTML V5 and XHTML V2 344

Posted by CmdrTaco
from the battle-of-the-markup dept.
An anonymous reader writes "While the intention of both HTML V5 and XHTML V2 is to improve on the existing versions, the approaches chosen by the developers to make those improvements are very different. With differing philosophies come distinct results. For the first time in many years, the direction of upcoming browser versions is uncertain. This article uncovers the bigger picture behind the details of these two standards."
This discussion has been archived. No new comments can be posted.

HTML V5 and XHTML V2

Comments Filter:
  • by TheLink (130905) on Sunday December 16, 2007 @02:02PM (#21718176) Journal
    You have to hand it to the W3C, they keep supplying web designers with rope.

    I've been trying to get them (and browser people) to include a security oriented tag to disable unwanted features.

    Why such tags are needed:

    Say you run a site (webmail, myspace (remember the worm?), bbs etc) that is displaying content from 3rd parties (adverts, spammers, attackers) to unknown browsers (with different parsing bugs/behaviour).

    With such tags you can give hints to the browsers to disable unwanted stuff between the tags, so that even if your site's filtering is insufficient (doesn't account for a problem in a new tag, or the browser interprets things differently/incorrectly), a browser that supports the tag will know that stuff is disabled, and thus the exploit fails.

    I'm suggesting something like:

    <restricton lock="Random_hard_to_guess_string" except="java,safe-html" />
    browser ignores features except for java and safe-html.
    unsafe content here, but rendered safely by browser
    <restrictoff lock="wrong_string" />
    more unsafe content here but still rendered safely by browser
    <restrictoff lock="Random_hard_to_guess_string" />
    all features re-enabled

    safe-html = a subset of html that we can be confident that popular browsers can render without being exploited e.g. <em>, <p>).

    It doesn't have to be exactly as I suggest - my main point is HTML needs more "stop/brake" tags, and not just "turn/go faster" tags.

    Before anyone brings it up, YES we must still attempt to filter stuff out (use libraries etc), the proposed tags are to be a safety net. Defense in depth.

    With this sort of tag a site can allow javascript etc for content directly produced by the site, whilst being more certain of disabling undesirable stuff on 3rd party content that's displayed together (webmail, comments, malware from exploited advert/partner sites).
  • Why not ditch HTML? (Score:4, Interesting)

    by forgoil (104808) on Sunday December 16, 2007 @02:21PM (#21718302) Homepage
    Why not just go with XHTML all the way? I always though that the best way of "fixing" all the broken and horribly written HTML out there on the web would be to build a proxy that could translate from broken HTML to nicely formed XHTML and then send that to the browser, cleaning up this whole double rendering paths in the browsers (unless I missunderstood something) etc. XHTML really could be enough for everyone, and having two standards instead of one certainly isn't working in anyones favor.
  • by wizardforce (1005805) on Sunday December 16, 2007 @02:22PM (#21718316) Journal
    good idea, although in the case of myspace, it wasn't a technical problem that prevented them from keeping pages "safe" [eg. preventing the execution of malicious code] it had to do with the fact that myspace, by default allows everything *on purpose* they could have built the system such that certain tags would/could be disabled [slashdot is an example] and as big as myspace is, resources are not a problem- apathy and the need to incorporate everything user generated into pages [to hell with security! we want to build our pages any which way we like!] is.
  • by BlueParrot (965239) on Sunday December 16, 2007 @02:33PM (#21718378)
    There is the object tag. It can be used as a client-side include. All it really needs is a "permissions" attribute or something like that:

    <object permissions="untrusted" codetype="text/html" codebase="foo.html">
    </object>

  • by GrouchoMarx (153170) on Sunday December 16, 2007 @02:57PM (#21718562) Homepage
    As a professional web developer and standards nazi, I'd agree with you if it weren't for one thing: User-supplied content.

    For content generated by the site author or a CMS, I would agree. Sending out code that is not XHTML compliant is unprofessional. Even if you don't want to make the additional coding changes to your site to make it true XHTML rather than XHTML-as-HTML, All of the XHTML strictness rules make your code better, where "better" means easier to maintain, faster, less prone to browser "interpretation", etc. Even just for your own sake you should be writing XHTML-as-HTML at the very least. (True XHTML requires changes to the mime type and to the way you reference stylesheets, and breaks some Javascript code like document.write(), which are properly left in the dust bin along with the font tag.)

    But then along comes Web 2.0 and user-supplied content and all that jazz. If you allow someone to post a comment on a forum, like, say, Slashdot, and allow any HTML code whatsoever, you are guaranteed to have parse errors. Someone, somewhere, is going to (maliciously or not) forget a closing tag, make at typo, forget a quotation mark, overlap a b and an i tag, nest something improperly, forgets a / in a self-closing tag like hr or br, etc. According to strict XHTML parsing rules, that is, XML parsing rules, the browser is then supposed to gag and refuse to show the page at all. I don't think Slashdot breaking every time an AC forgets to close his i tag is a good thing. :-)

    While one could write a tidy program (and people have) that tries to clean up badly formatted code, they are no more perfect than the "guess what you mean" algorithms in the browser itself. It just moves the "guess what the user means" algorithm to the server instead of the browser. That's not much of an improvement.

    Until we can get away with checking user-submitted content on submission and rejecting it then, and telling the user "No, you can't post on Slashdot or on the Dell forum unless you validate your code", browsers will still have to have logic to handle user-supplied vomit. (And user, in this case, includes a non-programmer site admin.)

    The only alternative I see is nesting "don't expect this to be valid" tags in a page, so the browser knows that the page should validate except for the contents of some specific div. I cannot imagine that making the browser engine any cleaner, though, and would probably make it even nastier. Unless you just used iframes for that, but that has a whole host of other problems such as uneven browser support, inability to size dynamically, a second round-trip to the server, forcing the server/CMS to generate two partial pages according to god knows what logic...

    As long as non-programmers are able to write markup, some level of malformed-markup acceptance is necessary. Nowhere near the vomit that IE encourages, to be sure, but "validate or die" just won't cut it for most sites.
  • by pikine (771084) on Sunday December 16, 2007 @03:36PM (#21718968) Journal
    From the conclusion of TFA:

    If you're more interested in XHTML V1.1 than HTML V4, looking for an elegant approach to create documents accessible from multiple devices, you are likely to appreciate the advantages of XHTML V2.

    The author apparently has no experience with rendering XHTML on mobile devices. First of all, since the screen is smaller, it's not just about restyling things in a minimalist theme. It's about prioritizing information and remove the unnecessary one so more important information becomes more accessible in limited display real-estate.

    For example, anyone who accessed Slashdot homepage on their mobile phone knows the pain of having the scroll down past the left and right columns before reaching the stories. You can simulate this experience by turning off page style and narrowing your browser window to 480 pixels wide. The story summaries are less accessible because they're further down a very long narrow page.

    Another problem is the memory. Even if you style the unnecessary page elements to "no display", they're still downloaded and parsed by the mobile browser as part of the page. Mobile devices have limited memory, and I get "out of memory" error on some sites. For reading long articles on mobile devices, it is better to break content into more pages than you would on a desktop display, both for presentation and memory footprint reasons.

    For these two reasons, a site designer generally has to design a new layout for each type of device. The dream of "one page (and several style sheets) to rule them all" is a fairytale.

  • by hey! (33014) on Sunday December 16, 2007 @04:03PM (#21719206) Homepage Journal
    Well, according to TFA, because XHTML, while terrific for certain kinds of applications, doesn't solve the most pressing problems of most of the people working in HTML today. It can do, of course, in the same way any Turing equivalent language is "enough" for any programmer, but that's not the same thing has being handy.

    At first blush, the aims of XHTML 2.0 and HTML 5 ought to be orthogonal. Judging from the article, I'd suspect it is not the aims that are incompatible, but the kinds of people who are behind each effort. You either think that engineering things in the most elegant way will get things off your plate more quickly (sooner or later), or you think that concentrating on the things that are on your plate will lead you to the best engineered solution (eventually).

    I'm guessing that the XHTML people might look at the things the HTML 5 folks want to do and figure that they don't really belong in HTML, but possibly in a new, different standard that could be bolted into XHTML using XML mechanics like name spaces and attributes. Maybe the result would look a lot like CSS, which has for the most part proven to be a success. Since this is obviously the most modular, generic and extensible way of getting the stuff the HTML 5 people worry about done, this looks like the perfect solution to somebody who likes XHTML.

    However, it would be clear to the HTML 5 people that saying this is the best way to do it doesn't mean anything will ever get done. It takes these things out of an established standard that is universally recognized as critical to support (HTML) and puts them in a newer weaker standard that nobody would feel any pressure to adopt anytime soon. A single vendor with sufficient clout (we name no names) could kill the whole thing by dragging its feet. Everybody would be obliged to continue doing things the old, non-standard way and optionally provide the new, standardized way for no benefit at all. Even if this stuff ideally belongs in a different standard, it might not ever get standardized unless it's in HTML first.

    Personally, I think it'd be nice to have both sets of viewpoints on a single road map, instead of in two competing standards. But I'm not holding my breath.
  • Re:reboot the web! (Score:4, Interesting)

    by MyDixieWrecked (548719) on Sunday December 16, 2007 @04:30PM (#21719442) Homepage Journal
    I agree with you about some things you're saying...

    You need to realize that the markup language shouldn't be used for layout. Your comment about "making UIs as easy as drag and drop" can be done with a website development environment like Dreamweaver. You need a base language for that.

    Personally, I think that XHTML/CSS is going the right way. It can be extended easily, it's simple enough that that basic sites can be created by new users relatively quickly, however complex layouts still require some experience (yeah, it's got a learning curve, but that's what Dreamweaver is for).

    The whole point of XHTML/CSS is that it's not designed to be implemented the same way in all browsers. It's designed so that you can take the same "content" and render it for different devices/media (ie: home PC, cellphone, paper, ebook) simply by either supporting a different subset of the styling or different stylesheets altogether.

    Have you ever tried to look at a table-based layout on a mobile device? have you ever tried to look at a table-based layout on a laptop with a tiny screen or a tiny window (think one monitor, webbrowser, terminal, and code editor on the same 15" laptop screen)? table-based layouts are hell in those scenarios. Properly coded XHTML/CSS pages are a godsend, especially when you can disable styles and still get a general feel for what the content on the page is.

    I'm not sure if I 100% agree with this XHTMLv2 thing, but I think XHTMLv1 is doing great. I just really wish someone would make something that was pretty much exactly what CSS is, but make it a little more robust. Not with more types of styles, but with ways of positioning or sizing an element based on its parent element, better support for multiple classes, variables (for globally changing colors), and ways of adjusting colors relative to other colors. I'd love to be able to say "on hover, make the background 20% darker or 20% more red". I'd love to be able to change my color in one place instead of having to change the link color, the background color of my header and the underline of my h elements each time I want to tweak a color.

    I'd also love if you could separate form validation from the page. doing validation with JS works, but it's not optimal. Having a validation language would be pretty awesome. Especially if you could implement it server-side. If the client could grab the validation code and validate the form before sending and handle errors (by displaying errors and highlighting fields) and then the server could also run that same code and handle errors (security... it would be easy to modify or disable anything on the clientside...) that would be great. All you'd really need is just a handful of cookiecutter directives (validate the length, format/regex, and also have some built-in types like phonenumbers and emails), that would be great, too.

    I also think that it's about time for JS to get an upgrade. Merge Prototype.js into javascript. Add better support for AJAX and make it easier to create rich, interactive sites.

    If we're not careful, Flash is going to become more and more prominent in casual websites. The only advantage the the current standards have is that they're free and don't require a commercial solution to produce.

    XSS is a sideeffect of trusting the client too much and a side-effect that won't be solved by anything you've suggested.

    And why does something need to be "compiled" to be faster? What needs to be faster? Rendering? Javascript? Or are you talking about server-side? Why don't we start writing all our websites in C? Let's just regress back to treating our desktop machines as thinclients. We'll access websites like applications over X11. It'll be great. ;)
  • by coryking (104614) * on Sunday December 16, 2007 @04:40PM (#21719538) Homepage Journal
    I remember when rusty and friends rolled out Dynamic Comments on Kuro5hin/Scoop. They did it with an iframe that chucked out a bunch of onload() crap that wrote into the parent document. Pretty slick for the time.

    Way ahead of it's time though... most javascript was either for homework assignments or popup ads. All of it was copy/paste hackjobs that the web author found on super-mega-awesome-javascript.com or something. The result was "most people" hated javascript. You could browse 99% of the interweb with it disabled and all you'd miss were popups. Kuro5hin was one of the first reasons to actually turn on javascript because dynamic threaded comments were 100% better than the non-dynamic ones.

    Now that javascript is starting to come of age and real programmers are writing cool things on it (and really javascript is kinda cool programming language once you get past super-mega-awesome-javascript.com and the differing implementations), almost anything that is useful on the internet uses javascript in some way. In a way, javascript has crossed the chasm from early adopters like kuro5hin to mainstream adoption and that nice beefy 80% of the market.

    What I find funny is only the tech people are the laggards of this bell curve. And all 10% of them seem to hang out on slashdot pining for the days of yore. What a world we live in when the supposed alpha geeks are the laggards of a technology bell curve!!
  • by coryking (104614) * on Sunday December 16, 2007 @05:01PM (#21719728) Homepage Journal
    WYSIWYG is impossible if you are using templates. You gotta visualize how the chunks come together!

    If you want traditional graphic design, make a PDF.
    PDF is for printing, dummy :-)

    I've got a better idea anyway... How about a way to take our centuries of knowledge about "traditional graphic design" and apply it to the a web-based medium? Do we have to chuck out everything we know about good design just because of the silly constraints of HTML/CSS? How about we improve or replace HTML/CSS with something that incorporates all we know about "traditional graphic design", all we know about good semantic markup, all we know about good programming, all we know about accessablity and all we know about usability and create something better?

    "Use a PDF, jackass" is an open invitation to fuck all ya'll and use Silverlight or Flex. Who knows... maybe Adobe and Microsoft understand us better then "the experts"?
  • by Bogtha (906264) on Sunday December 16, 2007 @05:48PM (#21720088)

    Not saying you are wrong, but why are there so many XSS issues if it is easy?

    A combination of ignorance, apathy, and poor quality learning materials.

    is there a "here is how to let your users make their comment pretty and link to other websites and not get hosed" FAQ?

    Well the real answer to this is to point them to the sanitising features available for their particular platform/language/framework/etc. Generic advice is low-level by its very nature, for example XSS (Cross Site Scripting) Cheat Sheet [ckers.org] or perhaps OWASP [owasp.org].

    I'm a pretty smart guy, I think... at least open minded or something. I mean, at least I seem to know enough to worry about XSS issues but yet I dont find it easy at all. What am I missing here?

    You're trying to do it yourself. Don't. Hand it off to a library.

    Slashdot doesn't even do HTML filtering "elegantly". How can I type in those two fake tags as a comment AND quote you without escaping the brackets myself? I dont think this is as easy of a problem to solve as you think it is :-)

    Slashdot is a mess all around, a lot of their problems are because their design strategy seems to be "accumulate features over time, never refactor, offer options instead of taking away obsolete features or being non-backwards-compatible". I mean, they have three different commenting systems, three different display systems and three different comment formats. That's hardly something to emulate. Having said that, they are probably one of the highest targets around for crapflooders, and they won that battle conclusively, which is clear evidence that it's not impossible to sanitise input. Slashcode is open-source, if there were a gap in its sanitation procedure, then Slashdot would quickly be overrun by trolls screwing up every page.

    If you want to handle situations like this, then normalise the code, and escape every tag not on the whitelist. But the feature itself isn't really ideal because the user expectation of their comments being markup-but-not-in-some-cases is confusing.

  • by grumbel (592662) <grumbel@gmx.de> on Sunday December 16, 2007 @06:00PM (#21720188) Homepage
    ### Pretty much the opposite of WYSIWYG actually.

    That might be the theory, but it simply is not true in reality. HTML is pretty much a WYSIWYG format with additional support for different font sizes and page width. The second you add a tag you are tied to a specific display DPI, the second you add a navigation bar, you no longer have a document that can adjust to different output devices easily. I mean just look at the web today, nobody is using HTML for writing documents. If people want to write a book, they use TeX, if people want to do something else, they stuff their content into a DB and render it to HTML when the user requests it. If the user wants to have a printable version, they rerender the DB content. Ever seen a manual being downloadable as 'single page HTML' vs. 'multipage HTML'? This is only needed because HTML isn't flexible enough to handle both styles of viewing with a single document. A flexible format would allow you to render the document in multiple different ways, but HTML doesn't allow that. You have to change the HTML code to change the rendering result in a significant way.

    Not all of this is of course to blame on HTML itself, the browser takes it share of blame to for not offering additional ways to render the HTML. But HTML by design is really closely tied to its output device and I doubt that will ever change.
  • by Anonymous Coward on Sunday December 16, 2007 @08:32PM (#21721172)
    There are at least two really good options here:

    a. Don't make us write HTML. If you want to let me write rich text, give me a rich text editor [webkit.org]. I can buy stuff on the web (which consists of database queries/updates) without typing in SQL, so why can't I write stuff on the web without typing in HTML? (Cue the army of geeks to say "But that's completely different! It's OK to make one hard on the user, but not the other...")

    a'. Use a different (non-HTML) text-based language, like Markdown or Restructured Text. I don't know if these can generate XHTML, but if not it wouldn't be that hard to add. A ton of blogs use processors like this already.

    b. Have slashdot validate it. Obviously there are plenty of HTML parsers out there that can take tag soup, and make it into a DOM, and from there it's trivial to filter tags you don't want users to use, and output valid XHTML. Slashdot (and every other website) must do something like this already, so what's so hard about fixing it to generate XHTML? This is really no different from a CMS, except instead of getting data from an RDBMS record, you're getting it from a DOM tree.

    I really don't see how your scare scenario could be a problem, or rather, any more of a problem in XHTML than HTML. No website in the world "allows someone to post a comment on a forum, like, say, Slashdot, and allow any HTML code whatsoever". This problem was solved already. All we need to do is tighten up the output a bit.
  • by maxume (22995) on Sunday December 16, 2007 @09:58PM (#21721682)
    One of the main goals of html5 is to formalize error handling. It accounts for many edge cases that html4 didn't specify, mostly by looking at what browsers do to handle html4. There are already parsers available for several different languages. There are enough existing broken pages out there that this might work pretty well.

    The problem is that the worry present in dealing with the strict language often results in no benefits over just using something non strict. Given that IE(at least 6, I don't know about 7) doesn't properly handle xhmtl, there is really no way of saying whether the current situation between html4 and xhtml has anything to do with preferences on the deployment side, as it doesn't work to deploy xhtml. My guess is that people with money on the line would prefer to show a (probably somewhat) broken page rather than an error message, but this is just a guess.

    There isn't anything stopping anyone from validating html4, it just has a relaxed idea of what to do if an error ends up in some output. Hopefully we can agree that there is room to differ.
  • by chromatic (9471) on Monday December 17, 2007 @02:32AM (#21723028) Homepage

    How can you not do it with a regex (or two)?

    In the past eight years or so, I haven't seen a single regex which can parse HTML correctly and completely. The closest variant failed when it encountered CDATA sections.

  • by Daniel K. Attling (1003208) on Monday December 17, 2007 @03:36AM (#21723242) Homepage
    Myspace is basically a hack job of static tables. The inclusion of css classes are a later graft upon the code to make it themeable (aka. blinding horror of red text on pink background). From what I've understood the reason why myspace has been so unwilling/unable to move forwards has been that doing so would break almost all themed user pages that depend upon said table structure.
  • by l0b0 (803611) on Monday December 17, 2007 @03:44AM (#21723262) Homepage

    You can include HTML inside XHTML, by changing the namespace for that content in the container element or using includes. The browser should then parse the contents as HTML, and you can get the best of both standards.

    Another option is to make sure comments cannot be submitted until they contain valid XHTML. You could use a WYSIWYG editor, fall back to /. mode when JavaScript is disabled, and help the user along by auto-correcting (when using WYSIWYG editor) or hinting (e.g., in "You need to end the strong tag by adding '</strong>') when validation fails.

  • by Dikeman (620856) on Monday December 17, 2007 @05:48AM (#21723562) Homepage
    Recently, I've had the privilege to work with people that were preceding both ISO committee's and W3C committee's. What struck me was their tendance to create standards that were on a high academic level. At the same time any pragmatic argument failed to be of any influence on the standard.
    Although this leads to standards that are a pleasure to those who like the pilosophical aspect of representation of and interaction with information - and I'm certainly one of them - it also leads to standards that will never be used.

    In the real world outside ISO and W3C, mundane arguments, like cost of implementation, degree of skill needed to work with those standards, ease of transition, etc, etc. *are* of importance and will influence the standard that will prevail in the end.

    Although I can enjoy the academic approach to a new standard, I have to say that as owner of a IT company my hopes are on the pragmatic approach of HTML V5.

    BTW: The job i did for those ISO guys (They didnt't work fulltime for ISO) was to map the ISO standard they had developed,to a practical implementation in the organisation they worked for after they had failed to do so themselves, so go figure.
  • by nuzak (959558) on Monday December 17, 2007 @11:45AM (#21725632) Journal
    > A suggested implementation is for the disabling bit to be at the parser level.

    The natural tag for controlling the parsing would be a processing instruction.

    <?secure on key:hkwh45kdfhgkjwh45?>
    blah blah blah blah
    <?secure off key:hkwh45kdfhgkjwh45?>

    Good luck getting that into a standard, but heck, you don't really even need the cooperation of the W3C to do this.

186,000 Miles per Second. It's not just a good idea. IT'S THE LAW.

Working...