Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

HTML V5 and XHTML V2 344

An anonymous reader writes "While the intention of both HTML V5 and XHTML V2 is to improve on the existing versions, the approaches chosen by the developers to make those improvements are very different. With differing philosophies come distinct results. For the first time in many years, the direction of upcoming browser versions is uncertain. This article uncovers the bigger picture behind the details of these two standards."
This discussion has been archived. No new comments can be posted.

HTML V5 and XHTML V2

Comments Filter:
  • by TheLink ( 130905 ) on Sunday December 16, 2007 @01:02PM (#21718176) Journal
    You have to hand it to the W3C, they keep supplying web designers with rope.

    I've been trying to get them (and browser people) to include a security oriented tag to disable unwanted features.

    Why such tags are needed:

    Say you run a site (webmail, myspace (remember the worm?), bbs etc) that is displaying content from 3rd parties (adverts, spammers, attackers) to unknown browsers (with different parsing bugs/behaviour).

    With such tags you can give hints to the browsers to disable unwanted stuff between the tags, so that even if your site's filtering is insufficient (doesn't account for a problem in a new tag, or the browser interprets things differently/incorrectly), a browser that supports the tag will know that stuff is disabled, and thus the exploit fails.

    I'm suggesting something like:

    <restricton lock="Random_hard_to_guess_string" except="java,safe-html" />
    browser ignores features except for java and safe-html.
    unsafe content here, but rendered safely by browser
    <restrictoff lock="wrong_string" />
    more unsafe content here but still rendered safely by browser
    <restrictoff lock="Random_hard_to_guess_string" />
    all features re-enabled

    safe-html = a subset of html that we can be confident that popular browsers can render without being exploited e.g. <em>, <p>).

    It doesn't have to be exactly as I suggest - my main point is HTML needs more "stop/brake" tags, and not just "turn/go faster" tags.

    Before anyone brings it up, YES we must still attempt to filter stuff out (use libraries etc), the proposed tags are to be a safety net. Defense in depth.

    With this sort of tag a site can allow javascript etc for content directly produced by the site, whilst being more certain of disabling undesirable stuff on 3rd party content that's displayed together (webmail, comments, malware from exploited advert/partner sites).
    • Re: (Score:3, Insightful)

      Why not just simplify your entire comment:

      Content from a 3rd party runs in a more restrictive context than the primary site (this includes frames etc).
      You are then not held at the whim of a web admin to ensure these tags are included.

      Or you could just use the noscript addin right now and choose which sites you trust at your discretion.
      • Re: (Score:3, Insightful)

        by TheLink ( 130905 )
        Can't do that. That's because often the website you visit is the one sending the 3rd party data.

        Think webmail (yahoo, gmail etc), when you receive spam, your webmail provider is the one sending you the data.

        Usually they will try to filter the content to make it safe. BUT as history shows it's not always 100%.

        The W3C or browser maker might also make a new tag/feature that your filtering libraries aren't aware of (e.g. old sites with guestbooks that might not filter out the "latest and greatest stuff").

        With m
    • Re: (Score:3, Interesting)

      good idea, although in the case of myspace, it wasn't a technical problem that prevented them from keeping pages "safe" [eg. preventing the execution of malicious code] it had to do with the fact that myspace, by default allows everything *on purpose* they could have built the system such that certain tags would/could be disabled [slashdot is an example] and as big as myspace is, resources are not a problem- apathy and the need to incorporate everything user generated into pages [to hell with security! we w
    • Re: (Score:3, Insightful)

      by throup ( 325558 )
      Could you not get around that by injecting code like:

      </restriction> <!-- closes the existing restriction zone. Might not pass as valid XML, but HTML browsers work with tag soup. -->
      Something evil!!!
      <restriction lock="I don't really care here" except="everything"> <!-- This bit is purely optional -->

      Obviously I need to work on something more destructive than "Something evil!!!" before I attempt to conquer the planet...
      • Re: (Score:3, Insightful)

        by TheLink ( 130905 )
        No because the closing tag has to have a lock string that matches the lock on the opening tag.

        My attempts to change the world (albeit by a little bit) aren't going very well either - it's been more than 5 years since I first proposed the tags, but so far the W3C and Mozilla bunch have preferred to make other "more fun" stuff instead...

        Maybe Microsoft has subverted the W3C too :).
        • I think it is still to easy to exploit is the problem. I'm sure if you thought hard, you could write some evil HTML to route around it and run your javascript. You'd just have to somehow get the big_key thing in your proposal.

          The only real secure way is to isolate the untrusted bits into their own block.... like how you do multipart mime documents in email or something. You'd need a tag to reference the "external" untrusted bits and have the browser render them in a sandbox. Even in this case, you can e
    • by Bogtha ( 906264 ) on Sunday December 16, 2007 @01:32PM (#21718370)

      even if your site's filtering is insufficient (doesn't account for a problem in a new tag

      Why would your site let through new tags that it doesn't recognise? Use a whitelist.

      the browser interprets things differently/incorrectly

      This only usually occurs if you let through malformed HTML. Use tidy or similar to ensure you only emit valid HTML. Not to mention the fact that the whole problem is caused by lax parsing — something the W3C has been trying to get people to give up on with the parsing requirements for XML.

      safe-html = a subset of html that we can be confident that popular browsers can render without being exploited e.g. <em> , <p> ).

      You could define such a subset using the modularised XHTML 1.1 or your own DTD.

      Before anyone brings it up, YES we must still attempt to filter stuff out (use libraries etc), the proposed tags are to be a safety net. Defense in depth.

      Yes, but it won't be actually used that way. If browsers went to the trouble of actually implementing this extra layer of redundancy, all the people with lax security measures would simply use that as an alternative and all the people who take security seriously will use it, despite it not being necessary. I think the cumulative effect would be to make the web less secure.

      • You could define such a subset using the modularised XHTML 1.1 or your own DTD.

        Or monkeys could fly out of our asses :-)

        The idea of modular XHTML is a nice one, but unless I'm missing something, this new XHTML modular thingy we are talking about would still need to be supported by the browser, right? In other words, it will not be supported and is a waste of time.

        Modular XHTML is a nice idea in theory, but honestly... nobody will use a module unless it is implemented by Firefox and IE. Can you name any existing XHTML modules implemented by both browsers?

        Er.. atom or rss?

        • by Bogtha ( 906264 )

          The idea of modular XHTML is a nice one

          It's not an idea, it's been a published Recommendation [w3.org] for over six years.

          this new XHTML modular thingy we are talking about would still need to be supported by the browser, right?

          No. If the server validates the untrusted data, what's the point in the browser doing it too? Validation is deterministic, you don't get double the security by doing it twice.

          Can you name any existing XHTML modules implemented by both browsers?

          All of them. XHTML 1.1

          • While it is true that the server should validate the inbound (and really outbound) HTML, you have to admit it isn't easy. In perl land, there are some CPAN modules to help, but none of them ever feel right. I'm not sure what the answer is really. But I think the OP's idea was for a couple extra "hey Mr. Browser, yeah, we suck about validation... please dont trust this crap here.. we tried our best on the server, but really, you shouldn't trust it either".

            Those are not XHTML.

            That was what I thought.. just guessing. MathML

            • by Bogtha ( 906264 )

              While it is true that the server should validate the inbound (and really outbound) HTML, you have to admit it isn't easy.

              On the contrary, it's very easy. There's plenty of tools out there to do this for you.

              In perl land, there are some CPAN modules to help, but none of them ever feel right.

              What do you mean by "feel right"?

              I think the OP's idea was for a couple extra "hey Mr. Browser, yeah, we suck about validation... please dont trust this crap here.. we tried our best on the server, but r

              • by coryking ( 104614 ) * on Sunday December 16, 2007 @02:55PM (#21719136) Homepage Journal

                On the contrary, it's very easy. There's plenty of tools out there to do this for you.
                Cow Crap!

                You want easy? SQL injections are easy to handle. Just use a parameterized query so you don't have to mix tainted data with your trusted SQL.

                Back in the stone age before php thought parameterized queries were more then enterprise fluffery, you were forced to mix your user data with your SQL. And oh were the results hilarious! It look three tries (and three fucking functions) for PHP/mysql to get their escape code right and I'm sure you can still inject SQL with "mysql_real_escape_string()" in some new unthought of way.

                There is no "parameterized query" with HTML. You are *forced* to mix hostile user data with your trusted HTML. If it was that hard to sanitize an "easy" language like SQL, how hard is it to sanitize a very expressive language like HTML?

                You are telling me all those CPAN modules handle the hundreds of ways you can inject HTML into the dozens of different browsers? How many ways can you make an angle bracket and have it interpreted as a legit browser tag? How many ways can you inject something to the end of a URL to close the double quote and inject your javascript? How many ways, including unicode, can you make a double quote? Dont forget, your implementation cannot strip out the Unicode like I've seen some filters do - I need the thing to handle every language! I would guess there are thousands of known ways to inject junk into your trusted HTML.

                I promise you that even the best CPAN module is still exploitable in some way not considered by the author. And I'd be insane to roll my own, as I'm not as smart as she is.

                Don't kid yourself and thinking filtering user generated content is easy. It is very, *very* hard.
                • by Bogtha ( 906264 )

                  It look three tries (and three fucking functions) for PHP/mysql to get their escape code right and I'm sure you can still inject SQL with "mysql_real_escape_string()" in some new unthought of way.

                  Escaping SQL isn't even close to the same problem. In that case, you virtually always want the user-submitted data to be treated as opaque data. The analogous situation with HTML would be escaping all the HTML and displaying it as raw code to the end user. The problem being talked about here is when you do

                  • It's a hell of a lot simpler if you normalize to a valid subset of HTML.

                    True.dat. But you gotta know how to normalize it down first. Not saying you are wrong, but why are there so many XSS issues if it is easy? Poor education? How do we educate good programmers to do the right thing? I mean that seriously... like is there a "here is how to let your users make their comment pretty and link to other websites and not get hosed" FAQ? I think I see your take though... it helps if you have give the user a wysiwyg editor that spits you a known set of HTML. Anything outside tha

                    • Re: (Score:3, Insightful)

                      by coryking ( 104614 ) *
                      bah! see? slashdot's filter system just fucked me over too and I swear I previewed to see if it kept all my paragraphs.

                      It ain't easy as you say bro... :-)
                    • Re: (Score:3, Interesting)

                      by Bogtha ( 906264 )

                      Not saying you are wrong, but why are there so many XSS issues if it is easy?

                      A combination of ignorance, apathy, and poor quality learning materials.

                      is there a "here is how to let your users make their comment pretty and link to other websites and not get hosed" FAQ?

                      Well the real answer to this is to point them to the sanitising features available for their particular platform/language/framework/etc. Generic advice is low-level by its very nature, for example XSS (Cross Site Scripting) Cheat [ckers.org]

                • It's not hard at all. Slurp it up into a DOM. At this point, it becomes an object and ceases to be a stupid string. You can then walk the tree removing nodes that are not allowed (example: in a forum post you can remove the script tag while ignoring bold and italics.)

                  I don't know why people are stupid about this. It's true that you probably can't do it with a regex. That's why $GOD gave us the DOM.
              • What do you think the browser is going to do that you can't?

                Your implication is basically that web-developers are more competent in terms of security than those who design the clients, and thus the client should just swallow the stuff without even bothering. In reality there are MANY people who make web pages who would probably trust the browser developers a lot more than they trust themselves not to make a mistake.

                Also, you're not looking at this from the point of view of the user. I might want to tell my

                • by Bogtha ( 906264 )

                  Your implication is basically that web-developers are more competent in terms of security than those who design the clients

                  Not at all. I expect the web developers in both cases to hand off the problem to third-party code. I just think that server-side code that has been maturing for a decade fills the role better than non-existent client-side code.

                  Also, you're not looking at this from the point of view of the user. I might want to tell my browser to trust John Smith not to put malicious stuff

    • Re: (Score:3, Interesting)

      by BlueParrot ( 965239 )
      There is the object tag. It can be used as a client-side include. All it really needs is a "permissions" attribute or something like that:

      <object permissions="untrusted" codetype="text/html" codebase="foo.html">
      </object>

    • Re: (Score:3, Insightful)

      This is a novel technique (the unique, hard to guess string, which easily could be a hash of the document and a secret salt the website has) I have not seen before, but this merely punts the issue to the browsers. It cannot be solved there (as you mention); in fact, it does not even begin to solve it: think about the legacy browsers floating around the web. I don't even trust browser vendors to lock down all of this code: they also have their own security bugs.

      There is also the minor point that your method
    • This is silly. (Score:3, Insightful)

      by uhlume ( 597871 )
      <restricton lock="Random_hard_to_guess_string" except="java,safe-html" />

      Doesn't really matter how "hard to guess" your string is if you're going to transmit it cleartext in the body of your HTML document, does it?

      "But wait!" you say, "We can randomize the string every time the document is served, thus defeating anything but an embedded Javascript with access to the DOM." Perhaps so, but now you're talking about server-side behavior — something clearly beyond the purview of the HTML specificat
    • by curunir ( 98273 ) * on Sunday December 16, 2007 @06:49PM (#21720920) Homepage Journal
      Why go through all the hassle of "random hard to guess string" which, if implemented improperly could be guessed? Plus, as others have pointed out, HTML is not a dynamic language. Your random, hard to guess string could be observed and used by an attacker.

      Wouldn't something like:

      <sandbox src="restrictedContent.html" allow="html,css" deny="javascript,cookies"/>

      ...be a whole lot simpler? Just instruct the browser to make an additional request, but one in which it's expected to fully sandbox the content according to rules that you give it. This makes it much harder for application developers to screw up and a lot harder for malicious code to bypass the sandboxing mechanism.

  • That's a very good article - as always IBM give a well-written introduction to the subject. But exactly what is the state of implementation of these? As far as I can gather, no browser maker has started to implement support for either. Is that correct? It would be useful to have some idea of the time scales we can expect on these both. Anyone know more about the state of play?
  • by gsnedders ( 928327 ) on Sunday December 16, 2007 @01:18PM (#21718278) Homepage
    All the browser vendors have already said they will support HTML 5 (yes, that includes MS) and all but MS have said they won't support XHTML 2 (MS hasn't made much of an effort to suggest they will support it either).

    As it stands, with both XHTML 5 and XHTML 2 using the same namespace, it is only possible to support one of the two.
    • MS was part of the W3C and at one time said they would support CSS. We all know where that has gotten us.
    • Can you explain why identifying the markup version with a dtd would not allow them to support both?
    • by DrYak ( 748999 )

      and all but MS have said they won't support XHTML 2

      Given their past effort in "supporting" previous standarts, it's not hard for them to claim "XHTML2" support.
      Just enable an additional DOCTYPE to be recognised, and throw the exact same broken "quirks-mode" parser as before.
      Most of the new XHTMLv2 tags which differs from XHTMLv1's one will fail to be recognized and displayed properly, but that won't be a big change to their traditionnal support of standart....
      {/sarcasm}

      More seriously :

      As it stands, with bot

    • As it stands, with both XHTML 5 and XHTML 2 using the same namespace, it is only possible to support one of the two.

      Please clarify, because I don't understand this.

      Since XHTML will continue to require a specific declaration and doctype, similar to
      <!-- always line 1 --> <?xml version="1.0" encoding="UTF-8"?>
      <!-- always line 2 --> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">

      will this not be enough so that client (browser) will be able to distinguish any version of XHTML from anything else? Isn't that sufficient??

  • Why not ditch HTML? (Score:4, Interesting)

    by forgoil ( 104808 ) on Sunday December 16, 2007 @01:21PM (#21718302) Homepage
    Why not just go with XHTML all the way? I always though that the best way of "fixing" all the broken and horribly written HTML out there on the web would be to build a proxy that could translate from broken HTML to nicely formed XHTML and then send that to the browser, cleaning up this whole double rendering paths in the browsers (unless I missunderstood something) etc. XHTML really could be enough for everyone, and having two standards instead of one certainly isn't working in anyones favor.
    • by GrouchoMarx ( 153170 ) on Sunday December 16, 2007 @01:57PM (#21718562) Homepage
      As a professional web developer and standards nazi, I'd agree with you if it weren't for one thing: User-supplied content.

      For content generated by the site author or a CMS, I would agree. Sending out code that is not XHTML compliant is unprofessional. Even if you don't want to make the additional coding changes to your site to make it true XHTML rather than XHTML-as-HTML, All of the XHTML strictness rules make your code better, where "better" means easier to maintain, faster, less prone to browser "interpretation", etc. Even just for your own sake you should be writing XHTML-as-HTML at the very least. (True XHTML requires changes to the mime type and to the way you reference stylesheets, and breaks some Javascript code like document.write(), which are properly left in the dust bin along with the font tag.)

      But then along comes Web 2.0 and user-supplied content and all that jazz. If you allow someone to post a comment on a forum, like, say, Slashdot, and allow any HTML code whatsoever, you are guaranteed to have parse errors. Someone, somewhere, is going to (maliciously or not) forget a closing tag, make at typo, forget a quotation mark, overlap a b and an i tag, nest something improperly, forgets a / in a self-closing tag like hr or br, etc. According to strict XHTML parsing rules, that is, XML parsing rules, the browser is then supposed to gag and refuse to show the page at all. I don't think Slashdot breaking every time an AC forgets to close his i tag is a good thing. :-)

      While one could write a tidy program (and people have) that tries to clean up badly formatted code, they are no more perfect than the "guess what you mean" algorithms in the browser itself. It just moves the "guess what the user means" algorithm to the server instead of the browser. That's not much of an improvement.

      Until we can get away with checking user-submitted content on submission and rejecting it then, and telling the user "No, you can't post on Slashdot or on the Dell forum unless you validate your code", browsers will still have to have logic to handle user-supplied vomit. (And user, in this case, includes a non-programmer site admin.)

      The only alternative I see is nesting "don't expect this to be valid" tags in a page, so the browser knows that the page should validate except for the contents of some specific div. I cannot imagine that making the browser engine any cleaner, though, and would probably make it even nastier. Unless you just used iframes for that, but that has a whole host of other problems such as uneven browser support, inability to size dynamically, a second round-trip to the server, forcing the server/CMS to generate two partial pages according to god knows what logic...

      As long as non-programmers are able to write markup, some level of malformed-markup acceptance is necessary. Nowhere near the vomit that IE encourages, to be sure, but "validate or die" just won't cut it for most sites.
      • I don't think Slashdot breaking every time an AC forgets to close his i tag is a good thing. :-)

        That's one reason I always try to preview before I post, no actually I preview so I can edit before posting. However I still let some mistake slip by.

        While one could write a tidy program (and people have) that tries to clean up badly formatted code, they are no more perfect than the "guess what you mean" algorithms in the browser itself. It just moves the "guess what the user means" algorithm to the server

      • Re: (Score:3, Interesting)

        by l0b0 ( 803611 )

        You can include HTML inside XHTML, by changing the namespace for that content in the container element or using includes. The browser should then parse the contents as HTML, and you can get the best of both standards.

        Another option is to make sure comments cannot be submitted until they contain valid XHTML. You could use a WYSIWYG editor, fall back to /. mode when JavaScript is disabled, and help the user along by auto-correcting (when using WYSIWYG editor) or hinting (e.g., in "You need to end the strong

      • Re: (Score:3, Insightful)

        by mcvos ( 645701 )

        But then along comes Web 2.0 and user-supplied content and all that jazz. If you allow someone to post a comment on a forum, like, say, Slashdot, and allow any HTML code whatsoever, you are guaranteed to have parse errors. Someone, somewhere, is going to (maliciously or not) forget a closing tag, make at typo, forget a quotation mark, overlap a b and an i tag, nest something improperly, forgets a / in a self-closing tag like hr or br, etc. According to strict XHTML parsing rules, that is, XML parsing rules,

    • by hey! ( 33014 ) on Sunday December 16, 2007 @03:03PM (#21719206) Homepage Journal
      Well, according to TFA, because XHTML, while terrific for certain kinds of applications, doesn't solve the most pressing problems of most of the people working in HTML today. It can do, of course, in the same way any Turing equivalent language is "enough" for any programmer, but that's not the same thing has being handy.

      At first blush, the aims of XHTML 2.0 and HTML 5 ought to be orthogonal. Judging from the article, I'd suspect it is not the aims that are incompatible, but the kinds of people who are behind each effort. You either think that engineering things in the most elegant way will get things off your plate more quickly (sooner or later), or you think that concentrating on the things that are on your plate will lead you to the best engineered solution (eventually).

      I'm guessing that the XHTML people might look at the things the HTML 5 folks want to do and figure that they don't really belong in HTML, but possibly in a new, different standard that could be bolted into XHTML using XML mechanics like name spaces and attributes. Maybe the result would look a lot like CSS, which has for the most part proven to be a success. Since this is obviously the most modular, generic and extensible way of getting the stuff the HTML 5 people worry about done, this looks like the perfect solution to somebody who likes XHTML.

      However, it would be clear to the HTML 5 people that saying this is the best way to do it doesn't mean anything will ever get done. It takes these things out of an established standard that is universally recognized as critical to support (HTML) and puts them in a newer weaker standard that nobody would feel any pressure to adopt anytime soon. A single vendor with sufficient clout (we name no names) could kill the whole thing by dragging its feet. Everybody would be obliged to continue doing things the old, non-standard way and optionally provide the new, standardized way for no benefit at all. Even if this stuff ideally belongs in a different standard, it might not ever get standardized unless it's in HTML first.

      Personally, I think it'd be nice to have both sets of viewpoints on a single road map, instead of in two competing standards. But I'm not holding my breath.
    • Re: (Score:3, Insightful)

      by jonbryce ( 703250 )
      I think the problem is that (x)html is trying to be two very different things. It is trying to be a universal document format for presenting information. It is also trying to be a universal presentation manager for thin client applications. The technical requirements for these are very different, and it may well be that two different standards are appropriate.
  • reboot the web! (Score:5, Insightful)

    by wwmedia ( 950346 ) on Sunday December 16, 2007 @01:22PM (#21718318)
    am I the only developer thats sick of this html / css / javascript mess??

    people/companies are trying to develop rich applications using decade old markup language thats improperly supported by different browsers (even firefox doesn't fully support css yet) and is a very ugly mix right now, its like squeezing a rectangular plasticine object thru a round,triangular and starshaped holes at the same time



    the web needs a reboot


    we need a programming language that:
    *works on the server and the client
    *something that makes making UIs as easy as drag and drop
    *something that does not forgive idiot html "programmers" who write bad code
    *something that doesnt suffer from XSS
    *something that can be extended easily
    *something that can be "compiled" for faster execution
    *something thats implemented same way in all browsers (or even better doesnt require a browsers and works on range of platforms)
    • Re: (Score:2, Informative)

      by Anonymous Coward
      There are a lot of people who think that web, Ajax and Flash applications are a very bad thing. Not just users, but also noted developers and usability experts.

      More thoughts on why Ajax is bad for web applications [zdnet.com]: this is about how Ajax apps are often very fragile and usually don't work as expected.

      Ephemeral Web-Based Applications [useit.com]: usability guru Jakob Nielsen writes this great article that goes into depth about how most web apps are complete failures when it comes to usability. Even something as basic as
    • I've just got a message from God this morning, he is working on reinventing the whole universe to plug those black holes now, and the request of a better web will be saved in a bugzilla database and be fulfilled several billion years later.
    • You can't WYSIWYG-author semantic content.
      • Semantic is awesome for "make the navigation column have a the colour of Mt Everestt on a cool summer day". Semantic is awesome for "make all the links in my header have an icon in front of them". Semantic is great for "Make my pull quotes use comic sans and set them in a box with a drop shadow and a reflection under them" But you still need to address basic presentation!!! I still need to make the three column grid in a straightforward way!! Where is my grid tag? Where is my "flow content between the
    • Re:reboot the web! (Score:4, Interesting)

      by MyDixieWrecked ( 548719 ) on Sunday December 16, 2007 @03:30PM (#21719442) Homepage Journal
      I agree with you about some things you're saying...

      You need to realize that the markup language shouldn't be used for layout. Your comment about "making UIs as easy as drag and drop" can be done with a website development environment like Dreamweaver. You need a base language for that.

      Personally, I think that XHTML/CSS is going the right way. It can be extended easily, it's simple enough that that basic sites can be created by new users relatively quickly, however complex layouts still require some experience (yeah, it's got a learning curve, but that's what Dreamweaver is for).

      The whole point of XHTML/CSS is that it's not designed to be implemented the same way in all browsers. It's designed so that you can take the same "content" and render it for different devices/media (ie: home PC, cellphone, paper, ebook) simply by either supporting a different subset of the styling or different stylesheets altogether.

      Have you ever tried to look at a table-based layout on a mobile device? have you ever tried to look at a table-based layout on a laptop with a tiny screen or a tiny window (think one monitor, webbrowser, terminal, and code editor on the same 15" laptop screen)? table-based layouts are hell in those scenarios. Properly coded XHTML/CSS pages are a godsend, especially when you can disable styles and still get a general feel for what the content on the page is.

      I'm not sure if I 100% agree with this XHTMLv2 thing, but I think XHTMLv1 is doing great. I just really wish someone would make something that was pretty much exactly what CSS is, but make it a little more robust. Not with more types of styles, but with ways of positioning or sizing an element based on its parent element, better support for multiple classes, variables (for globally changing colors), and ways of adjusting colors relative to other colors. I'd love to be able to say "on hover, make the background 20% darker or 20% more red". I'd love to be able to change my color in one place instead of having to change the link color, the background color of my header and the underline of my h elements each time I want to tweak a color.

      I'd also love if you could separate form validation from the page. doing validation with JS works, but it's not optimal. Having a validation language would be pretty awesome. Especially if you could implement it server-side. If the client could grab the validation code and validate the form before sending and handle errors (by displaying errors and highlighting fields) and then the server could also run that same code and handle errors (security... it would be easy to modify or disable anything on the clientside...) that would be great. All you'd really need is just a handful of cookiecutter directives (validate the length, format/regex, and also have some built-in types like phonenumbers and emails), that would be great, too.

      I also think that it's about time for JS to get an upgrade. Merge Prototype.js into javascript. Add better support for AJAX and make it easier to create rich, interactive sites.

      If we're not careful, Flash is going to become more and more prominent in casual websites. The only advantage the the current standards have is that they're free and don't require a commercial solution to produce.

      XSS is a sideeffect of trusting the client too much and a side-effect that won't be solved by anything you've suggested.

      And why does something need to be "compiled" to be faster? What needs to be faster? Rendering? Javascript? Or are you talking about server-side? Why don't we start writing all our websites in C? Let's just regress back to treating our desktop machines as thinclients. We'll access websites like applications over X11. It'll be great. ;)
    • I'm not THAT upset with it. Javascript + DOM is a good tool, but I feel the real problem is that the designers of these technology don't listen to previous solutions to the problems encountered on the web.

      Why did it take until CSS 3.0 to get easy-to-use columns? The New York Times has been using columns for 150+ years; why did the CSS implementers feel they should just dump all that publishing experience in the toilet and do things their own way?

      Likewise, CSS which is supposed to free us from table-based la
    • If you're going to shill for Silverlight (which you clearly are, given your Scoble [scobleizer.com]-soundbite title and your previous post here [slashdot.org]), at least be honest about it. This reads like one of those "evaluation guides" that sales put out for lazy journalists: "An XYZ app should be judged on features A, B, and C; by coincidence, our new XYZalizer product does A, B and C..."

      I fully sympathize with your desire for a better way, but not at the cost of throwing away the Web and replacing it with the $VENDOR Network, which i
    • by Z34107 ( 925136 )

      .NET / Silverlight?

      ducks

  • by alexhmit01 ( 104757 ) on Sunday December 16, 2007 @01:26PM (#21718334)
    Most of the web is non well-formed, so it's variations of HTML 4 with non-standard components. An HTML 5, that remains a non-XML language, presents a reasonable way forward for "web sites." Without the need to be well-formed, the tools to create are easier and can be sloppy, particularly for moderately admined sites. Creating a new HTML 5 might succeed in migrating those sites. If you avoid most breaks with HTML 4, beyond the worst offenders, Browsers could target an HTML 5, and webmasters would only need to change 5%-10% of the content to keep up. That would mean a less degrading "legacy" mode than the HTML 4 renderers we have now.

    So while the HTML 4 renderers floating around wouldn't be trashed, they could be ignored, left as is, and focus on an HTML 5 one. Migrating to XHTML is non-trivial for people with out-dated tools and lack of knowledge. You can't ignore those sites as a browser maker, but HTML 5 might give a reasonable path to modernizing the "non-professional" WWW.

    XHTML has some great features, by being well-formed XML, you can use XML libraries for parsing the pages. This makes it much easier to "scrape" data off pages and handle inter-system communication, which HTML is not equipped for.

    It's interesting in that HTML and XHTML look almost identical (for good reasons, XHTML was a port of HTML to XML) but are technically very different, HTML being an SGML language, and XHTML an XML language. Both programs have their uses, HTML is "easier" for people to hack together because if you do it wrong, the HTML renderer makes a best guess. XHTML is easier to use professionally, because if there is a problem, you can catch it as being an invalid XML document. Professionals worry about cross-browser issues, amateurs worry about getting it out there.

    XHTML "failed" to replace HTML because it satisfies the needs of professionals to have a standardized approach to minimize cross-browser issues, but lacks the simplicity needed for amateurs and lousy professionals.

    Rev'ing both specs would be a forward move that might simplify browser writing in the long term while giving a migration path. XHTML needs a less confusing and forward looking path, and HTML needs to be Rev'd after being left for dead to drop the really problematic entries and give people a path forward.
    • by Bogtha ( 906264 )

      HTML 5, that remains a non-XML language

      HTML 5 has two serialisations, a quasi-HTML serialisation and an XML serialisation.

      XHTML "failed" to replace HTML because it satisfies the needs of professionals to have a standardized approach to minimize cross-browser issues, but lacks the simplicity needed for amateurs and lousy professionals.

      XHTML failed to replace HTML because a browser with a dominating market share doesn't support it and using it in a backwards-compatible way confers very few adva

      • XHTML failed to replace HTML because a browser with a dominating market share doesn't support it [...]
        Right.

        [...] and using it in a backwards-compatible way confers very few advantages over HTML and none whatsoever for typical developers.
        Wrong -- or at least it depends on what you mean by "typical." Technologies like SVG and MathML are XML-based, so there is a big advantage to having xhtml support in browsers: it lets you use inline SVG and MathML according to the w3c standards. Because MS doesn't su

        • by Bogtha ( 906264 )

          Technologies like SVG and MathML are XML-based, so there is a big advantage to having xhtml support in browsers

          Yes, but the advantage is only there if you give up on Internet Explorer compatibility or put in a lot of extra work by coding an additional Internet Explorer version without SVG and MathML, i.e. the version you are supposedly skipping by using XHTML.

          Because MS doesn't support xhtml, SVG and MathML have basically been killed as practical browser technologies.

          Yes, so you can't really c

  • Seriously, at this point, having a single standard for web pages is going to be passe. All it will take is a good open source implementation for the browser, critical mass, and eventually, the big players will follow.
  • ... all over again. It seems to me that at some point one will become more popular than the other. The question is which one. Then the other will go away. So far though I do not see anything being really improved upon. IMHO there should be certain built-ins to the browser to make it worth it.

    Here is what I would suggest: 1 multi-column drop down, with sort capabilities. This is something that is available in desktop applications; 2) built-in browser menu; 3) better scripting modal window, I should ha

  • by ikekrull ( 59661 ) on Sunday December 16, 2007 @02:21PM (#21718828) Homepage
    The worst thing about W3C standards is the lack of a reference implementation. If you can't produce a computer program that implements 100% of the specification you are writing in a reasonable timeframe, your standard is too complex.

    Is doesnt matter if the reference standard is slow-as-molasses or requires vast quantities of memory, at least you have proven the standard is actually realistically implementable. On the other hand if your reference implementation was easy to build and is really good, then that will foster code re-use and massively jump-start the availability of standardised implementations from multiple vendors. It might also show that you have a really good standard there.

    If you don't do this, you get stuff like SVG - I don't think there is even one single 100% compliant SVG implementation anywhere, and there may never be.

    There aren't any fully compliant CSS, or HTML implementations either, to my knowledge.

    The same goes for XHTML and HTML5. If you, as a standards organisation, are not in a position to directly provide, or sponsor the development of an open reference implementation, then personally, I think you should be restricting your standard to a smaller chunk of functionality that you are actually able to do this with.

    There is no reason a composite standard, with a bunch of smaller, well defined components, each with reference implementations, can't be used to specify 'umbrella' standards.

    Now, i am also aware that building a reference application tends to make the standard as written overly influenced by shortcomings in the reference implementation, but i really can't believe this would be worse that the debacle surrounding WWW standards we've had for the last 10+ years. Without a conformant reference implementation, HTML support in browsers is dictated by the way Internet Explorer and Netscape did things anyway.

    I'm also aware that smaller standards tends to promote a rather piecemeal evolution of those standards, when what is often desired is an 'across the board' update of technology.

    But this 'lets define monster standards that will not be fully implemented for years, if at all, and hope for the best' approach seems to be obviously bad, allowing larger vendors to first play a large role in authoring a 'standard' that is practically impossible to fully implement, and then to push their own hopelessly deficient versions of these 'standards' on the world and sit back and laugh because there is no way to 'do better' by producing a 100% compliant version.

    • I agree with you completely. I think that *every* standard should come with a reference implementation. I can't even comprehend why standards bodies don't do this. It is the single most effective way to ensure that your standard is adopted. And it proves, as you said, that the standard is reasonably implementable - the code will demonstrate how easily implemented the standard is, and certainly the standard body would modify the standard where its egregiously difficult to implement instead of sinking lot
    • Re: (Score:3, Informative)

      by Bogtha ( 906264 )

      The worst thing about W3C standards is the lack of a reference implementation.

      For a few years now, the W3C publication process has included an additional final step. It is not possible for a specification to reach final Recommendation stage unless it has two complete interoperable implementations.

  • by pikine ( 771084 ) on Sunday December 16, 2007 @02:36PM (#21718968) Journal
    From the conclusion of TFA:

    If you're more interested in XHTML V1.1 than HTML V4, looking for an elegant approach to create documents accessible from multiple devices, you are likely to appreciate the advantages of XHTML V2.

    The author apparently has no experience with rendering XHTML on mobile devices. First of all, since the screen is smaller, it's not just about restyling things in a minimalist theme. It's about prioritizing information and remove the unnecessary one so more important information becomes more accessible in limited display real-estate.

    For example, anyone who accessed Slashdot homepage on their mobile phone knows the pain of having the scroll down past the left and right columns before reaching the stories. You can simulate this experience by turning off page style and narrowing your browser window to 480 pixels wide. The story summaries are less accessible because they're further down a very long narrow page.

    Another problem is the memory. Even if you style the unnecessary page elements to "no display", they're still downloaded and parsed by the mobile browser as part of the page. Mobile devices have limited memory, and I get "out of memory" error on some sites. For reading long articles on mobile devices, it is better to break content into more pages than you would on a desktop display, both for presentation and memory footprint reasons.

    For these two reasons, a site designer generally has to design a new layout for each type of device. The dream of "one page (and several style sheets) to rule them all" is a fairytale.

  • by Animats ( 122034 ) on Sunday December 16, 2007 @02:46PM (#21719042) Homepage

    The current situation is awful.

    • Major tools, like Dreamweaver, generate broken HTML/XHTML.. Try creating a page in Dreamweaver in XHTML or Strict HTML 4.1. It won't validate in Dreamweaver's own validator, let alone the W3C validator. The number of valid web pages out there is quite low. I'm not talking about subtle errors. There are major sites on the web which lack even proper HTML/HEAD/BODY tags.
    • The "div/float/clear" approach to layout was a terrible mistake. It's less powerful than tables, because it isn't a true 2D layout system. Absolute positioning made things even worse. And it got to be a religious issue. This dumb but heavily promoted article [hotdesign.com] was largely responsible for the problem.
    • CSS layout is incompatible with WYSIWYG tools The fundamental problem with CSS is that it's all about defining named things and then using them. That's a programmer's concept. It's antithetical to graphic design. Click and drag layout and CSS do not play well together. Attempts to bash the two together usually result in many CSS definitions with arbitrary names. Tables mapped well to WYSIWYG tools. CSS didn't. (Does anybody use Amaya? That was the W3C's attempt at a WYSIWYG editor for XHTML 1.0.)
    • The Linux/open source community gave up on web design tools. There used to be Netscape Composer and Nvu, but they're dead.
    • The sad thing about broken web code is that it's browsers that enable it.

      If people know they can be lazy and write crap code that the browser will somehow manage to render anyway, they will since it's easier than writing correct code.
    • by shutdown -p now ( 807394 ) on Sunday December 16, 2007 @03:34PM (#21719484) Journal
      Drag'n'drop is simply not a working approach to design proper UI (i.e. the one that automatically scales and reflows to any DPI / window size / whatever).

      As for "defining named things" - the concept of HTML is all about semantic markup. That's why using tables for layout is frowned upon, not because they are bad as such.

    • by ceoyoyo ( 59147 ) on Sunday December 16, 2007 @03:36PM (#21719498)
      HTML isn't supposed to be WYSIWYG. If you want traditional graphic design, make a PDF.

      HTML is supposed to be a document format that can be flexibly rendered. Pretty much the opposite of WYSIWYG actually.
      • Re: (Score:3, Interesting)

        by coryking ( 104614 ) *
        WYSIWYG is impossible if you are using templates. You gotta visualize how the chunks come together!

        If you want traditional graphic design, make a PDF.

        PDF is for printing, dummy :-)

        I've got a better idea anyway... How about a way to take our centuries of knowledge about "traditional graphic design" and apply it to the a web-based medium? Do we have to chuck out everything we know about good design just because of the silly constraints of HTML/CSS? How about we improve or replace HTML/CSS with something that incorporates all we know about "traditional gr

        • Re: (Score:3, Insightful)

          by ceoyoyo ( 59147 )
          What's wrong with a PDF? It's got exactly what you seem to want -- total control over your layout. It also supports hyperlinks. Safari certainly renders PDFs inline as if they were somewhat retarded web pages. I'm not sure why you think it's just for printing.

          HTML has it's purpose. It's time to stop trying to pervert it to yours. Either invent a fixed document format for the web or use one of the ones that's already widely supported (ie PDF). But guess what? There's a REASON people hate web links th
      • Re: (Score:3, Interesting)

        by grumbel ( 592662 )
        ### Pretty much the opposite of WYSIWYG actually.

        That might be the theory, but it simply is not true in reality. HTML is pretty much a WYSIWYG format with additional support for different font sizes and page width. The second you add a tag you are tied to a specific display DPI, the second you add a navigation bar, you no longer have a document that can adjust to different output devices easily. I mean just look at the web today, nobody is using HTML for writing documents. If people want to write a book, t
    • by zmotula ( 663798 )
      This is going to sound like a troll, but for me the situation is awful precisely and only because of Internet Explorer. I am a web designer, and my job would be infinitesimally easier and more fun if I could write for sane browsers only. I know how to work around most of the bugs now, but usually that means sticking to basic, dumb solutions (or testing like a madman). I do not need major tools, I am perfectly fine with Vim and Unix toolbox. I am happy with the div-float-clear approach as implemented by dece
  • by wikinerd ( 809585 ) on Sunday December 16, 2007 @06:01PM (#21720634) Journal

    I thank the HTML 5 guys for their attempts, but I prefer XHTML v2

    From TFA:

    XHTML V2 isn't aimed at average HTML authors

    XHTML is for intelligent human beings, you know, people who can actually understand what separation of concerns is.

    [HTML v5] propose features that might simplify the lives of average Web developers

    So HTML v5 is for people who don't understand separation of concerns.

    Unfortunstely that's the 99% of web kiddies out there.

    The standards will appeal to different audiences.

    One standard for smart people who know programming and actually work with an engineering mindset, another for those who see the web as a big graffiti and work with an "anything goes" mindset. No thanks, I prefer ONE standard for smart people, XHTML v2, and just to kick out everyone who isn't qualified.

    • by Dracos ( 107777 ) on Sunday December 16, 2007 @07:12PM (#21721080)

      Agreed, this article is HTNL5 apologist rhetoric. I thought it was rather well-balanced until the author got to HTML5, where his preference is subtly revealed.

      XHTML2's universal src attribute is mentioned (confusingly called a tag), but the universal href attribute is not, which allows any element to be transformed into a link. Nor is the rolse attribute mentioned, which allows a tag to be assigned a semantic meaning (like menu or header) without expanding the tag set.

      TFA even admits in a roundabout way that HTML5 exists because the majority of so called "web developers" are ignorant of the current standards and incapable of effectively using them. If you need to be "clever" to use XHTML2, then perhaps no one will have to reach for the eye-bleach every time they wander into places like MySpace (where page skins are based on an exploit where browsers interpret <style> tags outside the document head, which is illegal).

      I tell people "Writing web pages is easy. Writing them well is hard." This is proven by the amount of junk documents on the web that don't validate as anything but pretty, even if beauty is in the eye of the beholder.

      The author wisely avoided any discussion of the silly new tags (some of which are presentational, not semantic) HTML5 includes. He does mention XHTML5, which is "optional"... why should we take that step backwards?

      The anti-XML-compliance people like to complain that XML is too verbose. If they don't like it, they can use something else, like RTF. Cars have gotten verbose too over the years. Those people can put their money where their moths are by buying an antique that doesn't have a radio, GPS, seat belts, padded dashboards, windows, crumple zones, suspension, electric engine starters, or any number of improvements that could be argued to be bloat.

      XHTML2 is the way we should go.

A committee takes root and grows, it flowers, wilts and dies, scattering the seed from which other committees will bloom. -- Parkinson

Working...