HTML V5 and XHTML V2 344
An anonymous reader writes "While the intention of both HTML V5 and XHTML V2 is to improve on the existing versions, the approaches chosen by the developers to make those improvements are very different. With differing philosophies come distinct results. For the first time in many years, the direction of upcoming browser versions is uncertain. This article uncovers the bigger picture behind the details of these two standards."
Re:Bet there still isn't a decent "Stop!" button (Score:3, Insightful)
Content from a 3rd party runs in a more restrictive context than the primary site (this includes frames etc).
You are then not held at the whim of a web admin to ensure these tags are included.
Or you could just use the noscript addin right now and choose which sites you trust at your discretion.
Browser vendors choice (Score:4, Insightful)
As it stands, with both XHTML 5 and XHTML 2 using the same namespace, it is only possible to support one of the two.
Re:I bet my ass.. (Score:3, Insightful)
This also seems to be the case when ever somebody bitches about web designers changing fonts, using javascript, or doing something to make their page look nice. You visit the websites created by the "changing the font at all, even in the stylesheet, is evil" or the classic "why are you trying to use two columns? two columns are evil" religious zealots and all their pages look really dull and boring. Long streams of times new roman. I guess this is our future, eh?
reboot the web! (Score:5, Insightful)
people/companies are trying to develop rich applications using decade old markup language thats improperly supported by different browsers (even firefox doesn't fully support css yet) and is a very ugly mix right now, its like squeezing a rectangular plasticine object thru a round,triangular and starshaped holes at the same time
the web needs a reboot
we need a programming language that:
*works on the server and the client
*something that makes making UIs as easy as drag and drop
*something that does not forgive idiot html "programmers" who write bad code
*something that doesnt suffer from XSS
*something that can be extended easily
*something that can be "compiled" for faster execution
*something thats implemented same way in all browsers (or even better doesnt require a browsers and works on range of platforms)
Different directions -- Need Both (Score:5, Insightful)
So while the HTML 4 renderers floating around wouldn't be trashed, they could be ignored, left as is, and focus on an HTML 5 one. Migrating to XHTML is non-trivial for people with out-dated tools and lack of knowledge. You can't ignore those sites as a browser maker, but HTML 5 might give a reasonable path to modernizing the "non-professional" WWW.
XHTML has some great features, by being well-formed XML, you can use XML libraries for parsing the pages. This makes it much easier to "scrape" data off pages and handle inter-system communication, which HTML is not equipped for.
It's interesting in that HTML and XHTML look almost identical (for good reasons, XHTML was a port of HTML to XML) but are technically very different, HTML being an SGML language, and XHTML an XML language. Both programs have their uses, HTML is "easier" for people to hack together because if you do it wrong, the HTML renderer makes a best guess. XHTML is easier to use professionally, because if there is a problem, you can catch it as being an invalid XML document. Professionals worry about cross-browser issues, amateurs worry about getting it out there.
XHTML "failed" to replace HTML because it satisfies the needs of professionals to have a standardized approach to minimize cross-browser issues, but lacks the simplicity needed for amateurs and lousy professionals.
Rev'ing both specs would be a forward move that might simplify browser writing in the long term while giving a migration path. XHTML needs a less confusing and forward looking path, and HTML needs to be Rev'd after being left for dead to drop the really problematic entries and give people a path forward.
Re:Bet there still isn't a decent "Stop!" button (Score:3, Insightful)
</restriction> <!-- closes the existing restriction zone. Might not pass as valid XML, but HTML browsers work with tag soup. -->
Something evil!!!
<restriction lock="I don't really care here" except="everything"> <!-- This bit is purely optional -->
Obviously I need to work on something more destructive than "Something evil!!!" before I attempt to conquer the planet...
Re:Bet there still isn't a decent "Stop!" button (Score:3, Insightful)
Think webmail (yahoo, gmail etc), when you receive spam, your webmail provider is the one sending you the data.
Usually they will try to filter the content to make it safe. BUT as history shows it's not always 100%.
The W3C or browser maker might also make a new tag/feature that your filtering libraries aren't aware of (e.g. old sites with guestbooks that might not filter out the "latest and greatest stuff").
With my proposal, users can enable javascript+flash for stuff like youtube, and youtube can be more certain that the comments about the video will be treated as plain html by browsers that support the security tag. Stuff that slips through the filters would likely still be rendered inactive by those browsers.
Re:Bet there still isn't a decent "Stop!" button (Score:5, Insightful)
Why would your site let through new tags that it doesn't recognise? Use a whitelist.
This only usually occurs if you let through malformed HTML. Use tidy or similar to ensure you only emit valid HTML. Not to mention the fact that the whole problem is caused by lax parsing — something the W3C has been trying to get people to give up on with the parsing requirements for XML.
You could define such a subset using the modularised XHTML 1.1 or your own DTD.
Yes, but it won't be actually used that way. If browsers went to the trouble of actually implementing this extra layer of redundancy, all the people with lax security measures would simply use that as an alternative and all the people who take security seriously will use it, despite it not being necessary. I think the cumulative effect would be to make the web less secure.
Re:I bet my ass.. (Score:2, Insightful)
I don't think I've ever seen anybody say this. Example?
In actual fact, their pages don't look boring at all. Your default browser setup looks boring.
Remember, a web design doesn't look like anything until it is realised with the combination of hardware, browser defaults and personal settings. If you think a site that uses your preferences looks boring, then your preferences are to blame.
Re:Bet there still isn't a decent "Stop!" button (Score:3, Insightful)
My attempts to change the world (albeit by a little bit) aren't going very well either - it's been more than 5 years since I first proposed the tags, but so far the W3C and Mozilla bunch have preferred to make other "more fun" stuff instead...
Maybe Microsoft has subverted the W3C too
Re:Bet there still isn't a decent "Stop!" button (Score:3, Insightful)
There is also the minor point that your method is almost completely incompatible with DOM, but I'll overlook that for now.
Re:Where is Microsoft? (Score:2, Insightful)
ms ain't the devil for development, sometimes they drive new features and functionality that would take forever to incorporate otherwise. do they always do it in the best of ways, no, but they do bring out good things from time to time...
No standard without reference implementation (Score:5, Insightful)
Is doesnt matter if the reference standard is slow-as-molasses or requires vast quantities of memory, at least you have proven the standard is actually realistically implementable. On the other hand if your reference implementation was easy to build and is really good, then that will foster code re-use and massively jump-start the availability of standardised implementations from multiple vendors. It might also show that you have a really good standard there.
If you don't do this, you get stuff like SVG - I don't think there is even one single 100% compliant SVG implementation anywhere, and there may never be.
There aren't any fully compliant CSS, or HTML implementations either, to my knowledge.
The same goes for XHTML and HTML5. If you, as a standards organisation, are not in a position to directly provide, or sponsor the development of an open reference implementation, then personally, I think you should be restricting your standard to a smaller chunk of functionality that you are actually able to do this with.
There is no reason a composite standard, with a bunch of smaller, well defined components, each with reference implementations, can't be used to specify 'umbrella' standards.
Now, i am also aware that building a reference application tends to make the standard as written overly influenced by shortcomings in the reference implementation, but i really can't believe this would be worse that the debacle surrounding WWW standards we've had for the last 10+ years. Without a conformant reference implementation, HTML support in browsers is dictated by the way Internet Explorer and Netscape did things anyway.
I'm also aware that smaller standards tends to promote a rather piecemeal evolution of those standards, when what is often desired is an 'across the board' update of technology.
But this 'lets define monster standards that will not be fully implemented for years, if at all, and hope for the best' approach seems to be obviously bad, allowing larger vendors to first play a large role in authoring a 'standard' that is practically impossible to fully implement, and then to push their own hopelessly deficient versions of these 'standards' on the world and sit back and laugh because there is no way to 'do better' by producing a 100% compliant version.
Re:Where is Microsoft? (Score:4, Insightful)
Re:Where is Microsoft? (Score:2, Insightful)
Ajax-like techniques are possible without XMLHttpRequest and I don't believe Google Maps uses XMLHttpRequest anyway. If any organisation is responsible for the popularity of Ajax, it's Google, as it was when they started using it extensively that it really took off.
Re:I bet my ass.. (Score:3, Insightful)
Please re-read the original comment. It was saying that you can use JavaScript without being backwards-incompatible. You seem to have confused this with avoiding JavaScript altogether. Every single point you make is good against an argument that JavaScript should be avoided, but completely irrelevant to somebody asking for it to degrade gracefully, which is the distinction BlueParrot was trying to explain to you.
The current situation is awful. (Score:5, Insightful)
The current situation is awful.
Re:Bet there still isn't a decent "Stop!" button (Score:5, Insightful)
You want easy? SQL injections are easy to handle. Just use a parameterized query so you don't have to mix tainted data with your trusted SQL.
Back in the stone age before php thought parameterized queries were more then enterprise fluffery, you were forced to mix your user data with your SQL. And oh were the results hilarious! It look three tries (and three fucking functions) for PHP/mysql to get their escape code right and I'm sure you can still inject SQL with "mysql_real_escape_string()" in some new unthought of way.
There is no "parameterized query" with HTML. You are *forced* to mix hostile user data with your trusted HTML. If it was that hard to sanitize an "easy" language like SQL, how hard is it to sanitize a very expressive language like HTML?
You are telling me all those CPAN modules handle the hundreds of ways you can inject HTML into the dozens of different browsers? How many ways can you make an angle bracket and have it interpreted as a legit browser tag? How many ways can you inject something to the end of a URL to close the double quote and inject your javascript? How many ways, including unicode, can you make a double quote? Dont forget, your implementation cannot strip out the Unicode like I've seen some filters do - I need the thing to handle every language! I would guess there are thousands of known ways to inject junk into your trusted HTML.
I promise you that even the best CPAN module is still exploitable in some way not considered by the author. And I'd be insane to roll my own, as I'm not as smart as she is.
Don't kid yourself and thinking filtering user generated content is easy. It is very, *very* hard.
Re:I bet my ass.. (Score:3, Insightful)
There is a very strong business case for good degradation too... Last I checked, Google doesn't interpret your javascript. You want good SEO, you better make sure the content flows right in lynx (which is the best way to think about how google sees the page).
Sadly, screen readers are pretty much like google too, but I really think we aren't feeding screen readers enough information for them to properly read a page. I really dont know the answer to screen readers. I've never played much with it, but in the windows world, if you were doing a winforms app you can sprinkle your form with metadata to help screen readers. But again, even the winforms solution is a bit like an alt tag.
When I took a usability class, we watched some video I wish I could find of somebody using a screen reader. Talk about intense. Imagine reading a web page, or any document for that matter, while looking through a straw that is only one word wide. That is about what it is like. Now read it with the voice cranked to "hyper fast talk mode" and that is how the blind experience the web. Very interesting and eye opening.
Whatever the future holds (silverlight/flex), we need to make sure the standard has some good, juicy metadata to help out screen readers (and google, really).
Where was I now?
Re:Why not ditch HTML? (Score:3, Insightful)
Re:Where is Microsoft? (Score:2, Insightful)
Re:The current situation is awful. (Score:5, Insightful)
As for "defining named things" - the concept of HTML is all about semantic markup. That's why using tables for layout is frowned upon, not because they are bad as such.
Re:The current situation is awful. (Score:4, Insightful)
HTML is supposed to be a document format that can be flexibly rendered. Pretty much the opposite of WYSIWYG actually.
This is silly. (Score:3, Insightful)
Doesn't really matter how "hard to guess" your string is if you're going to transmit it cleartext in the body of your HTML document, does it?
"But wait!" you say, "We can randomize the string every time the document is served, thus defeating anything but an embedded Javascript with access to the DOM." Perhaps so, but now you're talking about server-side behavior — something clearly beyond the purview of the HTML specification.
If you think about it clearly, there's only one place that it makes any sense to address hostile embedded content, and it is server-side, with the growing battery of techniques already in service. Insisting that the HTML spec and browsers should be addressing this issue is assinine.
Re:The current situation is awful. (Score:3, Insightful)
HTML has it's purpose. It's time to stop trying to pervert it to yours. Either invent a fixed document format for the web or use one of the ones that's already widely supported (ie PDF). But guess what? There's a REASON people hate web links that go to PDFs. It's because the web itself was wisely intended NOT to be WYSIWYG because I don't want to have my monitor set the same way as yours is.
Re:Bet there still isn't a decent "Stop!" button (Score:3, Insightful)
It ain't easy as you say bro...
Re:I bet my ass.. (Score:3, Insightful)
The Web is not for the developers. It's for the people who want and need the data, the clients who in the end actually pay the bills and view the pages. If it's a games site for people to play Flash games, great: othewise, get out of the dancing bears business and let me look up what I need.
I prefer XHTML 2, thanks (Score:5, Insightful)
I thank the HTML 5 guys for their attempts, but I prefer XHTML v2
From TFA:
XHTML is for intelligent human beings, you know, people who can actually understand what separation of concerns is.
So HTML v5 is for people who don't understand separation of concerns.
Unfortunstely that's the 99% of web kiddies out there.
One standard for smart people who know programming and actually work with an engineering mindset, another for those who see the web as a big graffiti and work with an "anything goes" mindset. No thanks, I prefer ONE standard for smart people, XHTML v2, and just to kick out everyone who isn't qualified.
Re:The current situation is awful. (Score:3, Insightful)
Semantic markup languages like HTML break down because the web isn't for print. Semantic markup is the holey grail in the print world because it works so well for linear documents. The web is an interactive, non linear medium that doesn't get printed.
The web is an two way, interactive, non linear medium that is evolving to almost real-time interaction between the client and server. Books, which are written in semantic languages like LaTeX, dont have client-server interaction. Books dont have forms. Books dont have real-time data. Books are none of these things. Books only have headings, tables of contents, footnotes, indexes and other easy to describe things. These are all very easy things to handle in semantic markup languages. In fact, you are insane *not* to use semantic markup for a 300 page book because it makes changing the layout difficult.
You *cannot resize a book with a mouse*. You *cannot order an ipod* from a book. You *cannot post a comment shared across the globe* in a book. You *dont print the book in different sizes* (for example, you couldn't take Programming Perl and use the same content for a pocket sized book). You *dont have programming language running inside the book*. Books dont have programmers designing significant chunks of their architecture.
The web is more than a book. The web has some things that are book like that make sense for semantic content (all H1 should be this font) but lots that dont make sense (make the page 100% high so there are no scroll bars and inlay a second grid for scrollable content... think gmail). You think it makes sense to have a language that is only semantic for creating web applications? How could it even begin to describe google maps?
Even more damning is a book, which is described semantically, HAS A FIXED OUTPUT DEVICE LIKE A PDF FILE!!! Book authors can "cheat" with their semantic markup and layout because they already know what the target output device is!! They know what inks they can use, what fonts they can use, what the margins are, what the DPI of the printer is, and what the page dimensions are! They all output pixel perfect books using a semantic markup language! We HTML authors no NONE OF THIS and yet you expect us to design our web pages the same semantic markup abtraction as a book author!?
Can't you see the irony of recommending I use PDF when the main way to generate a PDF is with software using a semantic language!
Can't you see we can acheave the same goal of "making it easy to change the layout" in ways besides a stylesheet? Ever heard to a template language like the one used by Ruby on Rails or Template::Toolkit? Isn't it easier and cheaper to swap out "big layout" bits like columns by swapping out a template than it is a stylesheet? You think all it takes to target a mobile phone is just swapping out the stylesheet? No sir! I have a template system that changes *the entire fucking document* to suit mobile phones and their limitations! Isn't that the better way when you consider how different the two devices are?
So stop treating the web like a damn book! The web is not a book and semantic markup breaks down as an abstraction with modern development. This is very obvious to anybody who has done real web application development. Either help invent a better language to abstract what the web is or get left in the dust while you preach to a shrinking congregation.
Re:No standard without reference implementation (Score:1, Insightful)
Re:Bet there still isn't a decent "Stop!" button (Score:4, Insightful)
Wouldn't something like:
<sandbox src="restrictedContent.html" allow="html,css" deny="javascript,cookies"/>
Where's the databinding? (Score:2, Insightful)
What the web is crying out for is a standard that supports a rich data hierarchy, a rich presentation hierarchy, and a databinding mechanism to connect these two (preferably without using CSS, but that's another debate).
That's exactly where the next-gen UI frameworks have gone (Flex from Adobe, XAML from Microsoft). These frameworks represent the wave of the future and that's where the web needs to go too.
Meanwhile, the web standards community spouts all this rhetoric of "separating presentation and semantics" in HTML/CSS, which is nonsense. Both HTML and CSS are precisely concerned with presentation. And they are not at all separate. You need to know and love both to coax good looking pages out of a browser. All this huffing and puffing, yet the best they can offer for application-specific data models is microformats!
As far as I can tell, both HTML 5 and XHTML 2 are icing on the cake, and missing the main course altogether.
Re:I prefer XHTML 2, thanks (Score:5, Insightful)
Agreed, this article is HTNL5 apologist rhetoric. I thought it was rather well-balanced until the author got to HTML5, where his preference is subtly revealed.
XHTML2's universal src attribute is mentioned (confusingly called a tag), but the universal href attribute is not, which allows any element to be transformed into a link. Nor is the rolse attribute mentioned, which allows a tag to be assigned a semantic meaning (like menu or header) without expanding the tag set.
TFA even admits in a roundabout way that HTML5 exists because the majority of so called "web developers" are ignorant of the current standards and incapable of effectively using them. If you need to be "clever" to use XHTML2, then perhaps no one will have to reach for the eye-bleach every time they wander into places like MySpace (where page skins are based on an exploit where browsers interpret <style> tags outside the document head, which is illegal).
I tell people "Writing web pages is easy. Writing them well is hard." This is proven by the amount of junk documents on the web that don't validate as anything but pretty, even if beauty is in the eye of the beholder.
The author wisely avoided any discussion of the silly new tags (some of which are presentational, not semantic) HTML5 includes. He does mention XHTML5, which is "optional"... why should we take that step backwards?
The anti-XML-compliance people like to complain that XML is too verbose. If they don't like it, they can use something else, like RTF. Cars have gotten verbose too over the years. Those people can put their money where their moths are by buying an antique that doesn't have a radio, GPS, seat belts, padded dashboards, windows, crumple zones, suspension, electric engine starters, or any number of improvements that could be argued to be bloat.
XHTML2 is the way we should go.
Re:The current situation is awful. (Score:3, Insightful)
Layout is just as important to understanding content as the content itself. If you went into a $100USD per dish restaurant dressed in a tuxedo with your hot chick date and the menu is all in comic sans, what do you think about the quality of the food you are about to be served? Those guys who march around downtown areas might have really good compelling content, but nobody reads it because it is always done in permanent marker and twenty different colors. You know, the time cube guy might be right, but his site design makes him look like a joke. People argue that Kerry lost the 2004 election because they did a poor job with the presentation of their logo [nytimes.com].
The thing that upsets me about these debates is people think that the colour scheme used, the fonts used, the line spacing, the margins, the proportion between elements, or any other fundamental unit of design is just pretty window dressing around content. Those people also tell you looks dont matter and first impressions aren't important. They are wrong. Very, very wrong. Layout matters, even more on the internet than in print. We need powerful tools in our language to help us express layout. Dismissing layout as a trivial afterthought is a great way to ensure our future is nothing but flash apps.
Missing the point? (Score:2, Insightful)
All you "standards nazis" out there, please don't forget that. The web is for everyone, yes, even those who can't write HTML "properly".
Hopefully browsers will always render badly formed HTML, otherwise the web will be a poorer place for it.
Re:The current situation is awful. (Score:4, Insightful)
Drag'n'drop works fine if it is manipulating a proper UI API. OS X's Interface Builder, with its springs and struts system, comes to mind.
Re:Missing the point? (Score:2, Insightful)
Re:Why not ditch HTML? (Score:3, Insightful)
Use Tidy, and suddenly you've got perfectly fine XHTML again.
You don't have to write one yourself, because W3C provides a perfectly good one, and there already is a large number of open source clones of Tidy. Writing it yourself would be stupid and prone to error. The existing ones are as good as "guess what you mean" can getm and that is an improvement, because you're not trusting wacky, unreliable browsers to turn the crap on your site into something valid, you're doing it yourself. You are in control! That is always an improvement.
PS: My Slashdot comments are perfectly valid XHTML snippets. (Not valid XML, because they don't have a root element. I'm trusting the Slashdot server to handle that for me.)