Stories
Slash Boxes
Comments

News for nerds, stuff that matters

A Statistical Review of 1 Billion Web Pages

Posted by ScuttleMonkey on Wed Jan 25, 2006 03:41 PM
from the demanding-a-recount dept.
chrisd writes "As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata. We decided that to publish this would be of significant utility to developers. It's also a fascinating look into how people create web pages. For instance one thing that surprised me was that the <title> is more popular than <br>. The graphs in the report require a browser with SVG and CSS support (like Firefox 1.5!). Enjoy!"
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • and all I got was Britney Spears.

    Sheesh.
  • We've come a long way (Score:4, Funny)

    by suso (153703) * on Wednesday January 25 2006, @03:42PM (#14561604)
    (http://suso.suso.org/ | Last Journal: Tuesday March 09 2004, @12:03AM)
    if the tag isn't on the top elements list.
  • is more popular than (Score:5, Funny)

    by InsideTheAsylum (836659) on Wednesday January 25 2006, @03:46PM (#14561636)
    well when people talk like this and dont bother using punctuation spacekeys or any of the skills that they have been taught in school its no wonder why webpages turn out like this not to mention those long runon sentences and also all that broken code that are the fist attempt at a webpage by a twelve year old kid who tried to steal someone elses layout and replaced the word with his own then you start to look at all of those dynamically generated webpages and the layouts and the style sheets and its no wonder why the good old br tag never get a work out.
  • Finally... (Score:5, Funny)

    by RandoX (828285) on Wednesday January 25 2006, @03:46PM (#14561637)
    An un-slashdottable server.
  • BR tag? (Score:5, Insightful)

    by p0 (740290) on Wednesday January 25 2006, @03:46PM (#14561638)
    (http://www.primary0.com/)
    With css power you really do not need to use br, maybe that is the reason for the small stats for the tag's use?
    • Re:BR tag? (Score:4, Interesting)

      by masklinn (823351) <{slashdot.org} {at} {masklinn.net}> on Wednesday January 25 2006, @03:53PM (#14561711)

      Small stat? are you joking?

      This is about the number of sites that use the tag, not the number of tags out in the wild, and <br> is used on more pages than <table>, there are as many pages with at least one <br> than pages with at least an <img> tag

      That's freaking huge, for a tag that should almost never be used.

      [ Parent ]
      • Re:BR tag? by poot_rootbeer (Score:2) Wednesday January 25 2006, @04:46PM
        • Re:BR tag? by Bogtha (Score:3) Wednesday January 25 2006, @05:11PM
          • Re:BR tag? by Iron E (Score:1) Wednesday January 25 2006, @06:52PM
          • 1 reply beneath your current threshold.
        • Re:BR tag? by ScottyH (Score:1) Wednesday January 25 2006, @05:24PM
          • Re:BR tag? by Gonoff (Score:2) Wednesday January 25 2006, @06:43PM
        • Re:BR tag? by masklinn (Score:2) Wednesday January 25 2006, @06:01PM
          • Re:BR tag? by kchrist (Score:1) Wednesday January 25 2006, @07:59PM
        • Re:BR tag? by Blakey Rat (Score:2) Wednesday January 25 2006, @07:12PM
          • Re:BR tag? by Red Alastor (Score:1) Wednesday January 25 2006, @08:28PM
            • Re:BR tag? by mdecarle (Score:1) Thursday January 26 2006, @10:34AM
              • Re:BR tag? by Red Alastor (Score:2) Thursday January 26 2006, @01:33PM
            • Re:BR tag? by Domo-Sun (Score:2) Thursday January 26 2006, @12:48PM
        • 1 reply beneath your current threshold.
      • Re:BR tag? by Metasquares (Score:2) Wednesday January 25 2006, @04:53PM
        • Re:BR tag? by Luyseyal (Score:2) Wednesday January 25 2006, @05:16PM
        • Re:BR tag? by masklinn (Score:2) Wednesday January 25 2006, @06:04PM
          • Re:BR tag? by Metasquares (Score:3) Wednesday January 25 2006, @08:54PM
      • Re:BR tag? by Kelson (Score:2) Wednesday January 25 2006, @08:14PM
        • Re:BR tag? by masklinn (Score:2) Thursday January 26 2006, @06:17AM
        • 1 reply beneath your current threshold.
    • Re:BR tag? by Eightyford (Score:1) Wednesday January 25 2006, @03:56PM
      • Re:BR tag? by crumley (Score:2) Wednesday January 25 2006, @04:05PM
        • Re:BR tag? by TubeSteak (Score:2) Wednesday January 25 2006, @05:58PM
      • Re:BR tag? by CRCulver (Score:2) Wednesday January 25 2006, @04:25PM
        • Re:BR tag? by crabpeople (Score:2) Wednesday January 25 2006, @04:33PM
        • Re:BR tag? by Blakey Rat (Score:2) Wednesday January 25 2006, @07:15PM
          • Re:BR tag? by mrchaotica (Score:2) Wednesday January 25 2006, @08:48PM
      • Re:BR tag? by Just Some Guy (Score:2) Wednesday January 25 2006, @06:18PM
        • Re:BR tag? by Bogtha (Score:2) Wednesday January 25 2006, @06:53PM
          • Re:BR tag? by Just Some Guy (Score:2) Thursday January 26 2006, @12:39AM
    • Re:BR tag? by torunforever (Score:1) Wednesday January 25 2006, @04:05PM
      • Re:BR tag? by MyHair (Score:2) Wednesday January 25 2006, @05:07PM
      • Re:BR tag? by hixie (Score:1) Wednesday January 25 2006, @09:05PM
    • Re:BR tag? CSS, duh! by conJunk (Score:2) Wednesday January 25 2006, @04:50PM
    • by TekGoNos (748138) on Wednesday January 25 2006, @06:14PM (#14562967)
      (Last Journal: Thursday February 12 2004, @03:17AM)
      The summary got it wrong,

      the study states that there are more pages using title, than pages using br. NOT that more title tags are used than br tags.

      Approximatly 98% of all pages have a title tag and approximatly 7 out of 8 pages have (at least one, probably more) br tags.
      [ Parent ]
    • 1 reply beneath your current threshold.
  • No GOTOs? (Score:1, Redundant)

    by slashbob22 (918040) on Wednesday January 25 2006, @03:47PM (#14561647)
    I was expecting a few GOTO commands.

    For Example:
    IF browser="IE" GOTO Spyware
  • Not complete (Score:5, Funny)

    by Anonymous Coward on Wednesday January 25 2006, @03:47PM (#14561653)
    It didn't have everything of course. Some elements were censored on behalf of the Chinese government.
    • Re:Not complete by onedotzero (Score:1) Wednesday January 25 2006, @04:06PM
  • by ecklesweb (713901) on Wednesday January 25 2006, @03:49PM (#14561674)
    I have to ask, what's the purpose of a 1-BILLION page sample? That's the beautiful thing about statistics. If you can say something about the distribution of characteristics within a population, you don't have to survey the entire population to get meaningful results. Are the study authors proposing that no standard distribution can be applied to the entire universe of web pages? If that's the case, then do the statistics they apply to their sample of one billion really say anything predictive about the entire population?

    Aside from the cool factor of saying they sampled a billion pages, I don't see what extra benefits are gained from that extra effort.
  • dude (Score:2)

    by dotpavan (829804) on Wednesday January 25 2006, @03:50PM (#14561680)
    (http://dotpavan.googlepages.com/home)
    I am still at the 22nd page, lot more to go (1 billion? OMG!).. see you all there
  • Cool statistics (Score:1)

    by mendaliv (898932) on Wednesday January 25 2006, @03:50PM (#14561683)
    Their study on the <img> [google.com] element is quite interesting.

    3/4 of the parsed pages use alt text with their <img> tags, and about 10% use image maps... which I find a little scary. I haven't seen an image map in years.
  • well this is new (Score:1, Funny)

    by Abstract_Me (799786) on Wednesday January 25 2006, @03:52PM (#14561694)
    (Last Journal: Thursday March 17 2005, @09:14AM)
    we haven't slashdotted the google server... but it would appear that the firefox download site for extensions is.
  • by digitaldc (879047) * on Wednesday January 25 2006, @03:52PM (#14561701)
    The 'br' element [google.com]

    The br element is a simple one, yet used on so many pages that it is the 8th most-used element. It is used more than the p element.

    clear, style, class, soft, id, and \.


    Wow! I never knew you guys were that popular.
  • by hey (83763) on Wednesday January 25 2006, @03:53PM (#14561709)
    (Last Journal: Thursday December 08 2005, @04:33PM)
    Not just non-evil. This is useful and interesting stuff.
  • by Benanov (583592) on Wednesday January 25 2006, @03:53PM (#14561710)
    (http://suen.ed.psu.edu/~bkemp/ | Last Journal: Thursday January 26 2006, @10:46AM)
    From TFA, the classes page:

    The rest of the top 20 classes are either presentational or otherwise meaningless (msonormal, for example, which is one of the classes that Microsoft Office uses in its "HTML" output).
  • by Tackhead (54550) on Wednesday January 25 2006, @03:55PM (#14561723)
    > As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata.

    "Unfortunately, it was also of significant interest to the DOJ, who wanted to know how many times the word 'boobs' appeared in the first 50 characters after the string "IMG SRC". Because we didn't actually look for this data, and because the DOJ folks didn't believe us when we told them so, we're now enjoying a taxpayer-funded vacation in sunny Cuba."

    > We decided that to publish this would be of significant utility to developers

    whom we would encourage to send lawyers, guns and money; the blink tag now encloses the rotating ad banner.

  • Some of these results... (Score:4, Insightful)

    by Dracos (107777) on Wednesday January 25 2006, @03:55PM (#14561727)
    (http://www.fylo.net/)

    Prove that most people (and WYSIWYGs) don't know how to produce valid and accessible markup. The img alt attibute (an accessibility requirement) was found significantly less than width, height, and border.

    I'm working on a site now where the project owner is continually reducing usability and accessibilty of the entire site (Never mind that he secretly had a third party come up with an ugly design and ambushed the dev team with it).

    I keep telling everyone to deconstruct the adage "form follows function". It means function comes first. He doesn't care what anything *is* or how it *works*, only what it looks like. And, of course, that it's ugly.

  • SVG, uh. (Score:2)

    by Janek Kozicki (722688) on Wednesday January 25 2006, @03:56PM (#14561737)
    (Last Journal: Tuesday May 10 2005, @03:47PM)
    so I'm using debian sarge, and oh well - flame about dozens of other distros, but currently I'm too lazy[1] to update to etch, or anything else. And in sarge there is firefox 1.0.4 without SVG. Anyone knows some backported debs for sarge that will provide SVG support?

    [1] everything is about priorites, I spend some time reading /. but in fact I have some work to do, and this work is not switching linux distros around.
    • Re:SVG, uh. by Janek Kozicki (Score:1) Wednesday January 25 2006, @04:07PM
  • Ad for anti-IE (Score:5, Insightful)

    by jamienk (62492) on Wednesday January 25 2006, @04:01PM (#14561779)
    It looks like a subtle push against IE: many mantions of the HTML 5 spec (which is being written by WHAT a workgroup that includes many browser companies but not MS); use of SVG; written by a major FF developer.

    Way to go Google! Pour on the pressure!
    • Re:Ad for anti-IE (Score:4, Informative)

      by Bogtha (906264) on Wednesday January 25 2006, @04:48PM (#14562251)

      written by a major FF developer

      I don't believe Ian Hickson has been involved with Firefox; if I remember correctly, he used to hack on Mozilla, but then started work at Opera before Firefox took off.

      I don't think it's a jab at Internet Explorer, it's just that he knows that the target audience is likely to have a decent browser, so he's used the features likely to be available.

      [ Parent ]
  • Beford's Law (Score:2)

    by SIGFPE (97527) on Wednesday January 25 2006, @04:01PM (#14561782)
    (http://www.cygwin.co...999-06/msg00074.html)
    I'm curious to see how closely Benford's Law [wikipedia.org] is followed by these pages. It should be easy for Google to run the stats.
    • Re:Beford's Law (Score:4, Interesting)

      by EvanED (569694) <evaned AT gmail DOT com> on Wednesday January 25 2006, @04:38PM (#14562140)
      I had an interesting run-in with Benford's law a bit ago. I had this typed up already, so here goes (description of the law omitted; read the Wikipedia link in the parent -- it's really cool):

      You see, my hard drive crashed about two weeks ago. It had three partitions on it, and two of them are still perfectly readable. The third is pretty well shot. (Fortunately, it was the most useless partition; it's main contents was Windows itself. This does mean ANOTHER Windows installation -- after having to do one a few weeks before -- but really that's no biggie compared with my actual data. And while I'm on that subject, I had two hard drives; when I got the newer one, I put all my work stuff on it as well as a new Linux installation specifically because it was less likely to fail, and I look back at that decision now with great happiness, because it is that foresight that has made this no big deal at all.)

      I've been trying to recover data off of the third partition, and it seems that if you do a full scan of the partition it appears as if the data was just deleted. Most of the time it's able to recover information, but not always: folder names are often lost. They show up in the recovery programs I tried as just Folder2393 for example. (Numbers ranged from 2 to 5 digits.)

      The folder numbers approximately follow Benford's law.

      Here is the approximate distribution:
      (M. S. Digit) (% of folders) (Ideal Benford %)
      1 32 30.1
      2 15 17.6
      3 12 12.5
      4 12 9.7
      5 19 7.9
      6 03 6.7
      7 03 5.8
      8 02 5.1
      9 02 4.6
      [ Parent ]
    • Markov Chains by ImaLamer (Score:2) Wednesday January 25 2006, @05:53PM
  • Good God in Heaven (Score:1)

    by Run4yourlives (716310) on Wednesday January 25 2006, @04:03PM (#14561796)
    Some choice tidbits FTA:

    For example, looking at what HTML ids and classes are most common, and at how many sites validate (and yes, we know that we're not leading the way in terms of validation).

    There are more elements (from Microsoft Office) on the Web than there are elements.

    If someone can explain why so many pages would use a tag and then not put any cells in it, please let us know.

    Web "professionals" (and I am one of that group) have got a long, long, long way to go before we're actually taken seriously, it seems, as coders.
  • With all of this talk of the justice department requesting records from Google.

    Why could they not just use this method to get their data?
  • by MonkeyBoyo (630427) on Wednesday January 25 2006, @04:09PM (#14561850)
    One thing that screws up web page studies is that some sites duplicate pages hundreds or thousands of times.

    Oliver Steele did a cute study on how to spell aargh. [osteele.com]

    Unfortunately much of his data is screwed up because he counted pages for each spelling not unique pages.

    For this study, I don't see this problem ocurring.
  • by Baldrson (78598) * on Wednesday January 25 2006, @04:10PM (#14561857)
    (http://www.geocities.com/jim_bowery | Last Journal: Tuesday September 19 2006, @10:20PM)
    A lot of work has been done on the power laws of (possibly misnamed) "scale free" networks. The simplest is the law that says the frequency of a symbol is inversely proportional to its rank of its frequency. In other words, the most frequently occuring entity is twice the second and three times the third... most frequently referenced symbols.

    The most work on this, in the case of the WWW is the frequency with which pages are hyperlinked. A lot of work has been done on hyperlinking without access to the exhaustive database used by Google. I know that Google's business model started with rank ordering pages on their results by how often they were href'ed elsewhere so the data is there obviously and it wouldn't be a serious imposition on their proprietary information to publish analysis of the href power law.

  • is NOT more popular than (Score:1, Insightful)

    by Anonymous Coward on Wednesday January 25 2006, @04:14PM (#14561891)
    Whilst <title> may appear on more distinct pages, <br> surely is used more frequently in the aggregate; that is, the multiplicity of occurrences of <br> on many pages far exceeds the single(?) occurrence of <title> on most pages.
  • Opera also supports SVG (Score:5, Informative)

    by TheJavaGuy (725547) on Wednesday January 25 2006, @04:15PM (#14561901)
    (http://operawatch.com/)
    FYI, Opera also supports SVG. I'm surprised that Ian Hickson didn't have Opera also mentioned on that Google page, after all he worked at Opera until a few months ago.
  • TITLE vs. BR (Score:2)

    by HTH NE1 (675604) on Wednesday January 25 2006, @04:20PM (#14561936)
    For instance one thing that surprised me was that the <title> is more popular than <br>

    I'm not surprised. The TITLE container is required for every HTML page to be considered valid across all versions and is the most important text on the page, used by search engines to link to the page. Though browsers will accept pages without it, you'd be a damn fool not to use it.

    BR is optional and generally unnecessary when P handles your general hard line breaking needs. Even with TITLE being once, only once, and no less than once per page while there can be several BR tags on a page, BR is generally omissable. I'd expect overuse of BR to be more common on blogs that don't bother to detect paragraphs.

    Now if it were TITLE vs. TR there'd be no contest.
  • Heh (Score:4, Interesting)

    by Z0mb1eman (629653) on Wednesday January 25 2006, @04:20PM (#14561944)
    (http://www.clutterme.com/)
    This reminds me of the old joke that there only ever was one 'make' script, and everyone else modified it.

    I wonder how much of what they found is influenced by how people learned to write HTML - which in all likelihood was to copy code from existing pages... might explain parts of what they found, such as:

    Most people (roughly 98%) include head, html, title and body elements. This is somewhat ironic, since three of those four elements are optional in HTML
    • Re:Heh by Blink Tag (Score:2) Wednesday January 25 2006, @04:56PM
      • Re:Heh by Kunta Kinte (Score:2) Wednesday January 25 2006, @05:54PM
      • 1 reply beneath your current threshold.
    • Re:Heh by Bogtha (Score:2) Wednesday January 25 2006, @05:31PM
      • Re:Heh by hixie (Score:1) Wednesday January 25 2006, @09:34PM
    • Re:Heh by icepick72 (Score:2) Sunday January 29 2006, @12:47PM
  • Font still popular (Score:3, Interesting)

    by superflippy (442879) on Wednesday January 25 2006, @04:22PM (#14561961)
    (http://www.superflippy.net/ | Last Journal: Monday October 29, @09:54AM)
    In their list of the 19 most popular elements, the font tag was #16. This element was deprecated when, back in 2000 or so?

    Of course, there may have been a lot of old pages in the sample, or pages built with older versions of HTML. But I've seen first-hand people using font tags to make an error message red, for example, even in a page that's using XHTML 1.0. I try to explain to the developers I work with why they shouldn't use them. I remove the font tags when those same developers add them to pages I've laid out for them. Zombie-like, they refuse to die.
  • table with no (Score:5, Informative)

    by saigon_from_europe (741782) on Wednesday January 25 2006, @04:32PM (#14562070)
    From the article:
    If someone can explain why so many pages would use a
    <table>
    tag and then not put any cells in it, please let us know.
    I don't know if they counted dynamic pages, but I guess they did. In dynamic pages, an empty table is quite normal.

    Your code usually goes like this:
    <table>
    <% for each element in collection %>
    <tr><td> something </td></tr>
    <% end for %>
    </table>

    So it is quite easy to get the empty table if the collection is empty.
  • Button class (Score:1)

    by Sky Cry (872584) on Wednesday January 25 2006, @04:38PM (#14562142)
    The button class baffles us. We can't really tell what what it is used for. Similarly, the link class, which is apparently very popular, seems strange. Why would authors label something with that class?

    Button class is usually used when people want some links (<a href>) look like a button. (Light top and left borders, dark bottom and right borders, different background, inverted on hover, etc.)
  • GoLive (Score:1)

    by gmerideth (107286) <gmeridethNO@SPAMuclnj.com> on Wednesday January 25 2006, @04:41PM (#14562171)
    (http://www.uclnj.com/)
    GoLive's footprints are all over the Web. A scary number of pages use , not to mention the multitude of , , and elements.


    Didn't need a billion page analysis to point out that horrible fact.
  • What about plugins? (Score:3, Insightful)

    by AndrewStephens (815287) on Wednesday January 25 2006, @04:43PM (#14562190)
    (http://sandfly.net.nz/)
    I would be interested in seeing how many web pages use Java applets, Flash, Shockwave, Quicktime, ActiveX controls etc, etc. Sadly the authors did not include this information.
  • Script attributes (Score:2)

    by Stan Vassilev (939229) on Wednesday January 25 2006, @05:07PM (#14562444)
    Among the top 15 attributes used in the [script] tag are the following:

    "langauge"
    "langugage"
    "languaje"

    Link to that page in the stats:
    http://code.google.com/webstats/2005-12/scripting. html [google.com]

    I just have no comment to this.
  • Poor style by Google (Score:2, Redundant)

    by Jugalator (259273) on Wednesday January 25 2006, @05:20PM (#14562562)
    (Last Journal: Monday February 13 2006, @07:11PM)
    Web developers shouldn't aim for writing for one browser, but as many as possible.

    They're doing the exact opposite of what they should be doing.

    They're doing what led us into this shitty IE situation in the first place; targetting specific browsers instead of the public.

    Can anyone tell me what's here that can't be visualized with GIF's?

    Even if it'd mean less features for the user, they should at least graciously fall back to a more basic technology than SVG's.

    How do these pages look on IE, Opera, Safari, or Konqueror under default configurations?

    If this is what Google sometimes wish to do, design pages to push a specific browser, they're no better than Microsoft.
  • by xxxJonBoyxxx (565205) on Wednesday January 25 2006, @05:25PM (#14562600)
    The author's never worked with proxy servers, has he?

    I laughed when I read this... "The \ "attribute" is almost certainly the result of people writing markup like (br\) when intending to do (br). Of course, neither is particularly useful to browsers when the page is sent as text/html (as all these pages were)."

    (OK, for those who don't get it, one reason that so much content is sent with an "incorrect" text/html header is that many proxy servers will dump content on the floor unless it has a text/html header.)

    • 1 reply beneath your current threshold.
  • Questionable value (Score:1)

    by ergowa (854000) <shinobi@speakeasy . n et> on Wednesday January 25 2006, @05:26PM (#14562608)
    (http://www.eligiusstudio.com/)
    Between the questionable conclusions and the sometimes poor quality of the writing (not to mention, are there graphs and charts? I didn't see any.), I wonder about the usefulness of such an analysis.

    Take, for example, the commentary on the element. Abuse is in the eye of the beholder. A number of pages don't follow standards or use deprecated elements. In some cases, that's not entirely the fault of the authors. If I'm developing a corporate site that demands backwards compatibility to Netscape 4.x or an ancient version of IE, I'm certainly not going to jump through all sorts of hoops with layered CSS hacks when I can just use a deprecated element.

    And which specifications are we talking about? If I include those elements and validate my document, which elements will fail? At present six by my count (and not five) of those attributes in are deprecated by the W3C for HTML 4.0.

    Regarding the use of classes, I wonder how much HTML coding the authors do. I have had countless opportunities to style an element using a "copyright" class (rather than something like "small"). In some ways, it's a better practice since it describes the element rather than the style being applied to that element. It's still not ideal, but in the real world, I can remember that this element, like footer, appears in a certain place on the page and style it accordingly. Using a element is not a substitute; it's not meta-data, it's a display element the user sees.

    Similarly, "The button class baffles us. We can't really tell what what it is used for. Similarly, the link class, which is apparently very popular, seems strange. Why would authors label something with that class?" How about I have a submit button and a link side-by-side and I want them to look the same (that is, both appear as buttons)? If it makes sense from a user experience standpoint, then I'll use it. I can certainly see using a link class to style certain links on a page (say in a left navigation or the body) different from others. It's sloppy, but it gets the job done and, even though I'd avoid it whenever possible, I'm not going to slam someone else for doing so.

    And on it goes, "onmouseover on a elements is a little worrying; presumably those are mostly cases of the status bar being overridden". How about image rollovers for navigation? Empirically, I've seen fifty sites with image rollovers for every site that changes the status line. The authors then state (in the next section) that the relative few uses on the element is the assumption that few people are using rollovers. Since they typically are applied to the anchor, this is an erroneous assumption. Of course, why bother with scripting events when you can use CSS (apart from pesky backwards compatibility)?

    In general, the tone of the article seems to be that many people should not be allowed on the web because they can't follow standards (and are illiterate, in many cases). Nothing is said about browser being inconsistent in following standards, nor about how many of those pages are legacy pages from who knows when. The general attitude seems to be that HTML is as rigorous as a programming language. If that last were the case, browsers would only display pages that conformed to the 4.01 Strict standard or maybe the XHTML 1.0 Strict DTD. I mean, if you really want to slam users for not caring what the standards say, see how many of those documents are properly formed according to the XHTML standards. I don't even have to do an "analysis" to know the number would be very much on the low side.

  • Window-Target (Score:2)

    by d-e-w (173678) on Wednesday January 25 2006, @05:40PM (#14562709)
    There are pages that use the Window-Target header, and even some that use the Link header (though we haven't yet checked what for!). There are even some pages that include the Content-Style-Type header.

    Wasn't creating a Window-Target HTTP header a trick for always breaking out of other people's frames (if someone links to your site and framed your site content within their own). I thought it was more reliable (back in 1999/2000) than the various JS tricks for breaking out of frames.

  • by Ilgaz (86384) on Wednesday January 25 2006, @05:43PM (#14562733)
    (http://www.noooxml.org/petition)
    http://www.adobe.com/svg/viewer/install/main.html [adobe.com] got suitable plugins for browsers/OS of choice.

    Notice that I got SVG plugin installed for ages, Safari didn't display the graphs. Is it because I am not using "a browser with CSS"? Well, nevermind really...

    This is the thing why I and others have negative views against firefox, svg and even .ogg. Rootless promotion of this kind...

  • Wisdom (Score:3, Interesting)

    by AeroIllini (726211) <aeroilliniNO@SPAMgmail.com> on Wednesday January 25 2006, @05:44PM (#14562737)
    They've really hit on some wisdom here.

    There are several statistics they quoted which I have suspected for a long time, but only now can confirm with numbers.

    more than half of pages use the target attribute on the a element somewhere.


    I can't begin to describe the frustration I feel when I'm forced to use Internet Explorer and clicking links causes pages to fire up in a million new windows. Whether or not a link opens in a new window, a new tab, or the current window/tab really should be a client-side choice. Webmasters think they're being helpful by letting you separate your workspace into many windows, but they're really just slowing people down. Thank God for Firefox.

    It seems most pages use presentational attributes: the fourth most used attribute across all elements is the table element's border attribute, followed by the height and width attributes on img, followed by <table width="">, <table cellspacing="">, <img border="">, and <table cellpadding="">. Interestingly, though, the most frequently used attribute on the body element (namely bgcolor) is only used on around half of pages, with all the other presentational attributes on body being used even less. One possible explanation is that on average, colors are mostly done using CSS, while layout is mostly done using HTML tables.


    This makes perfect sense. While colors, fonts and styles are pretty much standard in a cross-browser environment, due to many various interpretations of the CSS Box Model, coding layout purely in CSS can be a terrible chore. It's usually much quicker to do a few simply layouts in tables (header, sidebar, content) and use CSS for pretty much everything else.
    • Re:Wisdom by ergowa (Score:1) Wednesday January 25 2006, @06:13PM
  • You'll probably need Opera, and it's Zoom feature, to be able to actually READ anything on those charts. The headers are microscopic, and the charts themselves not much bigger.
  • by goynang (680067) on Wednesday January 25 2006, @05:59PM (#14562847)
    TITLE is more popular than BR as it's used to create the title of the page - the bit that appears in the browser's title bar. Just about every HTML document will have it. BR is just for line breaks and not necessarily needed (or even ideal in the days of CSS).

    So no real surprise that it is more popular really.
  • Set-Cookie2 insecure? (Score:3, Interesting)

    by tedhiltonhead (654502) on Wednesday January 25 2006, @07:03PM (#14563304)
    The linked site claims the Set-Cookie header is "considered insecure":
    The Set-Cookie header (which is one of the ten most-used headers) is present on about two orders of magnitude more pages than the Set-Cookie2 header (despite the former being considered insecure).
    After glancing over the RFC [ietf.org] for Set-Cookie2, I can't see where it says Set-Cookie is "insecure". Google turns up nothing useful. Does anybody know more about this?
  • some handy titbits (Score:2)

    by pbhj (607776) on Wednesday January 25 2006, @07:49PM (#14563648)
    (http://alicious.com/ | Last Journal: Wednesday October 31, @07:36PM)
    This review is quite interesting (from a web dev's POV).

    There are also some handy little bits of info: Lists of most used attributes and tags could give an indication as to which tags Google will use and which will just be thrown out.

    Statements like: "More pages use the completely worthless <meta> name="revisit-after"> than use the <em> element!"; appear to be dropped in on purpose as hints for less experienced devs. Similarly "Next we have two name values: keywords, which these days is mostly useless" on http://code.google.com/webstats/2005-12/metadata.h tml [google.com] suggests that I can stop worrying that perhaps Google finds even a smidge of value in this data.

    Then there's bits like "One area of future study would be to see what these attributes are used for: is onunload used mostly by Web applications for legitimate purposes, or is it used more by hostile sites to show pop-unders?" which suggest that if you're using onunload legitimately your pagerank is about to take a nose dive!!?

    I'd not come across pingback and "link rev" before.

    Thanks for all the fish.
  • Fix for Firefox 1.5 (Score:3, Informative)

    by bigbadbuccidaddy (160676) on Wednesday January 25 2006, @08:10PM (#14563792)
    If your Firefox 1.5 doesn't display the graphs, or crashes, do the following as suggested by the Google webstats author:

    Apparently there's a problem in Firefox 1.5 regarding SVG images if you
    had SVG in the registry. Try following the steps described here:

          https://bugzilla.mozilla.org/show_bug.cgi?id=30358 1#c3 [mozilla.org]

  • I'm feeling violated (Score:3, Insightful)

    by Sontas (6747) on Friday January 27 2006, @01:23AM (#14576587)
    1 billion pages! Talk about a violation of privacy! The justice department is only asking for a random sample of 1 million addresses and the search results for any 1 week period. This guy gets access to 1 billion pages via the google repository (whatever that is), conducts detailed analysis of the contents of those pages, and nary a word of dissent from the vast Slashdot audience.

  • Re:Strangely... (Score:1)

    by onedotzero (926558) on Wednesday January 25 2006, @04:08PM (#14561845)
    (http://www.thedigitalfeed.co.uk/)
    They showed up fine for me. I had to upgrade (installed version was 1.07) but they certainly loaded.
    [ Parent ]
  • Re:Strangely... (Score:1)

    by Maskull (636191) on Wednesday January 25 2006, @04:19PM (#14561927)
    (http://twicetwo.com/)
    Same here. A few show up, but most are blank. Suggestions, anyone?
    [ Parent ]
  • Re:Strangely... (Score:1)

    by jimwelch (309748) <.moc.liamg. .ta. .kohclewmij.> on Wednesday January 25 2006, @04:36PM (#14562117)
    (http://slashdot.org/ | Last Journal: Monday July 11 2005, @11:30AM)
    Working fine here (WinDoze version).
    [ Parent ]
  • Re:Firefox 1.5 (Score:1)

    by bigbadbuccidaddy (160676) on Wednesday January 25 2006, @04:40PM (#14562166)
    And IE6 + ASV6 (http://www.adobe.com/svg/viewer/install/beta.html [adobe.com]) doesn't work either. All the graphs are blank, and if I go directly to svg by url, I get a big black rectangle.

    I vote this as the worst use of svg on the internet.
    [ Parent ]
    • Re:Firefox 1.5 by LWATCDR (Score:2) Wednesday January 25 2006, @04:56PM
  • Re:Strangely... (Score:2)

    by Billosaur (927319) * <wgrotherNO@SPAMoptonline.net> on Wednesday January 25 2006, @04:52PM (#14562292)
    (Last Journal: Tuesday November 13, @10:52AM)
    (Score:2, Troll)

    Talk about knee-jerk moderation...

    [ Parent ]
    • Re:Strangely... by bigbadbuccidaddy (Score:1) Wednesday January 25 2006, @07:50PM
    • 1 reply beneath your current threshold.
  • Re:Firefox 1.5 (Score:1)

    by bigbadbuccidaddy (160676) on Wednesday January 25 2006, @05:05PM (#14562427)
    The latest Opera shows the graphs as black rectangles as well.

    As does the Batik squiggle project.

    The only way I've sucessfully seen a graph is to view the source in IE, manually build the link to the svg, and go directly to the svg in the Firefox browser.
    [ Parent ]
    • Re:Firefox 1.5 by Kelson (Score:2) Wednesday January 25 2006, @06:07PM
  • Re:Firefox 1.5 (Score:2, Interesting)

    by bigbadbuccidaddy (160676) on Wednesday January 25 2006, @05:19PM (#14562552)
    The black box is caused by them not using type="text/css" on the ?xml-stylesheet declaration. type is a required attribute. If I add that it renders properly on all the svg viewers I tried.
    [ Parent ]
  • Re:Dumb (Score:5, Insightful)

    by Spad (470073) <slashdot.spad@co@uk> on Wednesday January 25 2006, @05:24PM (#14562594)
    (http://www.spad.co.uk/)
    It's even dumber to state that someone is presenting pictures with Flash when they're actually using SVG.
    [ Parent ]
  • Re:Worst use of SVG ever (Score:3, Funny)

    by jamesots (214246) on Wednesday January 25 2006, @05:45PM (#14562744)
    (http://jamesots.com/)
    Yeah, and what's the point of using HTML? They could have posted an image of the text to the same effect.
    [ Parent ]
  • Re:Firefox 1.5 (Score:2)

    by Kelson (129150) * on Wednesday January 25 2006, @05:51PM (#14562792)
    (http://www.hyperborea.org/journal/ | Last Journal: Tuesday September 11, @05:30PM)
    Works for me. Firefox 1.5 and Opera 9 preview both display the graphs.
    [ Parent ]
    • Re:Firefox 1.5 by bigbadbuccidaddy (Score:1) Wednesday January 25 2006, @07:08PM
      • Re:Firefox 1.5 by Kelson (Score:2) Wednesday January 25 2006, @07:15PM
        • Re:Firefox 1.5 by bigbadbuccidaddy (Score:1) Wednesday January 25 2006, @07:53PM
      • 1 reply beneath your current threshold.
  • by hixie (116369) <ian@hixie.ch> on Wednesday January 25 2006, @09:08PM (#14564207)
    (http://ln.hixie.ch/)
    lol.
    [ Parent ]
  • by hixie (116369) <ian@hixie.ch> on Wednesday January 25 2006, @09:44PM (#14564439)
    (http://ln.hixie.ch/)
    It did. It's third on the "name" chart, fourth on the combined chart. Or did I misunderstand your question?
    [ Parent ]
  • 10 replies beneath your current threshold.