Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Databases Programming Software IT

A New Data Model for the Web 54

An anonymous reader writes "Adam Bosworth delivered what could be considered a seminal lecture (mp3) at the last MySQL conference about a new data model for the web, why the plain HTML web succeeded, and why XQuery or the Semantic web are failures. He is emphatic that RSS 2.0/Atom are the next big thing and represent the new data model for the web. The audio is rather long at forty plus minutes and there are a few places where the talk has been covered."
This discussion has been archived. No new comments can be posted.

A New Data Model for the Web

Comments Filter:
  • *sigh* (Score:3, Interesting)

    by Anonymous Coward on Wednesday July 27, 2005 @05:20AM (#13174563)
    Do we take two steps back every week in this industry or what? RSS is a text file format. It's not a "data model".

    What are the operators for manipulating this data? What is the type system? How is integrity guaranteed? How do I build a distributed database system with it?

    There is only one complete data model: the relational model. Demonstrate to me how this "new" data model is not either 1) some subset of the relational model or 2) a bunch of nonsense, not a data model at all.

    He's got one thing right: XQuery (return to the hierarchic databases of yesterday) and RDF (return to the network model, but with a fixed 3-value schema) are nothing to waste your time on.

    To me his assertions are like saying, for example, the fundamental theorems of electromagnetism no longer apply to cell phones because they can now play MP3s, or something. Makes no sense.

    Unfortunately, there is nobody left in this industry that has any clue about databases.
    • Do we take two steps back every week in this industry or what? RSS is a text file format. It's not a "data model".

      I think "data model" refers to distribution and the scalability of distribution in this case.

    • an aggregation model (Score:4, Interesting)

      by ear1grey ( 697747 ) on Wednesday July 27, 2005 @07:27AM (#13174909) Homepage

      The slashdot story mis-sells the content of the speech. For me [boakes.org] it was just AB talking about how it would be useful to have a simple system of aggregation that goes beyond subscribing to an RSS feed.

      It's not a new data model & the semantic has not failed, in fact, it's more important when considering how to work with the diverse resulting data.

    • Re:*sigh* (Score:3, Interesting)

      by malachid69 ( 306291 )

      There is only one complete data model: the relational model.

      I have an issue with this statement. It could be because I *hate* SQL, but, let's see what other's think... According to http://www.google.com/search?hl=en&lr=lang_en&c2co ff=1&oi=defmore&q=define:data+model [google.com] there are a few definitions for data model... among them are:

      • A data model is a collection of descriptions of data structures and their contained fields, together with the operations or functions that manipulate them.
      • A data
      • by Anonymous Coward
        SQL was/is a industry attempt at a language for relational databases. Codd (father of relational database theory) criticized weaknesses in the SQL implementation. Do not make the mistake of thinking "SQL = Relational".

        The language Tutorial-D in the article you refer to [techworld.com] is yet another language for relational databases! Darwen and Date are critics of SQL implementations; they are NOT critics of the relational database as you imply. They are instead the strongest relational database proponents.

        Indeed the rel

        • The language Tutorial-D in the article you refer to is yet another language for relational databases! Darwen and Date are critics of SQL implementations; they are NOT critics of the relational database as you imply. They are instead the strongest relational database proponents.

          Yes, I understand that Tutorial-D is an attempt to make a more-correct implementation. I was mostly referring to these two comments from said article:

          Relational databases as we know them today are, however, far from optimal - at

      • Re:*sigh* (Score:1, Informative)

        by Anonymous Coward
        SQL has very little to do with the relational *model*. Put it out of your mind completely when discussing database theory. I'm not talking about database *products*.

        The relational model specifies a set of relational operators, much like mathematical operators. SQL looks nothing like this, just to get the value of a single relvar you have to type "SELECT * FROM Foo" instead of the simpler and clearer "Foo". That's just the tip of the iceberg..

        Anyway there are at least two definitions of "data model". One is
        • XML is a text file format.

          While true, that argument really has little merit. That is like saying MDB is a binary format. So what.

          Sometimes "customer" is the root of the tree, but sometimes you want "order item".

          Yes, and anyone who has worked with XML or object-oriented databases will show you that is very simple to do. You have a Customer schema, and an Order schema. Big deal. You do the same thing with relational tables.

          Sometimes you want many-to-many relationships

          When you move away from the idea t

      • Some would even suggest that relational databases are NOT a good or optimal solution.

        Actually, the point behind Tutorial D, Rel, etc is that the current *implementation* of relational databases are broken ... from a mathematical view point. The relational concepts have been deviated from and that is the downfall of SQL as an implementation of relational databases.

        Of course, SQL survives (although broken and inconsistent) because it works today and it puts bread on the table... as opposed to a theoretical se
    • Strictly in the buzzword sense, RSS is becoming the new XML, in that people make it out to be a much bigger deal than it really is. Right around 2000-2001, business people were raving about XML being the "dawn of a new era", where technical people were thinking "it's just a damn text file with some markup". RSS is much the same, non-technical types think it's an entire new technology instead of what it is: a new way of doing essentially the same old stuff.
    • Yeah, I came away with a similar impression: Bosworth, or at least those touting his speech, are ignoring the truely significant differences between web searching and DB queries in these areas
      • performance...he talks about scaling as if DB size and number of users were the only dimensions of scaling issues...time is an important dimension as well, espcially when you speak of concurrent update attempts.
      • databases support writing, not just searching, records and that entails levels of privelege beyond what the
  • Not really (Score:5, Interesting)

    by Linus Torvaalds ( 876626 ) on Wednesday July 27, 2005 @05:20AM (#13174565)

    He is emphatic that RSS 2.0/Atom are the next big thing and represent the new data model for the web.

    Here's the thing: RSS 2.0 and Atom really don't have a revolutionary data model. They are just file formats that list short descriptions, in a sequential order, with a bit of meta data, that get polled on a regular interval. That's all.

    They are only popular because the use pattern is different to normal web pages. The tech itself is pretty mundane. Internet Explorer 4.0 has something similar with "channels", way back in the 90s.

    You could have done the same thing with a subset of HTML 2.0 in the 90s. The main reasons people didn't is because they didn't think of it and the need wasn't as great.

    The Semantic Web, on the other hand, is doing new stuff. Some of it we don't know how to do yet. Some of it is immediately practical, some of it isn't. The Semantic Web is more of an idea than a tangible product.

    By saying that RSS and Atom somehow "beat" the Semantic Web, he's comparing apples to oranges. It just doesn't make sense.

    The reason the web took off so well was because it was built from a few simple principles that could be generalised. Resources that could be addressed. Simple, text-based markup. Simple, text-based protocol.

    The Semantic Web will probably take off in the same way, with various bits already being used to varying degrees of success (e.g. Mozilla already uses RDF). But it's a much bigger problem, so expecting it to take off just as quickly is naive.

    • Re:Not really (Score:4, Informative)

      by MoonFog ( 586818 ) on Wednesday July 27, 2005 @05:44AM (#13174628)
      Semantic web is perhaps best described as a framework. I totally agree that it's a pointless comparison. RDF/RDFS and OWL build upon XML so it would make more sense to say that RSS could be a building block for further extensions on the semantic web using for example OWL to represent data.

      One of the reasons it appears to move along so slowly now is that the research is handling a lot of issues and as van Harmelen has said, they're afraid to enter the same pitfalls as the research in artificial intelligence where there has been a lot of buzz, but not many concrete results. That's not to say that there aren't any issues with the semantic web, but it's still coming along. OWL is being extended with OWL-S and OWL-QL and the issues of security and privacy are being looked at. Besides, even though ontologies are a new development on the web, they are nothing new overall, something I guess AI researches can testify to.

      Recommended book for those who want to extend your knowledge on SW A Semantic Web Primer [amazon.com]
      • In any case there is a lot of work going on in creating OWL models for Atom [bblfish.net]. The two are not at all incompatible - and with a little more work they could have been nearly indistinguishable.
        Another really good book that covers all the bases is Service Oriented Computing [bblfish.net] which gives a very good view as to how the Semantic Web, Agents, Web Services and RESTful apis fit together. This is a really serious book, but it helps get an understanding of the problems that are attempting to be solved.
      • Therein lies the problem: the SW is a framework, which means what, exactly? Sounds good; nerds love it. What does it do for me?

        RSS/Atom is a product. I can see immediately that it is, or is not for me. The SW is just ideas. Good ideas, but nothing in the sack.

        • Therein lies the problem: the SW is a framework, which means what, exactly? Sounds good; nerds love it. What does it do for me?

          Currently? Not much, it's still fairly new and being developed all the time. XML is an integral part of the semantic web and you use that no?

          RSS/Atom is a product. I can see immediately that it is, or is not for me. The SW is just ideas. Good ideas, but nothing in the sack.

          Unfortunately, it's still in the starting blocks, but the plans have always been to take it step by ste
          • XML has nothing to do with the semantic web, except as one possible implementation. The semantic web is the idea of putting machine parsable meta-data into documents, so that machines can parse it out and understand what type of content is in the data. XML is a way of doing this. But calling it an integral part of the SW is like calling HTTP an integral part of it. Its a technology it can use, but is just one way of implementing it.

            As for using XML- actually, I don't think I've looked at any XML in wel
    • Nevertheless, for the masses in their office cubicles RSS feeds are the next "big thing".

      Try telling the masses that the next big thing is a new data model for the web, based on semantics, and 99% of them will ask you what "semantic" means, never mind the intangible data model that is the real underlying improvement.

      Show them a little program that sits on their desktop and feeds them the latest from CNN, the BBC etc and they understand that.

      Web development and IT in general is running a real risk of falling
      • "Try telling the masses that the next big thing is a new data model for the web, based on semantics, and 99% of them will ask you what "semantic" means, never mind the intangible data model that is the real underlying improvement"

        Actually, it's a quite logical question to ask. Research projects without any discernable end or application are often indistinguishable from Bullshit.
        • Especially for non-specialists in a technical area , I'm not saying esoteric research rarely has value, in our field it's usually the most valuable.
          Now the IT field is "mature"(-ish), it's getting a lot more exposure. As developers we need to be presenting simple (not trivial) but elegant little demos to people built on top of whatever the latest great "model" is and then asking for money and contributions.
          A lot of otherwise promising projects are doing it the other way round...
          "We/I've got this great frame
  • by Felonius Thunk ( 168604 ) on Wednesday July 27, 2005 @05:38AM (#13174612) Journal
    I'm downloading the speech now, but if it's anything like this great speech [adambosworth.net] he gave last year, it will be well worth listening to. That one changed my mind about what great things might look like. I've realized the great and wonderful content management system that my group is building is utterly doomed, for example, and I already have a new job in hand. It's all about the sloppiness.
  • Heh heh... (Score:1, Funny)

    by Anonymous Coward
    Heh heh heh... He said 'seminal'...
  • by Vo0k ( 760020 ) on Wednesday July 27, 2005 @06:56AM (#13174814) Journal
    There's way more to successful formats than the structure. But let me name two essentials.
    What use is a format of data if the data itself is useless?
    How can a format take off when only few have access to publishing in it?
    That's the way Gopher went. Only admins could add pages. Meantime, most of people with access to the net, were able to create their own ~/public_html
    Now RSS is the big thing. People add RSS to everything. Where are MSIE's "channels"? Spamvertisment available to the chosen few. Revolutionary video tape technologies competetive to VHS: None in shops, few movies available. And so on, and so on...
  • by astrashe ( 7452 ) on Wednesday July 27, 2005 @07:36AM (#13174929) Journal
    This is a great talk, and I really enjoyed it, but I'm not sure I buy it.

    I haven't really digested the talk, so maybe that's why. But this is my gut reaction against what he's saying.

    I don't think that geeks fully acknowledge the role of what I think of as bibliography in the web ecosystem.

    I was an English major. Let's say that you want to learn about Faulkner. If you go to the card catalogue, and search for books about Faulkner, you get a lot of hits -- more books than you could ever read. It's essentially useless.

    What you really need is a bibliography -- something written by a Faulkner scholar who says "these are the really important and groundbreaking books about Faulkner." That's one of the cool things about Encyclopedia Brittanica -- at the end of their articles, they tend to give you a run down of some of the key books on the subject.

    So if you want to read a biography of George Washington, EB will let you find the right one. That's important, because there are so many biographies of George Washington out there.

    That's my key point. If you go to a university library and use the catalogue to do a mechanical search for books about George Washington, the results aren't very useful. But if you read the bibliography at the end of the Encyclopedia Brittanica article, it's extremely useful.

    I'm trying to draw a distinction between mechanical searches, on one hand, and selections based on human judgement on the other.

    Google is useful in larege part, I think, because page rank lets you find what are essentially good bibliography pages. You use a dumb mechanical search to put you in touch with people who know their subjects and who have good judgement (hopefully).

    The other day, for example, I was thinking about an old programming language called APL. I searched for it, and found a couple of pages that seemed to have collected just about everything APL -- anecdotes, personal histories, tutorials, implementations, pictures of the goofy APL keyboards, etc.

    The Google powered web is cool because it combines the mechanical and the bibliographic so well. Google gets me to the bibliography -- it pulls that needle out of the haystack. But it's the bibliography that lets me drill down.

    This is important. The really good stuff I read about APL didn't come directly from the actual google result page. There was a link in between -- the google result page took me to the APL bibliography page, and from there I was able to hit the meat of the matter.

    We've seen, over the past decade, an explosion in which mechanical searching can do. Because it's been getting so much better so quickly, it's dominating the way we think about how we find information. It's causing us to give bibliography -- the judgement of experts -- short shrift.

    But bibliography is absolutely key to the google ecosystem.

    My problem with attempts to impose more structure on data is that it always breaks things. It's beefing up mechanical searches, which are already very good, and it does it at the expense of bibliography.

    I buy the argument in this lecture more than the guy making it does. He complains about heavier structures, and how the complexity will prevent people from producing and consuming information. I think that almost any move away from what we have now will do the same thing. The more you structure information, the harder it is for people to provide bibliography.

    The point is that the ideal medium for bibliogrphy is free form -- one person saying, "this is what I think" to another.

    The genius of google is that page rank gives you a mechanical way to uncover the best bibliographies. The best ones tend to show up at the top of the results.

    In the old days, there was alta vista, and there was yahoo. Yahoo used human beings to categorize data manually. They'd put sunglasses next to the best sites in many categories -- flag something as a "cool site". Alta vista was pure mechanical searching, with no human judg
    • I wholeheartedly agree.

      Honestly, I don't know much about the Semantic Web, but I have my doubts. In addition to its mechanical nature, I suspect the Semantic Web may eventually be plagued by abuse: search engine optimization. HTML is presentable so even if it's being abused with SEO, the human can verify whether it's crap or not. Can the machine do that?

      That's where the search engine comes in. Something that sifts thru the available data and presents the tidbits that are ripe for picking.

      I predict

    • Don't you think that Google itself is functioning like a bibliography? The important pages, the ones most worth seeing, are likely to be the most linked-to, and so appear at the top of the list. The rating is done by every web site creator, and the collation by Google; doesn't that make PageRank effectively a bibliographic tool?

      • The important pages, the ones most worth seeing, are likely to be the most linked-to...
        In the days before comment and referral spam, that might have been true. It remains to be seen if rel="nofollow" (a semantic annotation of sorts) proves successful in re-invigorating the importance of the simple link.
  • Great (Score:4, Funny)

    by Mensa Babe ( 675349 ) on Wednesday July 27, 2005 @07:38AM (#13174935) Homepage Journal
    A new data model?

    Couldn't we please focus on implementing the old [wikipedia.org] data [wikipedia.org] model [wikipedia.org] correctly [google.com] first?
  • by tod_miller ( 792541 ) on Wednesday July 27, 2005 @07:58AM (#13174989) Journal
    or is it just me? I know it is hard to predict the way technology is going, the only reason HTML still is around is because it works, and was widely adopted, and nothing else gives any [real] benefits (for now).

    as far as I am concerned, however you split up content, style, updates, 'sitefiles' (my collective analogue for rss and related technologies) the fact is one coherent, styled document must be the end result.

    Too much is being read into content management and RSS. Yes RSS is cute, I use it to have a BBC and CNN link in my firefox, and I just one click to read articles, not go to the site.

    RSS and podcasting is the worst combination of not-new hype ever. Downloading a file through the web, wow new! :-)

    Seriously, pod casting should be renamed downloading audio.
    • or is it just me? I know it is hard to predict the way technology is going, the only reason HTML still is around is because it works, and was widely adopted, and nothing else gives any [real] benefits (for now).

      Most developers who used to build "rich client" apps generally agree that web-based GUI's are a pain in the ass and lacking decent cross-browser widgets such as editable data grids, collapsable outline/trees, combo boxes, and others. Most companies like web apps because they are much easier to depl
  • I didn't understand why he said that? I've always heard it was good to put all your logic into the DB.

    Would anyone care to explain that a little? And please dumb it down a lot, I'm not that smart in databases.
    • It's more a case of maintainability.

      If "doesn't scale" simply means "I need more proc/mem/disk" you can always throw more horsepower at the problem, but that shifts the solution to a question of how much money you have to spend on toys. That's not what I'm guessing he's referring to, though.

      Without listening to or reading the presentation, I assume he's talking about the standard n-tier development/deployment model. Keep your presentation layer, business logic layer and data layer separate so that you

      • I suspect it depends on what you mean by "business logic".

        Some argue that you shouldn't even put foreign key restraints in your database... the app can handle that for you and it'll make it faster.

        Others argue that it is key to maintain the integrity of your data. If this means putting lot's of logic in the database in the form of procs, views, triggers, etc... that's what you need to do. Better to normalize and have accurate data than to denormalize and have speed.

        It all depends on what your needs are.
      • If "doesn't scale" simply means "I need more proc/mem/disk" you can always throw more horsepower at the problem, but that shifts the solution to a question of how much money you have to spend on toys.

        Let's give a concrete example to this. Consider a hypothetical 2 tier system with a thin client talking to a centralized database where all the business logic is being handled. This system gets deployed on a top-of-the-line Sun Fire server running Oracle. The system is successful and its usage grows rapidly.

    • As I understand him, putting all your eggs in the database basket limits you to the speed of your database server.

      This is, of course, pure bunk because Google does exactly this and Google scales well. Difference is the money available to you, Web programmer, and Google, Web moneybags. It's bunkum, but very wise bunkum nonetheless, unless you have a billionaire uncle who signs documents without reading them.

      The workaround for the limits of your database is, Adam claims, to share your data in RSS/Atom fee

    • I think he meant that a centralized approach doesn't support the logic of *all users*. You can put *your* logic in your Flicker-web-images DB, but would you also merge in it all the latest scripts developed by your users on top of your web app?

      No, you publish an API and other people use that API to access your contents. The logic of *their* web applications is in their sites, not in your DB.
  • The average Slashdot story links to a 2-5 minute article, and most people don't even bother to read that before they post a comment. Since this story links to a 40-minute MP3 that no one will bother listening to, the comments page should be an interesting read...
  • Very skeptical (Score:2, Insightful)

    by Exaton ( 523551 )
    Sorry, I trust Sir Tim Berners-Lee more than I trust "Adam Bosworth".

    That guy can start by learning how to add some <br />'s in what he writes (go check out his blog -- horrendous !) before pretending to talk about Web fundamentals.
    • I'm sure if you asked Tim Berners-Lee about Web fundamentals he would tell you not to use <br /> as it is not semantic markup and to use paragraphs instead - like the blog in question does.

      What does understanding Web fundamentals have to do with someone using excessive paragraph lengths? That's bad writing rather than bad markup.
      • OK, score one for you :-) I never use line breaks apart from what PHP's nl2br() will generate, so I wasn't careful with my remark.

        Mea culpa, and all that ^_^;

  • by Anonymous Coward
    but with
    • no standardized means of replying/interacting,
    • no means of maintaining topicality,
    • no means of adding attachments,
    • no bonafide archive,
    • poor performance,
    • egos that want to control responses to their posts.

    NNTP is an irreplaceable source of technical information. In contrast the world wouldn't skip a beat if all RSS feeds stopped tomorrow.

  • In the speech, Adam Bosworth predicted that "RSS 2.0 and Atom will be the lingua franca that will be used to consume all data from everywhere" because they "are simple formats that are sloppily extensible."

    It's true that many seem to be moving in this direction. For example, A9's OpenSearch [a9.com] is a simple extension to RSS. The Findory API [findory.com] offers simple, RSS-based access to news and blog search results. Yahoo offers a few services through more the more complex Yahoo APIs [yahoo.net], but offers many more through Yahoo R [yahoo.net]
  • Is he also played by Brent Spiner? I hope he doesn't have that puss chip.

    Is there a Dweeb mod point?

On the eighth day, God created FORTRAN.

Working...