A New Data Model for the Web 54
An anonymous reader writes "Adam Bosworth delivered what
could be considered a seminal lecture (mp3) at the last MySQL conference about a new data model
for the web, why the plain HTML web succeeded, and why XQuery or the
Semantic web are failures. He is emphatic that RSS 2.0/Atom are the
next big thing and represent the new data model for the web. The audio
is rather long at forty plus minutes and there are a few
places
where the
talk has been covered."
*sigh* (Score:3, Interesting)
What are the operators for manipulating this data? What is the type system? How is integrity guaranteed? How do I build a distributed database system with it?
There is only one complete data model: the relational model. Demonstrate to me how this "new" data model is not either 1) some subset of the relational model or 2) a bunch of nonsense, not a data model at all.
He's got one thing right: XQuery (return to the hierarchic databases of yesterday) and RDF (return to the network model, but with a fixed 3-value schema) are nothing to waste your time on.
To me his assertions are like saying, for example, the fundamental theorems of electromagnetism no longer apply to cell phones because they can now play MP3s, or something. Makes no sense.
Unfortunately, there is nobody left in this industry that has any clue about databases.
Re:*sigh* (Score:1)
I think "data model" refers to distribution and the scalability of distribution in this case.
an aggregation model (Score:4, Interesting)
The slashdot story mis-sells the content of the speech. For me [boakes.org] it was just AB talking about how it would be useful to have a simple system of aggregation that goes beyond subscribing to an RSS feed.
It's not a new data model & the semantic has not failed, in fact, it's more important when considering how to work with the diverse resulting data.
Re:*sigh* (Score:3, Interesting)
I have an issue with this statement. It could be because I *hate* SQL, but, let's see what other's think... According to http://www.google.com/search?hl=en&lr=lang_en&c2co ff=1&oi=defmore&q=define:data+model [google.com] there are a few definitions for data model... among them are:
SQL Is Not Relational ( & incorrect citation) (Score:2, Informative)
The language Tutorial-D in the article you refer to [techworld.com] is yet another language for relational databases! Darwen and Date are critics of SQL implementations; they are NOT critics of the relational database as you imply. They are instead the strongest relational database proponents.
Indeed the rel
Re:SQL Is Not Relational ( & incorrect citatio (Score:2)
Yes, I understand that Tutorial-D is an attempt to make a more-correct implementation. I was mostly referring to these two comments from said article:
Re:*sigh* (Score:1, Informative)
The relational model specifies a set of relational operators, much like mathematical operators. SQL looks nothing like this, just to get the value of a single relvar you have to type "SELECT * FROM Foo" instead of the simpler and clearer "Foo". That's just the tip of the iceberg..
Anyway there are at least two definitions of "data model". One is
Re:*sigh* (Score:2)
Re:*sigh* (Score:2)
Actually, the point behind Tutorial D, Rel, etc is that the current *implementation* of relational databases are broken
Of course, SQL survives (although broken and inconsistent) because it works today and it puts bread on the table... as opposed to a theoretical se
Re:*sigh* (Score:2)
Re:*sigh* (Score:2)
Not really (Score:5, Interesting)
He is emphatic that RSS 2.0/Atom are the next big thing and represent the new data model for the web.
Here's the thing: RSS 2.0 and Atom really don't have a revolutionary data model. They are just file formats that list short descriptions, in a sequential order, with a bit of meta data, that get polled on a regular interval. That's all.
They are only popular because the use pattern is different to normal web pages. The tech itself is pretty mundane. Internet Explorer 4.0 has something similar with "channels", way back in the 90s.
You could have done the same thing with a subset of HTML 2.0 in the 90s. The main reasons people didn't is because they didn't think of it and the need wasn't as great.
The Semantic Web, on the other hand, is doing new stuff. Some of it we don't know how to do yet. Some of it is immediately practical, some of it isn't. The Semantic Web is more of an idea than a tangible product.
By saying that RSS and Atom somehow "beat" the Semantic Web, he's comparing apples to oranges. It just doesn't make sense.
The reason the web took off so well was because it was built from a few simple principles that could be generalised. Resources that could be addressed. Simple, text-based markup. Simple, text-based protocol.
The Semantic Web will probably take off in the same way, with various bits already being used to varying degrees of success (e.g. Mozilla already uses RDF). But it's a much bigger problem, so expecting it to take off just as quickly is naive.
Re:Not really (Score:4, Informative)
One of the reasons it appears to move along so slowly now is that the research is handling a lot of issues and as van Harmelen has said, they're afraid to enter the same pitfalls as the research in artificial intelligence where there has been a lot of buzz, but not many concrete results. That's not to say that there aren't any issues with the semantic web, but it's still coming along. OWL is being extended with OWL-S and OWL-QL and the issues of security and privacy are being looked at. Besides, even though ontologies are a new development on the web, they are nothing new overall, something I guess AI researches can testify to.
Recommended book for those who want to extend your knowledge on SW A Semantic Web Primer [amazon.com]
Re:Not really (Score:1)
Another really good book that covers all the bases is Service Oriented Computing [bblfish.net] which gives a very good view as to how the Semantic Web, Agents, Web Services and RESTful apis fit together. This is a really serious book, but it helps get an understanding of the problems that are attempting to be solved.
Re:Not really (Score:2)
RSS/Atom is a product. I can see immediately that it is, or is not for me. The SW is just ideas. Good ideas, but nothing in the sack.
Re:Not really (Score:2)
Currently? Not much, it's still fairly new and being developed all the time. XML is an integral part of the semantic web and you use that no?
RSS/Atom is a product. I can see immediately that it is, or is not for me. The SW is just ideas. Good ideas, but nothing in the sack.
Unfortunately, it's still in the starting blocks, but the plans have always been to take it step by ste
Re:Not really (Score:2)
As for using XML- actually, I don't think I've looked at any XML in wel
Does it matter (Re:Not really)? (Score:1)
Try telling the masses that the next big thing is a new data model for the web, based on semantics, and 99% of them will ask you what "semantic" means, never mind the intangible data model that is the real underlying improvement.
Show them a little program that sits on their desktop and feeds them the latest from CNN, the BBC etc and they understand that.
Web development and IT in general is running a real risk of falling
Re:Does it matter (Re:Not really)? (Score:2)
Actually, it's a quite logical question to ask. Research projects without any discernable end or application are often indistinguishable from Bullshit.
Re:Does it matter (Re:Not really)? (Score:1)
Now the IT field is "mature"(-ish), it's getting a lot more exposure. As developers we need to be presenting simple (not trivial) but elegant little demos to people built on top of whatever the latest great "model" is and then asking for money and contributions.
A lot of otherwise promising projects are doing it the other way round...
"We/I've got this great frame
A war between the humans and computer scientists (Score:4, Insightful)
Re:Just listened to the whole thing (Score:3, Insightful)
Use appropriate character-encoding and -decoding at I/O-borders.
Finished.
Everyone who is not able to do these things correctly by hand or to make his script output correct XML should continue flipping burgers and does not belong in this industry.
What kind of Kindergarten is IT turning into?
Fuck.
Re:Just listened to the whole thing (Score:1)
Re:Just listened to the whole thing (Score:5, Insightful)
Miss a tag in XML, sorry, no rendering today. The result? No-one writes XML by hand
Actually, it works the other way around. Because syntax errors are immediately obvious when writing XML, it's a lot easier to write by hand, because when you make a mistake, you notice it straight away.
The reason why so many people use libraries with XML is because it's a standard format with libraries for practically every language. Using a library often saves time compared with writing stuff by hand.
that means your average Perl, Python, PHP coder will actually have to read some docs or a specification to remember how to output this stuff so they just won't bother.
Rubbish. They'll do exactly what they did to learn how to generate HTML - look at a few examples and make their own that looks like the example. <?php echo('<foo>My XML Document</foo>'); ?> is no harder than <?php echo('<h1>My HTML Document</h1>'); ?>
Bosworth says that's why RSS 2.0 beats the pants off RSS 1.0, anyone can create these files and the freely available libraries that handle this stuff are really really fault tolerant.
Both RSS 1.0 and RSS 2.0 use XML syntax and have freely available libraries anybody can use. But didn't you just say that nobody will bother using XML formats because people won't read the documentation that tells them how to use such libraries?
Re:Just listened to the whole thing (Score:2)
My own XML parser has error reporting and recovery (if hardly anything else). Sorry to hear that yours is so broken. Try a different one.
Java dies on you if you miss a curly brace, yet people keep on using it.
Heh heh... (Score:1, Funny)
Content, Availablity... (Score:4, Insightful)
What use is a format of data if the data itself is useless?
How can a format take off when only few have access to publishing in it?
That's the way Gopher went. Only admins could add pages. Meantime, most of people with access to the net, were able to create their own ~/public_html
Now RSS is the big thing. People add RSS to everything. Where are MSIE's "channels"? Spamvertisment available to the chosen few. Revolutionary video tape technologies competetive to VHS: None in shops, few movies available. And so on, and so on...
Re:oblg. extra linkage (Score:1)
Put your money where your ears are... and consider donating. [itconversations.com] Registration for IT Conversations is free it seems, but bandwidth isn't.
Really enjoyed, but not sure I buy (Score:5, Insightful)
I haven't really digested the talk, so maybe that's why. But this is my gut reaction against what he's saying.
I don't think that geeks fully acknowledge the role of what I think of as bibliography in the web ecosystem.
I was an English major. Let's say that you want to learn about Faulkner. If you go to the card catalogue, and search for books about Faulkner, you get a lot of hits -- more books than you could ever read. It's essentially useless.
What you really need is a bibliography -- something written by a Faulkner scholar who says "these are the really important and groundbreaking books about Faulkner." That's one of the cool things about Encyclopedia Brittanica -- at the end of their articles, they tend to give you a run down of some of the key books on the subject.
So if you want to read a biography of George Washington, EB will let you find the right one. That's important, because there are so many biographies of George Washington out there.
That's my key point. If you go to a university library and use the catalogue to do a mechanical search for books about George Washington, the results aren't very useful. But if you read the bibliography at the end of the Encyclopedia Brittanica article, it's extremely useful.
I'm trying to draw a distinction between mechanical searches, on one hand, and selections based on human judgement on the other.
Google is useful in larege part, I think, because page rank lets you find what are essentially good bibliography pages. You use a dumb mechanical search to put you in touch with people who know their subjects and who have good judgement (hopefully).
The other day, for example, I was thinking about an old programming language called APL. I searched for it, and found a couple of pages that seemed to have collected just about everything APL -- anecdotes, personal histories, tutorials, implementations, pictures of the goofy APL keyboards, etc.
The Google powered web is cool because it combines the mechanical and the bibliographic so well. Google gets me to the bibliography -- it pulls that needle out of the haystack. But it's the bibliography that lets me drill down.
This is important. The really good stuff I read about APL didn't come directly from the actual google result page. There was a link in between -- the google result page took me to the APL bibliography page, and from there I was able to hit the meat of the matter.
We've seen, over the past decade, an explosion in which mechanical searching can do. Because it's been getting so much better so quickly, it's dominating the way we think about how we find information. It's causing us to give bibliography -- the judgement of experts -- short shrift.
But bibliography is absolutely key to the google ecosystem.
My problem with attempts to impose more structure on data is that it always breaks things. It's beefing up mechanical searches, which are already very good, and it does it at the expense of bibliography.
I buy the argument in this lecture more than the guy making it does. He complains about heavier structures, and how the complexity will prevent people from producing and consuming information. I think that almost any move away from what we have now will do the same thing. The more you structure information, the harder it is for people to provide bibliography.
The point is that the ideal medium for bibliogrphy is free form -- one person saying, "this is what I think" to another.
The genius of google is that page rank gives you a mechanical way to uncover the best bibliographies. The best ones tend to show up at the top of the results.
In the old days, there was alta vista, and there was yahoo. Yahoo used human beings to categorize data manually. They'd put sunglasses next to the best sites in many categories -- flag something as a "cool site". Alta vista was pure mechanical searching, with no human judg
Re:Really enjoyed, but not sure I buy (Score:1)
Honestly, I don't know much about the Semantic Web, but I have my doubts. In addition to its mechanical nature, I suspect the Semantic Web may eventually be plagued by abuse: search engine optimization. HTML is presentable so even if it's being abused with SEO, the human can verify whether it's crap or not. Can the machine do that?
That's where the search engine comes in. Something that sifts thru the available data and presents the tidbits that are ripe for picking.
I predict
Re: Really enjoyed, but not sure I buy (Score:3, Insightful)
Spam killed the simple pagerank (Score:1)
Great (Score:4, Funny)
Couldn't we please focus on implementing the old [wikipedia.org] data [wikipedia.org] model [wikipedia.org] correctly [google.com] first?
a case of capt. obvious? (Score:5, Insightful)
as far as I am concerned, however you split up content, style, updates, 'sitefiles' (my collective analogue for rss and related technologies) the fact is one coherent, styled document must be the end result.
Too much is being read into content management and RSS. Yes RSS is cute, I use it to have a BBC and CNN link in my firefox, and I just one click to read articles, not go to the site.
RSS and podcasting is the worst combination of not-new hype ever. Downloading a file through the web, wow new!
Seriously, pod casting should be renamed downloading audio.
Re:a case of capt. obvious? (Score:1)
Most developers who used to build "rich client" apps generally agree that web-based GUI's are a pain in the ass and lacking decent cross-browser widgets such as editable data grids, collapsable outline/trees, combo boxes, and others. Most companies like web apps because they are much easier to depl
Centralized database logic doesn't scale? (Score:2)
Would anyone care to explain that a little? And please dumb it down a lot, I'm not that smart in databases.
Re:Centralized database logic doesn't scale? (Score:1)
If "doesn't scale" simply means "I need more proc/mem/disk" you can always throw more horsepower at the problem, but that shifts the solution to a question of how much money you have to spend on toys. That's not what I'm guessing he's referring to, though.
Without listening to or reading the presentation, I assume he's talking about the standard n-tier development/deployment model. Keep your presentation layer, business logic layer and data layer separate so that you
Re:Centralized database logic doesn't scale? (Score:2)
Some argue that you shouldn't even put foreign key restraints in your database... the app can handle that for you and it'll make it faster.
Others argue that it is key to maintain the integrity of your data. If this means putting lot's of logic in the database in the form of procs, views, triggers, etc... that's what you need to do. Better to normalize and have accurate data than to denormalize and have speed.
It all depends on what your needs are.
Re:Centralized database logic doesn't scale? (Score:2)
Let's give a concrete example to this. Consider a hypothetical 2 tier system with a thin client talking to a centralized database where all the business logic is being handled. This system gets deployed on a top-of-the-line Sun Fire server running Oracle. The system is successful and its usage grows rapidly.
Re:Centralized database logic doesn't scale? (Score:2)
This is, of course, pure bunk because Google does exactly this and Google scales well. Difference is the money available to you, Web programmer, and Google, Web moneybags. It's bunkum, but very wise bunkum nonetheless, unless you have a billionaire uncle who signs documents without reading them.
The workaround for the limits of your database is, Adam claims, to share your data in RSS/Atom fee
Re:Centralized database logic doesn't scale? (Score:2)
No, you publish an API and other people use that API to access your contents. The logic of *their* web applications is in their sites, not in your DB.
Oh boy... now you've done it (Score:1)
Very skeptical (Score:2, Insightful)
That guy can start by learning how to add some <br
Re:Very skeptical (Score:2)
What does understanding Web fundamentals have to do with someone using excessive paragraph lengths? That's bad writing rather than bad markup.
Re:Very skeptical (Score:1)
OK, score one for you :-) I never use line breaks apart from what PHP's nl2br() will generate, so I wasn't careful with my remark.
Mea culpa, and all that ^_^;
RSS Same As NNTP Newsgroups... (Score:2, Insightful)
NNTP is an irreplaceable source of technical information. In contrast the world wouldn't skip a beat if all RSS feeds stopped tomorrow.
Web services moving to RSS (Score:2)
It's true that many seem to be moving in this direction. For example, A9's OpenSearch [a9.com] is a simple extension to RSS. The Findory API [findory.com] offers simple, RSS-based access to news and blog search results. Yahoo offers a few services through more the more complex Yahoo APIs [yahoo.net], but offers many more through Yahoo R [yahoo.net]
NEW Data Model (Score:1)
Is there a Dweeb mod point?