Slashdot Log In
Is RSS Doomed by Popularity?
Posted by
samzenpus
on Wed Dec 08, 2004 07:55 PM
from the more-than-you-can-chew dept.
from the more-than-you-can-chew dept.
Ketchup_blade writes "As RSS is becoming more known to the mainstream users and press, the bandwidth issue reported by many sites (Eweek, CNet, InternetNews) related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but can this be enough to help RSS as it reaches an even bigger user base in the upcoming year?"
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Push (Score:5, Insightful)
How much bandwidth does Slashdot's RSS feed use?
It looks like the RSS feed on my home page has a small handful of subscribers. Neat.
Re:Push (Score:5, Insightful)
Most of the problem come from a few older RSS readers that don't support Conditional GET, gzip, etc. With modern readers, there's essentially no problem (I've measured it on a few sites I run). Yes, they poll every hour or two, but the bandwidth is a tiny, tiny fraction of what we get from say, putting up a small QuickTime.
There seem to be lots of people who freak out way to quickly about a few bytes. RSS sends to unnecessary data, but if you've configured things correctly, it's much smaller than lots of other things we do on our networks...
Parent
Actually, this is a more general xml problem (Score:3, Interesting)
See Roedy Greens (one time comp.java.lang FAQ maintainer)excellent essay [mindprod.com] on why XML causes these problems.
Re:Push (Score:3, Insightful)
Re:Push (Score:3, Interesting)
Re:Push (Score:3, Interesting)
On the other hand it seems like everyone and their dog can do P2P.
A P2P-ish RSS system that:
* Attempts to make each client capable (but not always used) of functioning as a caching server for the feed
* Has a top-level owner of a feed who has sole rights to update the feed. Perhaps passing public/private keys with the feed to ensure no tampering. Anyone who wanted to subscribe to the fee
They just need to follow ./'s lead (Score:5, Insightful)
Re:They just need to follow ./'s lead (Score:3, Insightful)
I don't know much about RSS, but it seems kind of silly to have the user refresh. Doesn't that defeat the purpose? Why not just have the server send out new news as it gets it?
Re:They just need to follow ./'s lead (Score:4, Informative)
The server would also need to have a list of clients to send the refresh to, which means you'd need to "sign up" so the server puts you on the list.
Nevermind the difficulties that dynamic IP addresses would cause. It's generally easier if the user initiates things.
Parent
Re:They just need to follow ./'s lead (Score:5, Informative)
Parent
Re:They just need to follow ./'s lead (Score:3, Informative)
But the gist of it is that push-media and multicast are either a thankfully-dead-fad, or are a technology whose time has yet to come. Push media, in particular, was salivated over quite a bit in the late 90's (eg. see Wired's 1997 cover article on it [wired.com]), so it's not as if it's a new idea. Despite this, push and multicast haven't gained wide success yet. Lots of people have various reasons why, and some of them are actu
Re:They just need to follow ./'s lead (Score:5, Informative)
Parent
Re:They just need to follow /.'s lead (Score:5, Funny)
Parent
Slashdot's RSS blocking policy (Score:5, Informative)
Every complaint about this that I've investigated has turned out to be either a broken RSS reader or an IP that's proxying a ton of traffic (which we usually do make an exception for).
Oh, and if you want to read sectional stories in RSS, then:
Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money. We were one of the first sites to do this but (as this story suggests) you'll see a lot more sites doing it in the future. I think our policy is fair.
Parent
Slashdot's RSS blocking policy-$$$$ Kaching. (Score:4, Insightful)
So's using correct HTML, and CSS.
Parent
Re:Slashdot's RSS blocking policy (Score:4, Insightful)
Not really. Our cache hit rate would be about zero. We update the homepage about once a minute, and the same goes for any page that any reader would be likely to reload within a reasonable time.
Parent
Re:Slashdot's RSS blocking policy (Score:3, Informative)
Re:Slashdot's RSS blocking policy (Score:3, Interesting)
That's OK, I'm a subscriber... still don't see how the custom RSS works. From my RSS reader how does Slashdot know I'm a subscriber? Special URL?
Welcome to the internet (Score:2, Informative)
Whee!
And the funny thing here is, if RSS had-- at its conception-- included caching and push-based update notification and all the other smart features that would have prevented this sort of thing from becoming a problem now, [i]it would never have been adopted[/i], because the only reason RSS succeeded where the competing standards to do the sa
Re:Welcome to the internet (Score:2, Insightful)
you're interpreting it from the client perspective, which is not where the name came from.
You're talking application-level (Score:5, Interesting)
If the web started with HTTP 1.1, it would never have gone anywhere because it's too complicated. There are parts of 1.0 that probably aren't implemented very well.
If you want to improve things, adopt an RSS reader project and add those features.
Parent
RSS readers don't cache! (Score:5, Insightful)
Re:RSS readers don't cache! (Score:5, Insightful)
The problem, is of course, server-side. For instance, the GPL blog software Word Press doesnt do ANY cacheing. Its RSS is a php script. So if you get 10,000 requests for that RSS, then you're running a script 10,000 times. That's ridiculous and poor planning. Other RSS generation is guilty of this crime.
Yes, there is a plug in (which doesnt work at nerdfilter nor at the other wordpress site I run) and a savvy person could just make a cron job and redirect RSS requests to a static file, but that's all besides the point. This should all be done "out of the box." This is a software problem that should be addressed server side first, client side later.
Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so. RSS really doesnt scale too well for big sites. What they should be doing is denying connections for IPs that hit it too often or change the RSS format to give server instructions like "Dont request this more than x times a day" in the header for the clients to obey. x would be a low number for a site not updated often and high for asite updated very often.
Parent
Re:RSS readers don't cache! (Score:5, Informative)
Technically true but misleading. WordPress allows user agents to cache the RSS/Atom feeds, and will only serve a newer copy if a post has been made to the blog since the time the user agent says it last downloaded the feed. Otherwise it sends a 304. This is in 1.3-alpha5. I dunno what 1.2.1 does.
Not coincidentally, these are the egregious worst offenders I mentioned. Bloglines grabs my RSS2 and Atom feeds hourly, and doesn't cache or even pretend to. Firefox Live Bookmarks appears to cache feeds, but your aggregator plugins might not. I can't (yet) tell the difference from the server logs between Firefox and the various aggregator plugins.
The best ones are the syndication sites that only grab my feeds after being pinged. Too bad I can't ping everybody. That could solve the problem if there was some way to do that.
Parent
Re:RSS readers don't cache! (Score:5, Informative)
When you request the feed, you first get sent your normal HTTP header. If properly configured, it will return a 304 if you have the most recent version -- however, as many feeds are generated in PHP[1], this header is defaulted off, and you'll end up with your standard 200, or go ahead, code. This single handedly wastes a metric tonne of bandwidth needlessly.
Even if you're trying to rape a feed, you'll only be wasting a few hundred bytes at most every half hour, than the whole 50K or whatnot size it is.
See here [pastiche.org] for a more detailed explanation.
[1] This is not a PHP specific issue; a lot of dynamic content, and even static content, fails to do this properly. But this is what it's there for, after all.
Parent
rsstorrent will solve it all (Score:4, Interesting)
Doomed? It's barely got off the ground... (Score:5, Insightful)
Take the BBC News website for example. On September 11th 2001 its traffic was way beyond anything it had experienced to that point. Within a year or so, it was comfortably serving more requests and seeing more traffic every day. Proof if it was needed that capacity isn't the issue when it comes to Internet growth, and won't be for the foreseeable future.
RSS is in its infancy. Just because people didn't anticipate it being adopted as fast as it has been that doesn't make it "doomed". By that rationale, the Internet itself, DVDs, digital photography, etc are all "doomed" too.
Limit download to new content (Score:5, Interesting)
About time for asynchronous (Score:3, Informative)
Not a problem with RSS.. just humans. (Score:5, Interesting)
RSS feeds are meant as a way to strip all the nonsense from a site and offer easy syndication, right? Basically, present the relevent news from a full-fledged webpage in a smaller file size? If such is the case, this isn't an RSS issue, really. I see it more as a bandwidth issue. I mean, people are going to get their news one way or the other.. either with a bunch of images and lots of markup via HTML or with just the bare minimum of text and markup via RSS. I would prefer RSS over HTML any day of the week! But perhaps RSS makes syndication TOO simple. Thus everyone does it and that eats additional bandwidth that normally would be reserved for those browsing the HTML a site offers.
And you could implement bans on people who request the RSS feed more than X times per hour as someone suggested (Doesn't /. do this?), but I don't think that gets around the bandwidth issue. I mean, those who want the news will either go with RSS or simply hit the site. Again, RSS is the preferred alternative to HTML.
So here's my suggestion.. go to nothing but RSS and no HTML!
Re:Not a problem with RSS.. just humans. (Score:3, Insightful)
I have to point out how much I love "Sage", the Mozilla Firefox plugin for RSS - you can even rightclick on that XML thing that tries to tell you to save the page and bookmark it under "Sage Feeds" and then Alt-S and you have your RSS.
I started using Sage for
Pop Fly (Score:5, Funny)
"Is Instant Messaging Doomed by Popularity?"
"Is E-Mail Doomed by Popularity?"
"Is Usenet Doomed by Popularity?"
"Is The Internet Doomed by Popularity?"
"Is Linux Doomed by Popularity?"
"Is Apple Doomed by Popularity?"
"Is Netcraft Doomed by Popularity?"
"Is Sex with Geeks Doomed by Popularity?"
Solutions (Score:5, Informative)
A simple fix (Score:3, Informative)
New subscribers would receive the initial copy of the feed via traditional unicast TCP, because that would be the least CPU-intensive way of handling a few requests at a time.
A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.
The main problem with the multicast solution is that although multicasting is enabled across the backbone, most ISPs disable it - for reasons known only to them, because it costs nothing to switch it on. Persuading ISPs to behave intelligently is unlikely, to say the least.
Solved, move on (Score:4, Informative)
As another poster has pointed out, banning users who check too frequently is an excellent fallback. A tiny site won't know to install the software, but it won't be an issue for a tiny site.
RSS + Bittorrent -- works for Podcasts... (Score:3, Interesting)
The Podcasters need it too. I'm subscribed to a couple dozen feeds and have well over 4GB of files in my cache right now.
The biggest problem with Bittorrent and podcasts is that the RSS aggregators needs to be Bittorrent aware. Unfortunately, few are.
Re:RSS + Bittorrent -- works for Podcasts... (Score:3, Insightful)
Bittorrent (Score:3, Insightful)
what's wrong with the old subscription model? (Score:3, Interesting)
An alternate approach would be to do the same thing with a news server. Why keep refreshing a feed for updates instead of letting it notify you when it has updates?
This issue was previously discussed elsewhere (Score:5, Insightful)
Slashdot user GaryM [slashdot.org] posted a related question elsewhere [advogato.org] about 20 months ago. At that time, in that forum, commenters dismissed his proposed solution, the use of NNTP, on the grounds that NNTP is deficient, but others continue to see NNTP as a possible solution [methodize.org] nevertheless.
Solution! (Score:3, Funny)
Now if only they'd bring back the $$$ from the mid 90s too.... :)
Solution: RSS over Usenet news (Score:5, Interesting)
Create a new first level domain ( like alt, comp, talk etc ) named "rss" and use an extra header to identify the originating rss feed URL. The latter header could be used by the RSS/NNTP reader to select which article bodies to download and to verify each RSS entry to identify fake posts.
Swarming (Like BitTorrent) is the answer (Score:5, Interesting)
A content creator (say Slashdot) has webpages and it has an RSS feed. They create a torrent for each page. They sign the RSS file and each torrent (and its content) with a private key. They post their public key on their homepage.
Now, you can cache the RSS file on other sites that support you yet the users can still be confident that it really came from you. Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first. When the page loads locally in your browser, it could still go out and get ads if you are an ad sponsored site.
If you are a popular site and have a "fan base", you should have no problem implementing something along these lines. If you are a site that has these problems, you are probably popular and have a fan base. Given the right software and the buy-in from users, the problem solves itself.
Re:Swarming (Like BitTorrent) is the answer (Score:4, Informative)
No, it "can't". Or at least, it can't serve it with any benefit. Tracker overhead swamps any gains you might make. BitTorrent is unsuitable for use with small files, unless the protocol has radically changed since I last looked at it. In the limiting case, like 1K per file, it can even be worse or much worse than just serving the file over HTTP.
Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first.
Oh, here's the problem, you don't know what you're talking about or how these technologies work. When an RSS file has been retrieved, there is nothing remotely like "get the webpage" that takes place in the rendering. The images are retrieved but those are typically too small to be usefully torrented too.
Regretably, solving the bandwidth problem involves more than invoking some buzzwords; when you're talking about a tech scaling to potentially millions of users you really have to design carefully. Frankly, the best proof of my point is that it was as easy as you say it is, it'd be done now. But it's not, it's hard, and will probably require a custom solution... which is what the article talks about, coincidentally.
Parent
If-Modified-Since, User-Agent (Score:4, Insightful)
Perhaps particularly offending User-Agents should be denied access to feeds. If I saw particular User-Agents consistently sending requests without If-Modified-Since, I'd ban them.
corporate caching (Score:3, Insightful)
Chip H.
Compression (Score:3, Insightful)
51894b boingboing.rss.xml
17842b boingboing.rss.xml.gz
RSS hits that directly hit databases are flawed (Score:3, Insightful)
Re:Usenet? (Score:2)
I agree with you. (Score:3, Informative)
HTTP compression will work even better here than it does for regular pages - RSS is basically all text so every response is going to be compressible. Looking at a handful of my feeds, some