Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Technology

Is RSS Doomed by Popularity? 351

Ketchup_blade writes "As RSS is becoming more known to the mainstream users and press, the bandwidth issue reported by many sites (Eweek, CNet, InternetNews) related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but can this be enough to help RSS as it reaches an even bigger user base in the upcoming year?"
This discussion has been archived. No new comments can be posted.

Is RSS Doomed by Popularity?

Comments Filter:
  • Push (Score:5, Insightful)

    by Phroggy ( 441 ) <slashdot3@@@phroggy...com> on Wednesday December 08, 2004 @08:56PM (#11037867) Homepage
    Remember all the hype about "push" technology back in the mid-nineties? Nobody was interested, but RSS feeds are being used in much the same way now. I'm thinking there are two significant differences: 1) with RSS, the user feels like they're in control of what's going on; with push, users felt like they were at the mercy of whatever money-grabbing corporations wanted to throw at them, and 2) a hell of a lot of people now have an always-on Internet connection with plenty of bandwidth to spare. When you've got a 33.6kbps dialup connection, you use the Internet differently than when you've got DSL or cable.

    How much bandwidth does Slashdot's RSS feed use?

    It looks like the RSS feed on my home page has a small handful of subscribers. Neat.
    • Re:Push (Score:5, Insightful)

      by Anonymous Coward on Wednesday December 08, 2004 @09:12PM (#11037987)
      Pointcast sent way too much data at the time, and now we all have orders of magintude more bandwidth.

      Most of the problem come from a few older RSS readers that don't support Conditional GET, gzip, etc. With modern readers, there's essentially no problem (I've measured it on a few sites I run). Yes, they poll every hour or two, but the bandwidth is a tiny, tiny fraction of what we get from say, putting up a small QuickTime.

      There seem to be lots of people who freak out way to quickly about a few bytes. RSS sends to unnecessary data, but if you've configured things correctly, it's much smaller than lots of other things we do on our networks...
    • Re:Push (Score:3, Insightful)

      by sploxx ( 622853 )
      Yes, maybe this way 'feels technically different', but if you have an RSS aggregator/news ticker applet whatever on your desktop, it usually hides the implementation details completely from the user. Do you really think of "ok, now my client makes a http request, that travels through the call hierarchy of the libraries, gets a tcp socket open, gets a kernel call of the driver to send a SYN packet??". Even if I may have detailed knowledge about the inner workings of an application, I usually don't care about
      • Ahh, and I forgot: Multicast is also a very nice idea for such applications.
        And, did I forget to mention that IPv6 should be implemented ASAP?

        There are sometimes reasons besides DRM and user control for new protocols, standards and formats :-)

    • Re:Push (Score:3, Interesting)

      by ikewillis ( 586793 )
      http://beacon.sf.net/ [sf.net] tries to do this using UDP and filesystem monitoring. It waits for the RSS document to change then sends a UDP datagram to notify everyone that a new version is available. It's better than everyone polling the server via HTTP anyway.
      • I never understood why they didn't use IP's or DNS entries to tell if a new article was ready.

        For instance,
        They could use a dynamic dns entry. The client would poll the ip of some domain until the ip changed. After a change, the client would go get the new article from some other ip. This wouldn't be very good for small timespans between articles since, looking at my no-ip domain, it takes about 5 mins between updates.

        Or, the RSS client gets the current "waiting" ip at the first poll. Then, it tries to co
        • Re:Push (Score:2, Interesting)

          by jasonwea ( 598696 )
          This seems like a far better than the UDP notification idea. Port forwarding for an RSS feed? No thanks.

          There is almost always a DNS cache at the ISP so the polling interval can be completely controlled by the TTL of the record. Using the existing distributed caching of DNS versus the large percentage of users who are not behind HTTP caches.

          I see two potential problems with this idea:

          1. A lot of people are stuck behind HTTP proxies with limited or no DNS. This isn't too bad as they could fallback to the
          • Re:Push (Score:3, Interesting)

            by Jahf ( 21968 )
            The problems with many of these mechanisms is that (as you mention) smaller sites may not have the facilities to do it.

            On the other hand it seems like everyone and their dog can do P2P.

            A P2P-ish RSS system that:

            * Attempts to make each client capable (but not always used) of functioning as a caching server for the feed

            * Has a top-level owner of a feed who has sole rights to update the feed. Perhaps passing public/private keys with the feed to ensure no tampering. Anyone who wanted to subscribe to the fee
    • Re:Push (Score:2, Interesting)

      by rlanctot ( 310750 )
      My suggestion is to revamp RSS to use a P2P format of publishing, so you spread out the load.
  • by Neil Blender ( 555885 ) <neilblender@gmail.com> on Wednesday December 08, 2004 @08:57PM (#11037870)
    And institute jackboot banning policies if you access them more than x times per y hours.
    • And institute jackboot banning policies if you access them more than x times per y hours.

      I don't know much about RSS, but it seems kind of silly to have the user refresh. Doesn't that defeat the purpose? Why not just have the server send out new news as it gets it?
      • by NeoSkandranon ( 515696 ) on Wednesday December 08, 2004 @09:13PM (#11037993)
        If the server initiated the connection then RSS would be useless to nearly everyone who's behind a router or firewall that they do not administer.

        The server would also need to have a list of clients to send the refresh to, which means you'd need to "sign up" so the server puts you on the list.

        Nevermind the difficulties that dynamic IP addresses would cause. It's generally easier if the user initiates things.
        • It could be done in a more efficient way however. The first few bytes could tell you if there's something new (a number that increment each time something change) and you would only fetch the whole file if there's something new.
          • by Electroly ( 708000 ) on Wednesday December 08, 2004 @09:38PM (#11038157)
            HTTP 1.1 already supports this. A conditional HTTP request can be made which basically asks the server if the file has been updated. The server can then respond a 304 Not Modified and avoid sending the entire RSS file again. Unfortunately, poorly written RSS aggregators don't implement this, and it is those aggregators that are the real problem here. They typically are the ones with the default 5 minute update time, too.
      • This question has been asked many times, and has been answered better than I'm able to.

        But the gist of it is that push-media and multicast are either a thankfully-dead-fad, or are a technology whose time has yet to come. Push media, in particular, was salivated over quite a bit in the late 90's (eg. see Wired's 1997 cover article on it [wired.com]), so it's not as if it's a new idea. Despite this, push and multicast haven't gained wide success yet. Lots of people have various reasons why, and some of them are actu

      • Yahoo news uses a ping mechanism. They only update your feed every once and awhile, but, if you want it updated faster, you ping them when you have an update.
    • by interiot ( 50685 ) on Wednesday December 08, 2004 @09:30PM (#11038113) Homepage
      You know what happens then? The same thing they do when you hamper your RSS feed in any other way, they scrape your HTML and create their own feeds [paperlined.org]. Slashdot doesn't monitor their front page as closely as they do their rss page, so you can get away with quite a bit of abuse, at least for a while. They've blacklisted my IP ocassionally when I got overzealous though.
    • by jamie ( 78724 ) <jamie@slashdot.org> on Wednesday December 08, 2004 @09:43PM (#11038185) Journal
      Slashdot blocks your IP from accessing RSS if you access our site more than fifty times in one hour. I think that's reasonable, don't you? Especially since our FAQ tells you to request a feed only twice an hour [slashdot.org].

      Every complaint about this that I've investigated has turned out to be either a broken RSS reader or an IP that's proxying a ton of traffic (which we usually do make an exception for).

      Oh, and if you want to read sectional stories in RSS, then:

      • create a user [slashdot.org] if you haven't already,
      • edit your homepage [slashdot.org] to include sectional stories you like (and exclude those you don't),
      • then reload the homepage [slashdot.org] and copy that "rss" link at the very bottom of the page. It will be customized to your exact specs!

      Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money. We were one of the first sites to do this but (as this story suggests) you'll see a lot more sites doing it in the future. I think our policy is fair.

      • then reload the homepage [slashdot.org] and copy that "rss" link at the very bottom of the page. It will be customized to your exact specs!

        OK, that's completely cool. Kudo to whoever implemented that. Now I don't have to bitch about it on this thread. :)

        But when I follow your link, I get

        http://developers.slashdot.org/index.rss

        and if I go to the normal homepage I still have

        http://slashdot.org/index.rss

        I'd expect there to be a ?user= or something. How does the RSS generator know it's me? My R
      • by Anonymous Coward on Wednesday December 08, 2004 @10:23PM (#11038412)
        "Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money."

        So's using correct HTML, and CSS.
      • Don't normal web proxies work just fine for caching RSS traffic? I just looked at Slashdot's feed and it seems to cache well - much better than the rest of Slashdot apparently. [Is there a reason Slashdot doesn't cache better? I'd think that'd save a lot of bandwidth.]

        So, to me, it looks like there is no need for a RSS proxy. RSS readers just need to learn to use regular web proxies and users need to be convinced that using such proxy servers is to their benefit. Good luck given the low number of users tha
      • Jamie, I'm not sure if you're mistaken or if something has been changed in the last month or two, but your IP blocking provisions certainly were kicking in WAY before 50 accesses in one hour.

        I had a Slashdot RSS feed live bookmark in Firefox (supposedly gets checked once an hour, or when the browser is started up), and that got me temporarily banned (perhaps I had restarted the browser several times in an hour for some reason, but it certainly wasn't 50 times!).

        Like I said, hopefully you have upped the ba
  • by Anonymous Coward
    Where we use "push" technologies for everything that functionally pulling information, and "pull" technologies for everything that functionally pushes information.

    Whee!

    And the funny thing here is, if RSS had-- at its conception-- included caching and push-based update notification and all the other smart features that would have prevented this sort of thing from becoming a problem now, [i]it would never have been adopted[/i], because the only reason RSS succeeded where the competing standards to do the sa
    • depends on your perspective. If I imagine myself to be a server, I'm pushing information to a client and pulling information from a client, like the name implies.

      you're interpreting it from the client perspective, which is not where the name came from.
    • by mveloso ( 325617 ) on Wednesday December 08, 2004 @09:58PM (#11038269)
      Well, RSS was simple, and everything you're talking about (caching, push-based update, etc) are application-level issues. Even though that stuff is defined in HTTP 1.1, it took years for HTTP 1.1 to come out.

      If the web started with HTTP 1.1, it would never have gone anywhere because it's too complicated. There are parts of 1.0 that probably aren't implemented very well.

      If you want to improve things, adopt an RSS reader project and add those features.
  • by IO ERROR ( 128968 ) <errorNO@SPAMioerror.us> on Wednesday December 08, 2004 @08:58PM (#11037887) Homepage Journal
    One thing that would help immensely is if RSS readers/aggregators would actually cache the RSS feed and not download a new copy if they already have the most current one. I could go through my server logs and point out the most egregious problem aggregators if anyone's interested.
    • I am interested in which aggregators you think are the worst offenders. the only true solution to the bandwidth problem, even with well behaved aggregators, is moving away from a polling framework. Syndication should be pubsub event based to solve the problem. Q.E.D.

      ----
      Dynamic DNS [thatip.com] from ThatIP.

    • This is a big problem and it would be substantially mitigated with such a simple solution.

      Following along the same line of reasoning, why not have the RSS reader send one request, and then changes are pushed to the reader after that? The reader can cache the change so if the user hits reload they get the most recent cache rather than hitting the server again.

      • >why not have the RSS reader send one request, and then changes are pushed to the reader after that?

        Well, they tried this way back when. I think they called it web casting. RSS is really just a lo-fi form of webcasting. You dont need to have any open ports on your machine, no special service running on the web server, just a flat file in the RSS format.

        Webcasting may replace RSS, but then we would probably have the opposite problem. "Why is slashdot slashdotting me!!"
      • Following along the same line of reasoning, why not have the RSS reader send one request, and then changes are pushed to the reader after that?

        The problem I see with that is network users who are behind firewalls. You can't very well push RSS data to them now, can you?
    • by gad_zuki! ( 70830 ) on Wednesday December 08, 2004 @09:11PM (#11037976)
      Sometimes you can't tell if you have the newest file, depending on the web server/config.

      The problem, is of course, server-side. For instance, the GPL blog software Word Press doesnt do ANY cacheing. Its RSS is a php script. So if you get 10,000 requests for that RSS, then you're running a script 10,000 times. That's ridiculous and poor planning. Other RSS generation is guilty of this crime.

      Yes, there is a plug in (which doesnt work at nerdfilter nor at the other wordpress site I run) and a savvy person could just make a cron job and redirect RSS requests to a static file, but that's all besides the point. This should all be done "out of the box." This is a software problem that should be addressed server side first, client side later.

      Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so. RSS really doesnt scale too well for big sites. What they should be doing is denying connections for IPs that hit it too often or change the RSS format to give server instructions like "Dont request this more than x times a day" in the header for the clients to obey. x would be a low number for a site not updated often and high for asite updated very often.
      • by IO ERROR ( 128968 ) <errorNO@SPAMioerror.us> on Wednesday December 08, 2004 @09:17PM (#11038024) Homepage Journal
        For instance, the GPL blog software Word Press doesnt do ANY cacheing.

        Technically true but misleading. WordPress allows user agents to cache the RSS/Atom feeds, and will only serve a newer copy if a post has been made to the blog since the time the user agent says it last downloaded the feed. Otherwise it sends a 304. This is in 1.3-alpha5. I dunno what 1.2.1 does.

        Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so.

        Not coincidentally, these are the egregious worst offenders I mentioned. Bloglines grabs my RSS2 and Atom feeds hourly, and doesn't cache or even pretend to. Firefox Live Bookmarks appears to cache feeds, but your aggregator plugins might not. I can't (yet) tell the difference from the server logs between Firefox and the various aggregator plugins.

        The best ones are the syndication sites that only grab my feeds after being pinged. Too bad I can't ping everybody. That could solve the problem if there was some way to do that.

      • Even if the file is generated on the fly, you still can avoid having to retransmit it by utilizing the etag and the if-none-match header. Basically this is a hash of the file contents that overrides the if-modified-since header. Simple solution: make wordpress generate an etag for the file and then compare it.

        Anyway, you're right, it's not a bandwidth issue, for the most part its a software issue. I'm tracking some weblogs for research and crawl the RSS feeds once a day. Most sites only update their fe
    • by maskedbishounen ( 772174 ) on Wednesday December 08, 2004 @09:15PM (#11038013)
      To some extent, this could be blamed on the feed itself. Ideally, it works like this..

      When you request the feed, you first get sent your normal HTTP header. If properly configured, it will return a 304 if you have the most recent version -- however, as many feeds are generated in PHP[1], this header is defaulted off, and you'll end up with your standard 200, or go ahead, code. This single handedly wastes a metric tonne of bandwidth needlessly.

      Even if you're trying to rape a feed, you'll only be wasting a few hundred bytes at most every half hour, than the whole 50K or whatnot size it is.

      See here [pastiche.org] for a more detailed explanation.

      [1] This is not a PHP specific issue; a lot of dynamic content, and even static content, fails to do this properly. But this is what it's there for, after all.
    • Everybody writing an RSS client or server script should read this [pastiche.org] and make it one of their main priorities.

      I imagine even more bandwidth could be saved if the next version of the RSS or ATOM standards mandated rsync support.
  • by RangerWest ( 711789 ) on Wednesday December 08, 2004 @09:00PM (#11037899)
    rsstorrent -- distributed rss,echoing bittorrent?
    • i don't think a bittorrent-type solution would work very well because bt was designed for the transfer of relatively large files. the fixed cost of having to negotiate a connection with peers would probably be larger than the tiny individual rss feeds themselves.
  • by WIAKywbfatw ( 307557 ) on Wednesday December 08, 2004 @09:02PM (#11037913) Journal
    What you're seeing right now are teething troubles. Nothing more, nothing less. The bandwidth and consumption experienced right now will be laughed off a couple of years from now as miniscule.

    Take the BBC News website for example. On September 11th 2001 its traffic was way beyond anything it had experienced to that point. Within a year or so, it was comfortably serving more requests and seeing more traffic every day. Proof if it was needed that capacity isn't the issue when it comes to Internet growth, and won't be for the foreseeable future.

    RSS is in its infancy. Just because people didn't anticipate it being adopted as fast as it has been that doesn't make it "doomed". By that rationale, the Internet itself, DVDs, digital photography, etc are all "doomed" too.
  • by zoips ( 576749 ) on Wednesday December 08, 2004 @09:04PM (#11037930) Homepage
    Instead of downloading the entire RSS feed every time, why not have aggregators indicate to the server the timestamp of the last time the RSS feed was downloaded, or the timestamp of the last item in the feed the aggregator knows about, and then the server can dynamically generate the RSS with only new content for that client. Increases processing load while reducing bandwidth, but processing time is what most servers have lots of, not to mention it's far cheaper to increase than bandwidth.
    • Because that's way to obvious to anyone with a second grade education ;)

      Everyone admits (now) that RSS was a really stupid protocol even as protocols go.

      Oh, and the timestamp thing adds far less processing overhead then the reduction of packets would save.
    • This is what conditional GET does. The client sends If-Modified-Since: $last_timestamp and If-None-Match: $e_tag_of_last_download headers and the server can respond with Not Modified as it sees fit.
      • That's not quite what he's talking about.

        If you previously had a copy of the feed with items A through X, and now A has dropped off and Y has been added, you'll pull the entire thing, B through Y, when all you really need is:

        remove A
        add Y, data follows

        Think more along the lines of "diff since $last_timestamp".
  • by benow ( 671946 ) on Wednesday December 08, 2004 @09:05PM (#11037931) Homepage Journal
    Asynchronous event driven models are the way to go for changing content. They're trickier to code, but require less bandwidth and are more responsive. Perhaps a bit of a privacy issue, at some level (registration with source), but easy to implement, failure resistent distributed asynchronous networks have much applicability, not just to RSS.


  • There we go. You now have version control.

    Keep copies of the RSS on the server for 30 days.

    http://www.mysite.com/requestfeed?myversion=2004 12 061753

    diff the new version from the old version. Send whats changed.

    How fucking hard is that people?
    • I think the biggest reason people are offering RSS feeds is because its a standard XML file on the webserver. No need to make additional scripts, no need to setup additional services -- just upload the XML file. When you start complicating the "Really Simple Syndication" model you start making it less simplistic. In my opinion the easiest way to limit bandwidth is to supply the XML file on servers that support gzip compression and the "Etag" header function. This way RSS readers will only download a compre
  • by dustinbarbour ( 721795 ) on Wednesday December 08, 2004 @09:06PM (#11037938) Homepage

    RSS feeds are meant as a way to strip all the nonsense from a site and offer easy syndication, right? Basically, present the relevent news from a full-fledged webpage in a smaller file size? If such is the case, this isn't an RSS issue, really. I see it more as a bandwidth issue. I mean, people are going to get their news one way or the other.. either with a bunch of images and lots of markup via HTML or with just the bare minimum of text and markup via RSS. I would prefer RSS over HTML any day of the week! But perhaps RSS makes syndication TOO simple. Thus everyone does it and that eats additional bandwidth that normally would be reserved for those browsing the HTML a site offers.

    And you could implement bans on people who request the RSS feed more than X times per hour as someone suggested (Doesn't /. do this?), but I don't think that gets around the bandwidth issue. I mean, those who want the news will either go with RSS or simply hit the site. Again, RSS is the preferred alternative to HTML.

    So here's my suggestion.. go to nothing but RSS and no HTML!

    • /. will temporarily block you if you pull the feed more than 48 times in a day or something like that. It works out to once a half hour.

      And excuse me, but lose HTML? The whole web as RSS feeds? You must be kidding. There's way too much content out there that simply can't be put into an RSS feed. It's static, it represents downloadable files, or documentation, or useless marketing hype, or whatever.

    • I wonder if advertising has anything to do with it - if you go to a news site just to see "what's up", you might get banner ads, google ads, so on and so forth - but RSS just makes a nice neat webpage for you or something similar.

      I have to point out how much I love "Sage", the Mozilla Firefox plugin for RSS - you can even rightclick on that XML thing that tries to tell you to save the page and bookmark it under "Sage Feeds" and then Alt-S and you have your RSS.

      I started using Sage for /., Groklaw, and a c
    • So here's my suggestion.. go to nothing but RSS and no HTML!

      Except it'll be an hour before someone implements images in RSS feeds, and then it's 1990 all over again.
  • Pop Fly (Score:5, Funny)

    by Anonymous Coward on Wednesday December 08, 2004 @09:07PM (#11037943)
    "Is RSS Doomed by Popularity?"

    "Is Instant Messaging Doomed by Popularity?"

    "Is E-Mail Doomed by Popularity?"

    "Is Usenet Doomed by Popularity?"

    "Is The Internet Doomed by Popularity?"

    "Is Linux Doomed by Popularity?"

    "Is Apple Doomed by Popularity?"

    "Is Netcraft Doomed by Popularity?"

    "Is Sex with Geeks Doomed by Popularity?" :)
  • Solutions (Score:5, Informative)

    by markfletcher ( 612245 ) on Wednesday December 08, 2004 @09:07PM (#11037955) Homepage
    There are several ways to mitigate the bandwidth issues. First, all aggregators should support gzip compression and the HTTP last-modified and etags headers. That'll take care of a lot of the problems. The other solution is to get people to use server based aggregators, like Bloglines [bloglines.com], which only fetch a feed once per iteration, regardless of how many subscribers there are. As a bonus, there are several things that server-based aggregators can do that desktop based aggregators can't do, like provide personalized recommendations. I like this solution, but of course I'm biased since I'm the founder of Bloglines. :)
    • Bloglines is quite good, and I appreciate that it's very chummy with Firefox, but I'm not 100% satisfied with it. I wish I could articulate what bugged me about it (especially to the founder, heh), but I find it's slower for me to check bloglines than it is to just swoop through my bookmarks every once in a while.

      By far the best RSS experience I've had has been with the Konfabulator RSS widget, which pops up when it finds a new entry and hides away when there's nothing new. It's elegant and simple. Blog
  • A simple fix (Score:3, Informative)

    by jd ( 1658 ) <imipak@ y a hoo.com> on Wednesday December 08, 2004 @09:08PM (#11037956) Homepage Journal
    What you have is a large number of subscribers accessing a common data source at or around the same time. The simplest fix would be to have a reliable multicast version of RSS, which is broadcast to all subscribers to that feed. Then, you only have to transmit the updates once. The network would take care of it from then on.


    New subscribers would receive the initial copy of the feed via traditional unicast TCP, because that would be the least CPU-intensive way of handling a few requests at a time.


    A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.


    The main problem with the multicast solution is that although multicasting is enabled across the backbone, most ISPs disable it - for reasons known only to them, because it costs nothing to switch it on. Persuading ISPs to behave intelligently is unlikely, to say the least.

    • A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.

      Maybe the caching is just so damn good that you don't notice it. UIUC has a transparent web cache that I doubt 90% (of ~8,000) of the dorm students rea

  • Although the caching solution seems intriguing, the onus should really be on the aggregator authors to do at least local caching for RSS access between "refresh" intervals and even better, use HTTP conditional GETs. It's also important to use sane default "refresh" intervals and constraints.

    During our product's [mesadynamics.com] development, our debugging refresh interval was 5 minutes and hardcoded to Slashdot. As you can imagine, it didn't take us long to discover Slashdot's unique banning mechanism -- it woke us up to
  • Solved, move on (Score:4, Informative)

    by Jeremiah Blatz ( 173527 ) on Wednesday December 08, 2004 @09:13PM (#11037991) Homepage
    Shrook [fondantfancies.com] for the Mac has already solved this issue with "distributed checking". Popular sites are checked once every 5 minutes, if the site is updated, everyone gets the latest content, otherwise, nobody touches it.

    As another poster has pointed out, banning users who check too frequently is an excellent fallback. A tiny site won't know to install the software, but it won't be an issue for a tiny site.

  • by Spoing ( 152917 ) on Wednesday December 08, 2004 @09:15PM (#11038007) Homepage
    Or, is in the works now on Dave Slusher's Evil Genius Chronicles [evilgeniuschronicles.org] Podcast. [Podcasts = RSS subscrition feeds for time shifted radio blogging.]

    The Podcasters need it too. I'm subscribed to a couple dozen feeds and have well over 4GB of files in my cache right now.

    The biggest problem with Bittorrent and podcasts is that the RSS aggregators needs to be Bittorrent aware. Unfortunately, few are.

  • Bittorrent (Score:3, Insightful)

    by Jherek Carnelian ( 831679 ) on Wednesday December 08, 2004 @09:16PM (#11038022)
    Seems like bittorrent, or a bittorrent-alike protocol would be useful here. Turn the RSSfeed into a tracker/seed and then all it has to keep track of is who has the latest version of the content and it could redirect feeders to each other, always preferring the latest updated version. Eventually, you will have the same scaling problems that bittorrent has (single tracker), but at least you stretch things out a few months or a year until a better solution ocomes around.
    • How big are RSS files normally? I'd be surprised if the bandwidth involved in tracking and coordinating a whole bunch of clients would be significantly less than the RSS itself.

      By the time you've told a client to "go an ask these other clients" you may as well have just sent it the RSS file.
  • Coral [slashdot.org]ized feeds ought to help sites with lots of subscribers. One nice thing about RSS traffic, I suspect, is that it is less bursty than requests for web pages. You can plan for traffic as the number of subscribers to your feed increases vs. your web pages getting /.'d.

    Ex: Slashdot RSS via Coral [nyud.net]

  • by Trepidity ( 597 ) <delirium-slashdot@@@hackish...org> on Wednesday December 08, 2004 @09:21PM (#11038057)
    When I want updates from sites, I subscribe to an email feed, and stick it in its own mailbox. I agree that some standardized format and display would be nice, but you can send XML over email too, so what's needed is a reader that I can point to an IMAP mailbox full of XML mails.

    An alternate approach would be to do the same thing with a news server. Why keep refreshing a feed for updates instead of letting it notify you when it has updates?
  • As RSS [becomes] more known to the mainstream users and press, the bandwidth issue reported by many sites . . . related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but [will it] be enough . . . ?

    Slashdot user GaryM [slashdot.org] posted a related question elsewhere [advogato.org] about 20 months ago. At that time, in that forum, commenters dismissed his proposed solution, the use of NNTP, on the grounds that NNTP is deficient, but others continue to see NNTP as a possible solution [methodize.org] nevertheless.

  • Solution! (Score:3, Funny)

    by Quixote ( 154172 ) on Wednesday December 08, 2004 @09:37PM (#11038151) Homepage Journal
    I have an idea... let's start a company which pushes data to the consumers... from a central point. We'll call it "pointcast".

    Now if only they'd bring back the $$$ from the mid 90s too.... :)

    • Bless you, Quixote, for stirring these passions so long laid dormant in my loins. I thought Pointcast was lost to us forever.
    • Pointcast never actually pushed - it was periodic HTTP polling, just like RSS. The difference is that Pointcast responded with entire articles with graphics, whether or not they were going to ever be read, while RSS typically responds with only headline and summary, with a link to the actual content.
  • by NZheretic ( 23872 ) on Wednesday December 08, 2004 @09:50PM (#11038228) Homepage Journal
    One solution would be to use an existing infrastructure that was built for flood filling content - the Usenet news server network.

    Create a new first level domain ( like alt, comp, talk etc ) named "rss" and use an extra header to identify the originating rss feed URL. The latter header could be used by the RSS/NNTP reader to select which article bodies to download and to verify each RSS entry to identify fake posts.

  • I see some people talking about bit torrent like networks for rss feeds. Really, why not just zip it. If it were normal text, I'd expect to shrink file size down to 10% but we're talking about XML which has a lot more redundancy.
  • by MS_leases_my_soul ( 562160 ) on Wednesday December 08, 2004 @09:54PM (#11038247)
    This still baffles me. BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.

    A content creator (say Slashdot) has webpages and it has an RSS feed. They create a torrent for each page. They sign the RSS file and each torrent (and its content) with a private key. They post their public key on their homepage.

    Now, you can cache the RSS file on other sites that support you yet the users can still be confident that it really came from you. Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first. When the page loads locally in your browser, it could still go out and get ads if you are an ad sponsored site.

    If you are a popular site and have a "fan base", you should have no problem implementing something along these lines. If you are a site that has these problems, you are probably popular and have a fan base. Given the right software and the buy-in from users, the problem solves itself.
    • by Jerf ( 17166 ) on Wednesday December 08, 2004 @10:36PM (#11038490) Journal
      BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.

      No, it "can't". Or at least, it can't serve it with any benefit. Tracker overhead swamps any gains you might make. BitTorrent is unsuitable for use with small files, unless the protocol has radically changed since I last looked at it. In the limiting case, like 1K per file, it can even be worse or much worse than just serving the file over HTTP.

      Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first.

      Oh, here's the problem, you don't know what you're talking about or how these technologies work. When an RSS file has been retrieved, there is nothing remotely like "get the webpage" that takes place in the rendering. The images are retrieved but those are typically too small to be usefully torrented too.

      Regretably, solving the bandwidth problem involves more than invoking some buzzwords; when you're talking about a tech scaling to potentially millions of users you really have to design carefully. Frankly, the best proof of my point is that it was as easy as you say it is, it'd be done now. But it's not, it's hard, and will probably require a custom solution... which is what the article talks about, coincidentally.
  • Main issue is that some clients check far too many times the site or download whole content without checking the change time. ban those and you will be fine. same kind of issue dyndns.org or some other site was having with linksys clients.
  • Why does everybody seem to feel the need to have their last 20-25 posts in their feed? It's just going to mean wasted bandwidth, especially for websites that update infrequently. I'd say the last five posts would be sufficient for most weblogs and 10 for news sites like Slashdot and The Register.

    Feed readers are the other issue. Many set their default refresh to an hour. I use SharpReader which has an adequate 4 hour default. I adjust that on a per feed basis. Some update once per day, and that's all I nee

  • many many bandwidth issues that were HTML related were solved by incorporating proxys between the viewers and servers, I fail to see why you couldn't do the same with index.xml
  • by pbryan ( 83482 ) <email@pbryan.net> on Wednesday December 08, 2004 @10:06PM (#11038320) Homepage
    I'd be interested in seeing how many of these hits are for complete feeds rather than If-Modified-Since the last time it was downloaded. I suspect that if the RSS readers were behaving like nice User-Agents, we wouldn't see such reports.

    Perhaps particularly offending User-Agents should be denied access to feeds. If I saw particular User-Agents consistently sending requests without If-Modified-Since, I'd ban them.
  • What's the problem here, is everybody loading the full feed each time?

    Wouldn't a client include a If-Modified-Since HTTP header in the GET request?

    We're talking 200 bytes for a not-modified query.

    Is it these 200-some-odd byte requests that people are complaning about?
  • Does Atom use significantly more or less bandwidth than RSS?
  • Why is there such concern over accessing RSS feeds as opposed to accessing the web site? Take for instance Slashdot: as of this writing, the main page is 65K and the RSS feed is 14K. Isn't this the case for most websites? So why the big fuss? If people are continuously refreshing the RSS feed, at least less bandwidth is consumed than if they were continuously refreshing the main page.

    ... Or is this one of those things where geeks have become so enamored with the technology that they go completely over
  • corporate caching (Score:3, Insightful)

    by chiph ( 523845 ) on Wednesday December 08, 2004 @10:25PM (#11038424)
    I wouldn't doubt that eventually someone will build a RSS caching device & sell it to the corporate market. Given how big a drain as RSS is to the supplier, the corporate market has the money and determination not to permit it to become a problem for them.

    Chip H.
  • There's an even simpler solution: pregenerate your RSS feed. Whenever info that you use to generate your RSS changes (e.g., you add a blog entry), regenerate a static file. This is no big deal; if anyone asked for the info, you'd have to generate this anyway. Then serve the static file.

    This gets a lot of caching behavior automatically.

  • One thing that would help is if people would stop stupidly going gaga over data wrappers that contain multiple times more metadata than data. XML (which RSS really is) and related technologies are blatantly dumb ideas - both in terms of reinventing a wheel that never needed to be invented in the first place, and in terms of being a massive waste of bandwidth.
  • Compression (Score:3, Insightful)

    by yem ( 170316 ) on Thursday December 09, 2004 @12:30AM (#11039243) Homepage
    I assume the complainers are using it?

    51894b boingboing.rss.xml
    17842b boingboing.rss.xml.gz

  • Comment removed (Score:3, Insightful)

    by account_deleted ( 4530225 ) on Thursday December 09, 2004 @12:47PM (#11043065)
    Comment removed based on user account deletion

Our OS who art in CPU, UNIX be thy name. Thy programs run, thy syscalls done, In kernel as it is in user!

Working...