Forgot your password?
typodupeerror
The Internet

Varnish Author Suggests SPDY Should Be Viewed As a Prototype 136

Posted by Soulskill
from the final-version-needs-lasers-don't-ask-why dept.
An anonymous reader writes "The author of Varnish, Poul-Henning Kamp, has written an interesting critique of SPDY and the other draft protocols trying to become HTTP 2.0. He suggests none of the candidates make the cut. Quoting: 'Overall, I find the design approach taken in SPDY deeply flawed. For instance identifying the standardized HTTP headers, by a 4-byte length and textual name, and then applying a deflate compressor to save bandwidth is totally at odds with the job of HTTP routers which need to quickly extract the Host: header in order to route the traffic, preferably without committing extensive resources to each request. ... It is still unclear for me if or how SPDY can be used on TCP port 80 or if it will need a WKS allocation of its own, which would open a ton of issues with firewalling, filtering and proxying during deployment. (This is one of the things which makes it hard to avoid the feeling that SPDY really wants to do away with all the "middle-men") With my security-analyst hat on, I see a lot of DoS potential in the SPDY protocol, many ways in which the client can make the server expend resources, and foresee a lot of complexity in implementing the server side to mitigate and deflect malicious traffic.'"
This discussion has been archived. No new comments can be posted.

Varnish Author Suggests SPDY Should Be Viewed As a Prototype

Comments Filter:
  • by scorp1us (235526) on Friday July 13, 2012 @09:46AM (#40638307) Journal

    Parsing a HTTP session with multi-part mime attachments using chunked encoding is murderous. Now true, many people don't have to worry about this, but the fact is the protocol leaks like a sieve. For instance, you can't send a header after you've entered the body of the HTTP session. You can't mix chunked-length encoded elements with fixed content-length elements with HTTP1.1. Once you've sent your headers and encoding, you're screwed. The web has a solution - AJAX, but then you need JavaScript.

    I'd be all for something new. I'd suggest base it on XML with a header section and header-element to get the transfer started then accept any kind of structured data including additional header elements. With this, you can still use HTTP headers for back-wards compatibility, but once recognized as "HTTP 2.0" the structured XML can be used to set additional headers, etc. With the right rules, you can send chunks of files or headers in any arbitrary order and have them reconstructed.

    • by Skapare (16644) on Friday July 13, 2012 @09:53AM (#40638383) Homepage

      If you substitute JSON (or something like it with equal or better simplicity) for XML, then I might go along with it.

      • by spike2131 (468840) on Friday July 13, 2012 @10:11AM (#40638549) Homepage

        I love JSON, but XML has the advantage of being something you can validate against a defined schema.

        • by Skapare (16644)

          And what do you do when something does not validate? Kick the guy who typed it in manually? Oh wait, what if it was generated by a program?

          The whole schema thing in XML is one of the things that makes it suck. Just write the data correctly in the first place and discard anything that doesn't make sense to the application.

          • by jimmifett (2434568) on Friday July 13, 2012 @11:00AM (#40639055)

            Ideally, you give the schema to the other side and they can validate the message before sending to you, catching possible errors there. You validate against same schema on your side as a safety net to week out junk data and messages from users that don't validate. It also allows you to enforce types and limitations on values in a consistent manner.

            JSON is good for quick and dirty communications when you are both the sender and the consumer of messages and can be lazy and not care too much about junk data.

            Both have their uses, but you have to know when to use which.

            • Except that it is impossible to design a validation scheme that covers all useful cases without resorting to designing a programming language.

              And when you get to that point, why not just write the application code to validate in the first place? Why is it so hard to write a "schema validation" for JSON data? The fact that the designers of JSON didn't overengineer the feature into the spec doesn't mean it's hard to do....

          • When it doesn't validate, you reject it. Or, in the case of a replacement for an "extensible" protocol, you do something more subtle - such as, accept something which is well-formed XML but contains unrecognised tags, by skipping over the unrecognised tags. Much as is done in HTML itself.

            Once you've written a few programs which accept data from the public internet, you come to greatly appreciate the value of protocols whose syntax is easy to parse, and whose semantics are simple to understand. The simple

          • The whole schema thing in XML is one of the things that makes it suck

            because...

            Just write the data correctly in the first place

            which you can't count on from Internet clients

            and discard anything that doesn't make sense to the application

            Which is what schema validation does for you (securely) without having to write any code.

        • JSON can also be validated against a schema, where the schema consists of a JavaScript file implementing isValid(parsed_object).
          • That seems great, but then to validate against the schema you must have a full JavaScript interpreter (or almost full, depending on how much you're willing to restrict what can be used in the isValid function). Not to mention that, this being JavaScript, a lot of schemas would end up being a mess, which would defeat half of the purpose of a schema -- being a human-readable documentation of the data format.

            Schema validation is a very clear example of a situation where it's not good to have a Turing-complete

            • half of the purpose of a schema -- being a human-readable documentation of the data format

              That purpose can be achieved with English.

              Schema validation is a very clear example of a situation where it's not good to have a Turing-complete language.

              If you specifically don't want something Turing complete when processing XML, then why do XML fans use XSLT despite its being Turing complete [unidex.com]?

              • half of the purpose of a schema -- being a human-readable documentation of the data format

                That purpose can be achieved with English.

                Sure, but then you have to write the schema AND document it. This can (and does) lead to documentation being out of sync with the code.

                If you specifically don't want something Turing complete when processing XML, then why do XML fans use XSLT despite its being Turing complete [unidex.com]?

                XSTL is a completely different story; it's used to transform XML, not to validate it (which is what XML Schema does). For that, having the flexibility of a Turing-complete language is a good thing (that said, XSLT is still a pain in the ass to use, regardless of being Turing-complete).

                • but then you have to write the schema AND document it. This can (and does) lead to documentation being out of sync with the code.

                  It's not much different from a C# implementation of a mobile application for Windows Phone 7 falling out of sync with the Objective-C implementation of the same application for iOS. Or what am I missing?

                  • Nothing, but wouldn't it be great if there was a way to use the same code in both places? XML Schema and the JSON schema proposal I linked before work like that -- a schema is used to validate documents and is at the same time readable as documentation for the document format.

              • by ultranova (717540)

                half of the purpose of a schema -- being a human-readable documentation of the data format

                That purpose can be achieved with English.

                I have never once seen documentation written in English that didn't leave something important unclear.

                • by tepples (727027)
                  Likewise, I've seen XML Schemas that don't completely specify all valid values for a given element. And you still need to consider that XML is much bigger on the wire than JSON. Besides, how does XML's distinction between an "attribute" and a "child text node" map onto the built-in data structures of a programming language better than JSON maps to Java Lists and Maps (or Python lists and dicts, or Perl arrays and associative arrays, or PHP arrays and arrays containing noninteger keys)?
                • I have never once seen documentation written in English that didn't leave something important unclear.

                  so... hindi or russian might be better, you think?

                  how about german? except that the text size goes 30% larger due to the words being a mil^Wkilometer long.

    • by tigre (178245)

      Did you really just say XML?

    • by dave420 (699308)
      XML? Cute.
      • by Skapare (16644) on Friday July 13, 2012 @10:31AM (#40638759) Homepage

        s/Cute/Ugly/

        XML is for marking up documents, not serializing data structures.

        Now suppose we make HTTP based on XML. During the HTTP header parse, we need the schema. Fetch it. With what? With HTTP. Now we need to parse more XML and need another schema we have to get with HTTP which then has to be parsed ...

        XML is not for protocols. JSON is at least more usable. Some simpler formats exist, too.

        • by i_ate_god (899684)

          and whats wrong with Key: Value; Value; Value anyways?

          • by Carewolf (581105)

            Nothing, what is wrong with the MIME syntax used in HTTP?

            The actually implementation and use may suck, but it could be cleaned to something more consistent without throwing everything else away as well.

            Btw, You get almost all the speedup SPDY provides by just using HTTP 1.1 and pipelining. Only reason it is not done more is because it is hard to predict if it will be supported probably, but you could make that a requirement for HTTP 1.2 for instance, solving the problem.

    • Re: (Score:3, Informative)

      by laffer1 (701823)

      XML is too big. If anything, we need to compress the response not make it ten times larger. The header thing can be annoying at times, but it's important to know what you're going to send the client anyway. You must figure it out by the end of the document, why not at the beginning? Many files have a header including shell scripts, image files, BOM on XML documents or even the xml declaration. it's common in the industry.

      AJAX doesn't solve the real problem. If anything it necessitates making responses

      • by DamonHD (794830)

        "You must figure it out by the end of the document, why not at the beginning?"

        Because in many reasonable cases you don't know the final outcome when you've produced the first byte of the response, for example streamed on-the-fly-generated pages possibly with on-the-fly gzip encoding. The user gets to see useful output sooner, and the server can more easily cap peak resources, by streaming/pipelining and lazy eval. Like SAX rather than DOM.

        Rgds

        Damoner

    • by Viol8 (599362) on Friday July 13, 2012 @10:33AM (#40638777)

      As a static data format its just about passable, but as a low overhead network protocol??

      Wtf have you been smoking??

      • by Skapare (16644)

        XML is for marking up documents. Our problem with HTTP is that it is stuck in the legacy document model. Today we need streams, and optimization of sessions. XML would just be the markup of documents we might want to choose to fetch over those streams. Notice that audio/video/media containers are not based on XML, and never should be.

      • by Rob Riggs (6418)

        Wtf have you been smoking??

        My guess: java beans.

    • by luis_a_espinal (1810296) on Friday July 13, 2012 @11:01AM (#40639079) Homepage

      I'd suggest base it on XML with a header section and header-element to get the transfer started then accept any kind of structured data including additional header elements.

      Haven't we learned enough already from industrial pain to stay away from XML? JSON, BSON, YAML, compact RELAX NG, ASN.1, extended Backus-Naur Form. Any one of them, or something inspired by any (or all) of them, that is compact, unambiguos (there should be only one canonical form to encode a type), not necesarily readable, possibly binary, but efficiently easy to dump into an equally compact readable form. Compact and easy to parse/encode, with the lowest overhead possible. That's what one should look for.

      But XML, no, no, no, for Christ's sake, no. XML was cool when we didn't know any better and we wanted to express everything as a document... oh, and the more verbose and readable, the better!!(10+1). We really didn't think it through that much back then. Let's not commit the same folly again, please.

      • XML has many good uses.

        This is not one of them.
        • Re: (Score:2, Funny)

          by Anonymous Coward

          XML has many good uses.

          It's just that none of them involve computers.

        • XML has many good uses. This is not one of them.

          Any text encoding has many good uses. And XML many good uses stem from the fact that... it is used, not because of its intrinsic qualities.

          There is a reason why configuration files are moving away from XML. There is a reason why over-http data exchange protocols and RPC/messaging mechanism are moving away from XML (or at least from WS-*). It was just a stupid pipe dream to represent everything as a document. ZOMG, HTML is just SGML, so the next evolutionary step... for everything... must be.... (cue drum

          • XML has the advantage of simple parsing. For example, I made a NZB fetcher - libxml did all the hard work, and even without that I could have extracted all the information I needed with nothing but a few regexes. It's ideal for that - it needs to hold only text data, and the object it stores is itsself made up of smaller objects made of smaller objects. So XML does have it's uses, but I agree that it is overused. For that matter, so are databases in general - how many applications do you find using a databa
    • But really any format that can express structured data is endorsed by me. I do not have a problem with JSON, in fact it is my 2nd favorite. My first favorite is Python's style, which is very, very close to JSON. But JSON has the advantage that web people already know it.

      Please don't get bogged down with XML, I wrote XML into my post because despite what you all think, it's not that bad to parse, provided that you use a stream-reader style rather than SAX or DOM. The other reason why I wrote XML is because i

    • by Darinbob (1142669)

      XML is a ridiculous format for this. It is bulky. It is intended to be human readable which means it is much llnger than protocols intended for machine readability. I can't figure out why everyone seems to think XML is the magic bullet to use everywhere. You dont neeed schemas for this, and if you did XML's method is really rotten anyway.

    • Some headers definately have to come first, since they pretty much indicate what the rest even means. But ETags for example... it would be wonderful to be able to send that stuff last... ... but I don't think I'd be willing to cram fucking XML into packets just for that (not that there is any connection between the two, referring to the root of this thread). Fuck human readability, seriously... it has it's place but it's also highly, and mindlessly, overrated. Define it well and compact, and then you can st

  • He says:

    For instance identifying the standardized HTTP headers, by a 4-byte length and textual name, and then applying a deflate compressor to save bandwidth is totally at odds with the job of HTTP routers which need to quickly extract the Host: header in order to route the traffic, preferably without committing extensive resources to each request. ...

    It seems to me that routing based on header is doing entirely the wrong thing. In any case, according to wikipedia [wikipedia.org]:

    TLS encryption is nearly ubiquitous in SPDY implementations

    Which rather makes routing on content infeasible (OK you can forward route behind the SSL endpoint, but this doesn't seem to be what he's talking about)

    • by Mad Merlin (837387) on Friday July 13, 2012 @09:58AM (#40638427) Homepage

      TFA is talking about in reverse proxies (of which Varnish is one of many), which are very commonplace. In fact, you're seeing this page through (at least) one, as Slashdot uses Varnish.

      • by Chrisq (894406)

        TFA is talking about in reverse proxies (of which Varnish is one of many), which are very commonplace. In fact, you're seeing this page through (at least) one, as Slashdot uses Varnish.

        Publicly cached data is outside SPDY's use-case. It is aimed at reducing latency [chromium.org], and its main target is rich "web application" pages. Now it may well be possible to design a protocol that supports caching as well as reduced latency, but this is not what SPDY was designed to do.

        • Delenda est. (Score:3, Insightful)

          by Anonymous Coward

          Then it cannot replace HTTP and should be withdrawn, or it's been wrongfully sorted in under "HTTP/2.0 Proposals [ietf.org]"

          The IETF HTTPbis Working Group has been chartered to consider new work around HTTP; specifically, a new wire-level protocol for the semantics of HTTP (i.e., what will become HTTP/2.0), and new HTTP authentication schemes.

          • by Chrisq (894406)

            Then it cannot replace HTTP and should be withdrawn, or it's been wrongfully sorted in under "HTTP/2.0 Proposals [ietf.org]"

            The IETF HTTPbis Working Group has been chartered to consider new work around HTTP; specifically, a new wire-level protocol for the semantics of HTTP (i.e., what will become HTTP/2.0), and new HTTP authentication schemes.

            Good point - unless there are particular reasons that a "niche protocol" for highly interactive sites is better than a general purpose one then a replacement that covers all uses should be covered. In fact I have come round to agreeing with TFA: "SPDY Should Be Viewed As a Prototype"

          • by tibman (623933)

            Isn't it a superset?

        • by Skapare (16644)

          It's more than just caching, these days. It's also about sending the requests to the appropriate server. For example, if you can send the requests of a logged in user to the same server or group of servers, it's easier to manage session state (each of 10000 servers holding 400 session states, instead of 10000 servers having to access a centralized store of 4000000 session states).

          One thing a new protocol could do to better manage that is, after session authentication, tell the client another IP address an

    • Routing based on header is the kind of thing you'd do in an accelerator proxy. You receive the request, look at the headers and perform actions based on those headers. Forwarding the request on to another host is an example of routing.

    • by kasperd (592156)

      It seems to me that routing based on header is doing entirely the wrong thing.

      But that is something you need to support as long as multiple domains are hosted on the same IP address. Lots of things gets easier if you can have a separate IP address for each domain you want to host. But there has been a shortage of IP addresses.

      However there is a solution. You just have to move to IPv6, then you will no longer have a shortage on IP addresses. So what if some people find themselves in a situation where they

      • by jbolden (176878)

        That's a really good idea! Make the HTTP shift coordinate with the IPV4/IPV6 shift and then we can assume 1 domain per IP. I'm having a tough time seeing how that breaks down. Any mods out there should mod you up for best idea of the day.

      • by toejam13 (958243)

        It seems to me that routing based on header is doing entirely the wrong thing.

        But that is something you need to support as long as multiple domains are hosted on the same IP address.

        In the load-balancing world, this is known as "Layer 7 routing" and it is quite a handy feature. It also goes well beyond just HTTP HOST headers. The User-Agent header is probably the most useful as you can route clients based on browser type or version, operating system and language. I use this one a lot for forwarding clients to a web_css_pool and web_nocss_pool (looking at you, IE6).

  • So, because you would have to design new security tools and think a different way in order to make it sure, does that make it flawed? Does this mean we are no longer free to innovate unless it fits into some mold? That is just stupid. If someone comes up with a new way of doing things, put on your REAL security hat and come up with a way to secure it, don't just spread FUD about how it is BAD!!
    • so you have the trillion pounds of latium to upgrade all of the routers on the internet? Good, then give me 5000 pounds worth so I can finally get the damn router/file/printer server for my household and provide the 100 million pounds for my ISP to get off their asses and upgrade to Docis3 and IPv6 tomorrow or STFU and Get off my lawn

  • by Skapare (16644) on Friday July 13, 2012 @10:11AM (#40638547) Homepage

    Much of what the web has become is no longer fitting the "fetch a document" model that HTTP (and GOPHER before it) are designed to do. This is why we have hacks like cookie managed sessions. We are effectively treating the document as a fat UDP datagram. The replacement ... and I do mean replacement, for HTTP, should integrate the session management with it, among other things. The replacement needs to hold the TCP connection (or better, the SCTP session), in place as a matter of course, integrated into the design, instead of patched around as HTTP does now. With SCTP, each stream can manage its own start and end, with a simpler encryption startup based on encrypted session management on stream 0. Then you can have multiple streams for a variety of serviced functions from nailed up streams for continuous audio/video, to streams used on the fly for document fetch. No chunking is needed since it's all done in SCTP.

    • That would be great, ideally (aside from maybe problems with it being absurdly over-engineered).

      But it would be really hard to make it catch on. You'd need to manage support for 'traditional' HTTP and the new protocol in all clients, servers, *and* web applications. Because do you really think Microsoft would backport support into old versions of Internet Explorer that people are still using for some god-unknown reason?

  • If someone proposed HTTP today, it wouldn't pass muster by these experts either. And I doubt that any of these new protocols really would make much of a difference anyway. The infrastructure has been built around HTTP, everybody knows how to compress it and everybody knows how to deal with the kind of multiple connections that it requires. If anything additional is really needed, it could be expressed as hints to the server and the intermediate infrastructure without starting from scratch.

    • by Skapare (16644)

      SCTP sessions give you multiple streams to do anything you want in them. And once you have encryption established in stream 0, a simple key exchange is all that is needed encrypt the other streams. You can do fetches in some streams while others are doing interactive audio/video streaming. And that's all done within one session as the network stack, and session routers, see it.

    • by Viol8 (599362)

      "If someone proposed HTTP today, it wouldn't pass muster by these experts either."

      And with good reason. Berners-Lee might have invented the web as we know it but like all first attempts (yes I know about hypercard and all the rest , they weren't networked!) it could really do with some serious improvement. Unfortunately the best solution would be to bin it and start again but its way to late for that so its make do and mend which almost always ends up in a total mess. Which is we what we have today.

    • by jandrese (485) <kensama@vt.edu> on Friday July 13, 2012 @11:14AM (#40639221) Homepage Journal
      The flipside of this is that a lot of the proposals to replace HTTP suffer badly from the second system effect, where the protocol designer decides to add proper support for all of the edge cases and ends up with a protocol that is gigantic and difficult to implement.
    • by jbolden (176878)

      New protocols don't go through committees they just happen. That's the great thing about using a generic TCP/IP or UDP/IP base. New protocols prove themselves by finding a market; protocol revisions prove themselves by finding a consensus.

  • This is one of the things which makes it hard to avoid the feeling that SPDY really wants to do away with all the "middle-men"

    Half the human race is middle-men, and they don't take kindly to being eliminated.

  • Wouldn't it be better to have the browser support zip/tarball path.

    Now

      would look thru the zip file.

    I suppose there could be some security issues here, but it seems like it would be easier than chunking protocols if not much faster.

    Further ...

    Now we've got cached apps as well.

    • by raynet (51803)

      Nah, it would be much better if we could use rsync:// instead of http:/// [http] it would handle nicely partial downloads, compression, slightly changed files etc.

  • SPDY is encrypted by design. There is no option for middle-men, and frankly, that is the way I like it myself, as i would assume most people. I don't like when devices mess with my traffic.

    As for most of the other complaints - given than Google is running SPDY just fine on all of it's servers, and they're basically one of the largest (if not the largest) hosts on the internet, I think they are all strawmen. If it is working for Google then it will work for others.

    My experience using SPDY, as a user, is nothing short of spectacular. The performance gains in on Google properties with SPDY are incredible and very noticeable.

    • Forgive my ignorance, but does the encrypted nature of SPDY along with no option for middle-men mean that ad blockers would no longer be possible? That is, do browser plug-ins effectively function as middle-men?
    • Ahh... so google properties have converted to this ..

      I wondered why my browser turns to crap and hangs on google so often.

      That's assume the sites work at all -- so far, google groups has gone completely dark for me... nothing comes up but a input line asking for
      groups... but nothing will come up... all javascript enabled, and nothing blocked, yet it doesn't work anymore...

      You might look at your assumptions about how well it works...

      Lemme guess your browser -- 'Chrome'?

  • The problem all of these HTTP 2.0 proposals are trying to work around is the fact that each resource fetched by the web browser is handled via a separate connection. By combining these elements into a single (compressed) stream you can save a TON of overhead. This is why sites that use nothing but data::URI images load so much faster--even--than sites using the fastest CDNs. These 'solutions' are just workarounds to the crap that is HTTP 1.1.

    Of course, the problem with data::URIs is that they can't be ca

    • by Carewolf (581105)

      What overhead, where? You are confusing several issues. One of the reason SPDY sucks is because it still uses TCP like HTTP does. Using HTTP over SCTP would be a great improvement.

      The problem with not using TCP though is that you no longer get the well-supported encryption from TLS for free anymore.

    • by DamonHD (794830)

      CDNs will still exist to be (a) high-bandwidth and (b) low-latency close-to-the-user commodity servers of large data volumes.

      A change of protocol won't eliminate the limitation of light speed and long-distance comms networks.

      Rgds

      Damon

  • For those who do read the article and didn't understand what the debate was about. Here is a good slide show from google about the advantages of SPDY. Which also explicate the issues in "HTTP routers" in the article: http://www.slideshare.net/bjarlestam/spdy-11723049 [slideshare.net]

The study of non-linear physics is like the study of non-elephant biology.

Working...