Forgot your password?
typodupeerror
The Internet Google Microsoft Technology

Google, Microsoft Cheat On Slow-Start — Should You? 123

Posted by Soulskill
from the reply-hazy-ask-again dept.
kdawson writes "Software developer and blogger Ben Strong did a little exploring to find out how Google achieves its admirably fast load times. What he discovered is that Google, and to a much greater extent Microsoft, are cheating on the 'slow-start' requirement of RFC-3390. His research indicates that discussion of this practice on the Net is at an early, and somewhat theoretical, stage. Strong concludes with this question: 'What should I do in my app (and what should you do in yours)? Join the arms race or sit on the sidelines and let Google have all the page-load glory?'"
This discussion has been archived. No new comments can be posted.

Google, Microsoft Cheat On Slow-Start — Should You?

Comments Filter:
  • by js3 (319268) on Friday November 26, 2010 @03:03PM (#34351766)

    When the competition starts crying you know someone is doing something right. Is it just me or has there been a lot of crying lately

  • Re:Misread the RFC (Score:5, Insightful)

    by Lunix Nutcase (1092239) on Friday November 26, 2010 @03:03PM (#34351768)

    No, this was just kdawson trying to fill his FUD quota for the day. He's a little behind.

  • Re:Misread the RFC (Score:5, Insightful)

    by da cog (531643) on Friday November 26, 2010 @03:10PM (#34351816)

    Yes, and for a post complaining about cheating I am mildly annoyed that he himself cheated his way around my "filter all posts made by editor kdawson" setting by submitting his story as a normal user and then getting another editor to post it.

  • Re:Misread the RFC (Score:4, Insightful)

    by Lunix Nutcase (1092239) on Friday November 26, 2010 @03:13PM (#34351840)

    He probably knows he's being filtered by more and more people.

  • Seems to me... (Score:3, Insightful)

    by 91degrees (207121) on Friday November 26, 2010 @03:13PM (#34351842) Journal
    This is reliable. It is comaptible with the spec (otherwise it wouldn't be reliable), and it's faster.

    I don't think it matters whether Google "cheats" or not. I and they both want me to get the data as quickly as possible. Strict adherence to the guidelines doesn't matter to either of us and doesn't affect anyone else.
  • by Anonymous Coward on Friday November 26, 2010 @03:15PM (#34351862)

    To understand the relevance of this: The slow-start protocol/algorithm is meant to avoid a situation where many packets are put on the wire which will never be received due to congestion somewhere along the path. Such packets create unnecessary network load (they're transported all the way to the choke point and then they're discarded, so they have to be retransmitted.) The referenced RFC is from 2002, so one might argue that there isn't a problem if the burst of packets remains small. After all, there are other protocols which don't even use congestion control (particularly real-time applications like VoIP and other UDP based protocols) or cause bursts of initial traffic by concurrently starting many TCP connections (Bittorrent and other peer to peer networks). However, using an overly large initial window size is indeed a violation of a very central RFC, so it should not be done.

  • by BitZtream (692029) on Friday November 26, 2010 @03:16PM (#34351864)

    I intentionally removed kdawson and timothy from the front page on slashdot just so I wouldn't have to see their ignorant, retarded, not a fucking clue posts ...

    Did they realize that no one read their tripe anymore now they have to have someone else approve it for them?

    kdawson and timothy are idiots, please give me a way to automatically not see anything that has to do with those two morons. Please.

    kdawson is cheating to get around the effort I put on not seeing his crap, MS and Google on the other hand are following the RFC just fine ... if anyone involved in the posting of this story had a clue about what it said or did any sort of actual research than I wouldn't have to rant about it ...

  • Re:Misread the RFC (Score:5, Insightful)

    by Spazmania (174582) on Friday November 26, 2010 @03:17PM (#34351872) Homepage

    IETF uses the capitalized MUST/MUST NOT terminology for a reason. It's used anywhere an implementer could reasonably do something else but for some reason isn't allowed to. Where it isn't present, it isn't required. If the authors omitted that terminology even after referencing RFC 2119 in a standards track modification to such a widely used protocol, they did so because the entire modification is optional.

  • by Animats (122034) on Friday November 26, 2010 @03:36PM (#34351978) Homepage

    That's been known in the TCP community for decades.

    I looked at this back in my RFC 896 [faqs.org] days, when TCP was in initial development and I was working on congestion. I introduced the "congestion window" concept and put it in a TCP implementation (3COM's UNET, which predated Berkeley BSD). The question was, what should be the initial size of the congestion window? If it's small, you get "slow start"; if it's large, the sender can blast a big chunk of data at the receiver at start, up to the amount of buffering the receiver is advertising.

    I decided back then to start with a big congestion window, because starting with a small one would slow down traffic even when bandwidth was available. One of the big performance issues back then was the time required to FTP a directory across a LAN, where TCP connections were being set up and torn down at a high rate. So startup time mattered. The decision to go with a smaller initial congestion window size came years later, from others. This reflected trends in router design. I wanted routers to have "fair queuing", so that sending lots of packets from one source didn't gain the sender any bandwidth over sending few packets. But routers gained speed faster than RAM costs dropped, and so faster routers couldn't have enough RAM for fair queuing. Today, your "last mile" CISCO router might have fair queuing [nil.com]. Some DOCSIS cable modem termination units have it. [cascaderange.org] But many routers are running Random Early Drop, which is a simple but mediocre approach. (The backbone routers barely queue at all; if they can't forward something fast, they drop it. Network design tries to keep the congestion near the edges, where it can be dealt with.)

    Remember, every dropped packet has to be retransmitted. (Too much of that leads to congestion collapse, a term I coined in 1984. That's what the "Nagle algorithm" is about.) In a world with packet-dropping routers, "slow start" makes sense. So that was put into TCP in the late 1980s (by which time I was out of networking.)

    However, the RFC-documented slow start algorithm is rather conservative. RFC 2001 says to start at one maximum segment size. Microsoft's implementations in Win95 and later start at two maximum segment sizes [microsoft.com]. In RFC 3390, from 2002, the limit was raised to 3 or 4 maximum segment sizes. (We used to worry about delaying keystroke echo too much because big FTP packets were tying up the 9600 baud lines too long. We're past that.)

    But Google is sending at least 8 segments at start, and Microsoft was observed to be sending 43. Sending 43 packets blind is definitely overdoing it.

    I wonder whether they're doing this blindly, or if there's more smarts behind the scenes. If their TCP implementation kept a cache of recent final congestion window sizes by IP address, they could legitimately start off the next connection with the value from the last one. So, having discovered a path that's not dropping big bursts of packets, they could legitimately start fast. If they're just doing it the dumb way, starting fast every time, that's going to choke some part of the net under heavy load.

  • by Oxford_Comma_Lover (1679530) on Friday November 26, 2010 @03:38PM (#34351990)

    The Third rule of network design, for a moral being, is to consider the moral, ethical, and legal consequences of any atypical changes you make to your behavior.

    Why the Third rule?

    Because the first rule is to figure out what on earth is going on--not just in theory, but in fact. Code for the OSI model is ugly, perhaps by necessity (it has to be very fast), but it's code that is very, very easy to get wrong. It involves a lot of interacting pieces working on different levels of abstraction with other players that you don't have code control over.

    The second rule is to realize when the first rule means that you shouldn't touch the stuff. Google and Microsoft have the engineering competence to mess with it--MSFT even should be messing with it, in terms of looking for ways to improve their behavior in a community-friendly way. Because they write the code that handles a huge portion of connections, and let's face it, TCP/IP just isn't designed for lots of things: AJAX or broadband, for example.

    The third rule is to consider the moral and ethical and legal consequences of changes.

    Only after at least these three steps should someone make changes that involve connections that go beyond the computers they control.

  • Re:Seems to me... (Score:4, Insightful)

    by WolfWithoutAClause (162946) on Friday November 26, 2010 @03:55PM (#34352132) Homepage

    It's going to be OK, provided it's only a small amount of traffic involved. But if everyone starts sending a lot of traffic like this... boom!

    In a sense Google are just saying that their search results are high priority traffic, and they've optimised it like that. Which is probably fair enough.

    But if you did that to anything that creates huge numbers of connections very rapidly and then sends a lot of data, perhaps using it for peer-peer networks, the network would start to suffer collapse.

  • by osu-neko (2604) on Friday November 26, 2010 @04:11PM (#34352270)

    First, implement it, and show that it works in practice.

    Later, standardize the proven best practices.

    Google, ur doin' it rite! :D

  • Re:Misread the RFC (Score:4, Insightful)

    by Spazmania (174582) on Friday November 26, 2010 @04:30PM (#34352452) Homepage

    Kay, so I've poked through the RFCs a bit...

    TCP first defined in RFC 793. No slow start; implementations generally send segments up to the window size negotiated in SYN exchange which is generally the smaller of the speakers' two buffers.

    Slow start first referenced in RFC 1122 (Internet host requirements) as: ''Recent work by Jacobson [ACM SIGCOMM-88] on Internet congestion and TCP retransmission stability has produced a transmission algorithm combining "slow start" with "congestion avoidance". A TCP MUST implement this algorithm.''

    At this point in the process there does not appear to be an RFC specifying TCP slow start making this statement in a document that is not itself about TCP per se very dubious.

    A decade later, RFC 2001 says: "Modern implementations of TCP contain four intertwined algorithms that have never been fully documented as Internet standards: slow start, congestion avoidance, fast retransmit, and fast recovery." The word "must" is subsequently used in connection with congestion avoidance but is not used in connection with slow start.

    RFC2414 then revisits the question of TCP's initial window size selection referencing RFC 2001 but again declines to state that TCP "must" start with a small window.

    RFC 2581 finally sets an unambiguous slow start requirement: The slow start and congestion avoidance algorithms MUST be used by a TCP sender [...] IW, the initial value of cwnd, MUST be less than or equal to 2*SMSS bytes and MUST NOT be more than 2 segments.

    However, even as it does so, it goes on to comment that, "We note that a non-standard, experimental TCP extension allows that a TCP MAY use a larger initial window [...] We do NOT allow this change as part of the standard defined by this document. However, we include discussion [...] in the remainder of this document as a guideline for those experimenting with the change, rather than conforming to the present standards for TCP congestion control."

    In other words, even though out of the box TCPs MUST implement slow start, it's understood that other behaviors are in use and are expected to continue.

    Finally, RFC 3390 allows the out-of-the-box behavior of TCP to use a larger initial window than 2581.

    Conclusion: Google still isn't cheating.

  • by carton (105671) on Friday November 26, 2010 @04:44PM (#34352602)

    Yes, that's my understanding as well---the point of slow start is to go easy on the output queues of whichever routers experience congestion, so if congestion happens only on the last mile a hypothetical bad slow-start tradeoff does indeed only affect that one household (not necessarily only that one user), but if it happens deeper within the Internet it's everyone's problem contrary to what some other posters on this thread have been saying.

    WFQ is nice but WFQ currently seems to be too complicated to implement in an ASIC, so Cisco only does it by default on some <2Mbit/s interfaces. Another WFQ question is, on what inputs do you do the queue hash? For default Cisco it's on TCP flow, which helps for this discussion, but I will bet you (albeit a totally uninformed bet) that CMTS will do WFQ per household putting all the flows of one household into the same bucket, since their goal is to share the channel among customers, not to improve the user experience of individual households---they expect people inside the house to yell at each other to use the internet ``more gently'' which is pathetic. In this way, WFQ won't protect a household's skype sessions from being blasted by MS fast-start the way Cisco default WFQ would.

    If anything, cable plants may actually make TCP-algorithm-related congestion worse because I heard a rumor they try to conserve space on their upstream channel by batching TCP ACK's, which introduces jitter, meaning the windowsize needs to be larger, and makes TCP's downstream more ``microbursty'' than it needs to be. If they are going to batch upstream on purpose, maybe they should timestamp upstream packets in the customer device and delay them in the CMTS to simulate a fixed-delay link---they could do this TS+delay per-flow rather than per-customer if they do not want to batch all kinds of packets (ex maybe let DNS ones through instantly).

    RED is not too complicated to implement in ASIC, but (a) I think many routers, including DSLAM's, actually seem to be running *FIFO* which is much worse than RED even, because it can cause synchronization when there are many TCP flows---all the flows start and stop at once. (b) RED is not that good because it has parameters that need to be tuned according to approximately how many TCP flows there are. I think BLUE is much better in this respect, and is also simple enough to implement in ASIC, but AFAIK nobody has.

    I think much of the conservatism on TCP implementers' part can be blamed on router vendors failing to step up and implement decades-old research on practical ASIC-implementable queueing algorithms. I've the impression that even the latest edge stuff focuses on having deep, stupid (FIFO) queues (Arista?) or minimizing jitter (Nexus?). Cisco has actually taken RED *off* the menu for post-6500 platforms: 3550 had it on the uplink ports, but 3560 has ``weighted tail drop'' which AFAICT is just fancy FIFO. I'd love to be proved wrong by someone who knows more, but I think they are actually moving backwards rather than stepping up and implementing BLUE [thefengs.com].

    and I like very much your point that cacheing window sizes per /32 is the right way to solve this rather than haggling about the appropriate default, especially in the modern world of megasites and load balancers where a clever site could appear to share this cached knowledge quite widely. but IMSHO routing equipment vendors need to be stepping up to the TCP game, too.

  • Re:Misread the RFC (Score:3, Insightful)

    by Spazmania (174582) on Friday November 26, 2010 @05:51PM (#34353218) Homepage

    What, are you stupid?

    "Document A doesn't say what you claim."

    "Yeah, but there's a previous document which does."

    "What previous document is that?"

    "Hur, learn to use google dude."

  • by Iron Condor (964856) on Friday November 26, 2010 @06:02PM (#34353292)

    I wonder whether they're doing this blindly, or if there's more smarts behind the scenes. If their TCP implementation kept a cache of recent final congestion window sizes by IP address, they could legitimately start off the next connection with the value from the last one. So, having discovered a path that's not dropping big bursts of packets, they could legitimately start fast. If they're just doing it the dumb way, starting fast every time, that's going to choke some part of the net under heavy load.

    That strikes me as still-kinda-eigthies-thinking. I guess the question is what your assumption for an unknown segment of network is: If you assume that all parts of the net are congested most of the time, then you'll want to do a fast start up only on those segments that you know can handle it (doesn't have to be an individual IP - If my ISP buffers alright and you can reach it alright then it doesn't matter how many folks are sitting downstream from them - it becomes their problem.) If, on the other hand, you have the expectation that most packets on most of the net are going to be just fine (for whatever reason; even if by sheer brute force buffering and clever back-end algorithms that figure it all out after the fact) then it makes sense to do fast start with unknown clients and omit it only on those found NOT to be able to handle it. Kinda a glass-half-full way of looking at it.

    These days I'd wager that the vast (VAST!) majority of packets are part of ongoing streams - streaming Netflix over the net, torrenting the collected porn of the 80ies, that kind of thing. Which means I'm as sure as I can possibly be of something I haven't researched that the performance of the net is only in the most marginal way dependent on startup behaviour around individual connections any more. (Or better when/where it is, it is probably due to the 100 tcp connections that need to be established to view a single web page; fix that and the question of startup behaviour will just go away. Incidently, MS'es CHM concept was a step very much in the right direction...)

  • Re:Misread the RFC (Score:1, Insightful)

    by Anonymous Coward on Friday November 26, 2010 @07:28PM (#34353916)

    IETF uses the capitalized MUST/MUST NOT terminology for a reason. It's used anywhere an implementer could reasonably do something else but for some reason isn't allowed to. Where it isn't present, it isn't required

    This is complete nonsense.

    The sign of a good RFC writer is not littering a document with MUST *** termonology. After a certain threshold it gets really old and implementors begin to ignore you.

    If there is a magic defined in an RFC or an algorithm used in a certain way more often than not it will NOT say that you MUST call the algorithm in this order with these special parameters. If however you don't follow the specification you should not expect your implementation to work at all.

    Recommendations often have very significant side effects if they are not followed. No wording in an RFC should ever be construed as a substitute for using ones brain and understanding the underlying basis upon which the the specification was arrived.

    If an outfit like Google has a better way of characterizing the link by for example keeping track of metrics obtained from recent connection histories then good for them.

    If however they are just turning off congestion control because it makes "their" site faster with the justification "usually" it is not necessary then fuck them.

    It seems to me the single worst thing one could do in a congested environment is add more connections with no realtime requirements in a non-congestion avoidant manner.

    Until I see simulation results to the contrary (Which is Googles burdon to supply) then I will just assume any instance of ignorant circumvention of slow-start is Google being Evil.

  • by teridon (139550) on Saturday November 27, 2010 @01:13AM (#34356118) Homepage
    Isn't it time /. got a "-1 Reply Abuse" mod? The parent reply has nothing to do with the GP. It's on topic, and maybe it deserves the "Insightful" mod -- but it's replying to the top post just to appear at the top of the page. STOP THE MADNESS!

It is impossible to enjoy idling thoroughly unless one has plenty of work to do. -- Jerome Klapka Jerome

Working...