Of course it is rather windows centric, but most of the issues apply across platforms (only a few talk about WSA functions)
However Lame List [tangentsoft.net] contains a lot of wonderful nuggets.
I must disagree with the article however, there are so SO few times that disabling the Nagle algorythm is the correct answer that the standard answer when someone asks about it on the networking forums is that the asker doesn't understand Nagle, and to reenable it. Telnet is even a bastard case in that your networking performance
I really should fix the bad interaction between the "Nagle algorithm" and "delayed ACKs". Both ideas went into TCP around the same time, and the interaction is terrible. That fixed timer for ACKs is all wrong.
Here's the real problem, and its solution.
The concept behind delayed ACKs is to bet, when receiving some data from the net, that the local application will send a reply very soon. So there's no need to send an ACK immediately; the ACK can be piggybacked on the next data going the other way. If that doesn't happen, after a 500ms delay, an ACK is sent anyway.
The concept behind the Nagle algorithm is that if the sender is doing very tiny writes (like single bytes, from Telnet), there's no reason to have more than one packet outstanding on the connection. This prevents slow links from choking with huge numbers of outstanding tinygrams.
Both are reasonable. But they interact badly in the case where an application does two or more small writes to a socket, then waits for a reply. (X-Windows is notorious for this.) When an application does that,
the first write results in an immediate packet send. The second write is held up until the first is acknowledged. But because of the delayed ACK strategy, that acknowledgement is held up for 500ms. This adds 500ms of latency to the transaction, even on a LAN.
The real problem is that 500ms unconditional delay. (Why 500ms? That was a reasonable response time for a time-sharing system of the 1980s.) As mentioned above, delaying an ACK is a bet that the local application will reply to the data just received. Some apps, like character echo in Telnet servers, do respond every time. Others, like X-Windows "clients" (really servers, but X is backwards about this), only reply some of the time.
TCP has no strategy to decide whether it's winning or losing those bets.
That's the real problem.
The right answer is that TCP should keep track of whether delayed ACKs are "winning" or "losing". A "win" is when, before the 500ms timer runs out,
the application replies. Any needed ACK is then coalesced with the next outgoing data packet. A "lose" is when the 500ms timer runs out and the delayed ACK has to be sent anyway. There should be a counter in TCP, incremented on "wins", and reset to 0 on "loses". Only when the counter exceeds some number (5 or so), should ACKs be delayed. That would eliminate the problem automatically, and the need to turn the "Nagle algorithm" on and off.
So that's the proper fix, at the TCP internals level. But I haven't done TCP internals in years, and really don't want to get back into that.
If anyone is working on TCP internals for Linux today, I can be reached at the e-mail address above. This really should be fixed, since it's been annoying people for 20 years and it's not a tough thing to fix.
The user-level solution is to avoid write-write-read sequences on sockets.
write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once. Using the standard UNIX I/O package and flushing write before each read usually works.
Ah, so you are the Nagle of the algorithm? How about an extension onto TCP as a concept: you can tell TCP that you are willing to accept d amount of delay, with the default being the 500 ms previously used and assigned. Thus protocols like X could state that they don't need to hang waiting for an ACK, while programs that should hang waiting for ACK will continue to do so.
This extension would only require recompiling the programs that attempt to not do the prior default action of that delay, such as recompi
"Flattery is all right -- if you don't inhale."
-- Adlai Stevenson
Always liked the Winsock Lame List (Score:2)
However Lame List [tangentsoft.net] contains a lot of wonderful nuggets.
I must disagree with the article however, there are so SO few times that disabling the Nagle algorythm is the correct answer that the standard answer when someone asks about it on the networking forums is that the asker doesn't understand Nagle, and to reenable it. Telnet is even a bastard case in that your networking performance
The trouble with the Nagle algorithm (Score:5, Interesting)
Here's the real problem, and its solution.
The concept behind delayed ACKs is to bet, when receiving some data from the net, that the local application will send a reply very soon. So there's no need to send an ACK immediately; the ACK can be piggybacked on the next data going the other way. If that doesn't happen, after a 500ms delay, an ACK is sent anyway.
The concept behind the Nagle algorithm is that if the sender is doing very tiny writes (like single bytes, from Telnet), there's no reason to have more than one packet outstanding on the connection. This prevents slow links from choking with huge numbers of outstanding tinygrams.
Both are reasonable. But they interact badly in the case where an application does two or more small writes to a socket, then waits for a reply. (X-Windows is notorious for this.) When an application does that, the first write results in an immediate packet send. The second write is held up until the first is acknowledged. But because of the delayed ACK strategy, that acknowledgement is held up for 500ms. This adds 500ms of latency to the transaction, even on a LAN.
The real problem is that 500ms unconditional delay. (Why 500ms? That was a reasonable response time for a time-sharing system of the 1980s.) As mentioned above, delaying an ACK is a bet that the local application will reply to the data just received. Some apps, like character echo in Telnet servers, do respond every time. Others, like X-Windows "clients" (really servers, but X is backwards about this), only reply some of the time.
TCP has no strategy to decide whether it's winning or losing those bets. That's the real problem.
The right answer is that TCP should keep track of whether delayed ACKs are "winning" or "losing". A "win" is when, before the 500ms timer runs out, the application replies. Any needed ACK is then coalesced with the next outgoing data packet. A "lose" is when the 500ms timer runs out and the delayed ACK has to be sent anyway. There should be a counter in TCP, incremented on "wins", and reset to 0 on "loses". Only when the counter exceeds some number (5 or so), should ACKs be delayed. That would eliminate the problem automatically, and the need to turn the "Nagle algorithm" on and off.
So that's the proper fix, at the TCP internals level. But I haven't done TCP internals in years, and really don't want to get back into that. If anyone is working on TCP internals for Linux today, I can be reached at the e-mail address above. This really should be fixed, since it's been annoying people for 20 years and it's not a tough thing to fix.
The user-level solution is to avoid write-write-read sequences on sockets. write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once. Using the standard UNIX I/O package and flushing write before each read usually works.
John Nagle
Re:The trouble with the Nagle algorithm (Score:2)
you can tell TCP that you are willing to accept d amount of delay, with the default being the 500 ms previously used and assigned. Thus protocols like X could state that they don't need to hang waiting for an ACK, while programs that should hang waiting for ACK will continue to do so.
This extension would only require recompiling the programs that attempt to not do the prior default action of that delay, such as recompi