Slashdot Log In
Boosting Socket Performance on Linux
Posted by
CmdrTaco
on Thu Jan 19, 2006 03:55 PM
from the everyone-likes-more dept.
from the everyone-likes-more dept.
Cop writes "The Sockets API lets you develop client and server applications that can communicate across a local network or across the world via the Internet. Like any API, you can use the Sockets API in ways that promote high performance -- or inhibit it. This article explores four ways to use the Sockets API to squeeze the greatest performance out your application and to tune the GNU/Linux® environment to achieve the best results."
This discussion has been archived.
No new comments can be posted.
Boosting Socket Performance on Linux
|
Log In/Create an Account
| Top
| 138 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Be aware (Score:4, Funny)
Re:Be aware (Score:4, Insightful)
(http://www.solatis.com/)
Exactly... especially with things like these, it's usually best for the entire internet if you just stick with the defaults... they are defaults for a reason, it might not be the best for you, but it's most likely the best for the internet as a whole.
Reminds me of those people tweaking firefox settings to hammer all kind of webservers... sure, your browsing might be a slight bit faster, at the expense of the browsing of lots of other people...
Re:Be aware (Score:4, Informative)
slashdotted? (Score:5, Funny)
GNU/Linux®...A lessefficent way to say Linux (Score:2, Funny)
(Last Journal: Friday September 14, @10:12AM)
Hello 1995 (Score:5, Insightful)
(http://www.intelligentblogger.com/ | Last Journal: Monday August 27, @11:47AM)
Re:Hello 1995 (Score:4, Interesting)
In the same line - where is the discussion of different FD table polling mechanisms? select() versus poll(), and wheres the writeup about Linux's epoll(). I would have been interested in an epoll() article, especially how it compares to FreeBSD's kqueue().
Re:Hello 1995 (Score:5, Informative)
(Last Journal: Thursday October 03 2002, @10:53AM)
For the overview, you want Dan Kegel's c10k page:
http://www.kegel.com/c10k.html [kegel.com]
Hello 2003. (Score:5, Interesting)
(http://slashdot.org/ | Last Journal: Saturday November 03, @04:58AM)
Documentation like this is great and extremely valuable. It would be much more valuable, however, if it remained current. For example, can the ABISS [sourceforge.net] project (which improves block I/O) be used at all? What do the numbers look like, when using profiling tools like Web100 [web100.org] (which profiles TCP communications)?
Has anyone run the Linux or one of the *BSD kernels through DAKOTA [sandia.gov], KOJAK [fz-juelich.de] or PAPI [utk.edu] to determine where, precisely, bottlenecks are within the kernels? It's easy to theorise, but isn't it cleaner to measure?
Now, I'm not saying these things aren't being done. They probably are, somewhere, by someone, but if the results aren't getting published we don't really know what impact what changes are going to have. The current method of evolving Operating System code in general is often a mix of personal theory and subjective experience based on non-random samples of activity. That can't really be a good way to do things, can it?
If I'm wrong, feel free to say. If I'm right, then maybe it would be a good thing if someone (possibly me) put together some kind of testing kit for measuring Linux kernel performance and actually measured the stats for Linux kernels on some kind of regular basis.
Re:Hello 1995 (Score:4, Informative)
(http://www.intelligentblogger.com/ | Last Journal: Monday August 27, @11:47AM)
From the MAN page [linuxmanpages.com]:
The article could have better explained that in context. For the most part it's automatic though, so don't worry about it.
Summary ripped directly from article (again) (Score:2, Informative)
Here is the summary:
The Sockets API lets you develop client and server applications that can communicate across a local network or across the world via the Internet. Like any API, you can use the Sockets API in ways that promote high performance -- or inhibit it. This article explores four ways to use the Sockets API to squeeze the greatest performance out your application and to tune the GNU/Linux® environment to achieve the best results.
Here is the first paragraph of the article:
The Sockets API lets you develop client and server applications that can communicate across a local network or across the world via the Internet. Like any API, you can use the Sockets API in ways that promote high performance -- or inhibit it. This article explores four ways to use the Sockets API to squeeze the greatest performance out your application and to tune the GNU/Linux® environment to achieve the best results.
Unless Cop (the submitter) is actually M. Tim Jones (the article author), Cop didn't write a darn thing.
Didn't we just have this discussion on
No mention of alternatives to select? (Score:5, Informative)
Re:Code Portability (Score:5, Informative)
Re:No mention of alternatives to select? (Score:5, Informative)
(http://www.spamgourmet.com/)
http://www.xmailserver.org/linux-patches/nio-impr
The website is hideous, but there used to be benchmarks against different polling/selecting methods. If I remember correctly, its kinda trial and error, YMMV, kind of stuff. Its worth a look.
Nothing new (Score:1, Funny)
GNU/Linux®? (Score:2)
IBM is getting some good Linux content... (Score:5, Interesting)
(http://tomcopeland.blogs.com/)
Folks who are thinking about writing something technical - give dW a shot. The editors are savvy folks and there's lots of good stuff up there already.
Oh, and book plug [pmdapplied.com]!
I've always wanted to know if it is possible (Score:2)
Re:I've always wanted to know if it is possible (Score:5, Funny)
(Last Journal: Friday August 24, @08:58PM)
It ignores you except at feeding time, and pees in your shoes when it's mad at you?
Re:I've always wanted to know if it is possible (Score:5, Informative)
Is that what you're looking for?
Nagle's algorithm (Score:5, Interesting)
(http://www.lcscanada.com/jaf)
To get around the above problems, I came up with the following scheme: Leave Nagle's algorithm enabled, but create a FlushSocket() function that merely disables Nagle on the socket, then calls send() on the socket with a 0-byte buffer, then enables Nagle again. This apparently forces the TCP stack to immediately send any data that it may have accumulated in its Nagle-buffer. Therefore the only thing the calling code has to remember to do is to call FlushSocket() whenever it has called send() one or more times and doesn't think it will be sending any more data any time soon.
The above technique seems to work pretty well under Linux, Windows, and OS/X (and is more portable than Linux-specific flags like TCP_CORK, etc), but I haven't seen it documented anywhere. Is that simply an oversight, or is there some nasty downside to this technique that I'm overlooking?
flush( sd ) would be nice (Score:1)
Math error in paper? (Score:3, Informative)
(http://existens.org/)
throughput = window_size / RTT
110KB / 0.050 = 2.2MBps
If instead you use the window size calculated above, you get a whopping 31.25MBps, as shown here:
625KB / 0.050 = 31.25MBps
That's funny, I get 12.5MBps
???
Socket tuning (Score:2)
What I'm wondering is, might it be possible to make these sort of calculations in kernel, detect congestion feedback and back off automatically? I'm not talking about the regular exponential backoff algorithm, but about some sort of best-rate prediction based on detecting the characteristic shape of feedback waves and backing off until they disappear.
GNU/Linux® (Score:1)
(http://batteriesnimh.com/)
Always liked the Winsock Lame List (Score:2)
However Lame List [tangentsoft.net] contains a lot of wonderful nuggets.
I must disagree with the article however, there are so SO few times that disabling the Nagle algorythm is the correct answer that the standard answer when someone asks about it on the networking forums is that the asker doesn't understand Nagle, and to reenable it. Telnet is even a bastard case in that your networking performance may actually go UP sending smaller bursts of network characters, rather than one at a time, each in its own packet. But you have to measure your own performance.
Frankly none of these suggestions will get you ultimate performance from a 10 Gig networking stack, and that is where networking finally becomes fun
The trouble with the Nagle algorithm (Score:5, Interesting)
(http://www.animats.com)
Here's the real problem, and its solution.
The concept behind delayed ACKs is to bet, when receiving some data from the net, that the local application will send a reply very soon. So there's no need to send an ACK immediately; the ACK can be piggybacked on the next data going the other way. If that doesn't happen, after a 500ms delay, an ACK is sent anyway.
The concept behind the Nagle algorithm is that if the sender is doing very tiny writes (like single bytes, from Telnet), there's no reason to have more than one packet outstanding on the connection. This prevents slow links from choking with huge numbers of outstanding tinygrams.
Both are reasonable. But they interact badly in the case where an application does two or more small writes to a socket, then waits for a reply. (X-Windows is notorious for this.) When an application does that, the first write results in an immediate packet send. The second write is held up until the first is acknowledged. But because of the delayed ACK strategy, that acknowledgement is held up for 500ms. This adds 500ms of latency to the transaction, even on a LAN.
The real problem is that 500ms unconditional delay. (Why 500ms? That was a reasonable response time for a time-sharing system of the 1980s.) As mentioned above, delaying an ACK is a bet that the local application will reply to the data just received. Some apps, like character echo in Telnet servers, do respond every time. Others, like X-Windows "clients" (really servers, but X is backwards about this), only reply some of the time.
TCP has no strategy to decide whether it's winning or losing those bets. That's the real problem.
The right answer is that TCP should keep track of whether delayed ACKs are "winning" or "losing". A "win" is when, before the 500ms timer runs out, the application replies. Any needed ACK is then coalesced with the next outgoing data packet. A "lose" is when the 500ms timer runs out and the delayed ACK has to be sent anyway. There should be a counter in TCP, incremented on "wins", and reset to 0 on "loses". Only when the counter exceeds some number (5 or so), should ACKs be delayed. That would eliminate the problem automatically, and the need to turn the "Nagle algorithm" on and off.
So that's the proper fix, at the TCP internals level. But I haven't done TCP internals in years, and really don't want to get back into that. If anyone is working on TCP internals for Linux today, I can be reached at the e-mail address above. This really should be fixed, since it's been annoying people for 20 years and it's not a tough thing to fix.
The user-level solution is to avoid write-write-read sequences on sockets. write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once. Using the standard UNIX I/O package and flushing write before each read usually works.
John Nagle
...what about UDP? (Score:1, Offtopic)
Pining for Doors (Score:2)
(http://slashdot.org/ | Last Journal: Friday May 07 2004, @03:22PM)
Linux is quite tragic that way. Hopefully there will be a Debian user-land on the OpenSolaris kernel soon, and then I can rock-n-roll again.
Re:somewhat old... (Score:2)
I agree though, nothing earth shaking. Nagle's algorithm is discussed in depth in most TCP/IP books, and so is how to turn it off. Wake me up when they post something new.
Re:Never trust an article with a (R) symbol... (Score:2)