Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Java Programming

Java Performance under Linux 141

krshultz writes "IBM has posted a great technical article on Java performance on its DeveloperWorks site. I learned a lot about Java and Linux in general." This is a nice big well-indexed article. Go.
This discussion has been archived. No new comments can be posted.

Java Performance under Linux

Comments Filter:
  • by Anonymous Coward
    There are no performance benefits versus natively compiled languages such as C++. Java is dog slow.

    But, the generated byte code is portable so you can write a program and deploy it with a tremendous amount of ease as an applet in a web browser and have it run on any computer whether it's a Linux box, Windoze, Solaris, or Mac.

    To me, it's the perfect way to write SQL database front ends (the client application) since these are not performace critical apps, but usually just simple forms. Java should be able to kill Visual Basic as a client/server database development tool.

  • by Anonymous Coward
    I write client Java UI apps for A Very Large Software Company that uses Java extensively. I was also recently given a side project to compare the speeds and memory usage of various Windows JavaVMs. The tests involved very large client-side Java apps.

    I compared the various Sun versions and the IBM 1.1.8 JDK for Windows, and the results roughly were:

    Sun 1.2 JDK : slowest (at least for client apps)
    Sun 1.1.8 JDK: OK
    IBM 1.1.8 JDK: fast
    Sun 1.3 beta JDK: fastest.

    On a decently modern machine, there was no distinguishable difference between native Windows apps and our better-looking client Java UIs. YMMV.

    Performant client Java is pretty close now...

  • I read an interesting history of the Ada langauge recently (forget who wrote it) and it basically stated that Ada was a US Defense dept creation intended to thwart the Russian space/military industry. Apparently the US realized that Russia was our if equal, if not better, in human intelligence. The Russians were able to exploit their poor equipment by squeezing out nifty asembler code.

    The Defense dept realized that while the US needed to engage Russia in a hardware race that Russia's beleagured ecomony could not match -- A precursor to the Star Wars tactics Ronald Reagan would employ 20 years later. The only way the US could engage Russia in the hardware race was to create the impression that the US was using a wonderful, new programming langauge that gave them a software advangtage over Russia, thus forcing Russia to adapt to the new programming language. The language was Ada.

    Ada was specifically designed to be a loser. The Russians stopped programming in assembler and sacrificed their intelligence lead.

    I haven't given the story enough justice. It was a fascinating read, so fascinating that it bordered on unbelievable. I found the link in a slashdot comment so if you do a find on Ada stories you might find it.

    I just thought it was amusing that you recommended Ada when it is essentially a bum language,designed to be that way from its inceptiopn

  • by Telcontar ( 819 ) on Wednesday January 19, 2000 @05:48AM (#1358261) Homepage
    How do the *BSD schedulers cope with that problem? What alternatives to an evaluation of the "Goodness functions" have been thought of?

    Is it maybe possible that one only makes a rough (heuristic) estimate of that function, maybe based on older (exact) values, which are only updated from time to time? The same goes for the ranking of the results of these functions (apperently much time is lost here). After all, with so many threads
    a) one does not have to select the best process to run - choosing a good one is OK
    b) having a bigger data structure in the Kernel should not be a problem - the testing machines had 1 GB of RAM...
  • by jd ( 1658 ) <imipak&yahoo,com> on Wednesday January 19, 2000 @05:30AM (#1358262) Homepage Journal
    This is not the first time someone's commented on the Linux scheduler. There have been unofficial patches for it for some time, and there have been more than a few complaints as to the way it operates.

    There seem to be three directions people want to go with the scheduler - coarse-grain, fine-grain and real-time. Instead of arguing which is "best", why don't the developers do what they've always done in the past - put the stuff in, and used menu options to let people choose! If one (or two) of the options turn out to be really redundant, back them out! Nothing's lost, but a few cycles of human time. And it's better spent with code than with flame-thrower.

  • "The result, however, is that each Java thread in the IBM Java VM for Linux is implemented by a corresponding user process whose process id and status can be displayed using the Linux ps command. "

    This statement from the article seems to imply that threads are implemented as separate processes. But then following statement implies that they are all in the same address space:

    "Instead, a special version of the fork() system call (also known as clone()) has been implemented that allows you to specify that the child process should execute in the same address space as its parent. "

    That's interesting. Threads typically need access to shared data, and having a separate process per thread would cause a performance hit accessing this data. However, they're all in the same parent process space, so does that mean they can share pointers (errrr, references)? What happens if I kill one of these threads (processes) from the command line (presumably the same as terminating a thread under windows, but easier to carry out as you don't have to write a program to do it)? Is it efficient to have the main scheduler scheduling threads? How do other OS's do it? I would have thought it bad as using a multi-threaded app would effect the performance of the whole system (but as an application writer, I guess I wouldn't want thread scheduling using up some of my process's time slice!)
  • What I would need is a CORBA ORB that will build Objective-C stubs and skeletons (ie: an Objective-C binding and implementation for CORBA).

    With CORBA bindings, Objective-C is useful; without them, its useless ... to me. I would love to see Objective-C support added to Orbit; hell, I'd love to see C++ support added to Orbit. I'd like to reduce the number of ORBs that I have to deal with to do multi-language development.

  • You just need 1 web server process for each NIC; it's the equivalent of having a multithreaded TCP/IP stack for multiple NICs.

    That's reasonable, since the processor can fundamentally move much more data than the combined throughput of the NICs (if that's not true, you need a faster processor/better memory bus, of course).

  • I mean a 2-10x speedup in anything.

    I can't imagine the scheduler is taking up more than the 20% it's taking up in this benchmark when you run Apache with a zillion processes.

    So if the scheduler were infinitely fast, Apache would get 20% faster. BFD. I'm more interested in changing the design of the webserver itself so it's 2-10x faster.
  • by aheitner ( 3273 ) on Wednesday January 19, 2000 @01:37PM (#1358267)
    You can't rely on user-space thread switches, it's just too messy. Remember, in theory Win3.1 did user space timesharing, and we all know how much of a joke that was. If you're going to be doing more than one thing at once (conceptually of course) in a single userland context, you should structure your code appropriately for the grain you want and build an event structure or whatever is appropriate. Calls to yield are just flaky -- you should be able to structure things much more intelligently that that.

    For example, a webserver has the fundamental building block of a packet on the transport, which has an MTU. So a webserver ought to be able to build a grainsize based on sending at least one (or perhaps several if the packet size is very small, such as ATM; this would be decided by looking at the hardware at configtime or runtime on the server, not a big deal) packet, then considering which client to serve next. Of course it helps massively if you can collect intelligence on what type of connection the client has, i.e. don't try to send more than a few kilobit to modem users, etc etc.

    It's bad in the first place if there has to be a VM choosing which code to general (slow slow slow!). I'm only talking about the "real-code" (i.e. traditional, compiled webservers and other multiconnection servers) case here. I don't believe one has the right to complain about performance at all if one chooses to use Java, so even tho "green-threads" style threads might be appropriate and effective for Java, they're not very useful for a real application.
  • by aheitner ( 3273 ) on Wednesday January 19, 2000 @06:09AM (#1358268)
    I've got a fundmental disconnect here ...

    Okay, the Linux scheduler is slower than it could be. It is taking up "up to 20% of CPU cycles" in the very process-intensive (given that native threads are no lighter weight than processes) benchmark, 400-2000 processes.

    But there's a more fundamental problem: a 20% speedup isn't significant. I'm not saying we should abandon all speedups that don't affect asymptotic complexity; I'm just saying that I'm looking for speedups of at least 2x-10x before I'm impressed with anything. 20% is small stuff.

    There's a bigger issue here: this many processes will never be fast. The cost of a context switch is high given current processor designs, and is not likely to get lower. Even assuming that on a thread switch, since you're dealing with the same data as the previous thread was using, the TLB and code/data caches remain useful (on a process switch in general they don't, and refilling the caches is very expensive), you still have to store a whole bunch of stuff to memory for the old thread and bring a whole bunch of stuff of stuff out of memory for the new thread. And you've got to leave userland for a bit to do that. Slow slow slow slow.

    It seems to me that in general we need to reconsider the approach of relying on the operating system to schedule and share resources (in the case of chatservers and ftpservers and especially webservers, where we see the real performance hits for massive thread/process expenses). Right now all this stuff is based on the Berkeley sockets API, a high level network API (i.e. one that doesn't at all consider what the transport will be). This has been a tremendously successful API; it's used on all platforms (well I can't speak for sure for Mac :) and it can be reasonably argued that Berkeley sockets paved the way for the Internet.

    But the fact remains that your ethernet card is fundamentally a serial device. I have to wonder if it wouldnt' be possible to write a webserver which does know about the transport for a change, and which could in only one process sit there putting packets onto the wire at a level much closer to the hardware, and therefore save a lot of expense in making the operating system arbitrate all these zillions of threads that want to share the connection.

    It would be an interesting project to say the least.
  • 1) If you are running one heck of a lot of processes/threads you would expect the time spent in the scheduler to be big.

    Yeah, but this is the optimisation phase. As linux goes to bulk-thread tasks it needs to start squeezing out all the performance it can; especially when the code is given to us by a corp.

    2) {I am not a hacker but} If they are at the level of seeing improvments in the scheduler by tweaking things like structure layout to improve cacheline localilty then can we sure that the "low performance impact" IBM Kernel trace patch is not having an effect?

    This wasn't a linux performance comparison against other OS's where the trace would be a factor. Instead this was a linux vs. linux comparison. The trace is just additional system load that would be equivalent on both systems.

    If you move to a many-many scheduling model you *will* reduce the time spent in the kernel scheduler. However, you *will* spend time in your user-land scheduler. Which is the win?

    IBM said they didn't know for sure. That's the point. They did the performance groundwork needed to make useful suggestions for hypothesis. Until the improvements are made and tested we won't know for sure. However now there is a test suite that can be used to test any new modifications. And one provided by a corp.

    This is sooo neat.

  • Ignore the Java! Java wasn't the point! It was an application to test with.


    To make you feel better, pretend they used freeware Threadmaster5000 software, a giant program that uses thousands of threads to do something groovey. The Threadmaster team decides to evaluate the bottlenecks their opensource program runs into on Linux.

    Oooh, looky! The scheduler has problems! But wait! They wrote a patch to the scheduler and performance went up 7%! Oh, aren't the Threadmaster people so nice to the open source community for working on Linux instead of just optimizing their code!

    Now re-read that replacing "Threadmaster5000" with IBM Java+Volano, Threadcount with Volanomarks, and Threadmaster team with IBM.

    Since we can take Java out of the picture and replace it with something else, it wasn't the point. The point is IBM identified a flaw in the scheduler and proposed a solution complete with code.
  • by jbert ( 5149 ) on Wednesday January 19, 2000 @06:03AM (#1358271)
    Whilst it seems that these people are nice and thorough, a couple of points:

    1) If you are running one heck of a lot of processes/threads (same thing on Linux) you would expect the time spent in the scheduler to be big.
    That is unavoidable overhead of *all* thread models. (You can try and reduce it - thats good...but run enough threads and it will dominate).

    2) {I am not a hacker but} If they are at the level of seeing improvments in the scheduler by tweaking things like structure layout to improve cacheline localilty then can we sure that the "low performance impact" IBM Kernel trace patch is not having an effect? What was the throughput like (i.e. the main benchmark measurement) like on a stock kernel?

    3) If you move to a many-many scheduling model you *will* reduce the time spent in the kernel scheduler. However, you *will* spend time in your user-land scheduler. Which is the win?


    I don't mean to suggest that these people don't have some good points (I hope that they develop patches and I hope that the best patch wins), but it is important not to jump to conclusions.

    PS - I only skimmed the article, so I may have got the wrong end of the stick. I'm sure someone will put me right if so :-)
  • Interesting story, but I really doubt it's true...
    true or not, Ada's merits can stand on their own.

    Been a while since I looked at Ada... anyone got a link to something current?
  • very minor question here, all respect intended:

    >Or maybe one of the network card manufacturers. If they designed a special NIC with the IP stack implemented in hardware
    >or firmware, then the OS kernel wouldn't need any modification other than a simplified device driver module. I think that
    >would eliminate the bottleneck you're referring to.

    Isn't this what a winmodem is? Didn't they offer negligable performance increase?

    I bet I'm thinking something totally stupid, so please tell me what I'm missing..

  • There are some problems naturally suited to a threading model, and some naturally suited to an event-driven model, and a language or OS that forces you to use one or the other for all problems is broken.

    Ousterhout is right, however, that threading is hard in the case that there is a lot of shared state between threads -- lots of locking issues to worry about.

    On the other hand, anyone who's tried to work on a network server based around an event-driven select() model knows what a pain in the ass that can be, too.
  • In essence:

    Here's what's wrong with Linux while running Java.
    Here's a patch to fix it.
    Thank you and good night.

    Nice to see big companies being so supportive. Who would have thought this would happen a year ago?

    "Sir, I'd stake my reputation on it."
    "Kryten, you haven't got a reputation."
  • I've found George Lebl's Making application programming easy with GNOME libraries [ibm.com] articles on IBM's site to be a really good introduction to Gnome programming. Maybe old hat for some, but as a beginning Gnome hacker, it was very helpful for me. Good info on Glib, and the 3rd article has a great example of using libXML to handle XML data files....

    I think it's awesome that IBM hosts this information. Kudos to whomever made that decision at Big Blue!
    ----
  • -- A 1996 UNENIX presentation [scriptics.com] by John Ousterhout [scriptics.com].

    The key thing to remember out of the IBM article:

    Threads are an essential part of programming in the Java language. Java threads are useful for many purposes, but because the Java language lacks an interface for non-blocking I/O (see the Resources section later in this article), threads are especially necessary in constructing communications intensive applications in Java. Typically one or more Java threads are constructed for each communications stream created by the Java program.

    So what Java requires people to do in thousands of threads, I can do in one thread per processor with event driven non-blocking I/O. Hmmm... Sounds like Java is broken to me.

    davez

  • Yes, I would implement a client-server system with only one thread per processor. Keep in mind that each thread would be doing event driven I/O for potentially hundreds or thousands of connections.

    The fact that the IBM guys are suggesting to change Linux to make up for deficiencies in the Java language seems extremely absurd to me.

    davez
  • Ouserhout didn't "turn around" his opinion on threads. He still believes that threads have their place in programming. He was simply trying to point out that threads aren't the panacea for all your programming ills, despite what industry hype would seem to suggest.
  • by tuffy ( 10202 ) on Wednesday January 19, 2000 @05:27AM (#1358280) Homepage Journal
    I hope this patch, or something equivilent hacked out between IBM and Linus/Alan/etc. will make it into the 2.3 tree prior to 2.4. IBM looks to be continuing their great Java and Linux software development, much to everyone's benefit.

    And don't forget that little feedback thing at the bottom. Let IBM know these are the sorts of things we like to hear!

  • First off, remember that IBM's JDK includes one of the fastest (if not the fastest) JIT compilers available for Java -- on any platform. It's really impressive.

    Secondly, I wanted to point out that TowerJ is not the only "static" Java compiler for Linux. Another very good system is GCJ [cygnus.com], a Java front-end for GCC, which is entirely open source, based on EGCS, and has pretty impressive performance!

    Matt Welsh
  • I sniff a troll, but here goes anyway:

    1. Due to design misfeatures, all objects must carry tags (to be checked for run-time type errors in many cases) and frequently be checked against 'null'. This situation can't be helped without sacrificing the safety of the language.

    So, you don't like pointers. Go and write in Visual Basic, then.

    2. Java is not a "free" language.

    In what sense? It's well specified (except for the threading, which will vary from implementation to implementation anyway), and the specs are free for everyone to download (unlike "pay us big bucks" ISO-specs for C++). Remember, AT&T made C++ - does that mean C++ isn't "free" either?

    3. Java is based on historic (obsolescent) programming principals. (ie C++)

    Java has learned from C++'s mistakes. Do you insist people write languages from scratch?

    4. Java implementations still frequently are unsafe (allow malicious code to run).

    Ah, the oldest FUD about Java. Well, I notice you don't document this.

    I think you are a fan of some obscure academic language nobody uses, and think that you can drum up support for it by attacking other languages. Sorry, things don't work that way.

  • [Fud deleted]

    Java is not interpreted, it's (usually) compiled (to bytecode - just like Perl and Python). The target platform just happens to be a virtual machine sitting on top of another system instead of being hardware. But why should one spend extra time writing in "C in wolf's clothing" when a more suitable language is there, just because you want a millisecond faster response time for a mouse click? Also: The JIT mechanism makes it possible to have efficient runtime compilation, where the "second-step" compiler can make adjustments based on actual runtime behaviour, which is impossible in C++ because it's compiled one place and run somewhere else.

    And C++ is such a mess of syntax that "efficient" and "C++" should not occur in the same sentence.

  • a 20% speedup isn't significant. I'm not saying we should abandon all speedups that don't affect asymptotic complexity; I'm just saying that I'm looking for speedups of at least 2x-10x before I'm impressed with anything. 20% is small stuff.

    I think you're being unrealistic about this. This is 20% across the board for modern server apps after all, and 20% is a pretty good improvement for a single kernel tweak. You have to consider the cumulative effect of a number of such tweaks. I have substantial difficulty even imagining a single tweak that could double performance let alone multiply it by ten times!

    The cost of a context switch is high given current processor designs, and is not likely to get lower. Even assuming that on a thread switch, since you're dealing with the same data as the previous thread was using, the TLB and code/data caches remain useful (on a process switch in general they don't, and refilling the caches is very expensive), you still have to store a whole bunch of stuff to memory for the old thread and bring a whole bunch of stuff of stuff out of memory for the new thread. And you've got to leave userland for a bit to do that. Slow slow slow slow.

    As I understand it, threads on Linux are implemented via lightweight processes, using the clone() syscall which basically does a fork without copying the majority of the execution environment. i.e. it runs within the original environment and only copies those bits which have to be unique to each thread. The overhead isn't nearly what it is to do a traditional process fork.

    I'm pretty sure threads are here to stay. They do make it a lot easier to design real time applications and scalable server applications.

    It seems to me that in general we need to reconsider the approach of relying on the operating system to schedule and share resources (in the case of chatservers and ftpservers and especially webservers, where we see the real performance hits for massive thread/process expenses). Right now all this stuff is based on the Berkeley sockets API, a high level network API (i.e. one that doesn't at all consider what the transport will be).

    Sockets do help to keep interfaces simple and standardised. Doesn't the modern Unix "streams" design (which IIRC standard Linux doesn't support yet) rely on something similar? Admittedly it's not very fast compared to other IPC methods available on a single machine.

    But the fact remains that your ethernet card is fundamentally a serial device. I have to wonder if it wouldnt' be possible to write a webserver which does know about the transport for a change, and which could in only one process sit there putting packets onto the wire at a level much closer to the hardware, and therefore save a lot of expense in making the operating system arbitrate all these zillions of threads that want to share the connection.

    There are enough wacky people out there looking for something unique to do, that somebody will no doubt have a go. But I'm certain it will enjoy only limited popularity. It's just not the Unix Way, and the Unix Way is a pretty important reason for the success of Unix and Unix-alikes.

    Apart from anything else, it ties the application to a particular hardware configuration, effectively making the server it runs on into a proprietary piece of kit. For that reason, it seems to me that the most likely people to try it would be a hardware firm, maybe a server vendor like IBM.

    Or maybe one of the network card manufacturers. If they designed a special NIC with the IP stack implemented in hardware or firmware, then the OS kernel wouldn't need any modification other than a simplified device driver module. I think that would eliminate the bottleneck you're referring to.

    Consciousness is not what it thinks it is
    Thought exists only as an abstraction
  • I'm not a hardcore kernel hacker but I do run IBM's JDK118 on Linux and I can see the benefit of increasing the file descriptors and tasks per user in the kernel beyond their defaults.

    Can anyone think of any reason why I shouldn't change these values? Would it affect performance elsewhere?

    --hunter
  • because the Java language lacks an interface for non-blocking I/O threads are especially necessary in constructing communications intensive applications in Java

    Wouldn't it be easier to add an interface for non-blocking I/O to Java??? Err - isn't it supposed to be an inherently multitasking language? Sheesh. This is an excellent example what's wrong with Sun's dog-in-the-manger attitude to the Java spec. "We already defined that, it can't be changed, fix it some other way! No, we're too busy defining new multimedia api's to take a look at it"
  • I really can't think of enough good things to say about IBM's open-source efforts. They're spending real money, and throwing good people at it. IBM has always known what quality control is about, and they're bringing that to the party. They're putting in whole products like Jikes and (more importantly from my point of the view) the Jikes parser generator. They're doing it without taking a holier-than-thou attitude, and without "cute" licenses like the SCSL. It's really hard to believe the old T.Rex of yore has turned cuddly.
  • Even assuming that on a thread switch, since you're dealing with the same data as the previous thread was using, the TLB and code/data caches remain useful (on a process switch in general they don't, and refilling the caches is very expensive), you still have to store a whole bunch of stuff to memory for the old thread and bring a whole bunch of stuff of stuff out of memory for the new thread. And you've got to leave userland for a bit to do that. Slow slow slow slow.

    Why? I don't see that. To switch threads in user space you push all the registers onto the stack, save the stack pointer, load the stack pointer of the next thread and pop its registers from the stack. It's just a few cycles, maybe 20 or so. If your thread is using floating point you've got more work to do - you have to save the FP context if another task was using it and load the FP context of the new thread. This isn't expensive because it doesn't happen often. (You do need to do some fancy dancing to detect automatically which threads are using FP and which aren't) You don't have to enter the kernel anywhere in this process, and that's a huge win. To make the user space task switches happen you sprinkle calls to a yield() function throughout the code, and at I/O points. This just doesn't eat much CPU time, compared to the enormous, odious cost of crossing into the kernel, killing the cache at a cost of several hundred cycles. And again when you cross back out. For Java, this approach is ideal because the VM has complete control of the code that gets executed - at load time it can insert the required yields as it sees fit.
  • recall that the only way to efficiently use non-blocking I/O is to select the file descriptors to wait on. This is tricky enough that making threading easier was probably the right thing to do.

    Providing an efficient, easy means of handling non-blocking I/O is no more difficult than handling any other kind of ansychonous event, such as a mouse click. The user program can register a handler, and each time an I/O is done the VM calls the handler. I don't see that as being hard, either in implementation or usage.
  • "To do it right this should really be a separate non-profit, but it could start out as an internal project at some large company."

    The greatest thing would be if IBM, Intel, Redhat, Suse, Caldera, VA Linux etc. could get together, create and sponsor such a thing - it would be very nice, and everyone would benefit from it.
  • Actually that is the complete _opposite_ of the Winmodem which moves the hardware functionality into a large driver.

    There is actually NICs with TCP/IP implemented in hardware, but the only drivers are for WinNT. I don't know that you would even want such a beast for Linux.
  • Now that's funny!!
  • Get on it.
    That's what the source is there for.
  • The issue is that there is actually very little difference in the Linux kernel between threads and processes. They're all lumped into the same scheduler, and that's what creates the scalability problem. Other OSes, like Solaris, schedule threads within their respective processes.
    --JRZ
  • by JohnZed ( 20191 ) on Wednesday January 19, 2000 @03:19PM (#1358295)
    Curiously enough, two years later Ousterhout turned around and touted TCL's threading features as a major advantage that it enjoys over Perl.
    I've programmed a fair amount with both threads in Java and non-blocking I/O in C, and the one-thread-per-connection model is VASTLY easier to program, maintain, and use. Non-blocking I/O leads to code that's extremely non-linear, and much more confusing, than multithreaded implementations. It's like having to work with code that uses a million goto's; you never know where you'll be executing next. Threading, on the other hand, achieves the same benefits, but it lets the programmer work at a higher level of abstraction.
    Are C++ and Java broken because they use, for example, object-oriented representations of streams rather than a series of calls to "write" on a file descriptor? Well, this difference does cause a performance impact. But if you can get your product to market twice as quickly by using technologies that extract a 15% performance hit, isn't that worth the difference? As operating systems improve more and more to cooperate with sophisticated threading models, the performance hit for using them will continue to decline.
    Rather than sticking our heads in the sand and saying, "Well, there's another, more confusing, less modern way to do it that doesn't require us to change the way we've done things for years," let's actually try to find ways to make programming easier AND produce a high-performance result.
    --JRZ
  • by JohnZed ( 20191 ) on Wednesday January 19, 2000 @06:47AM (#1358296)

    Interestingly enough, a heated thread on a related topic cropped up in the kernel-dev mailing list the other week. Check out Kernel Traffic [linuxcare.com] for the details, but basically it had to do with some SGI engineers who wanted to make a change in a threading mechanism to facilitate 3D graphics performance on Linux. Linus explained that he felt their method was, basically, an unmaintainable, inelegant hack that has crept its way into Irix for marketing purposes but will never be in the Linux kernel.

    The relevant thing in relation to the IBM article is Linus' discussion of the philosophy of fork() and how strongly committed he is to this model. He's stated quite often, in fact, that this thread scheduling mechanism (which schedules threads as separate processes) is a very intentional part of the kernel design.

    Personally, I think this opinion will pretty much have to change over time when people are able to demonstrate very elegant patches for the many-to-many threading model discussed in the IBM article. In fact, if I remember correctly, this is the sort of threading model that TowerJ uses in their native Java compilation system to achieve such great scalability on Linux. You can find plenty of examples of in-process scheduling code if you're interested in checking it out: GNU portable threads [gnu.org] is the first one that comes to mind, but almost every Java implementation offers this model as an option (green threads). The method IBM is talking about combines this inter-process tactic with the current, intra-process scheduler.

    It just makes sense that if you have 10,000 processes in a queue and you have to recompute goodness for each every time you enter the schedule, this will be a less scalable approach than if you'd created 100 processes with 100 threads each, so that thread_goodness only needs to be computed when that particular process is entered. Think about the management of a large corporation: does the top management allocate resources, set timetables, and otherwise schedule every single employee? No, they schedule a number of departments and projects, then the next level of managers schedules each of the employees within those.

    So far, I think this has been much less of an issue not just because Linux hasn't been focused on the enterprise space (where scalability to tens of thousands of threads is crucial), but more because the key server-side applications in Linux (Apache, etc), have been multi-process rather than multithreaded. Now, with the increase in multithreaded apps from Java (say what you will about the language, it makes threading MUCH easier than C) and, for example, the new Apache process models, we'll start to see serious real-world performance benefits for those OSes that have the best thread scalability. Linus, being the bright guy he is, will surely pick up on this make whatever changes are necessary. At least, that's the way I see it working out. --JRZ

  • I agree with almost everything you say... but have one minor nit to pick.

    An individual ethernet card is fundamentally a serial device, as you say. But don't many large servers have several network cards? I know the big Mindcraft server benchmarks used quad CPU, quad ethernet cards.

    If you made a webserver that knew about transport, it would have to know about dealing with multiple ethernet cards, right? And what if it's not ethernet, but FDDI or something exotic? Perhaps this would be more trouble than it's worth? Especially if the gains were small - as you point out, 20% improvements aren't that big a deal.

    On the other hand, isn't there a web server-in-a-kernel module designed for pure speed? Maybe it knows about transport, or could be extended that way...

    Someone who knows more than me can step in now...
  • As always, it depends on what you're doing. I'm still a Java evangelist to the nth degree, but I wouldn't think about writing a standalone GUI app in it yet.

    Nevertheless, in certain contexts, Java is the fastest thing around. In well thought out application frameworks (e.g. servlets) Java gives you better performance than you can get with C or C++ unless you're willing to invest years of work to duplicate those frameworks yourself.

    On new architectures and as SMP becomes ubiquitous, Java's pervasive thread awareness creates and runtime optimization has enormous potential that is only beginning to be tapped.

    As for those areas where it's just not there yet (client GUI apps, for example) you might look into Eiffel. It's a strongly OO language with a clean and simple syntax, garbage collection and compilers that target both native code directly and portable C code that can be run through a mature optimizing C compiler.

  • AFAIK, Oracle8i's approach is to abandon the idea of running a database application on top of a "general purpose operating system" since, typically, a database server only runs one application -- the database itself. So, a Oracle8i can be thought of as a "specific purpose OS" optimized for database queries with a database app on top of it. Much faster.

    Can someone who knows something about the GNU Hurd kernel comment on how it manages network connections? I remember reading something about how just about everything can be managed by a user-level process in Hurd -- does this include the NIC?


    -NooM
  • by noom ( 22944 ) on Wednesday January 19, 2000 @05:49AM (#1358300)
    There are compilers available for linux (TowerJ being one) but their primary benifit is for server-side code; it'd be much more difficult (but not entirely impossible -- proof-carrying-code would work) to ensure safety if you distribute binaries to clients. Indeed, the whole point of using platform independent byte-codes is so that the JVM can ensure saftey. Platform-specific machine code running on a server will probably coexist with platform-independent java byte-codes for client applications.

    -NooM
  • What if Larry Wall had called his language "BeeGeesAirSupply"? Would you want to use it?
    ---
    This comment powered by Mozilla!
  • by FascDot Killed My Pr ( 24021 ) on Wednesday January 19, 2000 @05:36AM (#1358303)
    This article gave me a hard-on.

    It's not so much about Java. It's mostly about threading under Linux. The meat of the article is about how to improve the scheduler.

    But the BEST part was the scientific attitude AND clear explanation (and proof) of the issues. This is EXACTLY what Linux needs. Maybe IBM would like to fund an idea I've had for a while:

    Set up a lab that does nothing but Linux benchmarking. This lab would research things like the scheduler issue from this article, memory access patterns, filesystem layout, etc. All of this research would be available to the public for kernel development, third-party developers, benchmarketing (and rebuttals thereof), etc. The lab could also provide patches to "fix" issues, but that would be of secondary concern. The main purpose would be to supplement the (usually excellent) intuition of the kernel programmers with some hard science.

    To do it right this should really be a separate non-profit, but it could start out as an internal project at some large company.
    ---
    This comment powered by Mozilla!

  • It would be interesting how a Crusoe chip would perform with a Java bytecode instruction set.

    Normally it should be morphing it as fast as the x86 instruction set, thus giving the performances of a compiled code for Java applications?

    Can someone more knowledgeable than me comment on this???
  • Be patient, gcj which is a part of gcc nowdays does that, if it gets a proper libgjc that is(libgjc = classes and stuff). I wouldn't expect anything from sun though...


    "Now if just someone could get on the stick and create a Java-like language that compiles directly to run on bare metal. Meanwhile, I'm painfully relearning C and C++ to get the kind of performance I need out of my applications. "
  • It has evolved and works great!

    In the near future, about half the time it will take sun to release a buggy and outdated JDK for linux ;) well be able to run java in a totally free runtime either compiled as bytecode or as native-code.
    Have a look att these and see for yourselves :

    http://www.gnu.org/software/classpath/classpath. html

    http://www.japhar.org/

    http://www.transvirtual.com/kaffe.html

    http://sourceware.cygnus.com/java/
  • The performance decrease introduced by a VM can be important for some programs (many GUI style apps will *not* suffer from it). Then you can either use a Source-to-native or a byte-code-to-native compiler. There are commercial and free products available (this list [geocities.com] includes some).

    Another workaround is the JNI, which lets you include non-Java code (e.g. C). It is used for the ZIP I/O, as an example.

    There are lots of numbers out there comparing C++ to Java performance, where C++ has a mere 10 % advantage. I'm not sure if this is always true, but with Sun's 1.3 beta and HotSpot or IBM's VM, you do get pretty decent performance.
  • >Because the Java API does not include non-blocking read and write APIs, VolanoMark is implemented by using a pair of Java threads on each end (client and server) of each connection

    Obviously broken.

    Overload your system with an absurd and absolutely useless number of threads and then complain about the time the kernel spends in the scheduler. Idiotic...

    Sun really should add this trivial non-blocking functionality. And bring Java out of the toy stage.
  • >If X = 100, that makes sense, but if X = 10,000 it doesn't. Or would you dynamically spawn more processes to handle greater and greater loads?

    Not dynamically. Thats not needed.

    Just use a fixed number of, lets say 20 processes that each handle five percent of the maximum load/number of connections.

    No dynamic scaling necessary. The system doesnt need to be efficient in the low-load case. Only the efficiency for the maximum load case is of interest.
  • What I find most amazing about this article has nothing to do with the Linux scheduler nor Java.

    The amazing part is that in the process of porting a product to Linux, IBM has taken the time to formally look into how to make it faster. This isn't earth shattering, but what they did after that is: they presented an open solution, on top of that, a patch!

    It would have been much easer for them to have simply complained that the scheduler was too slow and possibly not port Java (or quit dev.) until the "Linux community" fixed it. IBM's approach shows that they (or at least those that wrote this paper) consider themselves part of the "Linux community" and are willing to work within it.

  • A language like the one you are looking for actually exists, and it is Objective-C. The object model in Java is actually a (for, as I believe, security reasons) dumbed down version of ObjC, which in turn is more or less Smalltalk in disguise (i.e. a really nice OO environment).

    It's been a while since I touched Smalltalk, but as I recall, you can send any message to any object, and it's only trapped as an error if there's a mismatch at runtime; Java is much more strongly typed than Smalltalk. I thought Objective C was similar to Smalltalk in that regard, meaning ObjC isn't all that much like Java.

  • This really is one of the goals of the Linux Technology Center at IBM. It was founded about mid 1999, and intends to do the same thing for Linux that the Java Technology Center does for Java.

    I think it's great that we are seeing this kind of stuff so soon.

  • its already included. check the linux kernel mailing list and alans reply. BTW, many to many is not going in - userland threading is bad bad bad.
  • You can get upto 20 fps at 900 x 642 pixels, 24bpp from Java on a pii-333 or higher. not great - but it should be good enough.
  • www.gjt.org for some really nice classes too.
  • Sure VMs will always be slow. But that is no reason to abandon platform independent binaries. Take a look at the research on Slim Binaries for Oberon: http://caesar.ics.uci.edu/oberon/research.html

    "This paper presents an alternative approach based on "slim binaries", files that contain a target-machine-independent program representation from which native code is generated on-the-fly at load-time. The slim binaries used in our implementation are based on adaptive compression of syntax trees, and not on a virtual-machine representation such as p-code or Java byte-codes. They are highly compact and can be read from a storage medium very efficiently, significantly reducing the I/O cost of loading. The time thus saved is then spent on code generation, making the implemented system fast enough to compete with traditional loaders."

    Many languages (eg Perl, Smalltalk, elisp, VB, Java) rely on something VM-like at an underlying level. I think a lot of brain cycles are being wasted on reinventing VMs, freeze/thaw mechanisms, portable binaries, etc. Part of the problem is that the VMs are too closely tied to the languages. It would be a Good Thing if there was a project looking at separating out the pieces for on-the-fly-compiled and interpreted languages, much as EGCS does for compiled languages.

    For such a system to catch on it would have to be a clear winner over existing VM systems. The Oberon Slim Binaries are just such a winning technology. With perl already being rewritten from scratch as Topaz, and pretenders banging at the door for elisp (CLOS and guile versions of emacs are in the works), is this such an impossible dream?

    (yeah yeah I know it is. dont mention 'eval'...)

    -Baz, living on another planet, as usual.

  • Sun is killing Java if he still hold Java so tight. How can we let him know this, or he know it already but doesn't care?
  • See this [idiom.com] for a couple of small but real benchmarks on the IBM java; source included. Java speeds have improved considerably and are continuing to improve. Someday soon they'll be at the point where it's not an issue (e.g. 1.3x slower than C). Note that on one of the benchmarks the IBM jdk is *faster* than g++ -O. Huh? The processor was a celeron; I assume that the IBM jdk used PII instructions whereas g++ -O needs an additional flag. But still...
  • ... or maybe not that interesting.

    The interesting bit seems to be reorganizing some of the task structure so it's cache-friendly in one stress case. OK, that's healthy. Applause; it'll create some breathing room in systems overloaded in that way.

    But that doesn't really look at the hard issues. There was handwaving about multi-level thread models; good thing there was a ref to Solaris threads there, which has had that for over six years now. That "many to many" threading stuff is also called "two level scheduling", which can be nice (cheap to switch between user mode threads, like Green threads) but isn't always good (excess kernel interactions on thread wakeups).

    If this is the start of ongoing work at IBM, I'll be pleased. Not many Linux folk have access to the sort of measurement tech IBM applied here. But don't overrate this specific contribution; take the cache-line patch (assuming it doesn't slow anything else up) and move to the next problem.

    Volano, for all the press it gets, isn't a very good benchmark.

  • because the Java language lacks an interface for non-blocking I/O threads are especially necessary in constructing communications intensive applications in Java

    Wouldn't it be easier to add an interface for non-blocking I/O to Java??? Err - isn't it supposed to be an inherently multitasking language? Sheesh. This is an excellent example what's wrong with Sun's dog-in-the-manger attitude to the Java spec. "We already defined that, it can't be changed, fix it some other way! No, we're too busy defining new multimedia api's to take a look at it"

    I can't say for certain, but I suspect non-blocking I/O isn't available because operating systems that aren't unices don't necessarily support it. Plus, recall that the only way to efficiently use non-blocking I/O is to select the file descriptors to wait on. This is tricky enough that making threading easier was probably the right thing to do.

    That said, the fact that Java makes threading syntax, at least, easy is a bit scary. Java makes it easy to spin up multiple threads, but the underlying problems (deadlock, starvation, etc.) with multithreading still exist. Simply throwing synchronized here and there isn't magic, and will substantially slow a program when used unwisely.

    Using threading safely means isolating the portions of code that have to be multithreaded to the bare minimum - i.e., at points where message passing occurs. And yet the "it's easy, just spin up a thread!" mentality is found in many books/tutorials, etc. Scary.

  • Granted. And it's certainly a paradigm common elsewhere in Java. I guess I had my head wrapped around UNIX-style asynch I/O while writing.

    There are still a great many thorny issues, though - what thread does your "listener" receive data "events" on, for example? The AWT event thread is the only one that would make any sense, as it's the only one in which it is safe to update UI (for Swing, at least). With a mouse click, all the data relevant to the click can be delivered at once. But with I/O, if data isn't delivered in a chunk size big enough to make UI changes with, you'd have to batch it somehow. Why not batch it in a separate thread? But now we're back where we started.

    Life would be much more interesting for people implementing their own filter streams - underlying streams would have to pass data up into the filter, to determine if data was available to pass out of the filter. If you're chaining filters, when does the data cross the thread boundary? (As it will need to if being delivered in the event thread.) The current model makes writing your own I/O manipulation simple, and I think there's a lot of value in that.

    I/O is inherently complex enough that trying to hide the complexity behind an event model is probably counterproductive.

    That's not to say I haven't wished I could addSocketListener() from time to time. I just don't think it'd be less painful, ultimately.

  • A quick addition to my reply - just wanted to acknowledge that a good bit of what I just wrote is quite client-centric (I'm working on a client right now...or should be...:). I still think that the simplicity of blocking I/O outweighs marginal performance gains, especially if the JVM has to jump through hoops to simulate asynchronous.
  • I took a Java programming class last year, and the teacher told us that IBM had the fastest JVM. (Both on Windows and Linux platforms.)

    Since my professional future is going to be linked to Java for the next years (my firm just started up an authorized Java center), I like that.

    For now, the only thing I need the Sun JVM for, is because the IBM implementation is still at Java 1.1.

  • Well, I can't agree with you there norm.

    The thing is that 2 or 10 times speed up is great, on the java side. But java is kinda known to have all sorts of inefficiencies (sp?).

    however, 20% on a [presumably] well tuned peice of code like the scheduler if fantastic. This benefits not only java, but all thread intensive processes.

    Harking back to the paper; I believe that a one-to-one model is the way to go, but with a poilicy that favours threads that already have their memory space "loaded". Then you don't take the process switch penalty to the caches and TLBs. (being careful not to starve any process, of course).

    I've only had time to skim the paper, so maybe this is what they've done.
  • Talking out of ass mode activated. No lighters please.

    One issue which hasn't been raised (here, that is) is how much processor time an application gets. In a one-to-many implementation, it didn't matter how many threads an application spawned, since the application is the unit of schedule. How it divides its cycles across threads is its own concern.

    In the one-to-one model, the opposite is true; if an application has hundreds of threads, and each one is scheduled independently, the application could easily eat up 75% of all the cycles of the machine. (leading the user to conclude that one-to-one scheduling is slow)

    I'm sure the powers that be have thought about this and have some sort of scheduling policy for this, but of course that makes the scheduling alg that much more expensive. Just thought I'd bring it to attention

    So I think the 2 phase scheduling of Irix is pretty neat (as heard third hand in an earlier post above).


  • by javatips ( 66293 ) on Wednesday January 19, 2000 @06:10AM (#1358327) Homepage
    I develop with Java since the end of 1995 (Java 1.0 Beta2).

    Over the years, I have seen a drastic increase of performance of the JVM.

    Now I have to disagree with you. In multithreading application I have seen Java beform better than C++. The application where build by the same person and used the same architecture. (The guy was a beginner in Java and experimented in C++.)

    Currently I develop server-side component based (EJB) application (using application server written in Java - WebLogic) and batch processes written in Java. I can say that they perform really well.

    From my experience, by coding carefully you can achieve wonderfull performance. We add a batch application must process 6 to 12 millions of record per day (and do a lot of processing on each records). The first version of the batch was doing 7 record per seconds (di not meet our requirements) by optimizing the code and changing algorythm we went to > 300 record per second.

    Maybe we could gain a 10% to 20% more speed if we rewrote the whole thing in C++. But it would take at least twice the time to develop and will not be as stable as the Java version.

    I conceed that Java is a little bit slower that c++ (not in all cases) but the gain in programmer productivity and stability is really worth it.
  • Cute story. Sadly, from a humor-related point of view, it's an urban legend.

    I'll tell you something that is true, and that doesn't reflect particularly well on Ada's genesis. Ada's design was commissioned, though not actually executed, by the US Department of Defense. Just as with most military equipment, the Ada language would be put in the hands of poorly trained, minimally talented military programmers, and had to function acceptably under those conditions. I don't know if that goal was ever actually achieved, but that's the source of the language attributes which cause most younger programmers to chafe under its "fascist restrictiveness". Believe me, after you've written enough code, and made enough stupid errors that any rather bright chimp would probably have avoided, you begin to realize that all those "restrictions" are actually helping you.

    Ada's real downfall was the mandate--something in human nature rebels at being forced, so the various DoD departments expended their creativity getting around the Ada mandate instead of using all that brainpower writing cool code. Now that the mandate is dropped, Ada is experiencing a resurgence in popularity, not the least in the free software world.

    It's not a miracle language, regardless of what some real or imagined DoD hypemeister might have said about it, but it's damn nice, a pleasure to use when used properly, and a solid tool for the development of large, robust programs. But don't take my word for it ... see for yourself. [adahome.com]

  • by Scurrilous Knave ( 66691 ) on Wednesday January 19, 2000 @06:55AM (#1358329) Homepage

    If you're looking for a portable language that compiles to native machine code and which implements much of Java's semantics, check out Ada 95. You can find information here [adahome.com], or download a complete GPL'ed compiler here [nyu.edu].

    I'm totally serious, folks. Do not regale me with tales of how much Ada sucks--most originate from introductory CS classes where Ada83 was shoved down unwilling throats by indifferent or hostile educators. Please, go read and experience for yourself before replying. And for those who dispute my claim about Java semantics, please pay special attention to the links on this page [adahome.com] before you comment.

  • It seems like the best solution of all would be if Be would donate their threading source to Linux. Most of the reason Be is so fast is that it has such a good thread scheduler (they probably have several patents on their techniques, in fact).

    Of course they need to make a living too. The trick would be to figure out how they could donate the code yet still have a way to come out ahead (they've invested many millions in development). I honestly can't think of a good solution. Ideas?
  • Why? I don't see that. To switch threads in user space you push all the registers onto the stack, save the stack pointer, load the stack pointer of the next thread and pop its registers from the stack. It's just a few cycles, maybe 20 or so. If your thread is using floating point you've got more work to do - you have to save the FP context if another task was using it and load the FP context of the new thread. This isn't expensive because it doesn't happen often. (You do need to do some fancy dancing to detect automatically which threads are using FP and which aren't) You don't have to enter the kernel anywhere in this process, and that's a huge win. To make the user space task switches happen you sprinkle calls to a yield() function throughout the code, and at I/O points. This just doesn't eat much CPU time, compared to the enormous, odious cost of crossing into the kernel, killing the cache at a cost of several hundred cycles. And again when you cross back out. For Java, this approach is ideal because the VM has complete control of the code that gets executed - at load time it can insert the required yields as it sees fit.

    Unless I misunderstood your post, you've saved some kernel time and made a solution that is completely unscalable. The reason for kernel threads is to allow scaling on multiple processors. In a user space implementation, your single process with multi threads gets scheduled to run it's course. Only, instead on each thread getting n-ticks, each threads may get n/threads ticks. This assumes that each thread can execute in n/threads duration. Furthermore, it only executes on a single CPU, which means the process will only be scheduled once on a CPU and will never run concurrently.

    The kernel thread implementation, from what I've read, is pretty good. Not to mention, it's implementation was thought out with much debate. This isn't the first time I've heard about the scheduler needing attention. Currently, I don't see that the scheduler is going to fix very much when there are still device drivers and parts of the kernel that require fine grained locks. Targets like this are obvious and the reason why Microsoft loves to benchmark Linux with multiple NICs. Plain and simple, fix what's broken and then see what else needs to be optimized (e.g. scheduler).

    I would also like to time a moment to point out that Linux's context switch time is, in most cases, 20-30% less than most real time OS's, and in some cases 1/2 to 1/3. Having said that, I don't really have a problem with threading. I don't really remember the exact times (I've slept since then), but Linux's context swithes were tipically two to three times faster than NT's (this was 3.51 days if I recall). NT actually does scale well for what it is. To repeat, let's fix what's broken and increase the level of concurrency (this is why NT has some advantages) in the kernel. Then, worry about scheduling.

  • isn't oracle's oracle8i approach similar to this idea? abandoning the os and let the app do all the work directly w/ hardware...
  • It seems that the Irix lower level scheduler would be a duplication of effort and too little too late. (have to wait for efficient ordering.) AmigaOS inserts threads into an ordered linked list to absorb the O(n) cost immediately. Both methods require O(n) time for each of n threads, making O(n^2). Irix reduces the impact by using an O(n) algorithm most of the time. AmigaOS reduces this overhead by amortising cost. Both algorithms fail "when there are lots of threads that have roughly equal goodness".

    I suggest the use of the sorted list shortcut to stuff lower priority tasks behind the head of the list. This is linear for all situations and only suboptimal when there is a large number of unequal tasks, in which case the processor has plenty to do anyhow.

  • by Baldrson ( 78598 ) on Wednesday January 19, 2000 @07:13AM (#1358334) Homepage Journal
    Systems in which light-weight threads are first order constructs, such as Mozart [mozart-oz.org] illustrate why relational programming will eventually subsume functional or procedural programming:

    Functions are special cases of relations.

    It's important to build relational semantics (light-weight threads with logic variables or their equivalent) in at the kernel. Otherwise, you end up kludging around, either recreating it at the higher levels or malanalyzing your relational task to fit your functional tool.

    Open source studies like this one are increasing awareness of the need for light weight threads.

    That's good.

    The next step will be for people to recognize that what they are doing with all those threads is essentially relational in nature so they can really address the impedance mismatch between relational database and object oriented programming.

  • I learned some java in college, and I was somewhat impressed with the language in general. It appears to be very versatile (much like C++) and the whole idea of JIT compiling brings a new dimension to portability. I found a version of Napster on www.download.com [download.com] that was written in Java but I haven't compiled it yet. What performance benefits are there in Java versus C++ or Perl??
  • If you're talking about out and out raw speed, the there are no performance advantages in using Java. None. Java is slow.
    That said, it runs pretty much _anywhere_. If you doubt the enormous benefit of this, check the source of some fancy webpages. You'll see "if browser == netscape then elseif browser == IE" type statements in the page code. This is a huge pain in the arse when you're banging out the web pages - you must effectively write and test _each_ webpage for a variety of browsers.
    So, now, back to Java - it pretty much runs uniformly on anything. Not perfectly, but not far off, either. Multi platform code is a doddle. So if you write all your web server processes in Java, you can easily persuade your PHB to ditch NT in favour of a Linux box, with little re-testing, for example. You've just future proofed your webservers from shifts in hardware design, and all the associated costs. Nice.

    Strong data typing is for those with weak minds.

  • I was fairly impressed with the diverse set of other languages that are built on the Java Virtual Machine.

    http://grunge.cs.tu-berlin.de/ ~tolk/vmlanguages.html [tu-berlin.de] lists Basic, Logo, Eiffel, Ada95, Forth, Tcl and many "new" languages, all implemented using JVM bytecodes.

    A good book to get you started thinking about such things is Programming for the Java Virtual Machine by Josh Engel. It builds a bytecode assembler language he calls Oolong, and then implements both Prolog and Scheme from there. He also throws in a regexp compiler as an exercise.

    Wins in reusing the JVM for your embedded language of choice:
    • Most machines already have a JVM installed and configured, thanks to Netscape vs IE wars
    • Compliant JVMs can be upgraded and all the users of that JVM gain the performance benefits immediately

  • I used to be a Java evangelist, to the 10th degree, but now I have come to have a change of heart. I still love the language, the language IMHO being the best thing I've seen in years. It's that part about it being an interpretted language that really bugs me. Though I've seen significant improvements in speed over the years, still nothing to be overly excited about. And the so-called Revolutionary HotSpot engine promised by Sun turned out to be a major dissapointment. Fortunately I've come to see the error in Sun's thinking and to realize after all that Native code is indeed the way to go. Now if just someone could get on the stick and create a Java-like language that compiles directly to run on bare metal. Meanwhile, I'm painfully relearning C and C++ to get the kind of performance I need out of my applications.

    Regarding Java on Linux, I've observed that it always runs slower on Linux than on windows which is a real shame considering Linux is a better all around server. IBM's VM runs quite good by comparison, but it is still topped by IBM's VM in windows.



  • In medieval times, alchemists kept their recipes hidden in their books. Promises of golden riches, available Real Soon Now, could never be kept.

    Then came science, where knowledge was published and shared. And discussed. And improved. And whole society prospered.

    What huge difference it made when the scientific model was applied to physics, chemistry and such in the 18th, 19th and 20th century. Imagine what there is to come, now that we also apply it to software!

    Source code is knowledge. Knowledge is power. So, open source is power for everyone, right?

  • Comment removed based on user account deletion
  • This isn't supposed to be flamebait. In fact, your response contains basically no worthwhile arguments -- it merely attacks the poster. Let's have a fair argument.

    Java may be popular, mostly due to Sun's massive marketing of it, but that doesn't mean that we can't "attack" it. Yes, there are better alternatives, conjured by academia and therefore not well marketed. These ideas may not reach "IT Professionals", but the linux community generally is not the kind to fall prey to marketing hype.

    > So, you don't like pointers. Go and write in
    > Visual Basic, then.

    Visual Basic? That wasn't the "point" at all. There are languages with "pointers" which do NOT require runtime tag checks, and can even garbage collect without tags! Tags increase the size of *all objects* by at least a word. These languages are often much more expressive than Java (higher order functions come to mind).

    > Ah, the oldest FUD about Java. Well, I notice
    > you don't document this.

    Well, I've seen enough Security Updates in the last few years (see windowsupdate.microsoft.com, for instance) to justify it, I think. The main problem is that security is left up to the client implementation, and the security code is *complex*. With the PCC techniques I suggested, the client proof-verification code is very short (about 60 lines of C code) and much more trustworthy. The burden of safety then lies in the hands of the software provider.

    > Java has learned from C++'s mistakes. Do you
    > insist people write languages from scratch?

    No -- I am suggesting that we could help pioneer a pardigm shift. Java does indeed improve upon C++ in language design (whether this is worth the performance penalty is an issue of opinion), but it is still based on 30 year-old programming practices. Software systems are getting too large to be debugged by willpower; compiler assistance with advanced languages can go a long way to help produce bug-free (even mobile) code.

    - Tom 7
  • I'm surprised to see no mention of scheduler activations, or the elegant asynchronous model developed by Inohara-san and Masuda-san. You can get the paper here:

    http://www.is.s.u-t okyo.ac.jp/tech-reports/TR94-02-letter.ps.gz [u-tokyo.ac.jp]

    There was a recent discussion on NetBSD's mailing list about implementing this high-performance thread architecture for NetBSD. You can read about it below, under the ``upcalls'' thread.

    http://mail-index.netbsd.org/tech-k ern/1999/12/ [netbsd.org]

    Tru64, Irix, and Solaris 2.6 all use scheduler activations. Linux does not. And IBM is not suggesting it. The Masuda Lab implementaiton is slightly better than scheduler activations and far more elegant. I do not think commercial OS's have adopted the Masuda & Inohara architecture.

    Naive many-to-many without scheduler activations is simply not an efficient enough threading model. Note that all this research I have cited is over five years old, and yet Linus still wants kernel threads like NT uses (does he?). In my opinion, it is very important to keep current with a diverse array of research, because I've found that when I only pay attention to white papers and press releases from Intel, Sun, IBM, Ars Technica, whatever--I miss a lot of highly relevant work that later turns out to be every bit as essential as the usual wise softspoken few knew it was going to be from the beginning.

    If you liked the IBM paper, I highly reccommend reading the ones I've referred you to. You might look at some academic papers on scheduler activations or threads in general from your neighborhood university library--they're surprisingly accessible even to people like me who aren't research studs.

  • by john@iastate.edu ( 113202 ) on Wednesday January 19, 2000 @07:03AM (#1358345) Homepage
    I'm reaching way back in my memory here, but I recall a white paper (perhaps from Usenix) from SGI where they investigated how to keep their scheduler from using so many cycles - not so much from a "improve throughput" thrust, but more so to "improve responsiveness".

    Their conclusion was that what you wanted to do was have a two-level scheduler -- a real quick + dirty part that ran at interrupt level and just grabbed the next runnable processes from a circular list of the highest priority processes -- in and out in just a few cycles, but perhaps not grabbing *the* highest priority process this time -- then "every so often" (in computer terms, e.g. some fraction of a second) a lower level scheduler ran which did a more thorough re-ordering of the processes.

    Of course, one immediately sees that this lower level scheduler could even be a regular process (making syscalls) which means you can plug in whatever scheduling algorithm you like.

  • ...each Java thread in the IBM Java VM for Linux is implemented by a corresponding user process whose process id and status can be displayed using the Linux ps command...

    The striking observation here is the amount of kernel time spent in the scheduler .... between 30 and 50 percent for kernel 2.2.12-20 and between 37 and 55 percent for kernel 2.3.28...

    ...it became apparent that a significant amount of time could be spent calculating the scheduler's goodness measure...

    We wondered what would happen if the fields required by the goodness function were placed together in the task structure....

    The Linux scheduler needs to be modified to more efficiently support large numbers of processes.

    The Linux kernel needs to be able to support a many-to-many threading model.

    We look forward to working with the members of Linux community to design, develop, and measure prototypes of Linux code to support the changes described above.


    --------------------------------------
  • Sun makes great servers, and Solaris isn't all that bad, but java isn't that great. I quite frankly don't care if they lock java up in the basement and never release it. Let's not forget that we as developers should be working with what we have or making new things, not freaking out because we cannot get Sun to release java. We can develop a much more powerful language for performing the same tasks. My friend and I are working on KornBizkit, it should be able to completley replace things like java and activeX. In the mean time, let's just hope that Mozilla builds in enough java support to keep us going with those annoying little java pages.


    Munky_v2
  • KornBizkit is what we (the developers) call it, because we are either listening to Korn or Limp Bizkit when we are writing it. It doesn't actualy have an official name yet. I think that we are going to have a vote among our beta testers as far as what it will be called once (if) it is released.


    Munky_v2
  • I don't recall slamming Java. All I said was that I don't care for it. You on the other hand did slam me. That's not cool. What makes you think that just because you have an opinion it's right.

    BTW Its Munky_v2


    Munky_v2
  • That's about the best word I can think to describe that article. I knew that there were still issues with the way that Linux handle threads, but didn't really know much beyond. I was very surprised, though, to see that the 2.3.X kernel was slower than the 2.2.X. That's actually a little backwards from what I'm used to. Of course in some of the post 2.3.28 kernels, there are probably some scheduler changes that I'm not aware of.

    I say "Hat's Off" to IBM for providing the community with all their work and research. I'm sure that Linus will find some way to use and improve upon their patches to make it all even faster. As I'm beginning to have to work with Java more and more, I'm all for speed increases!
  • ObjC has the advantage of being just a very small and simple extension to ANSI C (and hence easy to learn), and the extension does not alter C itself, like C++ does.

    Except for the heratics who don't grok C. We are not all kernel hackers, ya know. I know the language well enough to know I dislike C's feel. Oddly enough, I do like java and lisp and perl. -coyo the heretic

  • Certainly light-weight threads as first order constructs are a very valuable thing. That's the sort of thing you've seen in Smalltalk for a long time. For that matter, it seems like a lot of the lesson of the last few decades is that for almost anything it's useful to have it as a first-order concept. Reflection rules. Hey, Java even has reflection. Sort of.
  • The source is just an implementation in a certain language by a certain person. The algorithm and thoughts and ideas BEHIND the program... THATS the real knowledge that's valueable. It's therefor more important to document why what is put where in the sourcecode than just releasing the sourcecode.
  • A product which does exactly what you want can be found here [towerj.com], and according to the volacanomark for Java, it is the fastest way to run Java.
  • I used to be a Java evangelist, to the 10th degree, but now I have come to have a change of heart. I still love the language, the language IMHO being the best thing I've seen in years. It's that part about it being an interpretted language that really bugs me. Though I've seen significant improvements in speed over the years, still nothing to be overly excited about. And the so-called Revolutionary HotSpot engine promised by Sun turned out to be a major dissapointment. Fortunately I've come to see the error in Sun's thinking and to realize after all that Native code is indeed the way to go. Now if just someone could get on the stick and create a Java-like language that compiles directly to run on bare metal. Meanwhile, I'm painfully relearning C and C++ to get the kind of performance I need out of my applications.


    I would never consider Java to be a replacement for C/C++ -- they are different tools suited to different purposes. One programming language will never meet every programmer's needs for every concievable task. The DoD learned this lesson the hard way in the 80's when they mandated that Ada be used for all new software projects.

    All programming involves making tradeoffs: you cannot optimize EVERY variable, as many of them are inversely proportionate to one another. If the most important design criteria of your application is raw performance, then definatly code it in C (with in-line Assembly as required). If fault tolerance is your overriding concern, use Ada. And so forth...

    The vast majority of programmers in the business world are not writing performance-critical code. For most business applications (which is what the vast majority of programmers do from 9 to 5), the overriding concerns are not performance and reliability, but ease of coding and maintainability. This is where 4GLs like VB, PowerBuilder, and Delphi shine - they allow programmers of average skill to produce (at least somewhat) functional software quickly.

    If you compare the architecture of most 4GLs to Java, they are remarkably similar: you have a compiler which produces p-code/bytecode, which are then executed in a dedicated run-time enviornment. What makes Java different is that the language itself is vastly superior, and it's much more of an open standard than propriatary 4GL's like VB and PB. OK, so it's not GPL'ed nor totally open - big deal, it's more than M$ will ever do with VB. In my book, Good Enough Right Now beats Perfect Sometime Real Soon, in my book. For many (most?) business apps, Java is good enough.

    If I'm a business manager, and I want an application to do X, typically I want it done yesterday. A programmer of average ability can produce a working application in less time using Java then a s/he could using C++, and that code will be more maintainable. If the performance is unacceptable, it's often more economical to just buy a bigger box than it is to hire a coding wizard who can wring every last ounce of performance out of the current hardware.

    (This is not to say that I advocate bloatware or mediocroty; I just realize that it's often a financial reality. Face it, not every coder is a guru; most places have to make do with the people they have.)
  • Now if all KornBizkit programmers wore those black eyeball things like the guitarist from LB, it could start to get interesting ;-)

    BeeGeesAirSupply?!? Stop the insanity!!

    ....I'm all out of love...lala...

  • http://www.citi.umich.edu/projects/linux-scalabili ty/index.html

    "The primary goal of this research is to improve the scalability and robustness of the Linux operating system in order to support greater network server workloads more reliably."
  • Some other schedulers don't do any kind of O(N) "goodness" calculation over all the runnable threads. Simple real-time schedulers (which admittedly have other deficiencies) just choose the thread on the front of the highest non-empty run queue. Other schedulers use stochastic (i.e. partly random) methods to pick the next thread without having to look at all the runnable threads.
  • I know ! Iknow !, lets make ourselves sound real intelligent by slamming java.
  • I know ! Iknow !, lets make ourselves sound real intelligent by slamming java.
    Don't get me wrong I think Sun's current stance is not in the best interest of the technology, but its not really that big of a deal considering that IBM has a much better implementation of Java anyways. They have a better IDE and their JDK1.1.8 rocks.

    If you were really planning on coming up with some cool new web techlogy that is better than Java or ActiveX(As if those are solely web technologies or even like technologies for that matter) then you could learn alot by studying them insteading shooting your mouth off.

    I'm actually hoping you do write some code for this reason: A dog can dig and find a morsel of corn in a pile of shit, there is a good chance that if you coded for ten years another monkey could find something in it not totally worthless.
    Just change the name to KornInShitBizkit.
  • I'm not an OS kernel expert ... but ...

    From the IBM article, it appears that the Linux scheduler must calculate a goodness number for each thread, and find the "most good" thread. This is clearly O(n) where n is the nos of threads. The IRIX approach is one way to make this faster, but its not clear how well it would work when there are lots of threads that have roughly equal goodness when they are runnable ... as would happen with the benchmark in question. I expect that IRIX would need to run the lower level scheduler a lot more often than in a "typical" mix of threads/processes.

It is now pitch dark. If you proceed, you will likely fall into a pit.

Working...