Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Next Generation C++ In The Works 479

lamefrog writes: "Bjarne Stroustrup and other members of the C++ community recently met to discuss new work on the language for the first time post-ISO standard (C++0x) in an effort to keep the language moving, avoid fossilization and avoid being overtaken by proprietary extensions. Suggested new features center around the standard library and include support for multi-threading and simple platform-independent systems abstractions (handles, TCP/IP, memory maps etc...)" (Read more.)

"Most intriguing is a suggestion to include extended type information that will eliminate the need for IDLs and make it possible to generate standard bindings to external systems (COM, CORBA, SQL). Clearly Bjarne wants to position this as a platform-neutral, vendor-neutral, standardized alternative to the proprietary, vendor-supported languages that have emerged over the recent years. Audio MP3 and slides available on Dr. Dobb's TechNetCast." Thoughtfully, it's available to download as well (not just streaming), and accompanied by a transcript. Good listening.

This discussion has been archived. No new comments can be posted.

Next Generation C++ In The Works

Comments Filter:
  • It's not a bug, its a feature. Although the handling of less thans is less than ideal (it should just convert them), the adding of spaces to URLs is intentional. Before slashdot did that, there were some jerkoffs who liked to post comments with 10000 character long URLs that would cause most browsers to either choke or make you constantly scroll left and right to read the comments. The breaks in long lines (URLs are very long lines) prevent that from happening, although it does add some annoyance to copy-and-paste URLs (Hint: use the Anchor tag!)

    Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs.
  • Umm, my major problem with C++ is execution time. I try and write as little as possible in C++, sticking mainly to C and Perl for anything which needs to execute quickly

    You're claiming Perl executes faster than C++? Bwahahaha.
  • By creating yet another language, they are just adding to the problem (of incompatible C/C++ derivatives). It would be better to pick one of the existing next-generation C languages and declare that the standard, or at least make the standard source-compatible with said language to ease transition.

  • Here is the problems I had when trying to compile code (that worked on GCC 2.95 and Borland 5.0) with Visual C++.

    - The compiler crashes if you look at it the worng way. Internal errors everywhere.

    - The for scope was wrong. Luckily, someone suggested a macro that would compensate for that.

    - Lots of silly arbitrary limits, for example debug symbols can at most be 255 chars. That is ok for C, but not for a mangled C++ template. There is a pragma that remove the warning, except that it doesn't work for static objects.

    On the other hand, their development environtment is nice, and their library beats GNU and Borland.

    Will 7.0 make me happy?

  • > The C++ definition used to say the scope of
    > variable was the scope that surrounded the for
    > loop.

    That was many, many years ago.

  • > The reason why it's not the default setting is obvious.

    GCC implement the "new" semantic, but still accept code that would be valid with the old semantic with a warning (by default).
  • // Work around broken for-scoping
    #define for if(0);else for
  • I doubt you will have to fight for features already in C99, while Bjarne probably won't get his wish for a common C/C++ standard fullfiled, I assume the new C++ standard will adopt most of the C99 features. Even though the C standard commite didn't made the job easy. I.e. C99 has a complex keyword, giving trouble for the C++ complex template.
  • Uses "class" to create new types.

    I like that this works:

    typedef double time;
    typedef double distance;
    typedef double speed;

    time hours = 2.0;
    distance miles = 30.0;
    speed mph = miles / hours; /* not an error! */

  • > 1. Losing the pointless duplication of
    > declarations in .h files.

    Not necessarily a good thing. Redundance catches errors.

    > 2. Virtual methods can be determined by the
    > linker, so the programmer no longer needs to
    > specify virtual-ness at all.

    Definitely not a good thing, virtualness is important when reasoning about a class. After a call to a virtual function much more state is uncertain than after a call to a non-virtual call.

    > 4. Inlining and template instatiation can be
    > post poned until link given a sufficiently
    > sophisticated intermediate format. In fact the
    > compiler can inline any method or function.

    This has been the case for a long time, with gcc -repo or Sun CC.

    > 5. No more name mangling.

    The type information has to be represented somehow, name mangling is not really different from other means.
  • #1, basically, this is the "use class" again. class is the abstraction mechanism for creating new types, not typedef.

    #2, we are trying to get away from the preprocessor, not towards it.
  • The only errors this catches are discrepancies in the header and source, which of course would be avoided by this feature.
    A discrepancy means one of the places is (likely to be) wrong compared to what was intended. If you eliminate one place, there is no a priori reason to assume it will be the wrong one that is eliminated. Redundancy in specifications thus helps catching errors, because you have to be wrong twice for the error to go uncatched. Same principle as parity checks.
    Virtualness is an implementation issue which is exposed in the base class because of linker limitations
    Virtualness have nothing to do with implementation, and everything to do with specification. When calling foo->bar (), the caller know excatly what the function will do if bar is non-virtual, and must assume anything can have changed if bar is virtual. This is important for writting robust code. Functions should be made virtual only after careful thought.

    This issue has been confused by some poor OOP language which doesn't have the ability to specify non-virtual function. These languages should not be used for robust software.

    However with sufficiently advanced linking technology, no mangling need be done at all. Its a kludge to get around old linkers.
    It simply doesn't matter. The same information must be represented, the form (a mangled string, or a struct) is simply an implementation detail.
  • Why does the language have to change every few years. It just makes old code harder to compile down the road because there are n version of the language. Did we learn nothing form BASIC being pulled in 50 different directions? Quit screwing with the language and work on the standard libraries.

    How about creating <stdgui.h> ?

    A classic case of "If it ain't broke, don't fix it." And C was never broke. C++ is and remains a monstrosity of unneeded evil.

  • The problem with these is that if you make them too simple, nobody will want to use them. If you make them too complex, they won't be widely implemented. And there isn't really a satisfactory middle ground either, it's more of a fine line. One missing little key feature can make an interface unsatisfactory, if not outright useless.

    There exist standard such abstractions for C programming, under the umbrella of POSIX and The Single UNIX Specification. These abstractions are not as widely implemented as, say, the standard C library! We are still not at the point where you can write POSIX code and expect it to work everywhere.

    Any interface that is going to be acceptable to a wide range of C++ vendors is going to have to be dumbed down and braindamaged beyond repair.
    For example, a standard C++ threading library probably won't be able to have useful POSIX behaviors in it because Microsoft would leave them unimplemented in future generations of Visual C++. So you will end up with some weak interface that caters to the lowest common denominator, and which programmers will soon learn to avoid.

    On the other end of the scale, you could end up with a situation in which the powerful, useful systems extensions are are an optional part of the C++ standard, and one that is only implemented properly by people who have a clue, over top of high quality operating systems. And so only developers targetting only these systems will be able to use the interface. Still, it's better than an interface that all programmers avoid.

    I suspect that for some time to come, the real tool for portability will be something that is already there: good old preprocessing directives that allow you to roll several similar programs into one. :)

    The ultimate solution to the portability problem is to actually have one operating system running everywhere. Portability is achieved with greater ease at the lowest levels, and doing it there provides the greatest leverage for everything else. Example: it's easier to have Linux running on some portable device and recompile existing programs for it, than to port applications to some exotic embedded operating system on the same device! This is particularly true because such devices are increasingly built around standard, advanced architectures that fit the model expected by an advanced operating system. The idea of using the same advanced OS for small and large computing is pretty much here now.

    Windows CE teaches us that it's not even enough to merely have a reimplementation of the same system call interface. Anyone who has had to port Win32 sofware to Windows CE will understand! If you port the actual kernel, that's a big difference, because you port every nuance of the behavior behind that interface. It's not possible to specify every such nuance in a document and have everyone implement it exactly, and it's hard to be certain to what extent an application depends on these nuances!

    Remember, when Thompson and Ritchie presented C and UNIX to the world circa 1974, it was the portability of the operating system that impressed the world. The portability of C programs rested on the retargettability of the C compiler and porting of the OS, not on writing in a standard language using standard interfaces! Porting C programs to different operating systems came later (and is not really all here yet, nearly thirty years later).

    This is what language standardization is really about: a bunch of conflicting big interests bent on preserving their piece of the pie. Nobody wants to come out and admit that there needs to be one *implementation* of one interface running everywhere, because that would mean giving up their proprietary operating systems and interfaces, whose incompatibilities they secretly cherish. Since nonportability of software is caused by secrecy driven by ego and greed, universal portability will only be achieved when we recognize the root causes and do something about them. Right now, with our programming language standardization efforts, we are accepting these causes as immutable givens, and working *around* them to create solutions that are incomplete and unsatisfactory when translated to action in the software development trenches.
  • You can use a auto_ptr only if you have allocated something using new. If you malloc'd it, or used some (C) library that requires you to call a function to free the resources auto_ptr cannot help.
  • or even just non-trivially higher level than C++

    Well garbage collection alone is a pretty big step towards higher level. Now if it could just do generic algorithms as well as the STL....

  • Please put in the sockets and signals that Qt implement? Those are damn nice.

    I like the libsigc-- ones better, far more runtime checking, adaptors, and I think they are even faster. OSS license, and avail. on sourceforge. They are not GTK+ specific, even though it was originally designed for use there.

  • auto_ptr is less then 100 lines of code. You can take the one from and template it up a bit more to use any allocate/free functions you like. You can even get some counted versions from boost [boost.org].

    finally may still be nice for some things, but following the "all resource allocation is object creation/all deallocation is destruction" design will eliminate 99% of those.

  • That's simply impossible. A C++ virtual function is an array index (into vtbl) followed by a call through a pointer

    Nope. It is an array index followed by a jump to that location which contains a jump to the new location. Or at least that is a common way to do it (it does double the size of the vtbl). That tends to make branch predictors happier...

    Now other then that one little edge, I did kind of forget the extra indirection, which does make the C++ call a bit slower, but compared to the cost of the pipeline bubble from doing a jump through a pointer, or for having the BTB miss (10 to 20 cycles) the extra two for the indirect (assuming a cache hit) is pretty minor.

    Oops. My bad.

  • Properly implemented, GC has no impact on the classes interface, or even the lifetime of the objects.

    Only reference counting can destroy objects as quickly as explicit memory management. Reference counting is also the slowest GC, and frequently not even accepted as GC because they break in the presence of even trivially circular objects. So what most people would think of as good GC will extend the lifetime of objects somewhat (except when they fix a memory leak and radically reduce the lifetime of the object).

    If your destructor is important, use of auto_ptr or other "smart" pointers will be needed even in a GCed language. (don't get me wrong, I like GC, I just don't like overselling it)

  • That double indirection is a loader artifact, not really part of the language.

    It is independent of ld.so. Well actually it might be done for similar reasons. It use to always be an indirect jump (like in C, but with the vtbl indirection), but has been changed to a jump through a jump because that runs faster on many platforms.

    See the gcc C++ archives for details, or maybe you can find a comp.arch archive.

  • don't understand this. Is this because the processor assummes that any location you jump to is involitile? It seems that an indirect jump could be "predicted" just as well by assumming the contents of the pointed-to memory location is the same as last time and this would require no more circuitry than the jump-to-jump predictor.

    I'm not a CPU designer, but I hang out in comp.arch a lot. So take this with a grain of salt.

    Some CPUs sniff writes to areas covered by the i-cache, and will do a lot of work when they are detected. Assuming BTB targets must be in the i-cache (true on some CPUs) that provides a good way to catch changing tables of jumps, but not tables of jump addresses.

    No CPU I know of sniffs BTB source addresses. I think that is in part because BTBs became popular in micros well after self-modifying code became "evil".

    Many CPU's (pretty much everything other then the x86 -- and other really old things like the 390) require a (i-)cache invalidation between modifying code and executing that code. The cache invalidation will also invalidate the BTB. So the CPU can feel free to use the BTB to optimize a JMP-JMP sequence, but not to optimize a indirect jump.

    While I'm on the topic of CPUs, I think the POWER (incl PowerPC) has dedicated branch registers, and doing an indirect branch through them is quite fast, at least if the CPU has time to prefetch the targets, or they are in the i-cache. Except on the PowerAS where the branch registers aren't special, but the pipeline is so short (an amazing six cycles at 500+Mhz) pipeline stalls are cheep enough that they don't do target prediction, and only static branch prediction.

    Could a CPU make indirect branches as fast as JMP JMP branches? Sure, but I think it would slow down all data stores, or the all uses of the BTB, or both. It doesn't seem worth it with current language usage. Could C implement function pointers as JMP JMP? Sure, but that would make function pointers wider then normal pointers (or waste space in normal pointers).

    Does this make C++'s virtual function calls faster then C's? That is going to depend a lot on the CPU, and the usage patterns. I doubt they will be faster on a PowerPC, but they could be faster on a AMD K7, or the Intel P3 and P4, and they were for sure in one usage on the SPARC for me. Which is why I went down this rathole in the first place, to figure out how that could be.

  • I truly don't understand this. C always can tell the pointertype, because it is static. Or are you thinking about equivalent code in C?

    I was talking about the equiv code. Yes, C always knows the pointer type, or more accurately C assumes it knows the pointer type, and if it is wrong, the programmer (or user) will pay.

    The discussion is idiotic. Algorithms and design are what's crucial, not syntactic sugar. High-level languages just improves your efficiency by many orders of magnitudes (whatever that is).

    It is your right to think that. Personally I find it easier to bring people into the fold by convincing them to use the more powerful language, and it's cheeper features.

    Of corse the first time you look at code that finds the 95% percentile by doing sort, and replace it with nth_element converting a O(NlogN) algo into a O(N) algo and save days of runtime in a billing application, yes, they will buy the algo argument. One savings like that can make up for a lot of runtime ineffecency.

    However if you never get them to use C++, you will never get them to "just call nth_element". So I like to start by saying "Yo, if you micromanage C's runtime speed, you can do that in C++ too, and while your in there try templates, they'll make that micromanagment way simpler, go check out the STL too....". That tends to convert more people.

  • Pick ANY task.

    Mmap a large binary file, treat it as an array of int, sort it. Treat it as an array of char, sort it. Use the language provided library to do both sorts.

    Write a C and C++ program to accomplish the task.

    Done. The C++ STL version is a few lines shorter then the C qsort version.

    Pick any C and C++ compiler. You make the choice of which.

    The gcc provided with BSD/OS 4.2 x86, or the SPARC version. For both languages.

    The C code will finish first. Always.

    Odd, the C++ program seems to have been eight times faster. Oh, look now that the file is in the cache it is 14 times faster.

    You simply cannot argue with facts.

    What's the alternative, arguing without facts, as you have just done?

  • I think the point is that the tables of jumps do not change, and thus the icache is not invalidated. The table being talked about is the vtab for a given class and it is a constant.

    Exactly. The hidden costs the other poster was talking about don't exist.

    This jmp-to-jmp stuff is a way to fool the CPU cache and predictor circuitry into assumming the location is constant, because it figures that jmp instructions are constant. It does seem kind of annoying that it is worth doubling the table size (and thus halving how much fits in the cache) in order to get around a mistaken programming assumpition by the CPU designers.

    Well, it doesn't so much fool the CPU into thinking the location is constant as to actually inform it that the location is constant.

    I think it would be better for a CPU jump predictor to assumme *everything*, whether in instruction, read only, or read/write data, is constant. Modern C++ code typically accesses a given location many orders of magnitude more times than it modifies it!

    C++ isn't the only thing that runs on a CPU. It may well be a good idea to have a indirect jump that does allocate a BTB entry, or to have BTB entries allocated if the indirect address is on a read-only page (this may be hard to tell). Of corse one could only do this on a CPU that deals with the BTB entry being incorrect (most modern CPUs do, OOO machines can do it fairly trivially), otherwise you can get some odd problems.

  • You're so full of crap. Yes, those hidden costs do exist and must be paid *even though* the contents of the table are in fact effectively constant. That's the problem. You still have to invalidate the i-cache, you still have to forego VM-level optimizations, etc. because the values *could* change.

    Eh? The vtbl is stored in immutable code space (in most implementations), after all the vtbl is immutable. The only i-cache invalidate is done when the OS maps the page in the first time. I'm unaware of what VM optimizations are being forgone.

    You can bitch about the vtbl being immutable is a bad requirement, and prevents C++ from being as flexible as Ruby. But that is a different topic.

    So I say there are no hidden costs in how C++ implements this trick. You can argue that there are hidden costs in how the CPU thinks about i-space, but that is a very different argument.

    It does no such thing. Any CPU that makes such an assumption about the immutability of i-space could be considered broken.

    With the sole exception of the x86 all modern CPUs are broken? They all assume that i-space is very seldom altered, and that that altering can be quite costly. In fact even the modern x86 assumes that. The other CPUs require an i-cache invalidate, the x86 makes stores slower by snooping i-cache address, or they make the i-cache smaller, or both.

    That seems a little silly as i-space modification is rare, it wasn't even all that common when it was easy. The only place I can think of where it is all that useful is a low level graphics system, but that kind of code can be frequently be pre-expanded, or even more commonly hardware assist is used now anyway.

    I have written self modifying code (at least) three time in the last 20 years. I don't mind it being slightly harder now. Have you ever written any?

    That breakage can be worked around by having the VM system play nasty tricks with making i-space pages read-only etc., but the cost of having to cover for the CPU's deficiencies like that is much greater than the benefit.

    Or frequently at the "cost" of finding stray pointer usage much sooner.

    Try looking at the problem from a *system* standpoint for a change, instead of a myopic "how can a CPU designer avoid work" standpoint.

    From a system standpoint making self modifying code faster makes everything else slower. Is there really enough self modifying code to make that a good deal?

  • No, it's the same topic because it impacts the same solution.

    I thought the topic was whether the C++ double jump through a vtbl was better then a single indirect jump through the vtbl. C++ does not allow overloading of functions of a single object, it requires a new class for that. So C++ will not benefit from a writable vtbl.

    If the topic is something else, please let me know so I can either argue my point, or agree with yours.

    Please stop trying to redefine the topic to suit yourself.

    Well it was a reply to a C++ article, so I assumed we were discussing C++, and CPUs slipped in in relationship only to that. I admit I may be talking about a different topic then you though, but not to frustrate you, but because I was unaware you were discussing something else.

    "Seldom" is not equal to "never", and we were talking about the assumption that i-space would *never* change because that's the only assumption that would make the proposed solution seem reasonable.

    No CPU I know of has that assumption. All of them allow a i-space change, because all of them need to allow programs to be loaded in. Given the rarity of the changes many require something fairly expensive to be done after the change (like an i-cache flush, or on things like the MIPS a controlled flush of some of the lines).

    We're not talking about self-modifying code, as much as you seem to be hoping that the taint associated with that phrase will stick to anyone who disagrees with you. We're talking about mutable data in i-space, and about the nasty hack of using double jumps with the intermediate target in i-space to "trick" CPUs and make method dispatch a cycle or two faster without considering the effect of such a hack on the rest of the system.

    Ok, does that mean you think the vtbl is mutable? The vtbl is immutable. It is never changed. It is a constant. I don't even know any non-portable ways to change the vtbl.

    If C++ itself was changed in a way that needed mutable vtbls I expect the double jump through a vtbl would be changed back to an indirect jump and the vtbl moved to a writable area. But I'm not sure. It might be cheeper on some CPUs do do selective i-cache flushes (like on the CPUs that allow single cache lines to be invalidated).

    But you're almost right. Making this particular hack work faster makes the rest of the system slower. That's exactly the point. Congratulations on finally getting it.

    Are we talking about the way C++ uses the CPU? If so it doesn't make anything slower as it uses immutable i-space to hold immutable data (technically it is code I guess, but immutable none the less).

    If we are talking about the way CPUs make modifying i-space expensive, then I expect you are wrong, but it does depend on exactly what code you need to run. It is a big argument, and unless it is one you are interested in I'll leave it dormant.

    If we are talking about something else please do let me know.

  • I'm sure we could have a very interesting discussion about the relative merits of double jumps vs. indirect jumps if you'd cooperate, because you seem to know more than most /.ers about how CPUs work. However, as long as you're going to deny that these systemwide costs exist at all - things like false sharing, extra interprocessor communication in an SMP system to do TLB shootdowns, pollution of the BTB when the regular L1 cache is damn near as good - then that's not going to happen.

    Those things do exist (for the most part). C++ doesn't cause them though. C++ using the double jump doesn't make these problems any worse (except, arguably "pollution of the BTB..."). If you are not intrested in C++'s use of this feature and want to disscuss modifyable i-space in genneral, I'm fine with that.

    But could you let me know what the hell the topic is?

    Now of the costs that you listed, what do define "false sharing" as, and what does "pollution of the BTB when the regular L1 cache is damn near as good" mean?

    then that's not going to happen

    Well it won't happen unless we are both on the same topic. Care to let me know what the topic is?

    How disappointing.

    Pretty much, but you can change that.

  • Ahhh, but it does. On an architecture designed around the "i-space modification is rare" assumption, writing to a vtbl *even* at object-creation or class-loading time incurs a substantial overhead in exception handling

    The vtbl is not ever written to. On a unix system it is part of the ELF or a.out code area. The linker figures out how it looks, and it is not changed at runtime.

    It might be different for a dynamically linked library, but the normal C linked libs frequently (but not always) work that way. BSD/OS does the double JMP trick (at least on the x86), using a ld.so that it got from FreeBSD, which uses a ld.so borrowed from, or inspired by the Linux version. So there is a good chance all three systems do it the same way.

    Of corse dynamically linked C++ code is quite rare. Or at least using C++ libs dynamically, it is common for C++ code to use dynamic C libs. This has a lot to do with defcicencies in the ABI, and in template generation code.

    So for a static linked program the vtbl is never ever written. I'm not sure about dynamically linked ones, maybe I'll go check.

    This is different from the modifications that must occur at image-load time (including DSO-load time) because those have distinct boundaries and the OS can treat pages differently during that period than afterward. Maybe if parts of the C++ runtime were integrated into the OS loader this could be handled more efficiently, but that's a heinous idea for other reasons.

    For statically linked code it is the same as being done at image load time, because the vtbl is filled out at image load time (before that actually).

    For dynamically linked code it is also not all that different (I assume) because ld.so does similar things for the C code, and it also has minimal OS support (the mprotect(2) call, make the stub tables non-executable+writable, change them, make them executable+read-only).

    Similarly, the whole point of the double-jump seems to be to abuse the BTB for performance. I call it abuse because every method pointer that's stuffed into the BTB is one less BTB entry that can be used for *real* branches.

    That's an opinion based on whether you view it as a kludge or not. I view it as a real use of the BTB because I view the BTB's job a keeping the pipeline filled in the presence of branches.

    Some CPUs allocate BTB entries for normal JMP and JSR instructions. The AMD 29k did that, the BTB held an address and the actual instruction. Modern CPUs don't tend to do that because the pipeline is too long to be happy with just one instruction. Some modern CPUs still keep entries for all branch/JMP/JSRs, but the BTB has a "internal pointer" to the i-cache line. This is rare because it is only useful on CPUs where the i-cache lookup takes more then one cycle (otherwise the internal pointer is no win).

    CPUs like the HAL SPARC64 actually unfold i-space around control flow instructions, so the JMP-JMP would be replaced with the straight line code, no BTB would be allocated. I'm not sure about the P-IV's trace cache as I don't know if it is a true trace cache, or just shares the name.

    In any event I think using the BTB entry to make the JMP JMP faster is a great thing, it avoids a pipeline stall. If that means some other branches can't fit, well at least it got used in some places.

    However here I'm only arguing my opinion vs. yours. The right thing to try would be to convert from JMP JMP to a indirect JMP. If the indirect JMP is faster then you are right for that workload. If the JMP JMP is faster then I'm right for that workload. It's probably not all that hard to get gcc to produce either kind of code, so the big issue would be finding the right workload.

    Any ideas?

    very fast special-purpose cache; there's another cache - the L1 - right nearby that could also contain that same information

    The BTB is actually included in the L1 cache on some CPUs. At least one SPARC, and I think the AMD K7. The down side is they limit the number of jumps that can have a BTB entry per cache line. I guess the other down side is they make the L1 cache take more transistors per line, and possibly reduce it's size.

    So you save yourself a cycle on the method dispatch (if repeated) by using the BTB instead of the L1, in return for which you create a nice fat pipeline bubble for someone else when they hit a branch that would have fit in the BTB if not for your shenanigans. That's not a win, it's just shifting the load.

    Oh, it's more then a cycle in many CPUs. In fact of all the modern CPUs except the PowerAS it is a fair bit more then a cycle, if you read the PowerAS papers they make a big deal about it. I think the PowerAS is also known as the IBM North Star, it is the CPU in their more recent AS/400ish systems.

    I do agree that if the BTB entry for the JMP JMP pushes out an entry that would see at least as much use it is just shifting the load. Or worse if it pushes out an entry that is used more. A good BTB replacement algorithm can reduce the chances of that happening. A very large BTB (like the ones tied directly to the L1 cache) will also reduce the chances.

    On the other hand the BTB entry for the JMP JMP may push out a less often used entry, or no other entry at all. In those cases it is a win.

    The question is, does it win more then it loses? The true answer will depend on the CPU and the benchmark. I expect it to be a win though.

    I sincerely hope you're asking how false sharing applies to this particular situation, not what false sharing is, because if you meant the latter then you should be reading H&P instead of posting here.

    Yes, I'm asking how it applys here. I read H and P in '92. I know there is a new edition, but getting it is pretty far down on my reading list. It did actually make it onto my bookshelf and there have been 4 moves between then and now though.

    False sharing is an issue because a single cache line on a modern processor is likely to span multiple vtbl entries. Naive vtbl-patching code that does manual icache invalidation would therefore be likely to go through all that overhead multiple times.

    For a statically linked C++ program there is no vtbl patching. None. For a dynamically linked one I don't expect the costs to be different from the C version which has a similar table.

    ck. The only alternative would be to have the vtbl-patching code be *deeply* aware of the local machine's cache line size (i.e. not just hidden in some memory-munging library routines). Also ick. That kind of machine-specificity needs a reason, and there just doesn't seem to be much of one so far.

    Yeah but the assumption would probably be 16 bytes because that is a really really common number, and even if it happens to be only 50% or 25% of the cache line that is better then doing only a single address.

    Of corse that would be if there were any vtbl patching code, because there isn't any.

    The same as it has always been, Sparky: whether double jumps as an alternative to indirect junks are a reasonable or sucky idea.

    Independent of context?

    If you know that the target won't be altered you can get a very different answer then if you assume the target will be mutable. Or even a different answer on mutable but almost never changed vs mutable and changed frequently.

    The answer can also depend a lot on the CPU and other things, like is the main memory system high latency (like maybe remote on a NUMA) or low latency (like on a CPU with an integrated SDRAM or RDRAM controller).

    I expect it is the correct thing with a immutable target on most but not all CPUs.

    If you're having trouble making the connections between the issues we're discussing and that basic point, let me know and I'll dumb it down a little more for you.

    If you don't want to debate, don't debate. There is no need to stoop to insulting your debating partner.

  • If you think about it for a while, you'll realize that there are situations that the linker can't handle, and therefore there must be run-time patch-ups for at least those cases. You really should be more careful about using words like "never".

    So when do they change the vtbl? I could believe the debugger might do it, but I don't know if it does. The debugger also writes other normally non-writable areas, so I don't think that makes a big difference.

    I don't know any other times a statically linked C++ program would change the vtbl. I've been asserting that dynamically linked C++ programs aren't an interesting debate area because (a) I really haven't seen any, and (b) the C code ends up doing the same fixups, and (c) I don't really know when and how the fixups are done (they could be as each vtbl entry is used, or en mass), (d) many platforms that can dynamically link C can't do the same for C++, (e) I figured it was irrelevant because in a apples to apples the C code is just as bad.

    And I suppose those are the only OSes that matter, eh?

    No, but those are the only OSes in widespread use that I know how the dynamic linker works on. I also know how Multics did it, how Sprite did it, how SunOS 4.x did it, and a few research systems. I also know how the static linked shared objects on SCO and BSD/OS work.

    I didn't want to say "this is how dynamic linking always works", so I went with "it works this way on the OSes I know about".

    And do you suppose that mprotect is free? Or might this be one of those hidden costs whose existence you've been denying?

    No, that is a lot of overhead. However I was explaining exactly what ld.so does for C code, not C++ vtbls. For example on many systems a libc.so call to malloc will do a JMP JMP (or indirect jump), even though malloc is in libc.so, the call through the dynamic link table just in case a "more important" .so defines malloc.

    I don't know if these (C code!) JMP JMP sequences are better then an indirect jump. By extension I don't know if the same C++ shared object JMP JMPs are a good idea.

    You got that backwards. I view it as a kludge or not based on the effect it has, instead of assuming it's not a kludge and then trying to deny effects to back up my opinion. IMO if it slows down the system as a whole *or* if it makes code elsewhere significantly more complex to support it, it's a kludge.

    Well, I do believe the static linked C++ JMP JMPs don't slow the system down as a whole, and I don't think they make anything more complex with the possible exception of the linker.

    Yeah, and nobody ever got in any trouble by forgetting the difference between "really really common" and "universal for all time" right?

    Sure they did, I even hesitated to bring it up the first time. I just think there is a lot of code that assumes 16 byte lines, and because the lines of CPUs are currently multiples of 8 bytes (almost always multiples of 16) they tend not to get in too much trouble.

    I can think of cases where it would cause trouble. I can think of more cases where it would at the very least fail to be faster then code that ignores the cache line size. I assume it also pisses off CPU designers because there might be some win in designing 30 byte cache lines (or some other odd size), except all the code that assumes 16 byte lines screw it up (but code "just written" would run better).

    The right thing would be to run with a #define, or a const int, I was however making a cynical comment that 16 would get chosen, not a pronouncement that 16 ought to be the One True Answer there.

    When my debating partner is obstinately straying from the rules of debate, I actually do feel they deserve a little slap on the wrist. The crux of this whole debate is your statement (in cid#635):
    The hidden costs the other poster was talking about [me, in cid#595] don't exist

    What annoys me is not that the statement was made, but that it wasn't retracted the first time it was refuted. Instead, I've had to put up with your topic changes, buzzword storms, squishy definitions, and all manner of other evasions. Frankly, I don't appreciate the extra work. I wouldn't treat you like an errant debate pupil if you'd stop acting like one.

    Let's see, what were those costs I was denying existed?

    Manual cache invalidation isn't cheap. It requires a lot more interlocking within the MMU than a typical instruction, so you pay a penalty every time you create an object.
    Invalidating i-cache may blow away unrelated (but needed) instructions because of false sharing.
    The object-creation code is now messier and more system-dependent.
    Mixing instruction and data spaces precludes a whole class of VM-system optimization.

    Those I still say don't exist, because I still say the vtbl is not changed at runtime.

    Now I do admit that I hadn't thought about dynamically linked code the first time I made that statement. In fact I didn't think about it until about two posts ago. I still don't think dynamically linked code is relevant for reasons I stated at the top of the post.

    If we do include the dynamically linked code then of the costs you listed originally, the same ones I dismissed out of hand, the "Invalidating i-cache may blow away unrelated (but needed) instructions" is the only relevant cost. It is not payed on each object creation. Depending on how exactly ld.so works it may be payed only when the .so is mapped the first time, or it may be payed when the first call on that vtbl is made, or the first call through a specific entry of the vtbl is called. Even so the vtbl will be shared for all objects of the same class.

    To be pissy, the costs you asserted, and I denied don't exist, even for a dynamically linked object.

    Being less pissy, one of the four costs you asserted exists in a radically reduced form. If you include dynamically linked code (and I hadn't argued at the time that one shouldn't, in large part because I hadn't thought about dynamically linked code at all) then you have enough of a point that I should have said "3 of the four never ever happen, the last doesn't happen in practice, and even if it did it isn't per object create, but about as frequent as per class in a .so, or per virtual function per class in a .so, or maybe per .so".

    But I didn't realize at the time that there was a small part of the original statement that was true. You were (or seemed) fixated on the vtbl being modified all the damn time, and I was fixated on denying it.

    Maybe we would have gotten here sooner if you were civil, but I assure you we would not have gotten here later.

  • We would have gotten here even sooner if you hadn't been so uncivil as to sleaze all around the subject (and several others) instead of simply accepting that maybe the point about hidden costs was a valid one.

    Maybe it wasn't sleaze, maybe I didn't realize it was true.

    And I still don't think it is the least bit common.

  • Umm, my major problem with C++ is execution time. I try and write as little as possible in C++, sticking mainly to C and Perl for anything which needs to execute quickly.

    Some things in C++ are quite slow, but no slower then simulating them in C. Faster in many cases. The C++ STL sort function seems to be about an order of magnitude faster then C's qsort (operating on char, short, and int's).

    In fact the STL in general is quite fast, normally faster then the C equivalent (when one exists), definitely faster then what one would whip up in an hour.

    C++'s virtual functions are slow. Quite slow. But faster then C calling through a pointer. Sometimes insanely faster because the C++ compiler can actually tell what type the pointer will be at run time. C can almost never tell.

    If the only thing you care about is execution speed, use C++. Use the STL. Use C's I/O. Avoid virtual functions, except when you would have used a function pointer before.

    I'm going to ignore the bit where you think Perl makes faster code then C++ (I do admit it could in some cases, but not normally).

    This is not to say that I think C++ is a wonderful language. I rather dislike it. I love the STL. Everything else in C++ seems to have been done better elsewhere. Still the language has value, if only because of the wide availability.

  • That isn't surprising at all. The C runtime is very straight forward (except for setjump/longjump). It is pretty clear how things can/should be implemented. C++ does a lot more for you, and it is unclear how they might do it (either because it's hard to guess how anyone might do it, or it's just hard to guess how this one compiler does it).

    (the STL's "runtime complexity" requirements is a good start, but it is just big O, the constants can still kill you)

    I don't think you will find a higher level language then C with a simpler to guess performance model (unless the model is "everything written in Tcl is slow"). I mean for all Eiffel's wonderful features, or Modula-3's, I don't think looking at two functions and guessing which is faster is among them.

    I can't think of any high level language has a simpler runtime then C. That is both high complement, and damnation.

  • I don't understand this. Is this because the processor assummes that any location you jump to is involitile? It seems that an indirect jump could be "predicted" just as well by assumming the contents of the pointed-to memory location is the same as last time and this would require no more circuitry than the jump-to-jump predictor.
  • Hear, hear.

    The fact that I can't take some code and change a pointer to a reference (or back) without a huge amount of search & replace of . and -> is very annoying and often I end up leaving inefficient code as it was because of this.

    There is no reason for this distinction. There isn't even a reason in C ever since the very first version that remembered the type of variables.

    Probably more drastic, but I would like to see '.' usable everywhere '::' is. This means class and variables are in the same namespace, which is incomptable, but it would make the code much nicer to read.

  • Oh dear...

    Shared libraries for functions like this are NOT efficient, despite all the hopes and dreams of morons. How big is the identifier that matches up the program with the shared library? I would not be suprised if it is 2 or 3 dozen times larger than the code (take a look at some mangled template names if you don't believe me).

    And you have just made this poor sap's program into another entry in DLL (or .so) hell. Now they have to "install" it in order for it to work. Wow, what great advanced in Comp Sci. Someday it will be totally impossible to run anything!

  • Why not do that?

    Besides the fact that you returned a reference to a temporary, a quick comparison of the size and readability of his example and your "solution" should make it pretty obvious why!

  • I think the point is that the tables of jumps do not change, and thus the icache is not invalidated. The table being talked about is the vtab for a given class and it is a constant.

    It looks like the problem is that the CPU designers figured the reason that somebody would jump to *(ptr+offset) is because the entry at *(ptr+offset) changes. But C++ (and all other OO languages, I would think) that entry is a constant, and instead it is the ptr that changes to point at different (but still constant) tables.

    This jmp-to-jmp stuff is a way to fool the CPU cache and predictor circuitry into assumming the location is constant, because it figures that jmp instructions are constant. It does seem kind of annoying that it is worth doubling the table size (and thus halving how much fits in the cache) in order to get around a mistaken programming assumpition by the CPU designers.

    I think it would be better for a CPU jump predictor to assumme *everything*, whether in instruction, read only, or read/write data, is constant. Modern C++ code typically accesses a given location many orders of magnitude more times than it modifies it!

  • Yes, reference and pointer are very much alike. That is why I want '->' and '.' to be the same, since having to change these back and forth is the only difference in most cases.
  • It's unpopular for good reasons. It would impose too great a burden on unusual platforms such as PDPs (IIRC) where the byte size is 36 bits. Having to mess around with the value before and after every integer operation would be a nightmare. Also, there is the endian problem, as others have mentioned. What I would like to see is a set of std:: functions for outputting binary data in specific formats such as little-endian 32 bit, big-endian 64 bit, etc.
    eg:

    std::binary_out<int,std::little_endian,32>( 12345 );

  • by rho ( 6063 ) on Monday April 23, 2001 @04:19PM (#269693) Journal

    You moron -- that's no way to develop applications. What are you, a first year Fortran student?

    The proper way to develop software is to have long, pointless meetings where techies can show off their intelligence, marketeers can display their ignorance and managers can "lead".

    Then, everybody pisses around reading Slashdot while a few firebrands argue over what kind of versioning system to use: "CVS!" "RCVS!"

    When half of the estimated time has past, the programmers start screaming about the Mythical Man-Month and how marketing is full of shit, while reconfiguring the Cisco switch to cut latency for Quake deathmatches.

    Marketing then begins to sell the product to customers, promising to have asked-for features implemented "at beta".

    Two weeks before due date, the programmers work 22 hour days cobbling an application from stolen code from previous jobs, algorithms designed in a drunken stupor, and (apparently) one programmer bashing his face into the keyboard.

    This fresh, steaming turd gets pushed out the door to tumultuous disdain. Programmers blame marketing, marketing blames management, and the customers get told that it's Microsoft's fault.

    So keep your wacky ideas to yourself, okay rookie?


    "Beware by whom you are called sane."

  • except when you would have used a function pointer before.
    Actually, this highlights (from the perspective of an embedded programmer) one of the biggest problems I have with C++. Personally, I have no problems getting good performance using C++ code... but this requires (IMO) far too much knowledge of the C++ compiler implementation. C requires some implementation knowledge to use effectively (e.g. strings are null-terminated arrays of char, etc.)... but in C++ there is just too much implementation-specific crap irrelevant to the programmer-as-designer's life.
  • It depends on how you define "best."

    If you mean fastest-executing, then you're correct.

    However, I mean "best" in terms of programmer productivity, code maintainability, and robustness. Low-level languages are not the best choice for achieving these goals.


    --
  • by FFFish ( 7567 ) on Monday April 23, 2001 @03:34PM (#269697) Homepage
    Heck, by creating yet another C derivative language, they are just adding to the problem of inefficient, difficult-to-debug, difficult-to-maintain languages.

    IMO, the programmer community would, in many cases, be far, *far* better off writing their applications using a very high level language.

    This will allow them to spend *less* time creating the main code body, and *more* time debugging. Their applications will be less faulty.

    Then, using profiling, they can identify exactly those areas that need to be written using a low-level language for speed.

    Imagine: very high productivity, very high maintainability, very large reduction in bugs, and 96% or more of the performance!

    It's the intelligent way to work.

    --
  • Here's a question for you:

    How is the parser supposed to understand if myControl is a ** or if cntrlRect is a **?

    struct Control **p;

    struct Control {
    void **cntrlRect;
    };


    This technique is what you'd use for loading a module dynamically and resolving it's functions dynamically. The syntax is C because C++ is C with features builtin to hide the underlying "ugly" C implementation.

    The parser can't dereference the ->> at runtime precisely of the above structure layout; it's ambiguous, you need to do it longhand like C programmers have been doing for eons.
  • 3) Eliminate pointer arithmetic.
    And ensure that the language is never used for any system-level project ever again, including the C++ standard library.
    Garbage collection ain't too popular with systems types either - if you don't want something any more, free it so someone else can use its resources. Either that or throw away the context (apache fork-n-die model) so you don't waste your time scrap-hunting.
    Maybe I'm just an old fart about disciplined programming. But then again, so is Linus, so I'm in good company. :-)
  • there's not much point in a language without its standard library
    Tell that to all the people out there who implement the C library mostly in C.

    Or kernels in C.

    Heck, what do stdin/stdout mean to an MFC application?

    You need some sort of support library. For a language to be useful, I would expect to be able to chuck the standard one out the window and replace it with one that suits the context I want to use the language in.

  • Please put in the sockets and signals that Qt implement? Those are damn nice.

    And something else... figure out some way to make the error messages from templates more readable. Templates are extremely nice, but when you screw one up, finding your mistake involves parsing a line with 8000 characters. This one's probably more for compliler implementers, but if there's a clean way to help in the language, do it.
  • "Could you please list the advantages C++ has over Java?"

    How about the memory eating JVM for starters?

    The point is, the Java people will consider many aspects of the language as 'benifits!'. While the C++ people will see these as misplaced, ill implemented 'drawbacks!'.

    It's all about viewpoint...
  • IMO, the programmer community would, in many cases, be far, *far* better off writing their applications using a very high level language.

    Like this one??? [borland.com]

    Funny, however, how many machos wouldn't be caught dead programming with it???? Like if working with C was a sign of intelligence (must be those PHBs who insist on it).


    --

  • Yes, but then do you have to redefine functions and operators for them? 'cause that would be a pain in the ass if I couldn't add or subtract my FOO's.

    Overload, dear, overload.


    --

  • Nah. The standard rules of C++ could be amended to deal with these new types. After all the +- operator already work on:
    char, short, int, long - all signed or unsigned and T* where T is a type.
    Other strongly typed languagesw such as Pascal already behave in this way, so it is possible to make it work.

    Isn't it funny that all the while C* programmers have been tripping over themselves and shooting themselves in the foot with C*'s easily obfuscated syntax, Pascal programmers have been able to enjoy extensions to the language that effectively cancelled all objections given against Pascal [lysator.liu.se]?

    So, you have a group of people still untangling their spaghetti-pointers and memory leaks, whilst the other have been concentrating on their *REAL* job, programming?


    --

  • Wasn't C a descendant of BCPL?

    I used to have a BCPL programming manual, it looked like a primitive ancestor of C.

  • And they need to ditch "implementation defined" behavior. Pick a behavior, and MANDATE it. Evaluate function arguments left to right, one at a time, and THEN call the function. Make a[i++]=i++ have a defined meaning. Make ints of less than 32 bits illegal. Heck, force standard bit sizes of the legacy types, and use int32, int64 for future code.

    Go use Java.

    The philosophy of C, and to some extent C++, is that the language does not hide the underlying machine architecture from the programmer. That is why there are all of those "implementation defined" bits in the language. They reflect reality, where not every CPU is a descendant of the VAX or 80386. The language has enough implementation dependent slop to allow its efficient implementation on a wide variety of architectures. The language also gives compiler writers some freedom in how to evaluate and optimize expressions, layout data structures and pass parameters to functions.

    Computers with 32-bit pointers/integers and 8-bit characters will not be around forever. They will eventually be replaced by newer architectures, perhaps with 64-bit integers, 128-bit pointers and 32-bit characters.

  • 1) Simplify syntax.
    2) Eliminate type casting. Require user specified conversion functions (standard types included with standard library, of course).
    3) Eliminate pointer arithmetic.
    4) Range types.
    5) Standard garbage collection inclusion.
    6) Persistant storage. Preferable by a directly included B+Tree database.
    7) B+Tree database as a part of the standard library.
    8) Foreign Function Interfaces (not just to C, but to, O, Fortran, Python, Ruby, OCaML, Ada, Eiffel ... all the standard suspects).


    Caution: Now approaching the (technological) singularity.
  • by geophile ( 16995 ) <jao@nOSpaM.geophile.com> on Monday April 23, 2001 @04:04PM (#269723) Homepage
    Get in line, sonny. They're still coming out with new versions of Ada, COBOL and FORTRAN.
  • if you're writing C++ that's slower than Perl you're doing something wrong. I challenge you to support any statement to the contrary.

    Sure, Perl has built in facillities that can do some common stuff very quickly, but if you used the same algorithms in C++ it would almost certainly execute faster.

  • The reason it's unpopular is because it would require a performance hit for no significant gain.

    The header climits tells you what the range available is in a platform independent manner. If you think you will not be able to control that your program be compiled on a 32 bit platform then check out INT_MAX and make sure its big enough for what you want to do.

    I can't comment on the C99 standard and what they think the gains would be from such a scheme. Though I wonder if those are really fixed bit width types? Can you talk more about that?

    --
    Poliglut [poliglut.com]

  • Say screw it and write in Object Pascal [freepascal.org]?
  • Ok, who are you and how long have you been working for my company?

    That was about the funniest damn thing I've ever read...hey, you're not like Scott Adams by any chance? :-)
  • It would be especially nice if these types were *not* considered, for the sake of signatures, type-identical to counterpart size-variant types, and if enums were also given a generic root type instead of being int in signature.

    I always wished that typedefs created new types, instead of behaving like wimpy macros.


    typedef int FOO;
    typedef int BAR;
    FOO f = 1;
    BAR b = 2;
    int i = 3;
    f = b; // ILLEGAL TYPE MISMATCH!
    b = i; // ILLEGAL TYPE MISMATCH!
    i = f; // ILLEGAL TYPE MISMATCH!


  • Of course, you could use something like:


    typedef struct { int _opaque; } time_t;


    and then pass time_t structs by value, assuming that copying a struct with just one int is no more expensive than just copying an int...?

    or like Windows, declare pointers to non-existant structs, like: typedef struct __HWND* HWND.
  • > I always wished that typedefs created new types, instead of behaving like wimpy macros.

    Maybe you should consider Ada, which does this and most of the other things people are wishing for everywhere in the responses to this article.

    Don't let ESR's lame and factually incorrect entry in his jargon file put you off.

    --
  • a) It's tied to a VM.

    Not really. There are plenty of companies who have made systems that compile Java code into an executable... however that's been a niche market because most people are quite happy running Java code in VM's. Furthermore, the great thing is that Java is not tied to a specific VM so much as a family of VM's. Don't like your current VM? Swap in one that works better for you (unless you are running iPlanet, GRRRRR)

    b) Its frameworks for GUI development try to be a least common denominator at the expense of running well on any given platform.

    Totally off base. Rather, Swing is a "greatest common denominator" tagential system. If you use it carefully it works quite well just about anywhere - I wrote a fairly large MDI app all in Swing that was targeted to P166's with 32MB of memory!! It wasn't incredibly fast there, but then again on a P450 is was as fast as any of my other apps.

    Swing was meant to be a GUI framework done right, and for the most part I think they've succeeded. It won't really be practical to write consumer apps that use Swing though until OS'es ship with a system to share Java execution and library space.

    c) Its not a standard until Sun submits it to a real standards setting body. Until then its just as proprietary (in my mind) as Visual Basic, etc.

    On the other hand, only MS make a VB compiler and runtime. Look at the plethora of Java VM's and compilers... If it looks like a rose and smells like a rose and the DNA profiles come back saying "rose", then even if the tag says it's a dandelion it's probably a rose.

  • Let's say they add some features to the Java to provide support for genericism.

    What are you going to use it for that you can't do now in Java? I'm curious because I really haven't seen any examples given that I couldn't code in Java, and I feel like I use a number of fairly genericly oriented techniques when programming in Java. Perhaps I would be happier still if some extension made it into the language, but I don't feel very hampered at the moment.

  • The STL is more powerful than the Java Collections package in the same sense as an F1 racer is more powerful than an Audi TT. For just about any real use you have, the TT is more practical and fun to use.

    I think the Collections package strikes the perfect balance between a set of libraries that has a lot of power and a set of libraries that just about anyone can fine useful at some level. The Collections package takes care of about 99% of your collection needs on a day-to-day (or even yearly) basis, with a great and extremley usable interface packed with features. Syncronize or make collections read-only on the fly! Sort by internal or custom external comparitors! Easy to use Sets and TreeSets with an AVL tree you don't have to code by hand!

    Plus, all that and if you want you can download the whole shebang in a tiny (I think about 48k, but it might be larger) jar file if you need to use it in a 1.x VM.

    On top of that, I'm really not sure there's anything you can do in STL that can't be done with a little (or possibly no) effort in Java. Post examples, I'd be happy to see 'em.

  • I can see general typesafe containers as an argument, in that any time you use a container you know the contents will be what you expect them to be.

    But then again you could easily approximate that in Java by providing List, Map, and Collection wrappers that would make sure anything added to a particular collection would be of an appropriate type. You could wrap the overlays around any collection. Just as with templates you need to specify a type that will be held you could do the same with collections if you really cared.

    Then again, it works out that pretty much anywhere you have a collection you're going to know what's going in and what's going out. I don't think I've run into a problem yet where it mattered that the container was not type safe.

    A rougher approximation for the second item (storing simple types) is of course using the wrapper objects like Int(), and I'll be the first to admit that's a bit of a pain. However, I don't see where that's an argument against Java supporting genericism so much as an argument about how types are handled in the language (and as a great fan of Scheme I'd love to see simple types be objects as well). I don't think it's a really strong argument as in most real cases you'd be dealing with objects, and the wrapper classes are good enough.

    If I had to work with lots of collections of ints or longs or whatever, I'd probably just slightly extend a few interfaces like List and Iterator to make use of the standard collections yet have simple access to simple types. Other operations like sorting and so forth would work on the contents transparently just like any other container, and I could still pass the containers around without knowing what they held.
  • I don't want to know what the hardware is, I shouldn't have to.

    If I'm writing an OS, a HAL, a driver or a compiler then the hardware is an issue. If I'm writing something really speed critical then I live with hardware as an issue. For anything else, it shouldn't matter one bit. That's the whole point of high level languages. I mean, honestly, when I'm writing a WP what does it matter what the endianness of the processor I happen to be compiling to today is? Or how many registers its got? Someone Else's Problem. Make it mine and I cause others for no good reason.
  • by HenryFlower ( 27286 ) on Monday April 23, 2001 @03:51PM (#269746)
    that an implementation of STL (the MS implementation, one presumes) at the time the book was written was immature. The above post makes it seem as though Kernighan and Pike were frothing in their denunciation of C++.
  • "I switched from GNU C++ to Visual C++ in 1996 (change of jobs) and found that VC++ 5.0 was lot closer to the standard"

    Thats odd, as I remember it the C++ standard wasn't finalised until around 1998. Can you really blame major compiler vendors for not having up-to-the-minute support on a moving target?

    I do remember VC 5 had a number of problems with STL in particular, and we still have problems with the compiler in VC 6, even with the latest SP (random errors regarding DEBUG_NEW in MFC apps that disappear when you try compile again, as well as internal compiler errors) .. but overall our experiences have been pretty good with it.

    -----

  • Yes I do actually.

    When configured properly, they cut compile times quite dramatically (funnily enough, when misconfigured (e.g. if the "use precompiled headers through" edit box is empty instead of containing a filename), they slow down compile times dramatically). I experimented with this stuff very recently actually, managed to get a "rebuild all" down from 6 minutes to 3 minutes 45 seconds, for a project of just over 100000 lines of code, in 7 or 8 projects.

    -----

  • I'm aware of that garbage collector (I use it for a number of projects), but it is has a number of limitation because it doesn't have any support from the compiler. That collector for example, knows nothing about your data structures so it has to assume everything could be a pointer forcing it to scan the entire heap. This is not practical for real-time application where GC needs to be done incrementally. Yes, the xerox collector can run experimentally in incremental mode using page protection - but this makes it hard to debug on many platforms and though I have not tested the speed, I suspect it has a fairly dramatic performance hit if you application accesses a lot of data. Also hardware page protection is not available on all platforms (DOS, game consoles, embedded systems, etc).

    Right now if you want fast incremental GC you have to make you own smart pointers and do reference counting for all global roots. It would be nice to see compilers with reference and resource management builtin so you don't have to make everything look like template hell.

    Also it's a misnomer that GC adds more overhead in terms of speed. Applications written to properly use GC spend less time copying/freeing objects than their non-GC counterparts so on average a GCed program with a good collector will have better performance. Not to mention I recall a study showing that c++ programmers spend some 50% of their time dealing with allocation/deallocation and a large percentage of bugs are related to memory leaks and premature frees. On average GC makes your program go faster and it almost always speeds up your development (which is more important these days).

    I've written a few garbage collectors myself using smart pointers to track global roots and then I use thread-suspension during collection to deal with currency issues. It would be nice to see something like this standardized and supported by the compiler and support libraries (i.e. enumurable counting pointers and thread suspension). It could be toggled on/off by class so that there is 0 overhead if you choice to not use it.
  • Umm, my major problem with C++ is execution time. I try and write as little as possible in C++, sticking mainly to C and Perl for anything which needs to execute quickly.

    C++'s fallback is that it is a bloated language with too many frivolous constructs. For anyone who doesn't believe me, take a look at section 3.8 of The Practice of Programming by Kernighan and Pike. Repeating them is a waste of space and my time (and you might as well pick up the book and read the whole thing while you're at it =]).

    Therefore, I think that the best thing (well, at least an important thing) to do with the next incarnation of C++ is to move a lot of the bloated architecture into external libraries, so that what is necessary can easily be loaded, and so that execution time would speed up incredibly. Of course, I'm not saying this is easy to do or even really possible, I'm just saying that it should probably be looked at pretty carefully.

    ---

  • Many CPU's (pretty much everything other then the x86 -- and other really old things like the 390) require a (i-)cache invalidation between modifying code and executing that code. The cache invalidation will also invalidate the BTB. So the CPU can feel free to use the BTB to optimize a JMP-JMP sequence, but not to optimize a indirect jump.

    That seems like a classic example of not counting all the costs. The double jump might in and of itself be less expensive than an indirect jump, but there are hidden costs involved:

    • Manual cache invalidation isn't cheap. It requires a lot more interlocking within the MMU than a typical instruction, so you pay a penalty every time you create an object.
    • Invalidating i-cache may blow away unrelated (but needed) instructions because of false sharing.
    • The object-creation code is now messier and more system-dependent.
    • Mixing instruction and data spaces precludes a whole class of VM-system optimizations.

    When you do count up all the costs, using double jumps is a tremendously stupid idea. The double jump itself might be faster than an indirect jump, but that's outweighed by the overall negative effect on the system as a whole.

  • It's very hard to write good code in low-level languages.

    Hard? Yes. Impossible? No. Many people have to write in C because that's the only language supported in their environment (e.g. a kernel or RTOS). These are often systems where the requirements for reliability, maintainability, etc. are very high, and the quality tends - of necessity - to be correspondingly high. My C is more object-oriented than 90% of the code I've seen in languages designed for OOP, for example, and I'm sure I'm not the only person for whom that's true.

    Note that the previous author didn't say that the best code is *always* or even *usually* written in very low-level languages. He just said that it *often* is, and that's true.

  • If you are coding an I/O intensive application, chances are that the scripting language with run about as fast as your tuned C or assembly. Your hard drive or your network card will usually hobble the most carefully tuned C or assembly

    What if you're coding the firmware for that hard drive, or the driver for that network card? Where's your scripting language then? That's why a lot of people use C, and that's the code to which I think the previous author was referring.

  • Exactly. The hidden costs the other poster was talking about don't exist.

    You're so full of crap. Yes, those hidden costs do exist and must be paid *even though* the contents of the table are in fact effectively constant. That's the problem. You still have to invalidate the i-cache, you still have to forego VM-level optimizations, etc. because the values *could* change.

    Well, it doesn't so much fool the CPU into thinking the location is constant as to actually inform it that the location is constant.

    It does no such thing. Any CPU that makes such an assumption about the immutability of i-space could be considered broken. That breakage can be worked around by having the VM system play nasty tricks with making i-space pages read-only etc., but the cost of having to cover for the CPU's deficiencies like that is much greater than the benefit. Try looking at the problem from a *system* standpoint for a change, instead of a myopic "how can a CPU designer avoid work" standpoint.

  • You can bitch about the vtbl being immutable is a bad requirement, and prevents C++ from being as flexible as Ruby. But that is a different topic.

    No, it's the same topic because it impacts the same solution.

    You can argue that there are hidden costs in how the CPU thinks about i-space, but that is a very different argument.

    No, it's the same argument because it impacts the same solution. Please stop trying to redefine the topic to suit yourself.

    With the sole exception of the x86 all modern CPUs are broken? They all assume that i-space is very seldom altered

    "Seldom" is not equal to "never", and we were talking about the assumption that i-space would *never* change because that's the only assumption that would make the proposed solution seem reasonable.

    From a system standpoint making self modifying code faster makes everything else slower.

    We're not talking about self-modifying code, as much as you seem to be hoping that the taint associated with that phrase will stick to anyone who disagrees with you. We're talking about mutable data in i-space, and about the nasty hack of using double jumps with the intermediate target in i-space to "trick" CPUs and make method dispatch a cycle or two faster without considering the effect of such a hack on the rest of the system.

    But you're almost right. Making this particular hack work faster makes the rest of the system slower. That's exactly the point. Congratulations on finally getting it.

  • I'm sure we could have a very interesting discussion about the relative merits of double jumps vs. indirect jumps if you'd cooperate, because you seem to know more than most /.ers about how CPUs work. However, as long as you're going to deny that these systemwide costs exist at all - things like false sharing, extra interprocessor communication in an SMP system to do TLB shootdowns, pollution of the BTB when the regular L1 cache is damn near as good - then that's not going to happen. How disappointing.

    You are the weakest link. Goodbye.

  • C++ using the double jump doesn't make these problems any worse

    Ahhh, but it does. On an architecture designed around the "i-space modification is rare" assumption, writing to a vtbl *even* at object-creation or class-loading time incurs a substantial overhead in exception handling, VM activity, the aforementioned cross-processor interrupts, etc. This is different from the modifications that must occur at image-load time (including DSO-load time) because those have distinct boundaries and the OS can treat pages differently during that period than afterward. Maybe if parts of the C++ runtime were integrated into the OS loader this could be handled more efficiently, but that's a heinous idea for other reasons.

    Similarly, the whole point of the double-jump seems to be to abuse the BTB for performance. I call it abuse because every method pointer that's stuffed into the BTB is one less BTB entry that can be used for *real* branches. Also, the BTB is just a small, very fast special-purpose cache; there's another cache - the L1 - right nearby that could also contain that same information. So you save yourself a cycle on the method dispatch (if repeated) by using the BTB instead of the L1, in return for which you create a nice fat pipeline bubble for someone else when they hit a branch that would have fit in the BTB if not for your shenanigans. That's not a win, it's just shifting the load.

    what do define "false sharing" as

    I sincerely hope you're asking how false sharing applies to this particular situation, not what false sharing is, because if you meant the latter then you should be reading H&P instead of posting here. False sharing is an issue because a single cache line on a modern processor is likely to span multiple vtbl entries. Naive vtbl-patching code that does manual icache invalidation would therefore be likely to go through all that overhead multiple times. Ick. The only alternative would be to have the vtbl-patching code be *deeply* aware of the local machine's cache line size (i.e. not just hidden in some memory-munging library routines). Also ick. That kind of machine-specificity needs a reason, and there just doesn't seem to be much of one so far.

    Care to let me know what the topic is?

    The same as it has always been, Sparky: whether double jumps as an alternative to indirect junks are a reasonable or sucky idea. If you're having trouble making the connections between the issues we're discussing and that basic point, let me know and I'll dumb it down a little more for you.

  • The vtbl is not ever written to. On a unix system it is part of the ELF or a.out code area. The linker figures out how it looks, and it is not changed at runtime.

    If you think about it for a while, you'll realize that there are situations that the linker can't handle, and therefore there must be run-time patch-ups for at least those cases. You really should be more careful about using words like "never".

    using a ld.so that it got from FreeBSD, which uses a ld.so borrowed from, or inspired by the Linux version

    And I suppose those are the only OSes that matter, eh?

    ld.so does similar things for the C code, and it also has minimal OS support (the mprotect(2) call, make the stub tables non-executable+writable, change them, make them executable+read-only)

    And do you suppose that mprotect is free? Or might this be one of those hidden costs whose existence you've been denying?

    That's an opinion based on whether you view it as a kludge or not.

    You got that backwards. I view it as a kludge or not based on the effect it has, instead of assuming it's not a kludge and then trying to deny effects to back up my opinion. IMO if it slows down the system as a whole *or* if it makes code elsewhere significantly more complex to support it, it's a kludge.

    The right thing to try would be to convert from JMP JMP to a indirect JMP. If the indirect JMP is faster then you are right for that workload. If the JMP JMP is faster then I'm right for that workload

    Not quite. The whole point here is that it's not sufficient to compile a C++ program and run it and compare the timings. It's also important to factor in the overall performance, maintainability, and other costs of making that program run faster and supporting the hacks that it uses. Remember what I said about hidden costs, or shifting load?

    Yeah but the assumption would probably be 16 bytes because that is a really really common number

    Yeah, and nobody ever got in any trouble by forgetting the difference between "really really common" and "universal for all time" right?

    There is no need to stoop to insulting your debating partner.

    When my debating partner is obstinately straying from the rules of debate, I actually do feel they deserve a little slap on the wrist. The crux of this whole debate is your statement (in cid#635):

    The hidden costs the other poster was talking about [me, in cid#595] don't exist.

    What annoys me is not that the statement was made, but that it wasn't retracted the first time it was refuted. Instead, I've had to put up with your topic changes, buzzword storms, squishy definitions, and all manner of other evasions. Frankly, I don't appreciate the extra work. I wouldn't treat you like an errant debate pupil if you'd stop acting like one.

  • Maybe we would have gotten here sooner if you were civil

    We would have gotten here even sooner if you hadn't been so uncivil as to sleaze all around the subject (and several others) instead of simply accepting that maybe the point about hidden costs was a valid one.

  • "I think it is interesting that after less than a decade, C++ is being labeled as fossilizing. While its parent C is still very vital."

    1) C++ was first released twenty years ago.
    2) C++ is not fossilizing (where is your data that people aren't using C++ anymore?) The STL, a relatively new construct (less than ten years old), is being used more and more. How is that fossilization?
    3) If C is so vital, then why are people flocking to Java and scripting languages?

    C has its uses. C++ has its uses. Language X has its uses.

    On the other hand, I agree with you that if Java adopts generic idioms and assertions, my reasons for using C++ will decrease dramatically. However, since we are talking about hypothetical futures, if C++ gains an easy to use garbage collector and unified thread and socket library, my reasons for using Java will decrease dramatically.

    Languages are immaterial. Concepts are important. Use whatever language best describes the concept you are trying to present.
  • yes, if I close my eyes the world will go away. Having a fuckwit writing news means there is one less competent person writing news which means I, and others who can recognise a dickhead, miss out.
  • g++ comes close, but still enough annoyances to be, well, annoying

    Got a few spare CPU cycles? Help us test the 3.0 prereleases. You can download freshly-built and working RPMs from http://www.codesourcery.com/gcc-snapshots/ [codesourcery.com] and run your favorite C++ through it.

    The C++ library has been completely rewritten. It wasn't stable enough for 2.95 (or RH's 2.96), and since then it has depended upon recent changes in the compiler itself, so it didn't get included in 2.95.3. Someone else on this page was complaining that "even g++ doesn't have fully-templated iostreams like the standard says," but the new library always has. (It just hasn't been turned on by default in any existing gcc release, is all.)

    There are other changes as well. Some really cool ones are already in the tree but won't be included in 3.0; we decided to wait until 3.1 for major user-visible features. The big (IMHO) change for 3.0 is the vendor-neutral ABI for C++ [codesourcery.com] that will let you link code compiled from different compilers.

    When will it be released? Sooner, if you help.

  • IMO, the programmer community would, in many cases, be far, *far* better off writing their applications using a very high level language.

    This will allow them to spend *less* time creating the main code body, and *more* time debugging. Their applications will be less faulty.

    Actually, the best code is often written in very low-level languages, like C and assembly. The key is planning your code extensively, programming in a disciplined manner, and using assert() liberally, so your code is essentially "bug-free" (i.e. no "bugs", just design faults).
    ------

  • Irregardless, he has a point. :-)
    ------
  • I define "best" as having all good qualities, including what you say.
    ------
  • I don't want to know what the hardware is; I shouldn't have to.

    That is the entire problem with today's programmers. That don't want to code blind, without knowing what they are coding for.

    Also, a lot of the things you mentioned you don't want to worry about can be transparent in C.
    ------

  • Because when you get into machine code, you only have a few registers to leave data in for the next function to see. It also makes handling your stack a lot easier (I think; can someone who does compiler design verify this?).
    ------
  • If you are coding an I/O intensive application, chances are that the scripting language with run about as fast as your tuned C or assembly. Your hard drive or your network card will usually hobble the most carefully tuned C or assembly AND it will take much longer to write than the equivalent code in a scripting language.

    That would make sense, except that today we have multitasking operating systems, which means you're wasting CPU time that could be used for other processes. Also, more and more, we are seeing code that used to be the frontend to something being moved into a backend. That means your new script may one day be the bane of somebody's database system, and they'll have to waste time rewriting your code, which is a shame (and a disgrace to the concept of free/open-source software.)
    ------

  • I recommend assert so that people use it more than one would want to for production code (due to runtime slowdowns). Besides, if people follow the other things I said (discipline, planning, discipline, etc (did I mention discipline?)), that will be a very minor issue.
    ------
  • A friend of mine once wondered why it seemed that functional programmers were more productive than others. He came to the conclusion that the cart was before the horse. Functional programming is unfortunately only popular in the hallowed halls of academia, and (yes generalising here) thus the people who were functional programmers thus tended to be more intelligent than your average code monkey.

    Case in point is that he wrote much of his garbage collector (by necessity) in C, and managed to produce some clear and consise code in a language that seems designed to thwart that aim.

    However, your post is correct in that it requires discipline to pull this off. The aforementioned code monkey has enough of that to fit in a box of matches without taking the matches out first. High level languages are no panacea: I've seen scheme code that is scary, and have, in flash of insanity, written Haskell that more resembled fortran than a proper language. However, they do make writing horrible code a bit more awkward while writing good code a bit less so.

  • Nothing annoys me more than the "for (int i=0; iblah; i++)" scope bug (int i should be within the scope of the for loop, not the block of code containing the for loop).

    Equally annoying is attempting to do something with templates that cause the compiler to freak out and crash. Once that happens you have to Clean everything and rebuild from scratch, after removing what caused the compiler to freak out (it corrupts files when it crashes like that).

    Current VC support for templates is patchy at best. *sigh*
  • The C++ standards deliberately leave something open which I think should be defined in a particular way.

    Suppose:

    you are constructing an instance of a derived class

    one of its base classes has "published" a pointer to the partially-constructed instance

    the class has a virtual member funciton

    the member function is overridden by this class

    the class also has a member variable of a class type with construction

    the constructor of the member variable (or something it calls) finds the published pointer and calls the virtual member function.

    What happens?

    My claim is that such a call SHOULD be legal and SHOULD call the BASE CLASS version of the member function. Similarly, during the execution of the DEsctuctors of the member variables you should also get the BASE CLASS version of the member function. You should get the derived class version exactly from the beginning of the execution of the first line of the body of the constructor through the end of the execution of the last line of the body of the desctuctor.

    The reasoning is too involved to go into here. Suffice it to say that:

    It's a consistent generalization of the philosophy of the C++ constructon-destructon semantics (and of the way that the C++ semantics differs from those of Objective C and Smalltalk.)

    It's an compiler implementation that is allowed by all the levels of C++ standardization.

    There's a LOT of neat stuff you can do with this guarantee that you can't do without it.

    There are a lot more opportunities for programming error if your compiler doesn't work this way. (Not to mention the issue of code that works find with a compiler that does it one way but breaks when run through a compiler that does it a different way.)

    The original C++ work didn't specify the behavior in question. The first ANSI standard explicitly left it open. The revised ANSI standard not only explicitly left it open but said "don't do that". B-(

    At the time I first proposed it (about 10 years ago) we looked into a sample of the compilers on the market. There are four binary combinations of member constructor/destructor and base/derived version of member function, of which I claim one is "right" and the other three "wrong":

    Cfront and the Cfront-derived C++ compilers tested (Sun, SGI) got it "wrong" one way.

    The three IBM PC compilers tested got it "wrong" a second way.

    Gnu G++ got it "wrong" the third way.

    so standardizing on this semantics wouldn't favor any particular vendor's existing product.

    IMHO this somewhat obscure issue is one of the major impediments to C++ achieving its potential as an object-oriented language, and it is unfortunate that is wasn't "fixed" in one of the previous standards.

    Perhaps there's one more chance here.

  • by UnknownSoldier ( 67820 ) on Tuesday April 24, 2001 @10:51AM (#269834)
    I've been thinking about how to make C++ better in my spare time for the last year or so.

    (Unfortunately my notes are at home, so this isn't the full feature set)

    Here are some comments I'd love feedback on.

    o) New operators:

    ^ would be the standard math power operator. The compiler would optimize ^2 the much it does now with *2.

    @ would be the pointer derefence op. (Allows you to search for where pointers are being used)

    ?= (replacement for ==, since it is WAY to easy to get = and == mixed up)

    $ is also another operator for users.

    o) STANDARDIZED and PORTABLE types
    NO MORE "long long" crap.

    int8, int 16, int32, int64, int128 (signed int's)
    real32, real64, real80 (floating point)
    fix (fixed-point)
    char8 (8-bit ascii)
    char16 (unicode-16)
    char32 (unicode-32)

    int would be the "native" integer type for the cpu.
    float would be the "native" floating type for the cpu.

    o) New keyboards

    "func" would preface all functions/methods. (helps the compiler out, and lets editors be able to expand/collapse functions easier)

    "macro" would force the function to be inlined.

    "include" is part of the language. No longer needs that ugly pre-processor hack.

    o) Cleaner Syntax - CONSISTENT reading of right to left.

    Pointers would bind LEFT (instead of right in C/C++)

    i.e. Pointer to a function

    The old C++ way: new (int (*[10])()) // array 10 pointers to function returning int

    Easier C^2 way: new func int() * [10] // array of pointers to func.

    e.g.
    func int () * pFunc; // pointer to func, no more stupid parenthesis matching

    o) C style implicit () casts gone. Only C++ style casts. (Allows for searching of casts)

    e.g.
    C++ way:
    char *pC;
    int *pI = (int*)(pC);
    *pI = 3;

    C^2 way:
    int *pI = static_cast(pC);
    @pI = 3;

    o) standard way to turn OFF implicit up-casting

    o) Binary constants. (We have decimal, octal, and hex. Where's the binary notation??)

    Preface numbers with "Zero Z"
    e.g.
    const int mask = 0z0110100010; // 0x1A2

    o) "typedef" and "alias" would be extended.
    typedef would make a NEW type. (Compiler wouldn't throw away the newnames)
    alias would behave like the old typedefs currently

    Maybe it's time to download gcc 3 ;-)

    *shrugs*
  • by selectspec ( 74651 ) on Monday April 23, 2001 @04:13PM (#269844)
    Yes, mdev 7.0 is probably one of the most compliant C++ compilers out there. It is one of the few linkers than handles template-objectfile redundancy propperly without tedious pragma's and bizzare typedefs in cpp files. Also, the msdev implementation of STL and IOStreams is to the letter of the standard with full template iostreams (unlike g++ non-template IOStreams). However, msdev's implementation includes nothing that is not in the standard (like hashtables), except auto_ptr. All in all, Microsoft has embraced C++ big time (MFC and especially ATL). Frankly their devotion to that sad ancient religion, hasn't helped them conjour up the stolen data plans nor given them clairvoyance enough to learn the location of the rebels hidden fort...kha..cc.cka..aaak.
  • As a programmer who often works on massively cross platform C++ server and client applications, a lot of these proposals (distributed processing, standard thread libraries) are nice, but there's one major gripe with the language under all platforms: the lack of standard sized types. What I mean is, integral types in parallel to the short int, int, long int, long long int (C99 standardized, not C++) etc, with names like int8, int16, int32, int64, int128... allowing portability without meticulous work in wrapping and handling functions, outside libs, autoconf scripts, etc. It would be especially nice if these types were *not* considered, for the sake of signatures, type-identical to counterpart size-variant types, and if enums were also given a generic root type instead of being int in signature (eg, operator(ClassName&, enum) ) and a variant size integral type defined to the size of a pointer were included. Just some thoughts from a person who has to extensively use the language.
  • by LionKimbro ( 200000 ) on Monday April 23, 2001 @09:51PM (#269972) Homepage

    Suppose:

    • you are constructing an instance of a derived class that is part of the system dependent side of a bridge
    • the other side of the bridge is called by a mediator instance
    • the mediator class has virtual functions
    • the instantiation of the mediator class makes a system call to create a thread that runs on the other side of the bridge
    • the instantiation in the first instance is allocating but is not initialized
    • the bus component on one side of the bridge is calling a lambda function that is created on the other side
    • the lambda function was created by manipulating a string in memory and casting it to a function call
    • but the lambda depends on instantiation data
    • the mediator passed some data to the thread
    • there are no semaphors

    It's 4:00am.

    What do you do? What do you do !!

  • by ryants ( 310088 ) on Monday April 23, 2001 @03:07PM (#270086)
    Why are they talking about new stuff when old, standardised stuff doesn't even work yet? (I'm looking at you, MS Visual Crap++)

    That's one of the most annoying and frustrating things about C++... it isn't implemented properly and effeciently anywhere yet (g++ comes close, but still enough annoyances to be, well, annoying).

    Ryan T. Sammartino

You can not win the game, and you are not allowed to stop playing. -- The Third Law Of Thermodynamics

Working...