Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Python-to-C++ Compiler

Posted by timothy on Thu Jun 15, 2006 12:12 PM
from the calibrate-your-scales dept.
Mark Dufour writes "Shed Skin is an experimental Python-to-C++ compiler. It accepts pure, but implicitly statically typed, Python programs, and generates optimized C++ code. This means that, in combination with a C++ compiler, it allows for translation of pure Python programs into highly efficient machine language. For a set of 16 non-trivial test programs, measurements show a typical speedup of 2-40 over Psyco, about 12 on average, and 2-220 over CPython, about 45 on average. Shed Skin also outputs annotated source code."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Surt (22457) on Thursday June 15 2006, @12:21PM (#15541340) Homepage Journal
    Until he addresses mixed types in n-tuples, this won't be useful for very many people.
    • But he's on the right track. Python allows dynamic typing but nearly all of ones programs do not take advantage of it. Recognizing that is key to making it go fast I think. It would be nice to have a filter you could run over python that would find all the type ambiguous points and let you insert some sort of compiler hinting.

      I could envision it working like this. Instead of statically declaring all your variable types in every function, you instead simply declare that whatever tpyes are being used, the
        • I suspect that varies with the programmer. I'm pretty certain that much of my Python code contains things that a type deduction system (SML, Haskell) wouldn't be able to cope with. Certainly I use duck typing a lot.

          How exactly does duck typing differ from the structural subtyping of e.g. OCaml, which allows you to write a function that can be passed any object, of any class or none, if it provides all the methods that function uses? The type inference system handles it just fine.

          Of course, "duck typing"

  • Ewwwww (Score:3, Funny)

    by $RANDOMLUSER (804576) on Thursday June 15 2006, @12:25PM (#15541375)
    As a UNIX admin, I was saddled with one of these kinds of things years ago, a DEC-BASIC to C compiler for UNIX. The output code quality was incredibly bad: machine generated variable and function names, bizarro nested struct/union/struct data structures, 400-line functions peppered with calls to 1-line functions. Completely unreadable. Thank $DEITY that project died quickly.
    • Re:Ewwwww (Score:5, Insightful)

      by Anonymovs Coward (724746) on Thursday June 15 2006, @12:30PM (#15541434)
      Completely unreadable.

      I think you're not supposed to read it. You're only supposed to feed it to your C++ compiler. f2c produced unreadable output too, but nobody read the output; at one time it was the only free fortran option on linux.

    • Re:Ewwwww (Score:2, Funny)

      by Virak (897071)
      Which is why I suggest you use brainfuck for all your coding needs. The generated code will make just as much sense as the original, if not more.
    • Re:Ewwwww (Score:3, Insightful)

      If you actually tried ShedSkin you'd find the C++ it produces is very similar to what a human might produce, and is actually quite easily readable. But then - why would you want to anyway? It's an intermediate form useful to pass to an optimising C++ compiler, not as something to read.
    • Re:Ewwwww (Score:4, Insightful)

      by Tim Browse (9263) on Thursday June 15 2006, @04:29PM (#15543944)

      Yeah, whenever I look at the output of my optimising compiler, it's really hard to understand too. It's all in assembler, for a start.

      Plus, the quality of C code generated by CFront was rubbish - unreadable.

      Same with the Modula-3 compiler I tried. You couldn't work out what was going on in the resulting C code without a load of work.

      Can you see where I'm going with this?

  • by stonecypher (118140) <stonecypher&gmail,com> on Thursday June 15 2006, @12:26PM (#15541390) Homepage Journal
    See, it's all well and good to compile python to speed it up. The problem is, people are now saying that they can write efficient code in python just because it magically translates to C++, and because this translator is faster than other python compilers.

    This won't be meaningful until a converted python script is compared to efficient code written natively in C++ in the first place.
    • by Anonymovs Coward (724746) on Thursday June 15 2006, @12:42PM (#15541568)
      I don't see your point. Some of us use python. It takes me a fraction the time to do something in python than to do it in any other language. I'm not interested in writing native C++ code because it's hypothetically faster (it's not faster if I count coding time). But I am interested in a good python-to-C++ translator. Why wouldn't any python user be?
      • by advocate_one (662832) on Thursday June 15 2006, @01:04PM (#15541857)
        But I am interested in a good python-to-C++ translator. Why wouldn't any python user be?

        no, I'd be far more interested in a good compiler to compile that python straight to machine code...

        • Why? If you can convert Python to reasonably optimized C++, then you can leverage the C++ compiler to do all the machine-level optimizations, rather than reinventing yet another wheel.
          • by mrchaotica (681592) * on Thursday June 15 2006, @03:09PM (#15543090)

            ...and that's why it shouldn't be a Python to C++ translator; it should be a GCC frontend instead (i.e., translating to GCC's internal representation).

          • Not quite true. Analogy:

            Would you also like to translate a text from Arabic to English by passing through 3 or 4 languages in between?

            In this analogy the problem would probably be accuracy, in the case you presented it would be performance being lost due to layers of conversion. Some high level optimizations are inevitably lost (unless the C++ compiler has some sort of strong AI).
          • This was my point exactly. The article says "this thing does a better job of converting Python to C++ in terms of efficiency than did the older one." People are hearing "This thing generates efficient C++." Nobody's tested that yet, though.

            You are making a gigantic assumption that because this converter's better than the last one, that it's usable in efficiency arenas. By comparison, you might be looking at the difference between a shoe and a shoe with a spring (that's what air pumps do, don't laugh) wh
            • it gives you an extra area for weird bugs to creep in... get the Python right and go straight to machine code with a trusted compiler.

              Is that the same way the method of using layers of multiple simple tools that all do one thing really well is more buggy that just using one larger general purpose monolithic app?

              A cross platform Python to machine code compiler would presumably need to reinvent a whole lot of difficult platform specific stuff that has already been solved by C++ compilers.

        • I'd prefer a python-to-Common-Lisp compiler, but only because I hate running out of stack space for recursive algorithms.
      • Assume that it takes:
        - 4 hours to write a given program in python, 32 hours to write same program in C++
        - 10 seconds to run the python program, but just 2 seconds to run the faster C++ program
        - the program is run 20 times a day
        - assume the developer time costs as much as the the time of the person that runs it

        Ok, so it'll take 630 days of running this program for the faster C++ program to make up for the extra time to develop it. So, if you can wa
      • Last time I checked, it was the only Python compiler... (CPython is an interpreter, PyPy is also an interpreter

        Neither CPython nor PyPy is a strict interpreter, both of them compile source to byte-code and then act as a virtual machine to run that byte-code. PyPy also does some work on compiling to native code on the fly, depending on which version you're using (Armin Rigo's is the most sophisticated on the JIT/native code front, but it's far from stable).
      • Oh, hell.

        That'll teach me to hit submit without checking the preview. I lost a big and important chunk of the reply after operator< because I forgot to write out the entity for <. Here's a repaste; yay form buffers, boo no edit button for the first five minutes of a post.

        -----------------

        That's the wrong comparison to make, because it assumes that the C++ programmer has unlimited time to make his C++ code efficient and correct.

        Well, yes and no. I actually got into this else-thread; there are a hell
  • Native code (Score:3, Insightful)

    by Roy van Rijn (919696) on Thursday June 15 2006, @12:29PM (#15541419) Homepage
    This is a good step to make Python run a bit faster, but I don't think it'll really make a huge difference.

    The best way to get some speed and still keep the nice Python functions and layout is just to export the most heavily used functions to native code (C/C++).
    I don't know if its possible to take the C++ output and optimize it seperatly, that way you will have a good start to make native code though.

    In short: Better, fast and easy, but not the best (if you can write native code)

  • Very interesting... (Score:4, Informative)

    by FuzzyDaddy (584528) on Thursday June 15 2006, @12:33PM (#15541460) Journal
    This is a very interesting development, both from the practical promise and just 'cause it's cool. However, as a python programmer myself, it's not yet in a usable form. Much of the efficiency of programming in python is the standard libraries (in particular Tkinter for user interfaces), and the non-standard libraries (for example, the serial port library). This project does not yet support these.

    Among python programmers, I'm curious - how many use psyco (another python performance enhancement tool) for their projects? I fiddled with it a while ago (it didn't work because of a C module that it didn't like), but never had a compelling reason to go back to it. Performance optimization has never been important enough for my applications to merit the effort.

    • > However, as a python programmer myself, it's not yet in a usable form

      Yup. Along the same lines, Ruby has a related project by Ryan Davis, Ruby2C [rubyforge.org]. It's useful for small localized speedups, but you wouldn't want to try to write your entire app in it.
    • by zhiwenchong (155773) on Thursday June 15 2006, @01:05PM (#15541863) Homepage
      It's all a matter of magnitude.

      I use Psyco in my work. My app is a code generator that processes multiple models and transforms them into optimization code. Psyco reduced the time it took for process 1 model from 20 seconds to 2 seconds. It doesn't sound like much, but when you have to do it for lots of models, the speedup suddenly becomes quite substantial.
  • ...kind of reminds me of the Google Web Toolkit [google.com] which is more or less a Java to Javascript/HTML compiler. It's not an optimization thing like ShedSkin, instead it lets folks use the Java skills they already have to write better web apps. I wonder what they use to parse the Java code? I don't see any mention of JavaCC [java.net] on their site, or ANTLR either for that matter...
  • I'm confused... (Score:5, Interesting)

    by advocate_one (662832) on Thursday June 15 2006, @01:02PM (#15541823)
    surely the best way to speed it up is to compile it straight to object code... c++ has to be compiled and just adds an intermediate step which will make things harder to debug...
    • Re:I'm confused... (Score:3, Interesting)

      by Dasher42 (514179)
      I think that the best example of what you're saying would be the Java compiler in the gcc suite. That separate front-end, back-end approach of gcc is terribly helpful.

      And yet, if you're going to compile Python, I'd want the translation into source code. If it's worth rewriting in C++, it's worth tuning, especially if you can improve the usage of type-safe code.
  • by suitepotato (863945) on Thursday June 15 2006, @01:13PM (#15541953)
    Why? Read the linked page? Says it all. Violates most any Python code of any complexity out there. So if it doesn't convert Python code from the real world, what is it for? Making Python coders learn enough about C++ to remember the limitations and write/rewrite Python code to use it?

    What the Python C/C++ interested people REALLY need is a book written by a group of Python AND C/C++ masters which teaches the two simultaneously showing complimentary methods of doing any given thing working from beginner to advanced and I DON'T mean "How to turn your n00b Python code into C/C++ hotness" sort of viewpoint. I mean both taught simultaneously in synch showing how they can interchange and compliment.

    Software tricks for converting? Ultimately worse than not having them because it leads to horrible obfuscation because we don't know exactly what is going on when 13,412 lines of Python is turned into C++ because WE DIDN'T WRITE IT AND WE NEVER LEARNED C/C++. "Say Mike, that's great but you're the company code cowboy and you don't do C++ natively and I sure as hell don't read it being management so exactly what happens if this needs to be fixed? We've gone from importing open source code you couldn't read to writing our own open source code you can't read."
    • by try_anything (880404) on Thursday June 15 2006, @05:45PM (#15544620)
      Software tricks for converting? Ultimately worse than not having them because it leads to horrible obfuscation because we don't know exactly what is going on when 13,412 lines of Python is turned into C++ because WE DIDN'T WRITE IT AND WE NEVER LEARNED C/C++. "Say Mike, that's great but you're the company code cowboy and you don't do C++ natively and I sure as hell don't read it being management so exactly what happens if this needs to be fixed?"

      That isn't how a compiler is used. When you compile a C++ program, you don't throw away your C++ source and check the executable into source control. "Oh, no! We used gcc and now we have a bunch of gobbledygook we don't understand!"

      The C++ is an intermediate stage in the make process, akin to the output of various phases of gcc.

  • by radtea (464814) on Thursday June 15 2006, @04:00PM (#15543569)

    Python is a terrific prototyping language (and lots of other things besides.) As a C++ coder I've been using it for prototyping stuff that will eventually be integrated into a larger application and therefore MUST be translated to C++. So what I'd like to see is a tool (written in Perl, just for the fun of having a linguistic threesome) that just does a light gloss on Python syntax to get me most of the way to human-readable C++. That would be far more useful (to me) than thsi thing, which sounds more like f2c, whose output could case brain damage in humans and cancer in rats, or possibly the other way around.
    • Why not pure assembler ?
        • That's pretty much what he's doing, ShedSkin is a Python to C++ compiler, then you need to compile the C++ code ShedSkin yields to machine code, you can do that with gcc.

          The goal (for the author) at the moment is to get a fairly complete Python to C++ compiler (ShedSkin is already very good if you're mostly doing simple operations such as crunching numbers, but if your program is really complex or uses libraries then you're out of luck)

    • by SigmoidCurve (188795) on Thursday June 15 2006, @12:49PM (#15541656) Homepage Journal
      bzerodi's point, made with Zen-like simplicity, is that language choice should be made to minimize programmer time, not machine time. I am at least a factor of ten more productive with Python than with C or C++. I am also far more confident in the correctness of what I write per line of Python than with what I write per line of C/C++.

      Yes, I have have wasted some time staring at the shell waiting and waiting for it to return from some complicated Python routine. I know that compiled C would faster, and hand-rolled assembler would be faster still. But I say to myself: hey, I wrote this code in a single afternoon, how many weeks of hair-pulling would it take to re-engineer this - and make it bug-free - in C? When I put it that way, I don't mind waiting the extra minutes for Python to do my dirty work.

      As a previous poster mentioned, the ability to handle tuples of mixed-types is critical. I look forward to seeing great things from Shed Skin in the future.

      • by b17bmbr (608864) on Thursday June 15 2006, @01:12PM (#15541942)
        After four hours of tweaking, our expert C++ programmer was finally able to write something that beat our ten lines of Python code that took under five minutes to write. And it didn't beat it by much, whereas the first pass at a C++ version was an order of magnitude slower.

        Which is why languages like python were written in the first place. They pretty much just make the underlying C calls anyways, but do so in a way that handles buffer overflows, pointers, etc., that pretty much make C/C++ so troublesome, hazardous, and hard to learn. I like java (alot really), but nothing beats a good scirpting language, like perl or python, to handle tasks like text manipulation. Python is especially good at using libraries, such as the imaging library, which are written in C anyways. How much faster can you get calling a C library from C than from python? I honestly don't know, but I can't imagine it's that much more. But when you add in speed of development, safety, and even portability, it's powerful.

        Python's OOP is also a feature that makes it far more attractive than perl for me. Perl does OOP, but it's not as clean as python's, and I don't think it supports all the OOP features either. Doing GUI's is not the strength of any scripting language, but it depends on what you need to do. You can write a native frontend and embed python into a C or even a java application.
      • Sorry, but without more details it would seem to me that
        your "expert" C++ guy wasn't an expert. Can you describe the
        problem a little better.. if what you say is true, I as
        a long term C++ programmer would consider switching, but
        I've looked at python, and I simply don't believe you.

        I'll grant that C++ is a nightmare for beginners with more pitfalls
        than an indiana jones movie, but once you know them, writing
        poorly performing code is unlikely.
      • Stupid comparison (Score:4, Insightful)

        by ardor (673957) on Thursday June 15 2006, @02:33PM (#15542750)
        As another poster already said, file I/O is a bottleneck regardless of ANY language. So, try something different. Real-time h264 decoding for example.
        • C++ makes it difficult to use complex data structures...

          It does? I've always managed, somehow.

          As have I, but I'd certainly rather manage in languages that support first order data structures, "for each" loops for iterations, proper disjunctive types, pattern matching, and so on. C++ is better than it used to be, but all the data structures and algorithms in the standard library barely hold a candle to the expressive power of many functional programming and "scripting" languages.

    • Re:Sounds good... (Score:4, Insightful)

      by B'Trey (111263) on Thursday June 15 2006, @12:33PM (#15541464)
      I will have to explore it more, but it will be intriguing to see how they handle things like pointers and structs that are not in python.

      Uh, why would they have to? This goes from Python to C++, not vice versa. If there are no pointers or structs in the Python code, why would they have to handle them? Certainly, it's quite possible that some Python variable types will be converted to pointers or structs in the output code, but that's orthagonal to the issue of Python not having them natively.

      If you were trying to go from C++ to Python, then you'd have to convert C++ pointers and structs to some sort of Python data type, and your comment would make sense. As it is, I'm not sure what you were trying to say.
    • Re:Sounds good... (Score:3, Insightful)

      by masklinn (823351)

      it will be intriguing to see how they handle things like pointers and structs that are not in python.

      Why would one ever need to do that? The goal is not to write C++ in Python, it's to compile Python to machine code via an intermediate Python -> C++ compilation.

    • No, not really. A large number of people, including myself, just use python as a nicer C. Futzing with pointers and other such things can be ingnored while making a prototype and, after finishing the prototype, the bits that need to be faster can then be rewritten.

      I recently wrote a largish simulation in python for a Biology course. The goal was to watch how a species spread over a planet given other competing species, natural disasters and the like. It took four in deep hack mode to write the whole thing,
    • Re:Static Typing? (Score:3, Interesting)

      by Surt (22457)
      Well, it's not quite as bad as it sounds. He's seemingly only really forbidding incompatible mixed types in the same variable, a usage that isn't exactly extremely common.

      A more significant roadblock, IMO, is that he can't handle mixed types in 3+-tuples, which is very common.
    • Re:Static Typing? (Score:4, Interesting)

      by MBCook (132727) <foobarsoft@foobarsoft.com> on Thursday June 15 2006, @01:40PM (#15542240) Homepage

      I love Python, but I hate the dynamic typing. It can be handy at times, but 99% of the time you make a variable to hold one kind of thing. Having the static typing would both improve performance (because the interpreter knew what you were up to) but would also eliminate bugs (because it would complain when I tried to set a double to "And now press...").

      I'd love to see Python get optional static typing.

    • Except that in .NET it all becomes MSIL, not Machine Code.

      Jeremy
    • Re:Very nice, but... (Score:3, Informative)

      by cnettel (836611)
      Well, C# has unsafe arrays, while VB.NET only exposes them quite indirectly through the marshalling API. Some other language implementations also uses some dose of reflection/late binding to implement certain features. You can sometimes avoid use features, but this will sometimes result in code that is "non-idiomatic" in that language. I like the .NET framework, but it's no panacea for a language-agnostic future.
                • by 2short (466733)

                  Indeed, VB.net and C# have very similar features and capabilities, and if there are big performance differences between them, it's because the authors of one of the compilers screwed up.

                  But the other posters were arguing that their performance and capabilities should be identical because they both compile to MSIL, and in fact that any language that does so would have equal performance and capabilities. Which is just silly; hence my silly IRock.net example. For a less silly example, Managed C++ certainly
    • by rpwoodbu (82958) on Thursday June 15 2006, @02:25PM (#15542682)
      It is worth mentioning that one of the the original implementations of C++ (if not the very first) was "cfront", a C++-to-C converter. I see this as a much easier way to get a new language implemented quickly, as you can take advantage of the common functionalities already implemented in the target language of the converter. Although Python is not a new language, using it as a compiled language is new, and thus I believe it is comparable to being a new language for this argument. C++ and Python have a lot in common, which makes C++ a very suitable target language for a Python-to-[compiled_language] converter.

      If this converter proves to be successful, I believe that a GCC frontend will be written eventually. There are probably potential optimizations that would be difficult or impossible to implement any other way.

      Some may think that the dynamic nature of Python may preclude its inclusion in GCC. Technically, all that would need to be done is to have a runtime to handle dynamic things, similar to how Objective-C (for which there is GCC support) has a runtime to handle message passing and late binding. However, a large portion of the potential efficiency of a compiled version of the language would be lost to these dynamic capabilities; luckily, a compiler can detect when things are implicitly static (in fact, this converter is limited to implicitly static constructs), and optimise them to be truly static at compile-time.
    • "boo", a .NET language, allows dynamic typing by specifying 'duck' type. It achieves near-c# speed because all other data are statically typed.

      It's a great language -- combining the benefits of Python, Ruby, and C# -- and it's wonderful for proto-typing in the .NET world.