Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Python

Can Codon 'Turbocharge Python's Notoriously Slow Compiler'? (ieee.org) 82

IEEE Spectrum reports on Codon, a Python compiler specifically developed to, as they put it, "turbocharge Python's Notoriously slow compiler."

"We do type checking during the compilation process, which lets us avoid all of that expensive type manipulation at runtime," says Ariya Shajii, an MIT CSAIL graduate student and lead author on a recent paper about Codon. Without any unnecessary data or type checking during runtime, Codon results in zero overhead, according to Shajii. And when it comes to performance, "Codon is typically on par with C++. Versus Python, what we usually see is 10 to 100x improvement," he says. But Codon's approach comes with its trade-offs. "We do this static type checking, and we disallow some of the dynamic features of Python, like changing types at runtime dynamically," says Shajii. "There are also some Python libraries we haven't implemented yet...."

Codon was initially designed for use in genomics and bioinformatics. "Data sets are getting really big in these fields, and high-level languages like Python and R are too slow to handle terabytes per set of sequencing data," says Shajii. "That was the gap we wanted to fill — to give domain experts who are not necessarily computer scientists or programmers by training a way to tackle large data without having to write C or C++ code." Aside from genomics, Codon could also be applied to similar applications that process massive data sets, as well as areas such as GPU programming and parallel programming, which the Python-based compiler supports. In fact, Codon is now being used commercially in the bioinformatics, deep learning, and quantitative finance sectors through the startup Exaloop, which Shajii founded to shift Codon from an academic project to an industry application.

To enable Codon to work with these different domains, the team developed a plug-in system. "It's like an extensible compiler," Shajii says. "You can write a plug-in for genomics or another domain, and those plug-ins can have new libraries and new compiler optimizations...." In terms of what's next for Codon, Shajii and his team are currently working on native implementations of widely used Python libraries, as well as library-specific optimizations to get much better performance out of these libraries. They also plan to create a widely requested feature: a WebAssembly back end for Codon to enable running code on a Web browser.

This discussion has been archived. No new comments can be posted.

Can Codon 'Turbocharge Python's Notoriously Slow Compiler'?

Comments Filter:
  • Dupe (Score:4, Informative)

    by e065c8515d206cb0e190 ( 1785896 ) on Saturday April 01, 2023 @06:44PM (#63417660)
    • Re:Dupe (Score:5, Funny)

      by EditorDavid ( 4512125 ) Works for Slashdot on Saturday April 01, 2023 @06:53PM (#63417670)
      I posted both stories. I thought IEEE Spectrum did a nice follow-up, with quotes from the founder of the new startup Exaloop and talk about their plans for a WebAssembly back end. I'm not sure that matches the exact definition of "dupe," so I'd..... Wait a minute. Is this an April Fool's prank?
      • It's after midday, so no.
      • Actually it's nice to see you weigh in here, and while my first thought was "dupe", at second glance it does look more like a partially-redundant followup than an actual dupe.

        You probably get a pass this time around.

        It would be nice if, after posting a dupe, when you (or msmash) get called out for posting a dupe, for you to step in and say "oops sorry" or to just pull the dupe posting. People will still be mean and say nasty things about you, but the ratio of people posting like that as a legitimate gripe

      • So technically it's not a dupe. BUT it is another outlet rehashing the same claims taken straight from the Codon release. If tomorrow the New York Times hears about Codon and decides to run a piece saying it's a nice way to compile a python subset for C-like performance in limited cases, and someone submits it, it'd still be a dupe in my book. Whatever, no big deal.
      • Perhaps an editor could explicitly frame it as a follow-up by linking to the previous story about this topic.

    • Re:Dupe (Score:5, Funny)

      by istartedi ( 132515 ) on Saturday April 01, 2023 @06:57PM (#63417682) Journal

      Dupe, dupe, dupe, dupe of URL
      Dupe of URL...
      ...
      As I-I walk through this world
      Nothing can stop the dupe of URL

  • by caseih ( 160668 ) on Saturday April 01, 2023 @07:04PM (#63417692)

    Codon compiles an annotated subset of Python code. Perhaps not too unlike the Cython compiler, but that is aimed at producing Python modules, not executables. In both cases, they accept a subset of Python. Some of Python's most powerful features and constructs are also the cause of the interpreter's slowness, and Codon will not help there. But for other things where the supported python subset is no hindrance, Codon is very welcome, especially if it can work with common tools like numpy, matplotlib, and PySide.

    Regardless, the answer to the headline question is, as always, usually, "no."

    • by jsonn ( 792303 ) on Saturday April 01, 2023 @07:09PM (#63417702)
      Also, it's not the Python compiler that is slow, but the bytecode interpreter.
      • Re: (Score:2, Insightful)

        by gweihir ( 88907 )

        Actually, neither the compiler nor the bytecode interpreter are "slow". They are pretty fast for the feature-set they implement. I guess the people trying to do performance optimization here have no clue what they are doing. All too common.

        • by Jeremi ( 14640 )

          I guess the people trying to do performance optimization here have no clue what they are doing. All too common.

          The "I guess" indicates that it's you who has no clue what they are doing. All too common.

          • by gweihir ( 88907 )

            Your inept attempt at an AdHominem falls short.

            • by Entrope ( 68843 ) on Sunday April 02, 2023 @08:11AM (#63418666) Homepage

              It is an entirely accurate criticism, and it's not an ad hominem argument. Writing any logic or data processing code natively in Python will make it incredibly slow, in large part because every assignment creates a new dynamically typed object on the heap. It doesn't matter if your variable has function scope and will only be assigned an integer or small floating-point value, the interpreter assumes pessimal behavior.

              As a real-world example, I wrote a CRC function in Python (the Python standard library doesn't implement the combination of CRC parameters this used) as part of a binary file processor. It was so slow that I rewrote it in Go while waiting for the Python version to process data -- the Go version was something like 200 or 500 times as fast (1 GB/sec vs 20-50 MB/sec) before I spent any effort optimizing either.

              On a different project, we had a simulator that had all the heavy processing in C++ and just some "business logic" (database and IPC) in Python. It still spent 10% of CPU time in malloc(), free() and similar functions because Python put everything in short-lived heap objects.

              The fact that you "guess" the interpreter is faster than that shows that you make uninformed guesses, and are not afraid to post them to Slashdot.

              • by gweihir ( 88907 )

                Your inept attempt to explain away your inept attempt at an AdHominem falls equally short.

                That said, if you are writing CRC or any other binary-heavy or decision-heavy code in Python, you are using the wrong tool for the job. That is your fault, not the fault of the tool and it does in no way reflect negatively on the tool, just on you. The right way to do something like this is to write a C module for Python that then implements the CRC function. It is not hard to do at all if you are competent (which you

                • by Entrope ( 68843 )

                  First, I wasn't the guy who originally pointed out you were full of shit.

                  Second, your excuse for Python's shit performance by saying "use another tool" -- just after you said "neither the compiler nor the bytecode interpreter are 'slow'" -- is an extremely sad attempt to move the goalposts.

              • by Megane ( 129182 )

                It still spent 10% of CPU time in malloc(), free() and similar functions because Python put everything in short-lived heap objects.

                So you're saying it was still better than Smalltalk?

              • by orlanz ( 882574 )

                On a different project, we had a simulator that had all the heavy processing in C++ and just some "business logic" (database and IPC) in Python. It still spent 10% of CPU time in malloc(), free() and similar functions because Python put everything in short-lived heap objects.

                Add C there and you got the standard Python development model since inception down. That's more or less what the Python community actually recommends. Did you do this for your byte processing too? I am not sure why you picked Go of all languages to highlight Python's weakness. The darn thing can't decide if it should or should not do duck typing, and its error stack is.. well basically non-existent. But it has its place and I think it is good at it so I respect it.

                As the GP said, Python's core is actua

                • There should not be a one language to rule them all.
                  Depends on what you are doing.
                  Depends on team size.
                  Depends on infrastructure.

                  And
                  And and

                  If you work for a research institute, most likely the language to chose is R, Python or MathLab. Does not mean that "backend" stuff is not written in Fortran. And does also not mean that you have no cloud ... or what ever.

                  Choosing the right language is an elitist point of view, and for 90% of all programmers no option at all. The language is set in stone already.

                  • For my work C#/.NET Core (NOT the older closed-source C#/.NET Framework) makes a very nice go-to language for most of what I do that isn't SQL.

                    If I had to do something different it very well might be the combination of Python and a very small bit of C for now, hopefully to be replaced with Python + Rust when and if I ever learned Rust well enough. I'm painfully aware of the limits of Python's scalability for large-ish projects but I believe they can be mitigated somewhat, in my line of work at least, by b

                • by Entrope ( 68843 )

                  No, my argument is only that Python is a marginally useful scripting language, and that anyone who calls it "fast" -- except for writing high-level flow control -- doesn't know what they are taking about.

              • in large part because every assignment creates a new dynamically typed object on the heap.
                That does not sound very likely.

                Perhaps you meant something different?

            • It was not an ad hominem.
              If at all, it was an (attempt to?) insult.

              Did your account get hacked? AFAIR "gweihir" roughly knows logical fallacies ...

        • neither the compiler nor the bytecode interpreter are "slow". They are pretty fast for the feature-set they implement.

          Perl is doing the same job as Python, but it's about four times as fast. So yes, Python absolutely, positively, provably is slow AF, even by the standards of scripting languages.

          • Perl is doing the same job as Python
            Except that both start with a P, both have support for OO programming and both are considered scripting languages?
            Otherwise simply: nope

            Nothing in common at all.

            The Python VM was written around 1990 ... PERL slightly later. Both focused on completely different things.

            Looking at this: https://programming-language-b... [vercel.app] your claim makes no sense. PERL did not win a single bench mark.

            Bottom line I favour speed of development and readability over the absence of both.

            You can al

        • Slow is a relative term, and relative to other languages with similar features, CPython is slow. PyPy is much faster, on par with V8, and I am surprised it has not gained more traction.
          • by cpurdy ( 4838085 )
            Probably because it can't run apps written in Python? Look, if you could just swap runtimes and have stuff run faster, people would do that. But compatibility matters, and PyPy can't do it.
        • I guess the people trying to do performance optimization here have no clue what they are doing.
          Actually they do.
          The optimize a subset of *usages* of Python. With. Great. Success.

          All too common.
          Yes, all too common: not even you at least read the summary :P

    • Re: (Score:2, Troll)

      by david.emery ( 127135 )

      Life is easy when you get to pick which requirements you implement, and ignore anything that's 'hard'...

      See also "Pareto Principle"...

      • by gweihir ( 88907 )

        Put another way, life is easy if you ignore all the hard parts and do not actually solve problems right. Completely unacceptable in tool making, of course.

  • by TJHook3r ( 4699685 ) on Saturday April 01, 2023 @07:35PM (#63417746)
    So, how is this different to a strongly-typed language and why not just use that language in the first place?
    • by caseih ( 160668 )

      You might use it if you had a code base in Python. And those who have code bases in Python have their reasons for using it instead of Java. Personally I have a lot of Python code, and I would never think of using Java for any of it, and would never want to. However I also wouldn't have much use for Codon either.

      • by f00zbll ( 526151 )

        I have some utilities for tensorflow written in python, but it so soooo painfully slow I reimplemented it in Java. I thought it might be 10-100x faster. After I rewrote the code, and benchmarked it, turns out it is 500x faster than python.

        some things are good in python as long as you don't need performance. if memory and performance matters, write it in some other language. there are nice things about python, performance isn't one of them.

      • by dfghjk ( 711126 )

        "And those who have code bases in Python have their reasons for using it instead of Java."

        Performance is not one of those reasons.

        "Personally I have a lot of Python code, and I would never think of using Java for any of it, and would never want to. "

        And those are the two choices.

        "However I also wouldn't have much use for Codon either."

        Wonder what they do with Python's wonderful threading? Type checking isn't the only reason performance sucks.

    • by gweihir ( 88907 )

      Indeed. Sounds like coder incompetence to me. All too common these days. A real craftsman uses the appropriate tool for the job, not tries to make some slap-dash modifications to the only tools he knows.

    • A strongly typed language... like Python?

  • by gweihir ( 88907 ) on Saturday April 01, 2023 @09:09PM (#63417834)

    Python is specifically designed for dynamic typing (runtime checks). Reducing that to static typing (compile-time checks) does not make a fast Python compiler. It does either make a compiler for _another_language or it makes one that is faster only under very specific conditions.

    Python is a scripting language and as such not intended for high-performance applications. What you do if you need high performance but still want to use Python is use Python as glue and for non-performance critical parts and do the high-performance stuff in Python modules implemented in C. It is really not that hard to do.

    • by Anonymous Coward

      And yet, if you talk to Python evangelists, they insist there's nothing wrong with Python.

      I've said this about a dozen times thus far. Python, performance-wise, compiled with Nuitka, using numpy and numba, with the aid of GPU to do all the rendering, is 1/4 as fast as VB6 doing its rendering entirely on the CPU for *exactly* the same application. I wrote both, and have wrung as much performance as possible out of each. Moreover the VB6 compiler is terrible; doesn't even do fundamental optimizations like

      • by functor0 ( 89014 )

        Python, performance-wise, compiled with Nuitka, using numpy and numba, with the aid of GPU to do all the rendering, is 1/4 as fast as VB6 doing its rendering entirely on the CPU for *exactly* the same application.

        Call me skeptical here, the only way what you said could be true is you put in your innermost loop where the bottleneck is doing dynamic python stuff. Anyone using numpy and the GPU properly must be faster than VB6. Crap. This is an April fools post, isn't it!?

        • by gweihir ( 88907 )

          Most people do not understand how to optimize code or how to read or run benchmarks. I guess you are answering to another one of those.

          • by ceoyoyo ( 59147 )

            Most people also don't understand how to use interpreted languages. They write the dirtiest for loop they can and then say "look, it runs so much faster in {COMPLIED_LANGUAGE_OF_CHOICE}!"

      • I'm trying to figure out what you mean by "1/4 as fast". Is the run time 25% or 400% ?
    • by Dwedit ( 232252 )

      "Very Specific Conditions"

      Within a loop, the type of a variable is unlikely to change, That would be the hot path of your code.

      Greatly accelerating the execution speed of a scripting language has already been done before, see Javascript.

      • It may be unlikely to change, but any compile-time optimization needs mathematical proof that it can never change. If the analysis cannot determine that it can never change, then it can never fully optimize it.
        • by gweihir ( 88907 )

          Ah, no. If that was the gold standard, you would need to throw away most languages and the ones remaining would largely be unusable for other reasons.

          • Statically typed languages can literally do this. In limited cases in C++, de-virtualization can also be determined at compile-time too.

            And, for better or worse, compilers are allowed by the standard to make optimizations based on the valid assumption that types of variables don't change. In GCC and Clang, you'd have to turn off strict aliasing to suppress those optimizations.
            • by gweihir ( 88907 )

              What compilers do does not qualify as "mathematical proof". Those require a lot more. Compilers start with a rather strict (and typically never mathematically proven) set of constraints they assume to be true and then derive some properties, often again using techniques whose validity has not been mathematically proven either.

              I get what you are trying to say, but "mathematically proven" is an _extremely_ high requirement. Using that to make your argument sound more valid just makes it sound bombastic and ov

              • Mathematical proof merely requires that something follows from a given set of axioms. If the type of the variable literally cannot be changed because of the language rules, then it is part of that axiomatic system. The proof that the type cannot change is then trivially derived from that axiom.
                • by gweihir ( 88907 )

                  There is no "set of axioms" that a real-world compiler could use efficiently for this task. I take it you have no experience with mathematically proving software properties. I have and it is basically always a conditional proof (i.e. strictly speaking just a plausibility argument and not a proof at all) relative to a rather large set of assumptions (which are _not_ axioms). Even then, just running some algorithm is not a mathematical proof unless the correctness of said algorithm has been proven mathematic

                  • for (int i = 0; i < 90; i += do_something());

                    In C, for example, this is easy to prove that the type of i does not change. There are no assumptions. In this loop, it is axiomatic that i is an int for all of eternity. The compiler forbids any redefinition of its type, and stops your program from compiling if you try. That is effectively a mathematical proof. It just doesn't have the trappings of mathematical notation.

                    I think you forgot the context of what I was responding to and seem to be assuming t
                    • by gweihir ( 88907 )

                      As I though, you have no clue what you are talking about regarding mathematics. Please stop using the term "mathematical proof". What you use is called "hand waving", not proving something.

                      Now, if you want to reduce your ignorance, you can look up "Hoare logic" or "wp calculus". Note that these do not apply to real languages unless the compiler has been completely formally verified and the language itself has been completely formally specified.

                    • Just to play devils advocate:
                      Assuming we are talking about C, the C compiler can implement the storage of variables in different ways as long as the required language conditions are met, like for int minimum size based on hardware details.

                      The C compiler could store variables with a type byte (for example) and it could reference that value for various purposes during execution.

                      In addition, because of C's ability to access memory with few constraints, the do_something() function could alter the type byte
    • What you do if you need high performance but still want to use Python is use Python as glue and for non-performance critical parts and do the high-performance stuff in Python modules implemented in C. It is really not that hard to do.

      It's not that hard in theory, but people keep writing the heavy lifting code in Python so they don't have to learn another language, and then it just keeps not getting replaced because it's "working"

      • by gweihir ( 88907 )

        Well, it is not hard in practice (if you are competent) either. I have done it several times.

        That said, sure, if it works, do the heavy lifting in Python. But do not complain about it being "slow". That is on you for using the wrong tool for the job.
        After all, you can use a kammer to cut down a tree. But that this takes forever is really not the fault of said hammer. I prefer to use a saw for this.

        • The people doing this work (the data processing) arenâ(TM)t professional programmers, theyâ(TM)re biologists. Theyâ(TM)re trying to do genetics, not spend their time writing optimal C++ code because itâ(TM)s faster. They ARE using the right tool for the job: itâ(TM)s the programming language that they know and can use and has worked well for everything else theyâ(TM)ve done up to this point.

          To reduce python to a hammer is disingenuous; itâ(TM)s a multitool, like most progr

          • The people doing this work (the data processing) arenÃ(TM)t professional programmers, theyÃ(TM)re biologists. TheyÃ(TM)re trying to do genetics, not spend their time writing optimal C++ code because itÃ(TM)s faster. They ARE using the right tool for the job [...] Programmer productivity matters too, thatÃ(TM)s why there are many different languages.

            Their inability to do the work in a more peformant language doesn't make them magically use the right tool for the job, rather it prevents it. People are noticing Python's poor performance because it's a problem, and it's a problem because they're using the wrong language. Maybe they can learn about data types and use this tool and then it will be the right language. Or maybe there's a better, third solution, and I just don't know what it is.

            I guarantee these biologists are better programmers than I am a geneticist.

            Sure, but that doesn't speak to the question of whether it makes s

            • I'm not going to comment on Python specifically and how slow of a language it is--I think we agree that it is--but for people who aren't programmers, picking up the first language was hard enough, let alone deciding to write in a second, considerably more performant language, and one that lacks the plugins or community support that Python already does. Someone out there has probably already coded something similar to what a newcomer wants to do, and it is non-trivial to throw that sort of support away. Lang

            • Python is slow even by the standards of modern scripting languages! It's objectively slow.
              Sorry. You are simply and utterly wrong.

              no one is writing a for loop in Python to do a matrix multiplication. Unless he is a student of programming languages and learns how to do two nested loops

              They write the:
              - load this file, load that file
              - split it into this kind of array
              - pipe it into this FORTRAN or C++ Library

              in Python

              Everythign else is done in C/Fortran/C++

              Ylou have no clue about the topic.

              Python is used like s

    • by tepples ( 727027 )

      What you do if you need high performance but still want to use Python is use Python as glue and for non-performance critical parts and do the high-performance stuff in Python modules implemented in C. It is really not that hard to do.

      Building modules written in C for Python for Windows requires Microsoft Visual C++ from the Windows SDK, whose installer is the Visual Studio installer. And I've found in my own personal and professional work that getting collaborators (either coworkers or contributors to a hobby project) to install things through the Visual Studio installer isn't very practical. This is why I'm told a lot of Python devs try to nail projects together with the hammer they have, which is NumPy.

      • by gweihir ( 88907 )

        Who does any real computing work on _Windows_? On Linux this is rather simple to do.

        • by tepples ( 727027 )

          I develop video games and tools to build video games, and I've worked on projects that began several years before support for graphical applications in WSL 2 was available. Many of the tools for creating parts of a video game other than the program itself, such as the graphics, are not quite Linux-first. In addition, I work remotely and am therefore not able to troubleshoot dual-boot installations of popular GNU/Linux distributions performed by coworkers two provinces/states away. I've found it a lot easier

          • by gweihir ( 88907 )

            Well, IMO either people use the right tools for the job or stop complaining. It is a poor craftsman that blames his tools and all that.

            That said, there are tons of poor craftsmen in the software space. Any real coder should have no trouble doing a Linux dual-boot config themselves.

            • by tepples ( 727027 )

              Any real coder should have no trouble doing a Linux dual-boot config themselves.

              The thing is, my coworkers aren't coders. They're pixel artists, level designers, and the like. They create images to be displayed in the game using proprietary tools that aren't ported to Linux, such as Pyxel Edit and Adobe Photoshop. They still need to be able to build the game that we're developing in order to test their work. Thus it's my job to make the toolchain run on both my workstation, which runs Linux, and theirs, which run Windows. I'm not aware of a tool that runs on my Linux workstation and ca

            • Or run linux in a VM ... or on modern Windows: use the Linux "subsystem" - forgot ho it is called.

              • by tepples ( 727027 )

                You're thinking of WSL, which was mentioned as not initially supporting graphical applications. A collision map editor is one example of a graphical application relevant to the described scenario.

      • I'm pretty sure you can use GCC on windows ... worst case under Cygwin or in the Git-Bash shell.

        • by tepples ( 727027 )

          MinGW-w64 (GCC on Windows) cannot compile extensions compatible with the Python interpreter distributed by Python.org. Only Visual C++ can. This is because the Python interpreter distributed by Python.org was built with Visual C++ and therefore expects extensions to use the Visual C++ application binary interface (ABI), not the MinGW-w64 ABI.

  • Static type checking catches a few mistakes here and there but it has very little application to fixing real problems in code. Especially when you have dynamic code which creates objects at run time based on the environment it finds itself in.

    Advocating for static typing (as opposed to strong typing) is generally indicative of wishful thinking.

    • by Tupper ( 1211 )

      This is definitely not my experience. Programs in a typed style in a strongly typed language tend to click into place at a certain point after which they have a lower rate of defects.

      This does not happen in Java. Mayhaps it or its libraries are insufficiently typed.

      • by pz ( 113803 )

        This is definitely not my experience. Programs in a typed style in a strongly typed language tend to click into place at a certain point after which they have a lower rate of defects.

        In my undergraduate course in software engineering, we were required to use an experimental strongly typed language that was particularly persnickety.

        We would say that it was so difficult to satisfy the compiler that if your code compiled, chances are it would work as intended.

      • This does not happen in Java. Mayhaps it or its libraries are insufficiently typed.
        Java is:
        a) Strong typed
        b) Static typed

        To get problems with types, especially using libraries, you need to do pretty obscure things. And those will cause runtime exceptions that tell you exactly what you did wrong

        Stupid Java haters ...

  • The article's headline at ieee.org is an affirmation but here is transformed into a headline having a question mark. By Betteridge's law, then clearly the answer is "no." Thus, fulfilling the editor's intention of saying this is bullshit and as a consequence produce lots of comment in Slashdot, by Python zealots of course, because nobody gives a shit since everybody knows that for performance Perl beats the crap out of Python/Codon/C/C++, all of them together.

    Another thing... notice that it's about Pytho
    • Perl had the advantage of being early.

      There are several very fast Python runtimes but they exist inside high-frequency trading corporations where they remain a competitive advantage.

      This guy who developed this new compiler can probably get a job there after graduation, for $450K/yr to start.

      Or he can stay in academia for $55K and upstream some patches.

      Perl's B-code system was done before any of this situation started. That's fortunate.

Air pollution is really making us pay through the nose.

Working...