'Codon' Compiles Python to Native Machine Code That's Even Faster Than C (mit.edu) 124

Posted by EditorDavid on Sunday March 19, 2023 @12:34AM from the get-with-the-programming dept.

Codon is a new "high-performance Python compiler that compiles Python code to native machine code without any runtime overhead," according to its README file on GitHub. Typical speedups over Python are on the order of 10-100x or more, on a single thread. Codon's performance is typically on par with (and sometimes better than) that of C/C++. Unlike Python, Codon supports native multithreading, which can lead to speedups many times higher still.
Its development team includes researchers from MIT's Computer Science and Artificial Intelligence lab, according to this announcement from MIT shared by long-time Slashdot reader Futurepower(R): The compiler lets developers create new domain-specific languages (DSLs) within Python — which is typically orders of magnitude slower than languages like C or C++ — while still getting the performance benefits of those other languages. "We realized that people don't necessarily want to learn a new language, or a new tool, especially those who are nontechnical. So we thought, let's take Python syntax, semantics, and libraries and incorporate them into a new system built from the ground up," says Ariya Shajii SM '18, PhD '21, lead author on a new paper about the team's new system, Codon. "The user simply writes Python like they're used to, without having to worry about data types or performance, which we handle automatically — and the result is that their code runs 10 to 100 times faster than regular Python. Codon is already being used commercially in fields like quantitative finance, bioinformatics, and deep learning."

The team put Codon through some rigorous testing, and it punched above its weight. Specifically, they took roughly 10 commonly used genomics applications written in Python and compiled them using Codon, and achieved five to 10 times speedups over the original hand-optimized implementations.... The Codon platform also has a parallel backend that lets users write Python code that can be explicitly compiled for GPUs or multiple cores, tasks which have traditionally required low-level programming expertise.... Part of the innovation with Codon is that the tool does type checking before running the program. That lets the compiler convert the code to native machine code, which avoids all of the overhead that Python has in dealing with data types at runtime.

"Python is the language of choice for domain experts that are not programming experts. If they write a program that gets popular, and many people start using it and run larger and larger datasets, then the lack of performance of Python becomes a critical barrier to success," says Saman Amarasinghe, MIT professor of electrical engineering and computer science and CSAIL principal investigator. "Instead of needing to rewrite the program using a C-implemented library like NumPy or totally rewrite in a language like C, Codon can use the same Python implementation and give the same performance you'll get by rewriting in C. Thus, I believe Codon is the easiest path forward for successful Python applications that have hit a limit due to lack of performance."

The other piece of the puzzle is the optimizations in the compiler. Working with the genomics plugin, for example, will perform its own set of optimizations that are specific to that computing domain, which involves working with genomic sequences and other biological data, for example. The result is an executable file that runs at the speed of C or C++, or even faster once domain-specific optimizations are applied.

'Codon' Compiles Python to Native Machine Code That's Even Faster Than C

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 124 Comments Log In/Create an Account

Comments Filter:

Quant Finance (Score:5, Interesting)

by igreaterthanu ( 1942456 ) writes: on Sunday March 19, 2023 @01:05AM (#63381737)

I work in the HFT world, and at my previous employer we had a system basically just like that that took Python code using numpy, etc. and compiled it to heavily optimized native binaries that could be called from C++. We had it years ago. Of course stuff like this is a competitive advantage so obviously not published.

- Re: (Score:2)
  
  by locater16 ( 2326718 ) writes:
  
  That's it, that's all the HFT stuff is?
  Here I thought I'd be impressed. Should've hired gamedevs, who measure optimization times in microseconds. The original Last of Us was partially hand coded in assembly just to get it to run. Somehow I pictured the HFT guys as being equivalent because, surely, right?
  - Re: (Score:2)
    
    by bill_mcgonigle ( 4333 ) * writes:
    
    > Should've hired gamedevs, who measure optimization times in microseconds.
    Many of the linux drivers were fixed or tuned by HFT firms to get faster order execution. At 10Gbps we're into nanoseconds if my tired-math is right.
    We all benefit from that work.
  - Re: (Score:2)
    
    by tippen ( 704534 ) writes:
    
    That's it, that's all the HFT stuff is?
    Lol, no. Even years ago, serious HFT performance work moved down into FPGAs to do hardware acceleration.
    - Re: (Score:3)
      
      by igreaterthanu ( 1942456 ) writes:
      
      FPGAs actually execute the trading, but they don't do the pricing.
  - Re:Quant Finance (Score:4, Insightful)
    
    by igreaterthanu ( 1942456 ) writes: on Monday March 20, 2023 @01:03AM (#63384043)
    
    Typically there's a heavily optimized model written in Python by compiled into native code that generates prices, but even with all the optimization it's too slow to be used for trading. e.g. it might come up with a price every 50-2000ms depending on the model. The model will also create "greeks" which are just derivatives, in the mathematical sense, of price to another factor. Then it emits this price + the greeks to an FPGA as often as it can generate them, and the FPGA uses math no more complex than linear regression to compute a price to buy low and sell high at, until the model is updated. There's a lot of complexity in the FPGA, but all the magic happens in the pricing engine - they have some crazy smart people coming up with new ways to predict what will happen next.
    
- - Re: (Score:2)
    
    by shibbie ( 619359 ) writes:
    
    Why would they do that since i) numpy is more well known and therefore usable by more devs without training, ii) they have a Codon-like native compiler and thus don't need regular "tuned" python code optimisations since the compiled difference is likely now negligible?
    - Re: Quant Finance (Score:2)
      
      by cowwoc2001 ( 976892 ) writes:
      
      For the same reason that you wouldn't write an entire application in bash. Performance is not the only problem.
no (Score:5, Funny)

by Anonymouse Cowtard ( 6211666 ) writes: on Sunday March 19, 2023 @01:19AM (#63381745) Homepage

I'm waiting for the peer review. Nothing is faster than c.

- Re: (Score:2)
  
  by CaptQuark ( 2706165 ) writes:
  
  I wonder if someone wrote a compiled BASIC if it would run faster than the interpreted version? /s
  - Re: no (Score:3)
    
    by 1s44c ( 552956 ) writes:
    
    VMS had a compiled basic. It worked great.
    - Re: (Score:2)
      
      by Crashmarik ( 635988 ) writes:
      
      There are lots of compiled basics. I even remember one designed for numerical analysis.
      There were even things like the Happaugue hardware/libraries for the old ISA machines.
    - Re: (Score:1)
      
      by christoban ( 3028573 ) writes:
      
      I think you're missing his point, which is that of course a compiled version is going to run faster.
  - Re: (Score:2)
    
    by No Longer an AC ( 4611353 ) writes:
    
    Yes. I vaguely remember one for the Apple ][ although I can't remember the name.
    I also remember playing Akalabeth and digging into the code. It was a combination of Integer BASIC and assembly. It was the first complex code I ever saw and I wonder what I would think of it now if I found a copy. Was Lord British a genius or a mad adventure game software developer?
    Whatever he is it was one factor in my desire to pursue a career in IT.
    - Re: (Score:2)
      
      by UnknownSoldier ( 67820 ) writes:
      
      There were many BASICs for the Apple 2, some compiled, some not. In alphabetical order:
      * AppleSoft
      * Beagle Compiler
      * Blankenship BASIC
      * CBASIC (CP/M)
      * Hayden Basic Compiler
      * MD-BASIC (Morgan Davis)
      * Micol Advanced BASIC
      * Microsoft BASIC
      * TASC [microsoft.com] (Microsoft's The Applesoft Compiler)
      * ZBASIC Compiler
      > Was Lord British a genius or a mad adventure game software developer?
      Both. His Ultima series had a huge influence on Western RPG game design. Sadly lately he has turned into a grifter [reddit.com] with Shroud of the Avatar
  - Re: (Score:1)
    
    by shadowwynd ( 6310460 ) writes:
    
    I used TurboBasic Back in the day - it was a BASIC compiler and got pretty good performance. The cool thing about it was that you could also do inline assembler for when you really needed to tweak some speed. It was with TurboBasic I learned that data types really matter - going from the default (undefined) numeric type, which was a single float, to defined integers for some graphics codes enhanced performance by about 2000% or so.
    - Re: (Score:2)
      
      by LifesABeach ( 234436 ) writes:
      
      request.
      that codon be a drop-in replacement.
      i like being able to debug inline.
      to convert my solution to a python callable binary would be like icing on the cake
  - Re: (Score:2)
    
    by hawk ( 1151 ) writes:
    
    it was done on various systems in the 80s, and possibly before.
    Someone had an AppleSoft Compiler, and mircrosft itself had an almost-compatiblecompiler for MBASIC.
    CP/M had CBAS2 which was not, iirc, compatible with anything else but the most generic BASIC.
    TOPS-20 (and I presume -10) compiled before executing, even in the interactive mode.
    On older BASIC implementations, just not having to scan through the lines on every GOTO or GOSUB was an easy performance gain. I think that MBASIC 5 started sticking refer
- Re: (Score:3)
  
  by Dwedit ( 232252 ) writes:
  
  Straight assembly doesn't have the overhead of conforming to someone's ABI. You have the freedom to have functions share registers directly.
  But once you call C code, you're back to following the ABI.
  - Re: (Score:3)
    
    by drnb ( 2434720 ) writes:
    
    Straight assembly doesn't have the overhead of conforming to someone's ABI. You have the freedom to have functions share registers directly.
    But once you call C code, you're back to following the ABI.
    Whether it is C calling assembly, or assembly calling C, only the assembly at the interface needs to comply with the ABI, you own assembly is still free to interact however you want.
- Re: (Score:2)
  
  by snikulin ( 889460 ) writes:
  
  FORTRAN
- Re: no (Score:3)
  
  by ArmoredDragon ( 3450605 ) writes:
  
  Traditionally fortran is considered to be faster than C. As in all things though, it really depends on the application, the programmer, and whichever language is able to best express the particular problem being solved.
  That said, it wouldn't be that surprising if it was faster than C for some very specific things. The name codon in particular hints to me that the compiler maintainer probably have applications specific to biology in mind, so it might be best at optimizing around problems specific to biology,
  - Re: (Score:2)
    
    by Dutch Gun ( 899105 ) writes:
    
    I read through the summary, and from what I could gleam, their claim is that they have "domain-specific optimizations", which strikes me as talking about specific libraries being very specifically optimized, maybe even talking about utilizing GPU or other highly parallel code paths, etc. So in that *very* specific case, you might be able to claim "faster than C", but that's not a very apples-to-apples comparison, really.
    It really makes zero sense for them to claim "faster than C" in the general sense, beca
    - Re: no (Score:4, Informative)
      
      by real_nickname ( 6922224 ) writes: on Sunday March 19, 2023 @04:12AM (#63381883)
      
      that's claiming to beat out modern C/C++ compilers and back-ends, which have decades and decades of collective optimization work going into them.
      They use LLVM so their native code generation has the same level of optimizations than clang for example. in the faq:
      Codon can sometimes generate better code than C/C++ compilers for a variety of reasons, such as better container implementations, the fact that Codon does not use object files and inlines all library code, or Codon-specific compiler optimizations that are not performed with C or C++.
      C++ has infinite container implementation(they compare to STL here I guess), linkers can also do global optimizations between object files and the codon-specific (ie gpu,openmp) which is irrelevant too(many c/c++ tools allow to do the same kind of semi-auto vectorization ).
      
      - Re: (Score:3)
        
        by Dutch Gun ( 899105 ) writes:
        
        They use LLVM so their native code generation has the same level of optimizations than clang for example.
        Thanks, I read the summary but missed that. Sort of an important point, as LLVM leverages a lot of existing micro-optimization work. Also, STL containers are famously not optimal. They're decent in the general case, but it's not hard to out-perform them.
        Reading more, the headlines are a lot more click-bait-ish (why am I not surprised). They tend to claim more modest speedups in general, and only claim "faster than C" in some very specific cases. The summary and headline, of course, make it sound like t
- Re:no (Score:5, Informative)
  
  by v1 ( 525388 ) writes: on Sunday March 19, 2023 @02:12AM (#63381785) Homepage Journal
  
  I'm waiting for the peer review. Nothing is faster than c
  Assembly can be faster than C. HOW much faster is entirely dependent on the compiler, and the structure of the C. If you don't know what you're doing, it's possible to write C that the compiler can't optimize well and will run substantially slower than assembly, but usually the C compiler does at least a pretty good job.
  I've written a lot of assembly in my time, and back then there were no optimizing compilers. 100% of the optimization was done by the programmer. And as a result, assembly can be screaming fast (and mind-bogglingly efficient on program size and RAM required) even on slow hardware. When you're "programming with sticks and rocks" as I used to say, you can squeeze out every last unnecessary CPU cycle. (while also making use of pretty much every bit of memory) There's a reason old programs were measured in kilobytes and ram was measured in megabytes.
  So this all depends on the quality of the compiler. MOST other languages nowadays compile to C, and let the (very old, VERY well optimized) C compiler generate the assembly for final compilation to machine code. IN THEORY this means they themselves don't have to do much optimization, they leave it to the C compiler to clean up the mess they make. So, eliminating that go-between has the potential to be faster, although I'm still a bit skeptical. Those C compilers have been around for so long, and have been tweaked so heavily, it's a tall order to make something that compiles directly to assembly that can match their optimization. The only way you're going to pull that off is if you know the source language and can optimize from one step back, leading to what could be more efficient and faster assembly. But it's a lot of work to get there, and you're up against decades of work that's been put into that C compiler. Not saying that you can't win, but the C compiler definitely has the home field advantage over you.
  I also don't buy that "we're optimizing for the specific hardware" benefit. You can already do exactly that with the correct compiler switches, and any good IDE will offer you the option to optimize those switches as it hands it over to the C compiler to generate your object code. Your IDE should have pages of switches and options you can tweak if you know your target platform and are willing to generate object that will only run on the exact platform you intend to use.
  tl;dr: I don't agree with you, but I also don't agree with them :P
  
  - Re: (Score:2)
    
    by vyvepe ( 809573 ) writes:
    
    MOST other languages nowadays compile to C, and let the (very old, VERY well optimized) C compiler generate the assembly for final compilation to machine code.
    I doubt it is still so.
    Nowadays it is compiled to some kind of intermediate language which is specific for a compiler suite.
    E.g. LLVM frontends compile to LLVM IR (intermediate representation) which is a kind of strongly typed assembly where e.g. calling conventions are still abstract. GCC suite front ends compile to GENERIC, GIMPLE or RTL; all still higher level than assembly which is specific for some target architecture.
    The intermediate languages are then compiled to the specific target by backend par
  - Re: (Score:1)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - IBM and Compilers 101 (Score:2)
    
    by Canberra1 ( 3475749 ) writes:
    
    IBM had this sorted decades ago. Write in target language - it could be be C , COBOL or FORTRAN. See https://www.ibm.com/docs/en/op... [ibm.com]. Better yet, in the link phases there are memory pool options (IBM MVS/ZOS has hardware keyed memory) to pinpoint haha any memory leaks. Even better - MIT and IBM go back a LONG way before 1968. See https://en.wikipedia.org/wiki/... [wikipedia.org] so nothing new here.
- Re: (Score:1)
  
  by sageres ( 561626 ) writes:
  
  Pure assembler?
- Re:no (Score:5, Interesting)
  
  by Dutch Gun ( 899105 ) writes: on Sunday March 19, 2023 @02:58AM (#63381823)
  
  I'm waiting for the peer review. Nothing is faster than c.
  These claims often mean "faster than C if written naively, without even a basic eye for optimization, etc"
  I'm pretty dubious about the whole "hey, you don't have to worry about performance - it just magically turns into maximally optimal code", which seems wildly over-optimistic. Even when writing in C or C++, there's a vast difference between naively written code and hand-optimized code. Unless they've somehow managed to invent the world's most amazing optimizing compiler, beating the absolute pants off of, say, gcc, MSVC, and LLVM and ALL the myriad optimization work done on those over many, many years of work, this claim is hard to take seriously.
  It sounds like cool tech, but honestly, a little too breathlessly optimistic to be easily believed. Those of us who have written a lot of highly optimized native code tend to understand how many different factors must come together to make that happen. Maybe I'm being a little too cynical for my own good, but extraordinary claims need to be backed up with a LOT of proof, and "faster than C" is a hell of a bold claim.
  
  - Re: (Score:3)
    
    by The Evil Atheist ( 2484676 ) writes:
    
    The actual claim is "sometimes better than C", which is possible, because not all C programs were written for extreme efficiency.
    
    But every compiler for a dynamic language has claimed this, and the hype dies down when they can't actually beat real world workloads on a consistent basis.
    - Re: (Score:2)
      
      by serviscope_minor ( 664417 ) writes:
      
      But every compiler for a dynamic language has claimed this
      Oh boy yes they have, I think to the point where it's really poisoned things.
      I remember every year for about 15 years reading how THIS YEAR, Java was faster than C/C++. I like how they were always tacitly admitting that the previous year it wasn't in fact faster. And then it turned into, well, if you write the certain kind of algorithm that the JVM can optimize well (e.g. ignoring complex containers), and write it in a rather mangled, non idiomatic s
      - Re: (Score:2)
        
        by The Evil Atheist ( 2484676 ) writes:
        
        Even last year, I debated somebody here who claimed some fancy Java VM has put Java on par with C++.
        
        Surely, if Java were so fast, and safe dynamic languages were so productive, someone would have developed an all-Java, no native library, web browser with acceptable performance by now.
- Re: (Score:2)
  
  by real_nickname ( 6922224 ) writes:
  
  Statically typed languages are fast, dynamically typed languages are slow. CPython is ultra slow. Nothing new.
  - Re: (Score:1)
    
    by strombrg ( 62192 ) writes:
    
    Statically typed languages are fast, dynamically typed languages are slow. CPython is ultra slow. Nothing new.
    Actually, for sufficiently large inputs, AOT implementations and JIT'd implementations are the same, perfomance-wise. Keep in mind that a JIT has access to runtime info an AOT optimizer (usually) doesn't.
    
    But there's nothing that prevents an AOT implementation from inserting a JIT, and there's nothing that stops a JIT from doing whole-program analysis.
    
    Again: for sufficiently large inputs.
- Re: (Score:3)
  
  by serviscope_minor ( 664417 ) writes:
  
  I'm waiting for the peer review. Nothing is faster than c.
  Java's faster than C this year, every year since 1995. The only thing that ever reliably beats C is FORTRAN.
  - Re: (Score:2)
    
    by Courageous ( 228506 ) writes:
    
    FORTRAN mainly beats C because the main, non-awkward way to use multi-dimensional matrices in C results in the matrix being allocated to non-contiguous memory. This is performance relevant, because this will cause the execution pipeline to stall.
    Fortran fixes this. If you use the right keyword in C#, it also fixes it. The NumPy library, which uses an external C library, also fixes it, but that library is built using the "awkward" method I mentioned.
    - - Re: (Score:1)
        
        by strombrg ( 62192 ) writes:
        
        Utter BS. The C Standard requires arrays to be allocated contiguously. int A[x][y][z] is layout in just like Fortran would (except of course, in row-major instead of column-major row). If you're talking about arrays of pointers, those aren't multi-dimensional arrays (even if the access syntax appears the same).
        No, it's not "utter BS".
        
        in C if you want to pass an array to a function, you either need an array of pointers to arrays, or you need to act sort of Pascal-ish and treat the dimensions as part of the type. Most C programmers would opt for the former, not the latter.
        
        Granted, it's been decades since C was my favorite language. Maybe the situation has improved?
        
        Re: (Score:2)
        
        by laughing_badger ( 628416 ) writes:
        
        Nope it's always been func( int* p_array, int n_xdim, int n_ydim, int n_zdim).
        Then loop over the known dimension sizes and do arithmetic to find out the offset of an element from p_array. You _can_ do [z][y][x] with arrays of pointers, but nobody actually does.
        
        Re: (Score:2)
        
        by Courageous ( 228506 ) writes:
        
        I looked into this. I'm in the same boat as you. They updated the C standard in 99 (C99) to solve the double referencing problem of dynamically allocated arrays. So my comment describe an advantage that FORTRAN hasn't had for about 25 years. LOL. Anyway, it's good that I was wrong; it means C programs are faster now. ;-P
        
        Re: (Score:2)
        
        by Courageous ( 228506 ) writes:
        
        Dynamic allocation required discontinuity before C99. C99 fixed this.
        
        Re: (Score:2)
        
        by Courageous ( 228506 ) writes:
        
        Can you show the declaration and dynamic memory allocation mechanism you are referring to, so that we can be sure we are talking about the same thing? Two dimensional example is fine.
        
        Re: (Score:2)
        
        by Courageous ( 228506 ) writes:
        
        So if you scroll back up to my "non awkward" reference in my OP, it was referring to this, what "awkward" referred to was this:
        int v3 = p3[2 * y * z + 3 * z + 4];
        This is awkward. You can dynamically allocate multiple dimensional matrices in C# and dereference them unawkwardly like this:
        int v3 = p3[x][y][z];
        Python has a similar construct (through the numpy library), that is syntactic sugar for the same literal types of C constructs you are referring to. It just hides them and makes them "non awkward".
        I've no
        
        Re: (Score:2)
        
        by Courageous ( 228506 ) writes:
        
        > The "awkwardness" of the syntax is arguable
        It really isn't. You would need to store the array allocated like in a struct along with something tracking its cardinally just to pass it around safely. Come on, buddy. You know it and I know it. Entire math libraries exist to ease the programmer with the abstraction management of it. Anyway, the issue is moot. You insulted me by telling me to "pick up a book" without asking what I meant by awkward, when the reality that extra book-keeping and arithmetic expr
- Re: (Score:2)
  
  by Rosco P. Coltrane ( 209368 ) writes:
  
  I'm waiting for the peer review. Nothing is faster than c.
  What do you need a peer review for if you already know that nothing is faster than C?
- Re:no (Score:5, Funny)
  
  by The Evil Atheist ( 2484676 ) writes: on Sunday March 19, 2023 @06:26AM (#63382019)
  
  c is, after all, the speed of light in a vacuum, so yes, nothing is faster than c.
  
  - Re: (Score:2)
    
    by Joey Vegetables ( 686525 ) writes:
    
    They recently found out that the speed of light in East Palestine, Ohio is greater than its speed in a vacuum. Apparently, even photons don't want to stick around and suffocate any longer than they must.
  - Re: (Score:2)
    
    by kiore ( 734594 ) writes:
    
    c is, after all, the speed of light in a vacuum, so yes, nothing is faster than c.
    Sounds like time to introduce the tachyon language to the mix ... the worse the source code, the faster it runs.
- Re: (Score:2)
  
  by Phillip2 ( 203612 ) writes:
  
  Or you could read their README. It says "on a par with C and sometimes faster".
  The headline is a poor representation of the claim.
- Re: (Score:2)
  
  by Zobeid ( 314469 ) writes:
  
  Back in the Glory Days, Forth was reputed to be faster than C. I don't know how that would play out with today's hardware and compilers, though. I guess it's moot since Forth is now a footnote.
  - Re: (Score:2)
    
    by lucasnate1 ( 4682951 ) writes:
    
    Forth code was supposedly smaller than C code back in the days where code size actually mattered. Guess that could have improved cache access or whatever. However in terms of raw computation Forth, to my knowledge, was considered slower than C, albeit not by much (about 10% I think?)
    - Re: (Score:2)
      
      by Zobeid ( 314469 ) writes:
      
      Well, what I can vaguely remember is that those 8-bit processors had few registers to work with, but they were pretty good at stack operations, which is what Forth is built around. And since you, the programmer, were manipulating the stack deliberately and explicitly, you tended to think a lot about the best way to do it.
      By default, C also allocates variables on the stack, though all the stack manipulation is concealed from the coder. Then as processors with more registers arrived (68000, etc.) the "regis
      - Re: (Score:2)
        
        by mrfaithful ( 1212510 ) writes:
        
        The register keyword still exists, but I think it's rarely used now, since compilers have become smart enough to make those decisions for us.
        I feel like it's less "smart enough" and more x86 poisoning. When 32bit Intel ISA ruled the streets a register keyword wasn't going to achieve much. You didn't have many to work with and they had specific purposes and any attempt to hold a value in a register for more than a few instructions was likely counter productive. Better to let the compiler handle it, especially since the CPU would also be doing register renaming to pipeline instructions and IMHO it's definitely better to let the compiler optimise i
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  The system says it can do automatic multi-threading in some contexts, so in those contexts it may well be "faster than c" unless you put a HUGE amount of work into tuning that C.
  (Note "faster than C" was the headline claim, not the claim I read in the linked to docs.)
- Okay, what's the catch? Then there's Julia. (Score:2)
  
  by goombah99 ( 560566 ) writes:
  
  I'm wondering what the catch is? Does it forbid certain python instructions or idioms. Does it flail if an unexpected data type is presented to a compiled subroutine? Does it have cases where it produced a different answer than python?
  Having worked with Julia I can believe you can compile an untyped language to faster than C. Julia sometimes is faster than Fortran. Julia however was rigged from the start to do just in time compiling when new data type signatures are presented to a functions arguments.
  - Re: (Score:2)
    
    by jma05 ( 897351 ) writes:
    
    This is more like a newer Cython + Numba.
    It is restricted Python, like Cython. It emits LLVM code instead of C/C++.
    > Does it forbid certain python instructions or idioms. Does it flail if an unexpected data type is presented to a compiled subroutine?
    > Does it have cases where it produced a different answer than python?
    Yes.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  They probably compared incompetently written C to the output of their compiler. Speed of C code very much depends on who designs and writes it. A competent C coder can almost always beat a compiler from another language.
  - Re: (Score:1)
    
    by strombrg ( 62192 ) writes:
    
    People used to say the same about hand-coded assembler.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Still true, but C compilers have gotten to a level where you only very rarely benefit from doing or embedding assembler.
- Re: (Score:2)
  
  by Dausha ( 546002 ) writes:
  
  Since c is the speed of light, you are correct, sir.
- Re: (Score:2)
  
  by sg_oneill ( 159032 ) writes:
  
  I'm waiting for the peer review. Nothing is faster than c.
  That has always been a myth. Fortran consistently outperforms C, for instance. (In both theory and practice. Theres a reason that crusty old language is still found extensively in high end simulation, HFT and so on.)
  And heres a kicker;- Under some circumstances Java *can* outperform it due to run-time optimization (although in practice thats usually not true)
  And then theres the whole wild world of GPU languages but thats a whole different conversatio
- Re: (Score:2)
  
  by hawk ( 1151 ) writes:
  
  >Nothing is faster than c.
  starships and gossip.
Non-Commercial Use Only? (Score:5, Informative)

by mazinger ( 789576 ) writes: on Sunday March 19, 2023 @02:05AM (#63381779) Journal

Looks like it's for non-commercial use only. The license is some modified Apache(?) license.

- Re: (Score:3)
  
  by Phillip2 ( 203612 ) writes:
  
  That's an interesting one. Their licence is for non-production usage.
  Their FAQ claims that after three years for any given release, this reverts to apache. Still, with that licence, I think, it seems effectively unusable.
  - Re: (Score:2)
    
    by butlerm ( 3112 ) writes:
    
    No doubt they are hoping to commercialize it and don't expect (almost) anyone to use it for anything other than an academic exercise until then. It is such a great idea though, someone should consider making an open source (i.e. OSI compliant) implementation or something along similar lines.
Marketing (Score:5, Insightful)

by istartedi ( 132515 ) writes: on Sunday March 19, 2023 @02:15AM (#63381789) Journal

These guys are out there on some other tech sites too. I think there's some proprietary stuff going on, so it's not just regular Python. Aside from that, if it's a DSL that's embedded in Python it's no more Python than inlined assembly in C or C++, is C or C++. If that inlined code takes advantage of something non-standard like a GPU then, duh! Of course it's going to be faster than whatever it's embedded in. If you blast all the way down the stack from a scripting language to a GPU, then Duh! Of course it's going to do whatever the GPU can do which is faster than standard C or C++.
Anyway, good marketing but if it's a proprietary development tool they've got an uphill battle and maybe this whole thing is a marketing push to see if they can actually generate enough sales to keep them going. They probably can't.

- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  It's definitely not all of Python. E.g. the strings are pure ASCII, and there are several other limitations. Still, they aren't bad or unreasonable. It could probably (my wild guess) handle 95% of Python code. And it appears (at a first glance) that it could import Python code to handle functions that it couldn't compile....but that needs to be managed by the programmer, which is what they're trying to avoid.
  IIUC saying it's "a DSL that's embedded in Python" is a bad description. But it's "sort of" lik
- Re: (Score:2)
  
  by butlerm ( 3112 ) writes:
  
  It is not a DSL so much as a restricted dialect of Python. I suppose they could use the technology to make true DSLs though, and it no doubt reduces confusion to refer to this one as a different programming language rather than as a general purpose drop in replacement for Python (which it most definitely is not).
  The main reason I wouldn't consider it a domain specific language is because it is suitable for a number of different domains.
Only ASCII Strings instead of Unicode (Score:4, Interesting)

by jopet ( 538074 ) writes: on Sunday March 19, 2023 @02:52AM (#63381817) Journal

This is a major flaw. How can you take the step back to ASCII in 2023?

- Re: (Score:1)
  
  by daveron ( 2034640 ) writes:
  
  after all the work python did to fully support unicode, now we have a new python that take us back to the ascii incompatibility days again.
- - Re:Only ASCII Strings instead of Unicode (Score:4, Interesting)
    
    by butlerm ( 3112 ) writes: on Sunday March 19, 2023 @01:41PM (#63382799)
    
    UCS-2 support has had the unfortunate effect that Microsoft has tended to be very late in supporting UTF-8. It barely supports it in Windows for example, and only on newer versions. SQL Server apparently introduced UTF-8 support in 2019. Java and Javascript have similar issues, although not as serious as the ones Windows has, because both have supported UTF-8 encoding for input and output for a long time.
    As far as Python goes, it is pretty easy to see why those focused on high performance string processing applications like genomics would want to support 8-bit characters natively, using a compilation option or something. It no doubt makes a significant performance impact. In C you can have whatever character set you want if you stick to native encoding, and for string processing that is fast. Mandatory Unicode with code point semantics is bad news on the performance and efficiency front if what you really want to do is process 8 bit characters. UTF-8 has only taken over much of the world because a great deal of low level software is entirely unaware (i.e. transparent to the fact) that it is in use at all.
    That goes for UTF-16 in a variety of contexts designed for UCS-2 as well. Codepoint semantics are relatively slow or inefficient in every programming language that supports them, and most don't and can't without breaking backward compatibility. So Python is ahead of its time in a way that gives it a performance disadvantage for some applications even when compiled and optimized to the hilt, unless someone takes non-standard narrow use expedients like this, or rewrites quite a bit of code to use byte support instead, which of course has a different set of compatibility issues.
    
Typically on par, sometimes better than (Score:3)

by The Evil Atheist ( 2484676 ) writes: on Sunday March 19, 2023 @03:38AM (#63381831)

Codon's performance is typically on par with (and sometimes better than) that of C/C++.
We've heard this many times before. They often benchmark against some horrible bit of C/C++ code that goes out of its way to do the inefficient thing.

Yet time and again, on real workloads, these wins never borne out.

- Re: Typically on par, sometimes better than (Score:2)
  
  by ArmoredDragon ( 3450605 ) writes:
  
  We've heard this many times before. They often benchmark against some horrible bit of C/C++ code that goes out of its way to do the inefficient thing.
  When your code accidentally does that, what difference does it make? Same outcome either way.
  - Re: (Score:3)
    
    by Entrope ( 68843 ) writes:
    
    They're comparing that intentionally bad code to a hand-tuned, domain-specific optimizer. That's a dishonest comparison because it's apples and oranges. They could just as easily create a library to do the fast bits and call that from the naive application code.
    - Re: (Score:2)
      
      by The Evil Atheist ( 2484676 ) writes:
      
      I'm afraid your explanation will be lost on him. He doesn't understand how programming languages work beyond "Rust good, C++ bad, because some blog says so".
      - Re: Typically on par, sometimes better than (Score:2)
        
        by ArmoredDragon ( 3450605 ) writes:
        
        Nah, I just dog C++ for no reason other than it makes you nerd rage.
        
        Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        You'd like to think so but you just admitted to being this guy: http://lol.i.trollyou.com/ [trollyou.com]
        lol u trol us ideed.
        
        Re: Typically on par, sometimes better than (Score:2)
        
        by ArmoredDragon ( 3450605 ) writes:
        
        Hmm... Nah, the shoe just doesn't fit. If anything I think that's a better description of either your friend there or maybe angel o' sphere who, on some days, literally posts immediately after every post I make for a period of time because he's pissed off.
Nope.. (Score:4, Funny)

by SuperDre ( 982372 ) writes: on Sunday March 19, 2023 @04:35AM (#63381917) Homepage

It might compile it to fast native assemblies, but the biggest problem, python is still a fugly language. It's a shame it's getting so popular.

- Re: Nope.. (Score:2)
  
  by 1s44c ( 552956 ) writes:
  
  Python, fugly? Have you seen the extreme amount of stuff you need to type in java? Or go? Python is just so much easier for most things.
  - Re: Nope.. (Score:2)
    
    by e065c8515d206cb0e190 ( 1785896 ) writes:
    
    Julia does a great job at having concise and (in my opinion) elegant syntaxes.
- Re: (Score:2)
  
  by jma05 ( 897351 ) writes:
  
  Compared to what? R? Matlab? Perl? Ruby?
  It was the best looking syntax, when it was designed.
  Usually, its just that some hate the indentation syntax like others hate parentheses of Lisp. It's subjective.
Just can't optimize it (Score:2)

by Gabest ( 852807 ) writes:

How do you identify bottlenecks, can you use inline assembly, can you use vtune?
Julia with a partial Python syntax. What's good ? (Score:1)

by GM ( 7955 ) writes:

The techno path is ok as already validated by Julia, Crystal (ruby compiler), or emacs-ELISP, for example.
A subset of python ? What percentage of PIP compile with it ?
My two complains with this announcement.
1. How does it compares to Julia ? It is the main contender for "performance with simple syntax".
2. They use OpenMP annotation for parallel computation. What is the level of expertise to use it correctly and how the programmer understands the memory management of their runtime, central part for HPC (eg.
- Re: Julia with a partial Python syntax. What's goo (Score:2)
  
  by e065c8515d206cb0e190 ( 1785896 ) writes:
  
  I logged in for that question. How does it compare to Julia?
  - Re: (Score:2)
    
    by jma05 ( 897351 ) writes:
    
    They are quite different, although both get their speed from LLVM.
    Julia is an excellent JIT language with a fairly large ecosystem, but not anywhere near as much as Python. It interops well with Python, but isn't syntax compatible with it.
    Codon is an AOT language with no real ecosystem as yet. Codon is a Python subset, minus the dynamic parts and also interops well with Python. It's standard library is a subset of Python's.
    If you want a better language for data science overall, use Julia. If you want a fast
    - Re: (Score:2)
      
      by e065c8515d206cb0e190 ( 1785896 ) writes:
      
      Oh yeah I meant Codon vs Julia. More out of curiosity than anything.
      I fall exactly in the category you describe though, data-science type usage and I'm more of an SME than an expert programmer (although I do hold a CS degree). I learned Julia about 6 months ago and have been slowly moving my Python code to it. I found Julia to be more concise, more elegant, faster (aside from startup time), although I can occasionally get stuck (lack of 3rd party lib or community help). I now write Julia code faster than P
      - Re: (Score:2)
        
        by jma05 ( 897351 ) writes:
        
        My experiences are about the same. 1.9 will fix the startup times somewhat.
        I have used Python over 2 decades. I would agree that if you need performance beyond what a smattering of Cython or Numba can give, its better to look for a faster alternative and Julia is a fine one. Cython and Numba do indeed complicate beyond a certain point.
        Julia is just superior for data science workflows since it was specifically designed for it. I use the entire Jupyter ecosystem. I think of them as complementing communities a
Support C-API? (Score:1)

by ndbecker ( 1943024 ) writes:

Does it support python c-api, so work with current extensions? If not, how can codon code call python extension code?
- Re: (Score:3)
  
  by butlerm ( 3112 ) writes:
  
  According to the documentation it is ASCII only internally for now, but supports Python extensions like NumPy with certain limitations. It has its own C FFI as well.
  I strongly suspect that if you want maximum performance you should use the native C FFI rather than the support for Python extensions (although that apparently all works as long as you stick to ASCII strings), because there is a conversion across the Codon to Python boundary and Python extensions are probably supported from the Python side. Of c
Slow algorithms are still slow, regardless. (Score:2)

by abelenky17 ( 548645 ) writes:

Some of the reason that many Python programs are slow is because the authors choose slow, but easy to understand, algorithms.
An O(n^2) algo with still be slower than an O(n log n), no matter how its compiled or translated.
I'm sure Codon can re-write the instructions to be faster, but I'm skeptical it can transform the underlying algorithm from a slow technique to a faster technique!
If non-experts in programming (regardless of their Domain-Specific SME) keep choosing slow algorithms in Python, the resulting
- Re: (Score:3)
  
  by jma05 ( 897351 ) writes:
  
  In this space (data science, AI research, data pre-processing etc), those are non-problems.
  Domain experts don't care about optimizing code to perfection. It's the Pareto principle.
  They can hire a professional programmer in the rare event they need to optimize something. Usually though, slow is quite tolerable.
  Python can be 40-100 times slower when working outside native extensions, which isn't usually an issue. This can accelerate some of those parts.
- Re: (Score:1)
  
  by strombrg ( 62192 ) writes:
  
  An O(n^2) algo with still be slower than an O(n log n), no matter how its compiled or translated.
  That's mostly what I was taught in school - but there was a brief aside, one day, saying that sometimes a worse algorithm could be faster if it, for example, stayed all in memory instead of hitting disk.
  EG, Python has a list type, which is kind of like an array, but the types can be heterogeneous, and they resize automatically. They're much faster than a linked list in Python, even though many algorithms that repeatedly append to a list are amortized O(n). Underneath it all, they're O(n^2), because to re
Historic Problem Solved (again) (Score:3, Funny)

by willkane ( 6824186 ) writes: on Sunday March 19, 2023 @12:26PM (#63382561)

The language-war is over: Python+Codon is the solution to all problems.

Sorry for the Rust devs. You were near, though.

Price, and alternatives (Score:1)

by strombrg ( 62192 ) writes:

ISTR hearing that Codon cost money for commercial use. Also, ISTR that it doesn't support much of the python standard library.

Also, it's not sounding that different from Shedskin and Cython, which do have at Least some standard library support.

Also, when a program running on CPython is running too slowly (which isn't that common), you just profile and run the hot spot on something like Shedskin, Cython or C - it's very rare to need to rewrite the whole program in another language.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
What's the catch? (Score:2)

by misnohmer ( 1636461 ) writes:

Typically compilers like this have a catch, most commonly that only a subset of the high level language can be used if you're going to compile it. Will this compile any project written in Python, as-is, into an executable which will run faster?
Testing it (Score:4, Interesting)

by Genrou ( 600910 ) writes: on Monday March 20, 2023 @10:28AM (#63384709)

I tried it to see what kind of results I could get with it. It happens that I have some different implementations of the Fast Fourier Transform that I use to benchmark these kind of things. What I found out is:
It doesn't implement every Python module. I couldn't get the array module to work. But this might be in their future plans.
It can get a little picky with variable types. For example, multiplying an integer with a complex won't work, or trying to print an integer using a floating point format. Maybe they're working on it too.
It can take some time to compile and run.
While the scripts indeed run faster, it was nothing close to 10x the speed, much less 100x. In general, I got a 2.5x speed up. It was outperformed by Pypy in every test I made.
Just for the record, I have the same algorithms implemented in C, and Pypy performs comparably to C. Disclaimer: they are not optimized, instead, I made an effort to make the same operations as much as possible, with the intent of comparing speeds. Also, not a scientific assessment, so take it with a grain of salt.

- Re: (Score:2)
  
  by jma05 ( 897351 ) writes:
  
  Such reasoning is quaint. Modern C compilers are far from that old model.
  Codon emits LLVM bitcode. The C compiler which is currently getting most attention (clang) also emits LLVM bitcode. So do many other modern languages.
  They are all writing against the abstract machine provided by LLVM. LLVM performs native code generation. So all these modern languages roughly have the same performance.
  Sometimes they have minor penalties due to additional runtimes, and other times they automatically optimize to specific

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Quant Finance (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Quant Finance (Score:4, Insightful)

Re: (Score:2)

Re: Quant Finance (Score:2)

no (Score:5, Funny)

Re: (Score:2)

Re: no (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: no (Score:3)

Re: (Score:2)

Re: no (Score:4, Informative)

Re: (Score:3)

Re:no (Score:5, Informative)

Re: (Score:2)

Re: (Score:1)

IBM and Compilers 101 (Score:2)

Re: (Score:1)

Re:no (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:no (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Okay, what's the catch? Then there's Julia. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Non-Commercial Use Only? (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Marketing (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Only ASCII Strings instead of Unicode (Score:4, Interesting)

Re: (Score:1)

Re:Only ASCII Strings instead of Unicode (Score:4, Interesting)

Typically on par, sometimes better than (Score:3)

Re: Typically on par, sometimes better than (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: Typically on par, sometimes better than (Score:2)

Re: (Score:2)

Re: Typically on par, sometimes better than (Score:2)

Nope.. (Score:4, Funny)