Proposed Change Could Speed Python Dramatically (infoworld.com) 97
"One of Python's long-standing weaknesses, its inability to scale well in multithreaded environments, is the target of a new proposal among the core developers of the popular programming language," reports InfoWorld:
Developer Sam Gross has proposed a major change to the Global Interpreter Lock, or GIL — a key component in CPython, the reference implementation of Python. If accepted, Gross's proposal would rewrite the way Python serializes access to objects in its runtime from multiple threads, and would boost multithreaded performance significantly... The new proposal makes changes to the way reference counting works for Python objects, so that references from the thread that owns an object are handled differently from those coming from other threads.
The overall effect of this change, and a number of others with it, actually boosts single-threaded performance slightly — by around 10%, according to some benchmarks performed on a forked version of the interpreter versus the mainline CPython 3.9 interpreter. Multithreaded performance, on some benchmarks, scales almost linearly with each new thread in the best case — e.g., when using 20 threads, an 18.1x speedup on one benchmark and a 19.8x speedup on another.
The overall effect of this change, and a number of others with it, actually boosts single-threaded performance slightly — by around 10%, according to some benchmarks performed on a forked version of the interpreter versus the mainline CPython 3.9 interpreter. Multithreaded performance, on some benchmarks, scales almost linearly with each new thread in the best case — e.g., when using 20 threads, an 18.1x speedup on one benchmark and a 19.8x speedup on another.
So (Score:4, Funny)
they are adding multiline comments finally?
Re:So (Score:4, Insightful)
"""
No.
"""
Re: (Score:2)
What kind of docstring is that? That's not helpful at all!
Re: (Score:2)
Cute, but technically that's a multiline string, not a multiline comment. IOW, you could do:
someVar = """
No.
"""
and someVar will have the value of "\nNo.\n".
It is a weird hack that the Python community has decided that a multiline string at the beginning of functions and classes are for documentation.
I guess for the most part the multiline string not assigned to anything works about the same as a multiline comment. I've certainly used fake values in JSON configuration files to be comments, since the JSON s
Re:So (Score:5, Funny)
they are adding multiline comments finally?
# I
# Don't
# Have
# A
# Problem
# With
# It.
Re: (Score:2)
""" /*
Here is
an example
of a multi-line
Python comment
*/
"""
Re: (Score:2)
Re: (Score:2)
https://en.wikipedia.org/wiki/... [wikipedia.org]
illegitimate Java (Score:2)
Visual J++ [wikipedia.org]
ART [wikipedia.org]
Dalvik [wikipedia.org]
honorable mentions, because they came out the same year as Java or were created independently:
Limbo & Dis virtual machine [wikipedia.org]
Squeak (SmallTalk dialect) [wikipedia.org]
Re: (Score:2)
Re: (Score:3)
Java made the mistake of settling on 16-bit unicode chars. There are a few other problems, but that's the one that makes me not look seriously at it. And they could fix it fairly easily, but they haven't for decades. All they need to do is make a ustring class on parity with string, and a uchar type that's 24 bits long (32 would be fine).
OTOH, I really hate how Java forces global variables to be members of a class. And I dislike the insistance that simple variables, like ints, be converted to classes in
Re: (Score:2)
Who's talking about Java here besides you? What does Java have to do with re-architecting the Python GIL, which has been a long-time issue.
Re: (Score:2)
Java not on the list. [youtu.be]
Re: (Score:2)
I always thought it was funny when the marketers kept claiming Java was first at write once run anywhere (especially when in practice it was write once run only on the same sub minor version of the JVM). The p-system came out in 1969.
Many programs assume thread safety (Score:5, Interesting)
This change could provide real benefits, but it's not a automatic silver bullet as many multithreaded Python programs rely on the inherent thread safety of data structures.
For example, consider a dictionary object that is read from in the main thread, while one or more worker threads are computing results and writing them to that same dictionary. In a typical implementation in Python, you wouldn't explicitly lock access to the dictionary because it's completely unnecessary (due to the GIL): it's impossible for any of the threads to corrupt the Python interpreter because nobody is /really/ accessing the dictionary concurrently. With the proposed change, however, such programs would likely blow up.
That said, I'd love to see this thing be added to Python as some sort of runtime option that can be enabled or via a separate python executable that you can drop in, so that by default you'd still get the automatic thread safety but then have the option of much better performance if you explicitly agree to take on the burden of thread safety yourself.
Re: (Score:2)
Re: (Score:2)
Funny, that's not what I've read everywhere about Python 2 vs Python 3. In fact, before that I had never read of a language that broke compatibility with its earlier versions.
Re:Many programs assume thread safety (Score:4, Informative)
Visual Basic broke *utterly* when going from VB6 to VB dotNET. It wasn't fully backwards compatible before that either.
Perl routinely broke backwards compatibility, up to and through 5.10.1.
Java has broken lots. INT anyone?
It's been the norm since the 90's. Before that, sure, not so much. COBOL written in the 70's will generally run fine today. But in the big languages, entrenched in enterprises, it's the norm, and has led to immense amounts of legacy.
If you haven't heard of it, you haven't been in the trenches much.
Re: (Score:1)
Perl routinely broke backwards compatibility, up to and through 5.10.1.
Huh? What features of Perl were changed so that a piece of code written for 5.6 wouldn't work on 5.8 or 5.10 without edits?
Re: (Score:3)
Re: (Score:2)
I have not had good experiences with using multi-threading to improve computation speed. I did some work on a custom pool memory allocator in C++, and it went like the clappers. Then I modified it to be thread safe, with locks and mutexes and so on, and the performance collapsed. In the end, it was not worth the faff of working out what was going on with multiple threads.
Re: (Score:2)
Yeah, in current Python you can't really get more computing capacity via threads unless the thread ends up releasing the GIL (e.g. worker threads compressing images by calling out to a C library). Threads are also handy for other GIL-releasing scenarios like disk or network I/O, waiting on user input, monitoring for hardware events, etc.
Re: (Score:2)
Generally, multi-threading is done to isolate the parts doing the heavy lifting to allow the program to still handle input (via UI or socket or wherever) and output. Doing it to improve computation speed beyond that requires that the load can be predicted and handled well, which is rare, or a heavy amount of hand crafting and tuning to get it working right.
This will be very useful for using Python as a "filter", and for running continuous calculations on streaming or varying input data. I doubt it will do m
Re: (Score:2)
Generally, multi-threading is done to isolate the parts doing the heavy lifting to allow the program to still handle input...
This is where I did score a hit with multi-threading. My real-time sound synthesis code was in one high priority thread, and the MIDI musical score interpreter was in another. I had another thread doing GUI stuff, which was lowest priority, compared to rendering the music.
I have to say that the biggest benefit I have seen from multi-core hardware is that a bad process consuming 100% CPU cannot lock you out of your machine. However, all the web browsers I have experimented with launch multiple processes, so
Re: (Score:2)
Re:Many programs assume thread safety (Score:5, Informative)
This change could provide real benefits, but it's not a automatic silver bullet as many multithreaded Python programs rely on the inherent thread safety of data structures.
Have a look at the section titled "Collection thread-safety", starting on page 6 of the proposal. The authors of the proposal are aware of the issue and have implemented mechanisms to allow for thread-safe multithreaded access to lists and dictionaries, at least.
Re: (Score:2)
I hadn't seen that, thank you!
Re: (Score:2)
The big obvious win would be a state where a thread in a function need not take the GIL when accessing locally scoped variables.
Possibly a win, posibly a train wreck: Allow a function to declare read-only access to variables in larger scopes. No lock needed as long as all users have declared read-only.
Re: Many programs assume thread safety (Score:1)
When I was in college (Score:5, Funny)
working on something...possibly a robot, I don't remember at this point...this python fanboi I was working with swore that python was the way to go 'cuz it had this great library for just what we needed to do.
My position was that it should have been done in C on account of it was a tiny embedded CPU and we couldn't spare the overhead.
He was all like...naw man, it'll be fine with the jit compilation and it'll be great!
And then he's looking through the manual for this library and it turns out it's not only not threadsafe, it'll actively fail if called from a thread.
I don't say a word.
He looks at me and tells me to shut up.
Re: When I was in college (Score:2)
Dude...writing a multithreaded program in C is not hard. Introducing race conditions into multithreaded programs is easy in any language. And writing subtle hard-to-find bugs is is also possible in any language, even English prose.
Re: (Score:2)
Dude...writing a multithreaded program in C is not hard.
Thesis.
Introducing race conditions into multithreaded programs is easy in any language. And writing subtle hard-to-find bugs is is also possible in any language, even English prose.
Thesis disproved. QED.
I'm not too encouraged by your logic skills. Maybe you should stay away from C.
Re: When I was in college (Score:3)
Yeah and it's easier to crash a helicopter than to fly one. This does not imply that no one should fly helicopters; it means people who know how to fly helicopters should be the ones flying helicopters.
Re: (Score:1)
In other words: "Derp! Derp! Just don't hire any of those idiots who write bugs in C!! For example, I'm a C god!"
Nice story, bro.
Re: When I was in college (Score:2)
Kind of describes every resume I've ever seen for an embedded coding job.
In my experience hiring people for that kind of work, after screening out the obvious frauds, roughly 40% of the remaining candidates make that claim believably.
And my personal record is about a 66% success on sniffing out the ones who end up panning out. That is to say, of the 3 people I hired for jobs like that, only 1 couldn't cut it, while 2 could.
Perhaps you have a less discriminating hiring manager making your decisions for you c
Re: (Score:2)
God Dammit.
Re: (Score:3)
Re: (Score:2)
And yet, MicroPython is increasingly being used in robotics with a wide variety of microcontrollers. It's not necessarily a replacement for C, but if I want to prototype something quickly, MicroPython works very well indeed. Having a REPL on the serial port is pretty darn useful too.
Re: When I was in college (Score:2)
Back in the day, you could get a thing called a Javalin which was a BasicSTAMP microcontroller with a Java bytecode interpreter. It had its uses, but it had its limitations too.
It came from a time when java was still new enough and hyped enough for people to think it was Fred Brooks's silver bullet.
Re: (Score:2)
I like to think of MicroPython as of an API layer.
Whenever you have (new) hardware that MicroPython hasn't yet supported with your MCU, you could go ahead and write a Python object backend in your language of choice (C of course), and have the API exposed in Python.
Yes, MicroPython needs some extra resources compared to plain C, but that's not an insupportable amount. Besides, you mostly are going to waste resources anyway by whatever "OS" replacement you're going to use on your PLC (STM32 libraries? Some R
It's about time! (Score:3, Insightful)
Re: (Score:2)
Literally. Now if they would also use curly braces instead of indentation for logical blocks, it would finally be a language worth using. Messed-up indentation is constantly screwing up logic when code is merged or refactored, and editors/IDEs that auto-"fix" indentation silently mess up program logic.
Indeed!
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Probably a perl programmer. They think line feeds are unnecessarily redundant.
Re: (Score:3, Insightful)
Indenting as a matter of style, and indenting with perfection as it dictates the actual program logic, are two entirely different things.
Also...
if (failure) { puts("Failure!"); return;}
Is perfectly fine. Again, it's a matter of style, and style should be flexible and superficial to function.
Re: (Score:2)
Re: (Score:3)
No, that is absolutely NOT "perfectly fine". That will make it incredibly easy to miss that that actually performs a "return" call when scanning the code on a late Friday afternoon trying to figure out what is making the program exit early.
Code is written once, and read hundreds of times. It NEEDS to be consistent, clear and without clever formatting exceptions.
You have just illustrated exactly why Python's approach is so vastly superior to curly bracing.
Re: (Score:2)
No, that is absolutely NOT "perfectly fine".
This! And then some more of this.
It's even nicer of you do it like:
if (failure) { puts("Failure!"); return;
free(ptr);
And then wonder where your memory goes. Or somebody smart goes through your code and amends:
if (failure) { puts("Failure!"); return; } else if (!failure && refcnt==0) {
puts("Outa here");
}
...because that will happen. Nobody will go ahead and format your code before applying changes to it.
Finally, someone who writes that, would also write this, sooner or later:
if (failure) return;
...and then some idiot comes along and says:
if (failure) puts("Failure"); return;
Or even worse:
if (failure) return;
do_work();
while fixing a bug sometime at 2 AM. All because this, to someone, was somehow unbearable:
if (failure) {
puts("Failure!");
return;
}
Re: (Score:2)
Re: (Score:2)
No solution is perfect, sadly. But in my decades of work, such merge issues have been rare.
On the other hand, a clause ending up outside of a scope by mistake in curly brace languages has happened a lot more often, often with results which are not caught until (at best) integration test, or worse, deployment. Yes, unit tests SHOULD handle that, in a perfect world.
Re: (Score:2)
Re: It's about time! (Score:2)
I put the curly braces where I need them, sometimes making several new if-blocks or removing/merging several for loops, and then I have emacs auto-indent the whole file for me...using the curly braces I put in there.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Having whitespace elements, which a very large fraction of IDEs and copy-paste buffers will casually and automatically change, be syntactically important is a terrible idea since this now represents "quietly and invisibly corrupting your code."
Re: (Score:3)
Performance is only half the problem. Python's memory footprint is insane, and it's many times forced me to rewrite programs in C. (In some situations numpy is a solution, but in far too many I've found myself in, it wasn't)
Re: (Score:2)
Performance is only half the problem. Python's memory footprint is insane...
Well no it isn't, not for my last Python project, anyway. I initially over-engineered the job, using an SQLite database. This created a need for SQL to access the data, which was a bit of a pain in the butt. It turned out that I could write much simpler code using pipe-separated-values format. It was less efficient than the SQL approach I suppose, but who cares? I did not notice any significant processing delay. The stupid simple approach got the job done. I did some calculations to work out if the dumb lin
Re: (Score:3)
Literally. Now if they would also use curly braces instead of indentation for logical blocks, it would finally be a language worth using. Messed-up indentation is constantly screwing up logic when code is merged or refactored, and editors/IDEs that auto-"fix" indentation silently mess up program logic.
Huh? That's one of my favourite features. You should be indenting already for readability, and once your code is indented the curly braces are just extra characters reducing readability.
Sure, once in a while the wrong indentation breaks the logic. But the other side is the indentation goes wrong and is doesn't change the logic. In that case what the dev sees (indents are easier to see than braces) and what the program does are completely different, which in general turns out worse.
Re: (Score:2)
I also love the white space syntax. It feels more comfortable to me and I feel like being close executable pseudocode makes development a bit faster. But I can certainly understand why people don't like it, and why there can be issues at scale, especially when you're dealing with different developers on a team, cutting and pasting, etc. It's a trade-off. One I happen to like, but it's not for everyone.
It may seem like a joke, but a braces dialect might not be so bad for those that want to use it. I once s
Re: (Score:2)
Huh? That's one of my favourite features. You should be indenting already for readability, and once your code is indented the curly braces are just extra characters reducing readability.
There is a pernicious kind of typo that afflicts a language that makes indentation syntactically significant. When you refactor Python code by cut and paste, or copy in stuff from existing code, you have to take extra care that you adjust the indentation correctly, or you can end up with code that is syntactically valid, but does nothing like what you expected.
I have found this with programming in C. In conditionals, the body can be a single statement, without the need to create a block enclosed in curlies.
Re: (Score:2)
This is false. The curly braces provide extra readability - they clearly show where a block starts and ends instead of having to infer that from whitespace. And significant whitespace can absolutely become a problem when merging code.
There's no rule saying you can't indent C code meaningfully, and most C code I see is indented perfectly. It's lack of discipline that leads to not indenting code and I don't
Re: (Score:2)
> and once your code is indented the curly braces are just extra characters reducing readability.
This is false. The curly braces provide extra readability - they clearly show where a block starts and ends instead of having to infer that from whitespace. And significant whitespace can absolutely become a problem when merging code.
There's no rule saying you can't indent C code meaningfully, and most C code I see is indented perfectly. It's lack of discipline that leads to not indenting code and I don't need an interpreter to bug out if I don't indent code perfectly every time, because I have curly braces to keep the compiler from crashing if I merge two pieces of code with slightly different indentation. The indentation shouldn't be necessary, the program logic is what is key.
More code isn't necessarily clearer code. When the code is properly indented the braces become redundant, and redundant code is generally clutter you need to mentally filter out when reading.
Re: (Score:2)
I guess it is a matter of taste. but there are many conditions in python where bracketing would make things much more readble.
The lack of brackets leads to a hodge-podge of ways to denote start/end blocks of code in many conditions, even PEP8 has no consensus
for multi-line 'if' or function calls, or function definitions.
Double-indent. to prevent you from bluring into your actual code;
or, adding a comment;
or adding parens (and then matching function indenting).
Maybe just put a bracket there. and it doesn't
Re: (Score:2)
Mutlthreading in Python is so pathetic (Score:2)
that I never use it. I spawn processes and use pipes and queues and shared memory like in ancient times - essentially trading memory - and massive amounts of it - for performances. It's one of my main beef with it.
It's high time this got fixed.
Re: (Score:2)
It's high time this got fixed.
as far as i can tell, it won't. this change will only add some performance, which is good, but it can't make python magically thread safe, nor can it add any real threading paradigm, meaning you will have to continue doing the same.
which isn't a bad approach at all, imo. kids these days! ;-)
Re: (Score:2)
I was so surprised when I read someone write that python lacked effective multithreading. How could such a mature language not do something so fundamental? I tested it and found that indeed, a program with 4 threads doing work took 4x as long. (I don't remember what kind of data they were accessing in the work.)
Apparently fixing this is a big challenge [python.org], though it's hard to not be judgmental that the language designer didn't think this important enough to fix a decade ago.
Re: (Score:3)
Multithreading on a single core is not so efficient in any language anyway. So when you need performance in Python, you want to use multiprocessing anyway. On Linux there's really very little difference in overhead between a thread and a process as well so maybe it doesn't matter as much. And many times multi threading is used when asynchronous programming might the better option, such as scaling up server programming. Multi-threaded programming has really fallen out of late because it doesn't scale pas
Re: (Score:2)
I'm aware that async is a good paradigm for network communications, and for ease of use. Though I tried it for a program that was I/O bound (in Rust) and found it hurt performance compared to using a small number of worker threads. As for threads and cores--except in python, single process multi-threaded programs I've written have always seemed to be able to utilize multiple cores. At least they maxed out my CPU. If cores aren't being used efficiently, would I still see Task Manager or top claiming high uti
Re: (Score:2)
You are correct. In Python multi-threaded programs occupy one core only, because of the GIL. This work will hopefully change that a bit. In other languages, or when talking about OS-level threads, those are farmed out to many cores, making them truly run simultaneously, which is why multi-threading on multiple cores is a good way of increasing computational performance. Totally agreed. The question is, does Python need multi-therading performance? On Linux, that could be no because on Linux there's vi
Re: (Score:1)
There is enormous difference in overhead between a thread and a process. Start up times for a thread are typically hundreds of cycles, versus hundreds of thousands or more for a process. More importantly, data access times across thread are only a few cycles to perhaps a hundred. For processes this is many thousands at least.
The simplest, safest and lowest hanging fruit for multiple cores is to split up a big loop across threads. This can typically be done with one OpenMP directive, has virtually no overhea
Re: (Score:2)
If you're looking for high performance, you're probably better off using one of the multithreaded libraries or writing your intensive code in C, where you can use threads just fine, and calling it from Python.
It would be nice to have a Python interpreter that utilized multiple cores better, but there are lots of more effective options already.
Re: (Score:3)
that I never use it. I spawn processes and use pipes and queues and shared memory like in ancient times - essentially trading memory - and massive amounts of it - for performances. It's one of my main beef with it.
It's high time this got fixed.
Try out the multiprocessing library [python.org]. That's essentially what it does. It's definitely got some limitations in how you pass memory around, but you can take full advantage of multi-threaded performance.
Re: (Score:2)
Badly thought out multi-threading is pathetic in any language. I have found this in C++. I had a whole bunch of threads rendering real time audio for a synth, but managing these threads and collecting the results clobbered the performance, probably because I divided the task too finely. Multi-threading is not a magic potion.
Re: (Score:1)
You want to use something mature, like OpenMP, that can manage this kind of scheduling and data collation for you. Using whatever threading library is being pushed this week just makes these issues harder.
Does the proposed change... (Score:2)
Finally! (Score:4, Funny)
Proposed Change Could Speed Python Dramatically
They're going to re-write it in Perl. :-)
Finally, Death To Multiprocessing! (Score:5, Insightful)
Python performance limitations have lead too many people to desperate hacks, the foremost of which is the completely misapplied multiprocessing.
As the name suggests, it is multiple processes. That means expensive launch time per process and, more importantly, expensive communication across totally different memory spaces. That is acceptable, and at times necessary, for multiple nodes. Most languages have a mechanism for those cases (MPI for the pros). While Python has an MPI, the community has largely decided to pretend that mutiprocessing=multithreading out of embarrassment that there is no multi-threading.
But, it is no substitute for multi-threading - for when codes should be sharing data on a single node, in a single memory space. Multi-threading is orders of magnitude faster to communicate data, and the thread coordination and creation mechanisms are critical for performance. More importantly, what might be a single directive to multi-thread a loop instead becomes a totally unnatural, contrived code to accommodate the wrong, distributed memory, paradigm.
We are not talking minor differences. This is truly multiple orders of magnitude kind of stuff, and it is fortunate that most of the important Python libraries can be written in C or other languages (Fortran, lol) that can access multi-threading, usually via OpenMP.
And as every major processor these days is very multi-core, this would be a huge step forward in making Python more efficient.
Re: (Score:3)
I think that it depends a lot on your workload. If each execution unit is mostly independent and only needs to coordinate with other units so that work isn't duplicated, then multiprocessing will probably be as fast as multithreading. It might even be faster because you might be able to avoid some underlying locks. In addition, it allows for easier migrating to running on multiple hosts and also prevents a crash in one execution unit from taking down the entire system.
Unless you are starting a new thread
Re: (Score:1)
That startup time for a new process is always much, much more than for a new thread. In any circumstance. Perhaps you are saying that you don't care because of how long they live relative to the compute time, which could be true.
Even if each unit is 100% independent, MT will still be faster then MP. There is no locking, mutexes, semaphores, etc. necessary in that case, so all you do in increase the context switch time. As the dependencies increase, the thread coordination mechanisms (mutexes, etc.) will alw
Re: (Score:2)
There could be locks in your lower level libraries that could make MT threading slower than MP. For example, calls to malloc() could take a lock (depending on your libc implementation). Those locks could add up in your MT implementation due to contention that wouldn't happen in the MP implementation. But you're right that without knowing specifics of a workload, either solution could be better.
The MP solution that I am most familiar with was called the Sun Grid Engine. Before their demise, Sun open sour
Oh now they care? (Score:2)
So decades ago when I wanted multithreading performance from python and was literally kicked off to C++ land (love it btw), they now change their tune?
Thought the "science was settled" about how the GIL is the best thing since sliced bread and I am just a hipster who oh wait, correctly spotted what would be inportant in the future and asked for it earlier... like pre-python 3 you tools.
Go rewrite your rewrite because you didnt listen to us last time...
Re:Oh now they care? (Score:4, Insightful)
No, the GIL has always been seen as at best a necessary evil, not a desirable feature. The GIL is not considered part of Python but only part of the CPython implementation - other implementations are free to (and do) use other methods.
There have been many attempts to remove the GIL in the past, but they've all suffered from one major flaw - significant performance reduction on single-threaded code. This new implementation appears to not suffer from that same flaw, so for the first time in a long time people are getting excited about the possibility of finally fixing what most consider an undesirable (but practical) implementation in CPython.
Re: (Score:2)
Not on my planet. For the last 20 years or so, the most praise I have heard about the GIL is that it's here and it works.
Wake me when it's 5x faster (Score:2)
Yes, we could compensate for python's abysmal performance by using more processors. But first off, that's not possible for all types of problems and secondly, that's just a horrifyingly wasteful solution..
Python is dog slow. A 10% improvement is not lipstick on a pig, it's chapstick.
Re: (Score:2)
Because it's very, very important to wait for the database quickly.
Re: (Score:2)
Python is excellent at being glue-code. That it is not suitable for heavy lifting is pretty obvious. On the other hand, doing that heavy lifting in C and embedding that in Python is pretty easy to do.
Jython was always multi-threaded (Score:2)
Do not expect too much (Score:2)
One of the frequently made mistakes by non-CS experts is to expect multithreading to give massive speed ups. In actual reality, there are few things where that is true. Most real-world software can benefit from a few parallel activities (say, up to 3 or 4), but that is it. For some software, single-threading it is the fastest option. In addition, even where parallelizing it is beneficial, this is not a beginner's game. Doing multi-threaded software right is tricky and you need to deal with some new problems
Ugh (Score:2)