Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python 3.11 vs 3.10 performance (github.com/faster-cpython)
715 points by hyperbovine on July 6, 2022 | hide | past | favorite | 444 comments


There was always a denial of removing the Global Interpreter Lock because it would decrease single threaded Python speed for which most people din’t care.

So I remember a guy recently came up with a patch that both removed GIL and also to make it easier for the core team to accept it he added also an equivalent number of optimizations.

I hope this release was is not we got the optimizations but ignored the GIL part.

If anyone more knowledgable can review this and give some feedback I think will be here in HN


I use a beautiful hack in the Cosmopolitan Libc codebase (x86 only) where we rewrite NOPs into function calls at runtime for all locking operations as soon as clone() is called. https://github.com/jart/cosmopolitan/blob/5df3e4e7a898d223ce... The big ugly macro that makes it work is here https://github.com/jart/cosmopolitan/blob/master/libc/intrin... An example of how it's used is here. https://github.com/jart/cosmopolitan/blob/5df3e4e7a898d223ce... What it means is that things like stdio goes 3x faster if you're not actually using threads. The tradeoff is it's architecture specific and requires self-modifying code. Maybe something like this could help Python?


I would love to see this implemented purely for curiosity's sake, even if it's architecture-specific.

Personally, cosmo is one of those projects that inspires me to crack out C again, even though I was never understood the CPU's inner workings very well, and your work in general speaks to the pure joy that programming can be as an act of creation.

Thanks for all your contributions to the community, and thanks for being you!


That's a pretty clever hack, nicely done!


The GIL removal by that guy reverted some of the improvement done by other optimisations, so the overall improvement was much smaller.

And most people do care for single-threaded speed, because the vast majority of Python software is written as single-threaded.


> the vast majority of Python software is written as single-threaded.

This is a self-fulfilling prophecy, as the GIL makes Python's (and Ruby's) concurrency story pretty rough compared to nearly all other widely used languages: C, C++, Java, Go, Rust, and even Javascript (as of late).


Getting rid of the GIL will also immediately expose all the not-thread-safe stuff that currently exists, so there's a couple of waves you would need before it would be broadly usable.


Cool, they should start now.

As a python dev, pythons multiprocess/multithreading story is one the largest pain points in the language.

Single threaded performance is not that useful while processors have been growing sideways for 10 years.

I often look at elixir with jealousy.


Or maybe keep things the way they are.If you really need performance python is not the language you should be looking for.

Instead of breaking decades of code, maybe use a language like Go or Rust for performance instead.


Python is also the dominant language for machine learning which does care for performance. The person who made recent nogil work is one of the core maintainers of key ML library. The standard workaround is ML libraries, the performance sensitive stuff is written in C/C++ (either manually or with cython) and then uses python bindings. But it would be much friendlier if we could just use python directly.

It's also commonly used language for numerical work in general. Most of the time numpy is enough and then occasionally you'll need something not already implemented and then have to do your own bindings.


> Python is also the dominant language for machine learning which does care for performance. The person who made recent nogil work is one of the core maintainers of key ML library. The standard workaround is ML libraries, the performance sensitive stuff is written in C/C++ (either manually or with cython) and then uses python bindings. But it would be much friendlier if we could just use python directly.

Multithreading is not really the reason why things get written in cython etc., you can easily see 100x improvements in single threaded performance (compared to maybe a factor of 2-8x for multithreading). If you care about performance you'd definitely write the performance critical stuff in cython/pythran/c.


Nope, C++ and Fortran are.

The bindings available in Python can also be used from other languages.


I would have thought convincing people they’ll just have to use Go or Rust or Elixir would have been an easy sell around here.

Turns out they just want a better Python.


>Turns out they just want a better Python.

That's Go. It gives actual types[1] and structs (so you dont have to wonder about dict, class, class with slots, dataclasses, pydantic, attrs, cattrs, marshmallow, etc). It removes exceptions and monkeypatching. It's async-first (a bit like gevent). It's inherently multicore. And you can package and distribute it without marking a pentagram on the ground and sacrificing an intern to Apep.

You just need to stop being friends with C FFI. Which is fine for web and sysadmin tools. For data science and ML/AI/DL, it's less ok.

[1] And the types are actually checked! I think a lot of people aren't really using python type checking given how slow it is and no one seems to complain. Or maybe everyone is using pyright and pyre and are satisfied with how slow these are.


Going to the very unexpressive Go from the expressivity of python is a goddamn huge jump though.

Going to JS or even TS for performance would be saner, and it has a same-ish object model even.


The expressivity in Python is a problem that needs to be solved though. Moving to JS goes the wrong way.


It already exists, but they don't want to learn other languages.


> Instead of breaking decades of code

Pin your version.


Concurrency is not solely required for performance. I'm designing a small tool (a file ingester/processor) for myself, which gonna need several, really concurrent threads. I love Python, but I can't use that, so I'm learning Go.


Why put Go and Rust in the same category? I never really understood that.

Either include like almost every language from JS, Java, C# to Haskell, or just list C++ and Rust. But Go is in the former category.


Python has basically already done exactly that with 2.7 to 3 and we came out of that relatively fine.

I say bring it.


So you are essentially saying Python is obsolete. It's used for decade old code and for new code you should use go or rust.


> As a python dev, pythons multiprocess/multithreading story is one the largest pain points in the language.

Hmm, how is that so?

As a python dev as well, I don't have much complaint with multiprocessing.

The API is simple, it works OK, the overall paradigm is simple to grok, you can share transparently with pickle, etc.


Multiprocessing is fine, but the cost of starting another interpreter is pretty visible, so you need to create a pool, and it may not be an overall speedup if the run time is short.

It takes more careful planning than async in JS?, say, or goroutines.


Yeah but for JS style async you'd use probably use an event loop in Python, not multi processing.


Yes. But, frankly, async is also simpler in JS than in Python: e.g. no need to start a reactor loop.


Starting the event loop is no worse than any setup of a main function, it’s a oneliner: asyncio.get_event_loop().run_until_completion(my_async_main)


Errr no, that has been replaced with asyncio.run quite some time ago.


Pickle isn't transparent though, custom objects that wrap files or database sessions need to override serialization.

The ProcessPoolExecutor is nice but shouldn't be necessary.


On the flip side, if your workload can be parallelized across thousands of cores, python has about the best CUDA support anywhere.


That would be C++ and Fortran actually.


But the python bindings are great and useful to many, so Python gets to be added to the list.


Just like any language with FFI capabilities to call the same libraries.


As another python dev, it has basically never been a painpoint for me.

Your anecdote adds little.


Pretty sure it is actually a statement that comes from the "python is a scripting language" school, and not because the huge horde of programmers that craves concurrency when they write a script to reformat a log file to csv keeps being put off by the python multiprocessing story.


Not sure I understand your point, can you clarify? Python is used across many different domains, being able to really take advantage of multiple cores would be a big deal.

I'd really appreciate if python included concurrency or parallelism capabilities that didn't disappoint and frustrate me.

If you've tried using the thread module, multiprocessing module, or async function coloring feature, you probably can relate. They can sort of work but are about as appealing as being probed internally in a medical setting.


I'm not the person you responded to, but I think the gist of it is: what it is is not defined by how it is used. Python, at its core, is a scripting language, like awk and bash. The other uses don't change that.

Occasionally, a technology breaks out of its intended domain. Python is one of these - it plays host to lots of webservers, and even a filesystem (dropbox). Similarly, HTML is a text markup language, but that doesn't stop people from making video games using it.

The developers of the technology now have to make a philosophical decision about what that technology is. They can decide to re-optimize for the current use cases and risk losing the heart of their technology as tradeoffs get made, or they can decide to keep the old vision. All of the design choices flow down from the vision.

They have decided that Python is a scripting language. Personally, I agree (https://specbranch.com/posts/python-and-asm/). In turn, the people using the language for something other than its intended purpose have choices to make - including whether to abandon the technology.

If instead Python moves toward competing with languages like go, it is going to need to make a lot of different tradeoffs. Ditching the GIL comes with tradeoffs of making Python slower for those who just want to write scripts. Adding more rigorous scoping would make it easier to have confidence in a production server, but harder to tinker. Everything comes with tradeoffs, and decisions on those tradeoffs come from the values of the development team.

Right now, they value scripting.


Python has already become a lot more than a scripting language. To say today that scripting is it's core identity seems naive at best. Yes, it has roots but has object oriented and functional facets which do not exist in awk or bash.

Pandas, numpy, scipy, tensorflow. All of these go way beyond what is possible with a scripting language.

Since when is the runtime performance of a script a serious concern? Why is it a problem if this aspect gets slightly slower if it brings real concurrency support?


Awk is a functional language with a surprising number of features. Bash, not so much. Developers really do care about runtime performance of scripting languages: They often wait for scripts to finish running before doing other work, and if the sum of the time it takes to write and execute a script is too long, they will look for an alternative.

All of the libraries you have cited are from the ML and stats communities, and they are not core language features. From what I understand, ML folks like Python because it is fast to play with and get results. In other words, they like it because it is a scripting language.

Personally, I like that Python has kept the GIL so far because I would never run a 24/7 server in Python and I am happy to use it very frequently for single-threaded scripting tasks.

Edit: I didn't decide that Python was a scripting language. The Python maintainers did. The point is that the identity of a project doesn't flow down from its use cases.

Edit 2: I should have said "the identity of a project doesn't flow down from its users."


"Personally, I like that Python has kept the GIL so far because I would never run a 24/7 server in Python and I am happy to use it very frequently for single-threaded scripting tasks."

Just as a side-note - my prior gig used Python on both the Server and Data Collection industrial systems. It was very much a 24x7x365 must-never-go-down type of industrial application, and, particularly when we had a lot of data-sources, was very multi-processing. Was not unusual to see 32 processes working together (we used sqlite and kafka as our handoff and output of processes) running on our data collection appliances.

Our core data modelling engine would routinely spin up 500 worker pods to complete the work needed to be done in 1/500th of the time, but we would still see some of the long term historian runs take upwards of a week to complete (many hundreds of machines for multiple years with thousands of tags coming in every 5 seconds is just a lot of data to model).

I say this mostly to demonstrate that people and companies do use python in both large-processing intensive environments as well as industrial-must-never-stop-24x7 mission critical appliances.

I don't ever recall any of our engineers looking at Python as "a scripting language" - it was no different to them than Java, C#, Rust, Go, C++ or any other language.


I know that people do use python for 24/7 "must not fail" applications. I'm just not smart enough to write python that I would trust like that. Python comes with a tremendous number of foot guns and you have to find them because there is no compiler to help, and it can be a real pain to try to understand what is happening in large (>10,000 line) python programs.


I think the key here is to see using the Python language as an engineering discipline like any other, and just take the classes, read the literature, learn from more Senior Engineers, and projects, before attempting to develop these types of systems on your own.

I don't think anybody expects a recent graduate from computing science, with maybe 4 or 5 classes that used python under their belt (and maybe a co-op or two) to be writing robust code (perhaps in any language).

But, after working with Sr. Engineers who do so, and understanding how to catch your exceptions and respond appropriately, how to fail (and restart) various modules in the face of unexpected issues (memory, disk failures, etc...) - then a python system is just as robust as any other language. I speak from the experience of running them in factories all over the planet and never once (outside of power outages - and even there it was just downtime, not data loss) in 2+ years seeing a single system go down or in any way lose or distort data. And if you want more performance? Make good use of numpy/pandas and throw more cores/processes at the problem.

Just being aware of every exception you can throw (and catching it) and making robust use of type hinting takes you a long way.

Also - and this may be more appropriate to Python than other languages that are a bit more stable, an insane amount of unit and regression testing helps defend quite a bit from underlying libraries like pandas changing the rules from under you. The test code on these project always seemed to outweigh the actual code by 3-5x. "Every line of code is a liability, Every test is a golden asset." was kind of the mantra.

I think that what makes python different from other languages, is that it doesn't enforce guardrails/type checking/etc... As a result, it makes it trivial for anyone who isn't an engineers to start blasting out code that does useful stuff. But, because those guardrails aren't enforced in the language, it's the responsibility of the engineer to add them in to ensure robustness of the developed system.

That's the tradeoff.


It's not that hard, we have compute-intensive servers running 24/7 in production and written entirely in Python on our side, using C++ libraries like PyTorch.

You just have to isolate the complicated parts, define sensible interfaces for them, and make sure they are followed with type hints and a good type checker.


>> To say today that scripting is it's core identity seems naive at best. Yes, it has roots but has object oriented and functional facets which do not exist in awk or bash. Pandas, numpy, scipy, tensorflow. All of these go way beyond what is possible with a scripting language.

No, it is not. It's what you can do with a scripting language, which is a fairly straightforward functional categorization. If you are very familiar with it and really want to just ignore the performance issues and don't mind strange workarounds (eg elaborate libraries like Twisted), you probably use python.

> The point is that the identity of a project doesn't flow down from its use cases.

Use cases are what determines the identity of a language. It takes a massive and ongoing marketing campaign to convince people otherwise, with limited success without use-cases. Python is popular because it's a scripting language with a relatively low learning curve (one true way to do things philosophy). That's it. It will improve over time, but it's been slow going...and that's in comparison to the new Java features!

Haskell is a shining example of how the language identity is foisted upon the public until you are convinced enough to try it. It doesn't take long to learn it's a nightmare of incomplete features and overloaded idioms for common tasks, so it isn't used. There aren't good use-cases for a language with haskell's problems, so the developer community and industry-at-large avoids it.


But all the “magic” that makes the scientific stack so great is largely due to numpy (and many other Fortran or c driven code) being fantastic. Python is the glue scripting language for organizing and calling procedures. That’s,IMO, it’s original and core purpose.

Now the fact that you can do some fun metaprogramming shenanigans in Python just speaks to how nice it is to write.


Except most of them are bindings written in native languages, being scripted from Python.


If we’re going to leave Python as a scripting language (fine by me), can we get the machine learning community to swap to something better suited?

It strikes me as a bit of a waste of resources to keep stapling engineering effort into the Python ML/data ecosystem when it’s basically a crippled language capable of either: mindlessly driving C binaries, or scripting simple tasks.

What other performance, feature and technique advancements are we leaving on the table because the only “viable” ecosystem is built on a fundamentally crippled language?


From what I can tell, the ML community is moving toward Julia. I don't think anyone predicted that they would end up locked into Python so heavily.


That's not been my experience much at all work or research wise. Tensorflow, pytorch, jax are still very dominant. I've worked at several companies and interviewed at several dozen for ML roles. They have 100% been python/c++ for ml. I'd be impressed if even 2% of ML engineers used Julia.


I feel like Julia will take more of the R people than the Python people, to be honest.


I wanted to use Julia for some experiments but it's so confusing. I would call a function with VSCode and get "potential function call error" or something, with no details. Is it valid or not?

Also, I hate the idea of re-building all the code when the program starts. Python's JIT can at least ignore the performance-critical code that's written in C++.


IMHO majority of Python software doesn't use threads because it is easier to write single threaded code (for many reasons), not because of GIL.


In Golang you can spawn a green thread on any function call with a single keyword: `go'.

The ergonomics are such that it's not difficult to use.

Why can't or shouldn't we have a mechanism comparably fantastic and easy to use in Python?


Because making it easy to write C/C++ extensions that work the way you expect (including for things like passing a Python callback to a C/C++ library) has always been a priority for Python in a way that it isn't for Golang?


Any C/C++ extensions that wants to enable more efficient Python has to learn GIL and how to manipulate that as well. Including not limited to: how to give up GIL (so that other Python code can progress); how to prepare your newly initiated threads to be Python GIL friendly etc.

Personally, GIL is more surprising to me when interop with Python.


> Any C/C++ extensions that wants to enable more efficient Python has to learn GIL and how to manipulate that as well. Including not limited to: how to give up GIL (so that other Python code can progress); how to prepare your newly initiated threads to be Python GIL friendly etc.

Sure, but the easy cases are easy (in particular, you can usually do nothing and it will be correct but slow, which is much better than fast but incorrect) and the hard cases are possible.

> Personally, GIL is more surprising to me when interop with Python.

Any GCed language, including Go, will oblige you to integrate with its APIs if you want to handle complex cases correctly.


https://docs.python.org/3/library/concurrent.futures.html sort of gives you that, syntax works with threads or processes.


@Twirrim How would you rate the user experience of the concurrent.futures package compared to the golang `go' keyword?

It is architecturally comparable to async Javascript programming, which imho is a shoehorn solution.


I agree with your point, but the vast majority of C, C++, Java, and Javacript code is also written as single-threaded. It’s fair to acknowledge that the primary use case of most languages is single threaded programming, and also that improving the ergonomics of concurrency in Python would be a huge boon.


A lot of python runs in celery and possibly even more python runs parallelized cuda code. Not sure at all the majority of python code is single threaded, especially for more serious projects.


I think you mean the work by Sam Gross:

https://github.com/colesbury/nogil/

Interesting article about it here:

https://lukasz.langa.pl/5d044f91-49c1-4170-aed1-62b6763e6ad0...


Removing GIL also breaks existing native packages, and would require wholesale migration across the entire ecosystem, on a scale not dissimilar to what we've seen with Python 3.


They believe that the current nogil approach can allow the vast majority of c-extension packages to adapt with a re-compile, or at worst relatively minor code changes (they thought it would be ~15 lines for numpy for example).

Since c-extension wheels are basically built for single python versions anyways, this is potentially manageable.


The various trade-offs and potential pitfalls involved in removing the GIL are by now very well known, not just here on HN but among the Python core developers who will ultimately do the heavy lifting when it comes to doing this work.


> on a scale not dissimilar to what we've seen with Python 3.

If only people had asked for it before the Python 3 migration so it could have been done with all the other breaking and performance harming changes. But no, people only started to ask for it literally yesterday so it just could not ever be done, my bad. /sarcasm

If anything the whole Python 3 migration makes any argument against the removal of the GIL appear dishonest to me. It should have been gone decades ago and still wasn't included in the biggest pile of breaking changes the language went through .


I think probably the reason is that in 2008, consumer chips were still mostly single or dual core. Today we have consumer grade chips that have dozens of cores, so the calculus has changed.


We didn't get any performance improvements, but at least now we can call print from lambda functions, that's was certainly worth the massive breakage.


I agree, please don't just accept the optimization and sweep the GIL removal under the rug again.


> I hope this release was is not we got the optimizations but ignored the GIL part.

I would not be surprised. It is highly likely that the optimizations will be taken, credit will go to the usual people and the GIL part will be extinguished.


One of the biggest features I'm looking forward to is the more specific error messages. I can't tell you how much time I've wasted with cryptic errors that just point to a line that has a list comprehension or similar.


I've been learning Rust recently. There are a number of things about the language that I dislike, but its error messages are an absolute joy. They clearly put an incredible amount of effort into them, and it really shows.


I wrote 2-3 of those and I really wish more languages had a similar approach with their standard library.

When you get an error there's a verbose explanation you can ask for, which describes the problem, gives example code and suggests how you can fix it. The language has a longer ramp-up period because it contains new paradigms, so little touches like this help a lot in onboarding new devs.


I am a hug fan of ravendb's founder ayende. He always stresses how useful good error messages are, especially for self service. And he is certainly on spot with that, e.g "could not execute Mode None" is much less helpful than " environment variable X not set, could not determine Mode. Please set this as MODE=Agressive or MODE=Passive".


The exception is std::io::Error, which doesn't provide any information about which file failed to be opened/closed/read/whatever. I know this is because doing so would require an allocation, but it's still painful.


Your parent is thinking about the compiler errors, whereas std::io::Error (which I'm guessing you mean) is an Error type your software can get from calling I/O functions at runtime.

To be fair, if decorating the error with information about a filename is what you needed, since Rust's Error types are just types nothing stops you making your function's Error type be the tuple (std::io::Error, &str) if you have a string reference, or (though this seems terribly inappropriate in production code) leaking a Box to make a static reference from a string whose lifetime isn't sufficient.


The GP comment is referring to the compile time diagnostics and not the runtime diagnostics, which certainly could do with some out-of-the-box improvements but that you can extend in your own code (as opposed to compile errors which crate authors have little control over, at least for now).


huh a stack trace should tell you that.


This is the first time I ever heard someone complain about Python's error messages in the 10+ years I've been using it. Even people who just learned it, pick up reading tracebacks after about a day of practice. The only problem I ever see is if a library swallows too much, and gives a generic error, but that's not something the language can fix. I really hope they don't change things too much.


The new error messages are much better. Take a look at this example:

  Traceback (most recent call last):
    File "calculation.py", line 54, in <module>
      result = (x / y / z) * (a / b / c)
                ~~~~~~^~~
  ZeroDivisionError: division by zero

In this new version it's now obvious which variable is causing the 'division by zero' error.

https://docs.python.org/3.11/whatsnew/3.11.html#enhanced-err...


For a lot of folks dealing with getting something working, its way more useful to have the most likely spot where the error is occurring spit out for them like this.

Stack trace is useful, especially in understanding how the code is working in the system.

But if the goal is solve the problem with the code you've been working on, existing traces are way too verbose and if anything add noise or distract from getting to productive again.

I could see tracebacks get swifty terminal UI that shows only the pinpointed error point that can be accordion'd out to show the rest.


You've clearly been working with the right people. I constantly get screenshots of error messages (not even the text of the traceback so that I could copy it) with questions about what that means. It takes some training for people to be able to read tracebacks correctly, especially if they have no other programming experience.


I only rarely delve into the python world, but as a .net developer I always found it odd and confusing that the error message comes after the stack trace in python. I don’t see people complaining about it, so maybe it’s just a matter of habit?


I guess it's just a habit even if I think it's mildly confusing too.

But: it's kind of a virtue that the most important error message comes last. Far too many (ahead of time) compilers output too many errors. The most likely error to matter is the first one, but it is often scrolled way off screen; the following errors might even just be side effects that mislead or confuse the new users.


On the CLI, you'll see the error just above the prompt and then the stack trace above that. It makes sense since the user is always reading up the terminal in reverse order.


I don’t think they’re that bad, but I think the bar has (fortunately) become higher.


Exactly right.

The old Python stack trace, circa 2015, was the best in the world.

The new ones are even better.

The gap between Python and Rust, and the JVM languages/C/C++, is increasingly widening.

Stack trace is one of the areas Python and Rust are making the case for why they're the languages of the future.


Not saying that there isn't room for improvement but I wouldn't call the Python error messages "cryptic". C++, on the other hand...


I program in both C++ and Python every day and I'm fluent in both. I would prefer 10000 lines template instantiation backtraces every time to the average python error message.

Clang and GCC error messages have come a loooong way in the last 10 years or so and their quality is impressive.

To be fair, I'm currently stuck at python 3.6 ATM and I hear that python has also improved a lot since.


When is that coming? Yes the error messages are confusing in a big list comprehension.



Whenever I nest a list comprehension I feel dirty anyway, I wonder if a better error message will help solve something that is just hard to grasp sometimes.


"Debugging is twice as hard as writing the code in the first place. Therfore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Rajanand

Although, personally, I enjoy python list comprehensions.


I believe that quote about debugging being twice as hard as coding is by Brian Kernighan.

https://en.m.wikiquote.org/wiki/Brian_Kernighan


Maybe, I grabbed the first matching quote I found. I can't attribute it on my own.


No sweat. I’m a quote nerd so it jumped out at me


Not even with nesting— as soon as it's long enough that I'm breaking it into multiple lines, I'm immediately like, okay this is long enough to just be a normal for-loop, or maybe a generator function. But it doesn't need to be a list comprehension any more.


All list comprehensions should error with message "FATAL PARSE ERROR: Will you really understand this in six months?"


Something like this I find ok (just a filter):

    [x for x in y if x is not z]
Sometimes though, facilitated by Jupyter's notebooks making executing as you build the code very easy, I create a beast like this:

    [os.path.join([x for x in y if x is not z]) + '/bin' for a in b if 'temp' not in a]
Yes this is fictional and probably contains an error but you get the point, you can filter lists of lists like this, but it's really unfriendly to anyone trying to understand it later.


List comprehensions should be used for making quick lists that contain simple calculations or filters.

Anything more will come back to bite you later.


If people complain about Rust's borrow checker already...


Big list comprehensions cause confusion in many ways. KISS is crucial with them.


I agree. The rule of thumb I follow is that if the list comp doesn't fit on one line, whatever I'm doing is probably too complicated for a list comp.


But why? The only reason is that error messages are awful and intermediate values can't be named. There is in principle no reason why list comprehension need to be worse than straight line for loops. The deciding factor to chose betwee the two should be whether I'm writing something for side effects as opposed to for its value.


You just listed two very good reasons: Intermediate values can't be named, and error messages are awful.


In a way it's a nice incentive not to nest too much (I have that tendency too)



Hold mine :D https://github.com/anchpop/genomics_viz/blob/master/genomics...

That's one expression because it used to be part of a giant comprehension, but I moved it into a function for a bit more readability. I'm considering moving it back just for kicks though.

My philosophy is: if you're only barely smart enough to code it, you aren't smart enough to debug it. Therefore, you should code at your limit, to force yourself to get smarter while debugging


Yours is nice and readable. Parents' one is not, but it feels like the indentation is deliberately confusing. I'd lay it out like so:

   tags = list(set([
     nel
     for subli in [
       mel
       for subl in [
         [[jel.split('/')[2:] for jel in el] 
         for el in classified
       ]
       for mel in subl 
     ]
     for nel in subli
     if nel
   ]))
Still not very readable, tho. But that's largely due to Python's outputs-first sequence comprehension syntax being a mess that doesn't scale at all.

Side note: one other thing I always hated about those things in Python is that there's no way to bind an intermediary computation to a variable. In C# LINQ, you can do things like:

   from x in xs
   let y = x.ComputeSomething()
   where y.IsFoo && y.IsBar
   select y
In Python, you have to either invoke ComputeSomething twice, or hack "for" to work like "let" by wrapping the bound value in a container:

   y
   for x in xs
   for y in [x.ComputeSomething()]
   if y.IsFoo and y.IsBar
It's not just about not repeating yourself or not running the same code twice, either - named variables are themselves a form of self-documenting code, and a sequence pipeline using them even where they aren't strictly needed can be much more readable.


> But that's largely due to Python's outputs-first sequence comprehension syntax being a mess that doesn't scale at all.

Choices like wrapping a set constructor around a list comprehension rather than just using a set comprehension don't help, neither does using multiple nested no-transformation comprehensions laid out that way.

By hand, on mobile, I think this reduces to (assuming “from itertools import chain” in the file header):

   tags = list({
     nel
     for subli in chain.from_iterables(chain(
       [[jel.split('/')[2:] for jel in el] 
       for el in classified))
     for nel in subli
     if nel
   })


Since it's all generator expressions internally anyway, an intermediate doesn't actually add appreciable (any?) runtime overhead but does make the code more readable:

    ys = (x.ComputeSomething() for x in xs)
    result = [y for y in ys if y.isFoo and y.isBar]


Does the new walrus := in python solve your problem?


The official name is assignment expression, which can be good to know to find documentation. Here's the PEP: https://peps.python.org/pep-0572/


Curiously, searching the docs for "walrus" (https://docs.python.org/3/search.html?q=walrus) yields exactly the results you need, but searching for "assignment expression" yields the previous results mixed a lot of other results due to the word "expression".


Not really; where would you put it inside the []?


In the example from the end of the earlier post:

   y
   for x in xs
   if (y:=x.ComputeSomething()).IsFoo and y.IsBar


I suppose that works, although readability is not great, to put it mildly.


Much better would be

[(y:=x.ComputeSomething()) for x in xs if y.IsFoo and y.IsBar]


That doesn't work, because that first subexpression is evaluated last, after "if".


For another example of how to use the walrus operator, I use it multiple times in the code I linked a few comments up



That code is fine in my book, there's more static code than for-comp and it's at work 3 level deep with no hard coupling between levels. Also well spaced and worded.


I don't think this is bad code. Maybe not the most Pythonic in the imperative sense, but certainly you could come across such constructs in languages that are more oriented towards functional programming. This could also be solved in a more readable manner by having a generator pipeline, though it would be a good idea to see what the performance of chained generators is like in your version and flavour of Python.


It's “not even” using for nesting in list comprehensions. This kind of thing:

   [x for stop in range(5) for x in range(stop)]


This is really incredible work.

The "What's New" page has an item-by-item breakdown of each performance tweak and its measured effect[1]. In particular, PEP 659 and the call/frame optimizations stand out to me as very clever and a strong foundation for future improvements.

[1]: https://docs.python.org/3.11/whatsnew/3.11.html


Yesterday, I watched Emery Berger’s “Python performance matters” Which is you should not bother to write fast Python, just delegate all heavy lifting to optimized C/c++ and the likes. His group has a Python profiler whose main goal is to point out which parts should be delegated. Of course optimized is better but the numeric microbenchmarks in this post are mostly examples of code that is better delegated to low overhead compiled languages.

https://youtu.be/LLx428PsoXw


> just delegate all heavy lifting to optimized C/c++ and the likes

The more o used Python, and the more Python I saw written, the more I am convinced this is not a good or reasonable argument. A good chunk of the Python devs I’ve interacted with are mystified by the concept of virtual environments, regularly abuse global variables, struggle to read the docs, and structure their code poorly.

Telling these people “oh just write this section in C” will go nowhere. It’s asking them to figure out: installation, compilation, packaging, FFI, tool chains, and foot-guns of language(s) that are arguably more dangerous and whose operation varies from “not super straightforward” to “arcane” based on your dev and deployment environment and application needs.


The only "FFI" I've seen people actually use is writing a command line tool with C++ and calling it with the subprocess library.


Very much yes, if you are fluent in both C++ and python, it is just easier to rewrite in the former than deal with FFI.

Unfortunately sometimes you have to provide a library for consumption to programmers that only speak python so you don't have many options.


Plus something like C++20 and hot code reload (VC++ and Live++), make it quite easy to do so.


> you should not bother to write fast Python, just delegate all heavy lifting to optimized C/c++ and the likes

Certainly that's something you can do, but unfortunately for Python it opens the door for languages like Julia, which are trying to say that you can have your cake and eat it too.


I like Julia but its easy to write slow julia unless you keep the performance tips in mind. Arrays are horribly slow, tuples are much faster, but having tuples be a multiple of 128 bytes adds 10% or more to speed. I honestly don't understand how much slower arrays are. Its like 30x or similar.


> I honestly don't understand how much slower arrays are. Its like 30x or similar.

In what context, under what operation? Depending on context, the difference makes sense and is what one would expect - tuples are immutable, with fixed size known at compile time, and stack-allocated; arrays are mutable, dynamic in size, and usually heap-allocated. That's why StaticArrays.jl [1] exists, for when you need something in between/the best of both worlds.

> I like Julia but its easy to write slow julia unless you keep the performance tips in mind.

I very much agree with this statement though. But following a very few basic ones like "avoid non-constant global variables", "look out for type instabilities", "use @views and @inbounds where it makes sense" gets you most of the way there, for eg. about 3-5x of the time a C program would take. Most of the rest of the tips on the Performance Tips [2] page are to squeeze out the last bits of performance, to go from 5x of C to near-C performance.

[1] https://github.com/JuliaArrays/StaticArrays.jl [2] https://docs.julialang.org/en/v1/manual/performance-tips/


I had been using arrays the way I'd use vectors in c++ or arrays in matlab. For DSP type tasks you typically allocate a tensor on the heap, the size can be constant and even hardcoded, which allows lots of optimization. But when I modified the nbody implementation from the benchmark game from tuples to arrays for the position and velocity vectors it went from 5.3s to 3min+. That was with specifying the type and initializing them as part of a struct all at once.... And I just realized my mistake, everytime I made a new struct it was allocating a new array with all the overhead that entails. Simply ditching the structs for a single array and allowing them to mutate would have been more idiomatic to the way I use matlab and it would have avoided all that reallocation.


I'm not really saying Julia is going to eat Python's lunch, because I also don't think they've cracked the code exactly. But some language will eventually.


That’s bizarre, are they implemented as linked lists or something? Why would arrays be that slow?


They are not guaranteed to be contiguous, but they supposedly are if you specify a primative type and initialize them all at once. Tuples have a constant size, are contiguous, and immutable. So lots of optimization opportunity.


It is guaranteed to be contiguous. It might be an array of pointers if you put heap allocated objects in there (like an array of arrays). I think what you instead meant to say is that it's not guaranteed to be stored inline, which yes it is the case that it's only able to be stored inline if `Base.isbits(eltype(A))`, i.e. if it can deduce the bit length of the element type as constant and the objects are allocated to the stack. But of course that's just a requirement for storing values in an array even in C(++) (nice description here: https://stackoverflow.com/a/54654958/1544203), it's just a basic requirement due to having to know the memory size of what is being stored, and so it's not necessarily a limitation of Julia but of all languages.


How is it an array if it's not guaranteed to be contiguous?


They aren't, which is why the statement doesn't make sense haha. The difference is really just that tuples of isbits types can be stack allocated (and tuples of isbits types are isbits, so they can nest, etc.), and so in some cases with sufficiently small amounts of data, creating stack allocated objects is much faster than creating heap allocated objects.

But if you compare concretely typed heap allocated arrays in Julia and C, there's no real performance difference (Steven Johnson has a nice notebook displaying this https://scpo-compecon.github.io/CoursePack/Html/languages-be..., and if you see the work on LoopVectorization.jl it makes it really clear that the only major difference is that C++ tends to know how to prove non-aliasing (ivdep) in a bit more places (right now)). So the real question is, did you actually want to use a heap allocated object? I think this really catches people new to performance optimization off guard since in other dynamic languages you don't have such control to "know" things will be placed on the stack, so you generally heap allocate everything (Python even heap allocates numbers), which is one of the main reasons for the performance difference against C. Julia gives you the tools to write code that doesn't heap allocate objects, but also makes it easy to heap allocate objects (because if you had to `malloc` and `free` everywhere... well you'd definitely lose the "like Python" and and be much closer to C++ or Rust in terms of "ease of use", which would defeat the purpose for many cases). But if you come from a higher level language, there's this magical bizarre land of "things that don't allocate" (on the heap) and so you learn "oh I got 30x faster from Python, but then someone on a forum showed me how to do 100x better by not making arrays?", which is somewhat obvious from a C++ mindset but less obvious from a higher level language mindset.

And FWIW, this is probably the biggest performance issue newcomers run into, and I think one of the things to which a solution is required to make it mainstream. Thankfully, there's already prototype PRs that are well underway, for example https://github.com/JuliaLang/julia/pull/43573 automatically stack-allocates small arrays which can prove certain escape properties, https://github.com/JuliaLang/julia/pull/43777 is looking to hoist allocations out of loops so even if they are required they are minimized in the loop contexts automatically, etc. The biggest impediment is that EscapeAnalysis.jl is not in Base, and it requires JET.jl which is not in Base, and so both of those need to be made "Base friendly" to join the standard library and then the compiler can start to rely on its analysis (which will be nice because JET.jl can do things like throw more statically-deducible compile time errors, another thing people ask for with Julia). There's a few people who are working really hard on doing this, so it is a current issue but it's a known one with prototyped solutions and a direction to get it into the shipped compiler. When that's all said and done, of course "good programming" will still help the compiler in some cases, but in most people shouldn't have to worry about stack vs heap (it's purposefully not part of the user API in Julia and considered a compiler detail for exactly this reason, so it's non-breaking for the compiler to change where objects live and improve performance over time).


My background is over 28 years of c++ programming 20 of that as my profession. My mistake in retrospect was using small arrays as part of a struct, which being immutable got replaced at each time step with a new struct requiring new arrays to be allocated and initialized. I would not have done that in c++, but julia puts my brain in matlab mode. If I had gone full matlab and just used a matrix or tensor instead of an array of structa of arrays then I would have still taken a hit from the mutability, but not the one I did take. After successfully doing some DSP in julia I became arrogant and took on the nbody benchmark at benchmark game. I did discover that targeting skylake instead of ivybridge made no change. It doubled the speed of the c++ and rust solutions. Removing @inbounds was less than a 10% hit. And the iterative fastmath inverse sqrt was absolutely neutral on the skylake machine I used. Sqrt was just as fast. I did get a 10% bump by making the 3tuples into 4tuples for position and velocity. Alignment I'd assumed, but padding the struct instead of the tuple did nothing, so probably extra work to clear a piece of an simd load. Any insight on why avx availability didn't help would be appreciated. I did verify some avx instructions were in the asm it generated, so it knew, it just didn't use.


> My mistake in retrospect was using small arrays as part of a struct, which being immutable got replaced at each time step with a new struct requiring new arrays to be allocated and initialized. I would not have done that in c++, but julia puts my brain in matlab mode.

I see. Yes, it's an interesting design space where Julia makes both heap and stack allocations easy enough, so sometimes you just reach for the heap like in MATLAB mode. Hopefully Prem and Shuhei's work lands soon enough to stack allocate small non-escaping arrays so that user's done need to think about this.

> Alignment I'd assumed, but padding the struct instead of the tuple did nothing, so probably extra work to clear a piece of an simd load. Any insight on why avx availability didn't help would be appreciated. I did verify some avx instructions were in the asm it generated, so it knew, it just didn't use.

The major differences at this point seem to come down to GCC (g++) vs LLVM and proofs of aliasing. LLVM's auto-vectorizer isn't that great, and it seems to be able to prove 2 arrays are not aliasing less reliably. For the first part, some people have just improved the loop analysis code from the Julia side (https://github.com/JuliaSIMD/LoopVectorization.jl), forcing SIMD onto LLVM can help it make the right choices. But for the second part you do need to do `@simd ivdep for ...` (or use LoopVectorization.jl) to match some C++ examples. This is hopefully one of the things that the JET.jl and other new analysis passes can help with, along with the new effects system (see https://github.com/JuliaLang/julia/pull/43852, this is a pretty huge new compiler feature in v1.8, but right now it's manually specified and will take time before things like https://github.com/JuliaLang/julia/pull/44822 land and start to make it more pervasive). When that's all together, LLVM will have more ammo for proving things more effectively (pun intended).


My experience writing TCL native extensions made me realize, this kind of languages are only good for scripting purposes.

Anything where performance matters should be written in languages where JIT and AOT compilers come as standard options on the reference implementations.


The key wording is "optimized C".

CPython is already executed in pure C, the only difference it being a very slow, unoptimized C.

The code for simple adding of two numbers is insane. It jumps through enough hoops to make you wonder how is it even running everything else.


Somewhat related, had anyone used Nim? These days I've been using it for its Python like syntax year C like speed, especially if I just need to script something quickly.


The problem with any new language is libraries and tooling.


I use it for scripting mainly when I don't need Python specific libraries like pandas. The tooling is pretty good as well.


Seems like just going with bash would be more straightforward?


Why would I want to subject myself to writing in a language like bash? Bash has many disadvantages compared to Python, Nim, Go, nearly anything that isn't brainfuck.


Yeah I mean it's pretty bad, but the upside is that it will run everywhere without having to install anything and there are lots and lots of answered questions out there if you run into issues. Which I doubt is true for Nim. Using niche stuff means you're on your own in terms of support.

And with how dependency happy everything is these days, I avoid trendy projects like the plague.


Nim is compiled, so a dependency on a runtime or a VM is not an issue there, at least.


So I can write, test, compile a NIM script on my x64 machine, copy it over to an ARM device and it'll work? Right...? And even if that worked, what if I need to edit one line because reasons? Can't really do that for compiled stuff, generally.

Now if it got transpiled to bash for example, that might be actually practical. Cuts dev time, still enables multiplatform usage and random edits.


Is it worth it to learn? Especially when you have python?


Yeah it's a nice language, it has optional static types as well which is always a plus in my experience. The speed though is what's really interesting.


These speedups are awesome, but of course one wonders why they haven't been a low-hanging fruit over the past 25 years.

Having read about some of the changes [1], it seems like the python core committers preferred clean over fast implementations and have deviated from this mantra with 3.11.

Now let's get a sane concurrency story (no multiprocessing / queue / pickle hacks) and suddenly it's a completely different language!

[1] Here are the python docs on what precisely gave the speedups: https://docs.python.org/3.11/whatsnew/3.11.html#faster-cpyth...

[edit] A bit of explanation what I meant by low-hanging fruit: One of the changes is "Subscripting container types such as list, tuple and dict directly index the underlying data structures." Surely that seems like a straight-forward idea in retrospect. In fact, many python (/c) libraries try to do zero-copy work with data structures already, such as numpy.


A few people tried to excuse the slow python, but as far as I know the story, excuses are not necessary. Truth is that python was not meant to be fast, its source code was not meant to be fast, and its design was not optimized with the idea of being fast. Python was meant as a scrypting language that was easy to learn and work with on all levels and the issue of its slowness became important when it outgrew its role and became an application language powering large parts of the internet and the bigger part of the very expensive ML industry. I know that speed is a virtue, but it becomes a fundamental virtue when you have to scale and python was not meant to scale. So, yes, it is easy for people to be righteous and furious over the issue, but being righteous in hindsight is much easier than useful.


Meta has an active port to optimize instagram to make it faster and they just open sourced it to have optimizations merged back into CPython

When you have so many large companies with a vested interest in optimization I believe that Python can become faster by doing realistic and targeted optimizations . The other strategies to optimize didn’t work at all or just served internal problems at large companies .


O, I agree. As I said, when python and its uses scaled, it became quite necessary to make it fast. I like that it will be fast as well and I am not happy that it is slow at the moment. My point is that there are reasons why it was not optimized in the beginning and why this process of optimizations has started now.


Back when Python was started there was really C or C++ for optimized programs and scripting languages like Python and Perl. But since Python had the ability for C Extensions it allowed it to bypass those problems. Since Python was easy to learn both web developers and scientists to learn. Then financial organizations started to get interested and that’s really how Python cemented itself.

What exactly do you do with Python that slows you down ?


I worked on Python almost exclusively for maybe five years. Then I tried go. Each time I wrote a go program, I am giddy with excitement at how fast my first attempt is, scaling so smoothly with the number of cores. I also wrote a lot of fairly performant C in the 90s, so I know what computers can do in a second.

I still use Python for cases when dev time is more important than execution time (which is rarer now that I'm working adjacent to "big data") or when I'm doing things like writing python to close gaps in the various arrays of web apps provided for navigating the corporate work flow, and if I went as fast as a 64 core box let me, we'd have some outages in corp github or artifactory or the like, so I just do it one slow thing at a time on 1 core and wait for the results. Maybe multiprocessing with 10 process worker pool once I'm somewhat confident in the back end system I am talking to.

(edit: removed my normal email signoff)


You should try Nim, it's Python like but compiled so it's as far as C. These days if I want to script something (and don't need Python specific libraries like pandas) I use Nim.


He's already using Go. Does Nim provide any "killer features" that Go doesn't?


Well, good generics and sane error handling, for two examples.


Go recently got pretty decent generics. I'm with you on the error handling though. At least it's explicit. Go also has a plethora of really useful libraries, and that along with good editor and build tooling is probably the real draw of the language for me.


Version 1.0 generics, still better than nothing, though.


Okay you are using Python gluing two web services together which is what you deep acceptable that Python can do but can you just comment on the things that you don't use Python anymore due to it being slow?

Don't take this the wrong way but I think you could be more specific. Are you saying that similar to Go it should be just faster in general?


Scheme and Lisp already existed.


> python was not meant to scale

So many successful projects/technologies started out that way. The Web, JavaScript, e-mail. DNS started out as a HOSTS.TXT file that people copied around. Linus Torvalds announced Linux as "just a hobby, won't be big and professional like gnu". Minecraft rendered huge worlds with unoptimized Java and fixed-function OpenGL.


Indeed, and that's why I love it. When I need extra performance, which is very rarely, I don't mind spending extra effort outsourcing it to a binary module. More often, the problem is an inefficient algorithm or data arrangement.

I took a graduate level data structures class and the professor used Python among other things "because it's about 80 times slower than C, so you have to think hard about your algorithms". At scale, that matters.


Global Interpreter Lock (GIL) is also an issue affecting Python execution speed:

https://en.wikipedia.org/wiki/Global_interpreter_lock


Not really, no.

It prevents you from taking advantage of multiple cores. Doesn't really impact straight-line execution speed.

A data structures course is primarily not going to be concerned with multithreading.


>> It prevents you from taking advantage of multiple cores. Doesn't really impact straight-line execution speed.

Most computers running Python today will have multiple cores. If your program can only use a fraction of its available computing power, it affects the program's execution speed. Python is too widely used in compute-intensive applications (e.g. machine learning) for GIL-related issues to be ignored in the long term.

Ideally, Python should have automatic load scaling and thread-safe data structures in the standard library to take advantage of all available CPU cores.

Java has had concurrent data structures in its standard library for years and is adding support for "easy" multi-threaded execution with Project Loom.

Python needs to add its own Pythonic versions of thread-safe data structures and compute load scaling to take advantage of the multi-core CPUs that it runs on.

>> A data structures course is primarily not going to be concerned with multithreading.

A graduate-level data structures course might include concurrent and parallel data structures.


> Python was meant as a scrypting language that was easy to learn and work with on all levels.

Being fast isn't contradictory with this goal. If anything, this is a lesson that so many developers forget. Things should be fast by default.


When you only have so many hours to go around, you concentrate on the main goals.


My point is that you can write fast code just as easily as you can write slow code. So engineers should write fast code when possible. Obviously you can spend a lot of time making things faster, but that doesn't mean you can't be fast by default.


> you can write fast code just as easily as you can write slow code

I think some people can do this and some can't. For some, writing slow code is much easier, and their contributions are still valuable. Once the bottlenecks are problems, someone with more performance-oriented skills can help speed up the critical path, and slow code outside of the critical path is just fine to leave as-is.

If you somehow limited contributions only to those who write fast code, I think you'd be leaving way too much on the table.


You usually need more tricks for fast code. Bubble sort is easy to program (it' my default when I have to sort manually, and the data is has only like 10 items)

There are a few much better options like mergesort or quicksort, but they have their tricks.

But to sort real data really fast, you should use something like timsort, that detects if the data is just the union of two (or a few) sorted parts, so it's faster in many cases where the usual sorting methods don't detect the sorted initial parts. https://en.wikipedia.org/wiki/Timsort

Are you sorting integers? Strings? Ascii-only strings? Perhaps the code should detect some of them and run an specialized version.


Being fast requires effort. It's not always about raw performance of the language yo use, it's about using the right structures, algorithms, tradeoffs, solving the right problems, etc. It's not trivial and I've seen so many bad implementations in "fast" compiled languages.


This isn't true in general, and it is especially not true in the context of language interpreters / VMs.


Not true. Premature optimization is the root of all evil. You first write clean code, and then you profile and optimize. I refer you to the underlyings of dicts through the years (https://www.youtube.com/watch?v=npw4s1QTmPg) as an example of that optimization taking years of incremental changes. Once you see the current version it's easy to claim that you would have get to the current and best version in the first place, as obvious as it looks in hindsight.


CPython and the Python design in general clearly show that writing clean code and optimizing later is significantly harder and take much more effort than keeping optimizations in mind from the start. It doesn't mean you need to write optimal code form day one, just that you need to be careful not to program yourself into a corner.


> Being fast isn't contradictory with this goal. If anything, this is a lesson that so many developers forget. Things should be fast by default.

It absolutely is contradictory. If you look at the development of programming languages interpreters/VMs, after a certain point, improvements in speed become a matter of more complex algorithms and data structures.

Check out garbage collectors - it's true that Golang keeps a simple one, but other languages progressively increase its sophistication - think about Java or Ruby.

Or JITs, for example, which are the latest and greatest in terms of programming languages optimization; they are complicated beasts.


Yes, you can spend a large amount of time making things faster. But note that Go's GC is fast, even though it is simple. It's not the fastest, but it is acceptably fast.


Funny you should pick that example in a sub thread that you started with an assertion that code should be fast from by default.

Go’s GC was intentionally slow at first. They wanted to get it right THEN make it fast.

No offense but you’re not making a strong case. You’re sounding like an inexperienced coder that hasn’t yet learned that premature optimization is bad.


Far from it, Go was designed to be optimizable form the start. The GC was obviously not optimal, but the language semantics were such that the GC can be replaced with a better one with relatively minimal disruption.

Of course one can't release optimal code from version one, that would be absurd.

Also your last sentence is extremely condescending.


Just FYI, Go's GC mantra is "GC latency is an existential threat to Go."


> Things should be fast by default.

In over 90% of my work in the SW industry, being fast(er) was of no benefit to anyone.

So no, it should not be fast by default.


> no, it should not be fast by default.

Maybe better to elaborate on what it should be, if not fast? Surely you aren’t advocating things should be intentionally slow by default, or carelessly inefficient?

There’s a valid tradeoff between perf and developer time, and it’s fair to want to prioritize developer time. There’s a valid reason to not care about fast if the process is fast enough that a human doesn’t notice.

That said, depends on what your work is, but defaulting to writing faster, more efficient code might benefit a lot of people indirectly. Lower power is valuable for server code and for electricity bills and at some level for air quality in places where power isn’t renewable. Faster benefits parallel processing, it leaves more room for other processes than yours. Faster means companies and users can buy cheaper hardware.


> Maybe better to elaborate on what it should be, if not fast?

It should satisfy the needs of the customer and it should be reasonably secure.

Everything else is a luxury.

My point was that in most of my time in the industry being faster would not have benefited my customers in any manner worth measuring.

I'm not anti-performance. I fantasize about how to make my software faster to maintain my geek credentials. But neither my bosses nor my customers pay for it.

If a customer says they want it faster we'll oblige.


Clean code, easy to maintain is not luxury when talking about a programming language implementation.


Being first to market is usually more important than being faster then those who got there first.


Ha! Good luck convincing end-users that that's true.


Client doesn't care. As long as their advertising spend leads to conversion into your "slow" app, they're happy.


I don't need to. Less than 10% of end users have put in requests for performance improvements.


Fast typically comes with trade offs.

Languages that tried to be all things to all people really havent done so well.


Fast in which regard? Fast coding? Fast results after hitting "run"? ;)


Or at the very least they should be designed in such a way that it optimizing them later is still possible.


"Premature optimization is the root of all evil."

-- Donald Knuth


You should probably read the full context around that quote, I'm sick and tired of everyone repeating it mindlessly:

https://softwareengineering.stackexchange.com/a/80092

> Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.


> You should probably read the full context around that quote, I'm sick and tired of everyone repeating it mindlessly:

I'm confused. What do you think the context changes? At least as I read it, both the short form and full context convey the same idea.


That quote has been thrown around every time in order to justify writing inefficient code and never optimizing it. Python is 10-1000x slower than C, but sure, let's keep using it because premature optimization is the root of all evil, as Knuth said. People really love to ignore the "premature" word in that quote.

Instead, what he meant is that you should profile what part of the code is slow and focus on it first. Knuth didn't say you should be fine with 10-1000x slower code overall.


The thing is, when I see people using this quote, I don't see them generally using it to mean you should never optimize. I think people don't ignore the premature bit in general. Now, throwing this quote out there generally doesn't contribute to the conversation. But then, I think, neither does telling people to read the context when the context doesn't change the meaning of the quote.


Right but if they post just that part, they're probably heavily implying that now is not the time to optimize. I've seen way more people using it to argue that you shouldn't be focussing on performance at this time, than saying "sometimes you gotta focus on that 3% mentioned in the part of the quote that I deliberately omitted"


No, they don't deliberately omit the part of the quote. They are either unaware of that part of the quote or don't think it matters to the point they are making.

Yes, if you quote Knuth here (whether the short quote or a longer version) you are probably responding to someone whom you believe is engaged in premature optimization.

It remains that the person quoting Knuth isn't claiming that there isn't such a thing as justified optimization. As such, pointing to the context doesn't really add to the conversation. (Nor does a thoughtless quote of Knuth either)


I dunno, guess it's hard to say given we're talking about our own subjective experiences. I completely believe there are people who just know the "premature optimization is the root of all evil" part and love to use it because quoting Knuth makes them sound smart. And I'm sure there are also people know it all and who quote that part in isolation (and in good faith) because they want to emphasise that they believe you're jumping the gun on optimization.

But either way I think the original statement is so uncontroversial and common-sense I actually think it doesn't help any argument unless you're talking to an absolutely clueless dunce or unless you're dealing with someone who somehow believes every optimization is premature.


You certainly can accept that slowdown if the total program run-time remains within acceptable limits and the use of a rapid prototyping language reduces development time. There are times when doing computationally heavy, long-running processes where speed is important, but if the 1000x speedup is not noticeable to the user than is it really a good use of development time to convert that to a more optimized language?

As was said, profile, find user-impacting bottlenecks, and then optimize.


I would note that the choice of programming language is a bit different. Projects are pretty much locked into that choice. You've got to decide upfront whether the trade off in a rapid prototyping language is good or not, not wait until you've written the project and then profile it.


> Projects are pretty much locked into that choice.

But, they aren't.

I mean, profile, identify critical components, and rewrite in C (or some other low-level language) for performance is a normal thing for most scripting languages.

> You've got to decide upfront whether the trade off in a rapid prototyping language is good or not, not wait until you've written the project and then profile it.

No, you absolutely don't.


Yes, it true, if you use Python you can rewrite portions in C to get improved performance. But my point was rather that you couldn't later decide you should have written the entire project in another language like Rust or C++ or Java or Go. You've got the make decision about your primary language up-front.

Or to look at it another way: Python with C extensions is effectively another language. You have to consider it as an option along with Pure Python, Rust, Go, C++, Java, FORTRAN, or what have you. Each language has different trade-offs in development time vs performance.


Certainly, but Python is flexible enough that it readily works with other binaries. If a specific function is slowing down the whole project, an alternate implementation of that function in another language can smooth over that performance hurdle. The nice thing about Python is that it is quite happy interacting with C or go or Fortran libraries to do some of the heavy lifting.


The quote at least contextualises "premature". As it is, premature optimisation is by definition inappropriate -- that's what "premature" means. The context:

a) gives a rule-of-thumb estimate of how much optimisation to do (maybe 3% of all opportunities);

b) explains that non-premature opimisation is not just not the root of all evil but actually a good thing to do; and

c) gives some information about how to do non-premature optimisation, by carefully identifying performance bottlenecks after the unoptimised code has been written.

I agree with GP that unless we know what Knuth meant by "premature" it is tempting to use this quote to justify too little optimisation.


I agree with you, the context changes nothing (and I upvoted you for this reason). However programming languages and infrastructure pieces like this are a bit special, in that optimizations here are almost never premature.

  * Some of the many applications relying on these pieces, could almost certainly use the speedup and for those it wouldn't be premature
  * The return of investment is massive due to the scale
  * There are tremendous productivity gains by increasing the performance baseline because that reduces the time people have to spend optimizing applications
This is very different from applications where you can probably define performance objectives and define much more clearly what is and isn't premature.


I don't know about that. Even with your programming language/infrastructure you still want to identify the slow bits and optimize those. At the end of the day, you only have a certain amount of bandwith for optimization, and you want to use that where you'll get the biggest bang for you buck.


This certainly does not mean, "tolerate absurd levels of technical debt, and only ever think about performance in retrospect."


Python was meeting needs well enough to be one of, if not the single, most popular language for a considerable time and continuing to expand and become dominant in new application domains while languages that focussed more heavily on performance rose and fell.

And it's got commercial interests willing to throw money at performance now because of that.

Seems like the Python community, whether as top-down strategy or emergent aggregate of grassroots decisions made the right choices here.


Python had strengths that drove it's adoption, namely that it introduced new ideas about a language's accessibility and readability. I'm not sure it was ever really meeting the needs of application developers. People have been upset about Python performance and how painful it is to write concurrent code for a long time. The innovations in accessibility and readability have been recognized as valuable - and adopted by other languages (Go comes to mind). More recently, it seems like Python is playing catch-up, bringing in innovations from other languages that have become the norm, such as asyncio, typing, even match statements.

Languages don't succeed on their technical merit. They succeed by being good enough to gain traction, after which it is more about market forces. People choose Python for it's great ecosystem and the availability of developers, and they accept the price they pay in performance. But that doesn't imply that performance wasn't an issue in the past, or that Python couldn't have been even more successful if it had been more performant.

And to be clear, I use Python every day, and I deeply appreciate the work that's been put into 3.10 and 3.11, as well as the decades prior. I'm not interested in prosecuting the decisions about priorities that were made in the past. But I do think there are lessons to be learned there.


> tolerate absurd levels of technical debt

In my experience it's far more common for "optimizations" to be technical debt than the absence of them.

> only ever think about performance in retrospect

From the extra context it pretty much does mean that. "but only after that code has been identified" - 99.999% of programmers who think they can identify performance bottlenecks other than in retrospect are wrong, IME.


Well it's entirely possible that Knuth and I disagree here, but if you architect an application without thinking about performance, you're likely going to make regrettable decisions that you won't be able to reverse.

It is not possible to predict bottlenecks in computation, no. But the implications of putting global state behind a mutex in a concurrent application should be clear to the programmer, and they should think seriously before making a choice like that. If you think of a different way to do that while the code is still being written, you'll avoid trapping yourself in an irreversible decision.


python isn't premature. python is more than 30 years old now, python 3 was released more than 10 years go.


It's been at least 5 years since I read an angry post about the 2 to 3 version change, so I guess it's finally been accepted by the community.


i would say this quote does not apply here. VM implementations are in the infamous 3% Donald Knuth is warning us about.


Because Guido van Rossum just isn't very good with performance, and when others tried to contribute improvements, he started heckling their talk because he thought they were “condescending”: https://lwn.net/Articles/754163/ And by this time, we've come to the point where the Python extension API is as good as set in stone.

Note that all of the given benchmarks are microbenchmarks; the gains in 3.11 are _much_ less pronounced on larger systems like web frameworks.


Yeah breaking compatibility kills a language. He did the right thing.


yet breaking compatibility in the 2->3 transition so that we can use print in lambdas was perfectly fine.


People learn from mistakes


> one wonders why they haven't been a low-hanging fruit over the past 25 years

Because the core team just hasn't prioritized performance, and have actively resisted performance work, at least until now. The big reason has been about maintainership cost of such work, but often times plenty of VM engineers show up to assist the core team and they have always been pushed away.

> Now let's get a sane concurrency story

You really can't easily add a threading model like that and make everything go faster. The hype of "GIL-removal" branches is that you can take your existing threading.Thread Python code, and run it on a GIL-less Python, and you'll instantly get a 5x speedup. In practice, that's not going to happen, you're going to have to modify your code substantially to support that level of work.

The difficulty with Python's concurrency is that the language doesn't have a cohesive threading model, and many programs are simply held alive and working by the GIL.


Yup. Sad but true. I just wanted c++ style multithreading with unsafe shared memory but it's too late. I was impressed by the recent fork that addresses this in a very sophisticated way, but realistically gil cpython will keep finding speed ups to stave off the transition.


wondering how that interacts with c/c++ extensions that presumes the existance of the GIL


A good place to start learning about this https://www.backblaze.com/blog/the-python-gil-past-present-a...


> one wonders why they haven't been a low-hanging fruit over the past 25 years.

From the very page you've linked to:

"Faster CPython explores optimizations for CPython. The main team is funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also funded by Bloomberg LP to work on the project part-time."


Hmm, looks like corporate money gets sh*t done.


Yep. That's the thing about the open source dream; especially for work like this that requires enough time commitment to understand the whole system, and a lot of uninteresting grinding details such that very few people would do it for fun, you really need people being funded to work on it full-time (and a 1-year academic grant probably doesn't cut it), and businesses are really the only source for that.


> Now let's get a sane concurrency story

This is in very active development[1]! And seems like the Core Team is not totally against the idea[2].

[1] https://github.com/colesbury/nogil

[2] https://pyfound.blogspot.com/2022/05/the-2022-python-languag...


So we have:

* Statically allocated ("frozen") core modules for fast imports

* Avoid memory allocation for frames / faster frame creation

* Inlined python functions are called in pure python without needing to jump through C

* Optimizations that take advantage of speculative typing (Reminds me of Javascript JIT compilers -- though according to the FAQ Python isn't JIT yet)

* Smaller memory usage for frames, objects, and exceptions

Dang that certainly does sound like low hanging fruit. There's probably a lot more opportunities left if they want Python to go even faster.


Things may seem like low hanging fruit in the abstract bullet point, but the work needed may be immense.


> Inlined python functions are called in pure python without needing to jump through C

Given that Python is interpreted, it's quite unclear what this could mean.

Also, what does it mean to "call" an inlined function?? Isn't the point of inline functions that they don't get called at all?


It's a little confusing but I don't think they meant inlining in the traditional sense. its more like they inlined the C function wrapper around python functions.

> During a Python function call, Python will call an evaluating C function to interpret that function’s code. This effectively limits pure Python recursion to what’s safe for the C stack.

> In 3.11, when CPython detects Python code calling another Python function, it sets up a new frame, and “jumps” to the new code inside the new frame. This avoids calling the C interpreting function altogether.

> Most Python function calls now consume no C stack space. This speeds up most of such calls. In simple recursive functions like fibonacci or factorial, a 1.7x speedup was observed. This also means recursive functions can recurse significantly deeper (if the user increases the recursion limit). We measured a 1-3% improvement in pyperformance.


It seems to be just previously, it does:

  switch (op) {
    case "call_function":
      interpret(op.function)
    ...
  }
Now it does:

  switch (op) {
    case "call_function":
      ... setup frame objects etc ...
      pc = op.function
      continue
    ....
  }
Not sure if they just "inlined" it or use tail-call elimination trick.


> Given that Python is interpreted, it's quite unclear what this could mean.

Python is compiled. CPython runs bytecode.

(If Python is interpreted, then so is Java without the JIT).


Fair enough, but that doesn't help understanding the original sentence.


I don't think that calling non-JITed Java 'interpreted' is in any way controversial.


> Of course one wonders why they haven't been a low-hanging fruit over the past 25 years.

Because it's developed and maintained by volunteers, and there aren't enough folks who want to spend their volunteer time messing around with assembly language. Nor are there enough volunteers that it's practical to require very advanced knowledge of programming language design theory and compiler design theory as a prerequisite for contributing. People will do that stuff if they're being paid 500k a year, like the folks who work on v8 for Google, but there aren't enough people interested in doing it for free to guarantee that CPpython will be maintained in the future if it goes too far down that path.

Don't get me wrong, the fact that Python is a community-led language that's stewarded by a non-profit foundation is imho its single greatest asset. But that also comes with some tradeoffs.


> Because it's developed and maintained by volunteers

Who is working on python voluntarily? I would assume that, like the Linux kernel, the main contributors are highly paid. Certainly, having worked at Dropbox, I can attest to at least some of them being highly paid.


Raymond Hettinger gets his Python Contributor salary doubled ever couple of years.



first, there are lots of volunteers that want to mess with assembly language.

second, CPython is just a C interpeter, there isn't much assembly if any.

third, contributing to CPython is sufficiently high-profile you could easily land a 500k-job just by putting it on your CV.

No, those are not the real reasons why it hasn't happened before.


Go on then - what are the real reasons, in your mind?


Unfriendly core team?


Why should major technology projects be "friendly" to random people?

I think the word you're looking for is "politics".


Being friendly is inclusive and generally a winning strategy compared to others.

The underlying source of the Politics of Python and associated perceptions stems from the core team's culture of not being "friendly".


People act according to their interests.

They have no obligation to go out of their way to cater to yours.

The entitlement of the new generation of open-source contributors to require political correctness or friendliness is so destructive it's ridiculous. I wouldn't want to be involved with any project that prioritizes such zealotry on top of practicality.


It's not about entitlement so much as about ensuring the project can reach it's full potential and continue to stay relevant and useful in perpetuity as people come and go.

The world constantly changes, projects adapt or become less useful.


Answer is very simple. Amount of people who got paid to make python fast was rounded to 0.


Not really. There were a couple engineers working at Google on a project called unladen swallow which was extremely promising but it eventually got canceled.

The developer who worked at Microsoft to make iron python I think that was his full-time project as well and it was definitely faster than cpython at the time


"The developer who worked at Microsoft to make iron python" - Jim Hugunin [1] (say his name!) to whom much is owed by the python community and humanity, generally.

[1] https://en.wikipedia.org/wiki/Jim_Hugunin


I'm sorry I had forgotten at the time. I almost got to work with him at Google. I agree he's a net positive for humanity (I used numeric heavily back in the day)


Those people were not being paid to speed up CPython, though, but mostly-source-compatible (but not at all native extension compatible) alternative interpreters.


That's because the core team wasn't friendly to them making changes in CPython.

Not because they deliberately only wanted to build an alternative interpreter.


That's absolutely not the case with IronPython, where being tied to the .NET ecosystem was very much the point.

But if it was just core-team resistance and not a more fundamentally different objective than just improving the core interpreter performance, maintaining support for native extensions while speeding up the implementation would have been a goal even if it had to be in a fork.

They were solving a fundamentally different (and less valuable to the community) problem than the new faster CPython project.


>That's absolutely not the case with IronPython, where being tied to the .NET ecosystem was very much the point.

That might very well be, but I talked about UnladdedSwallow and such "native" attempts.

Not attempts to port to a different ecosystem like Jython or IronPython or the js port.

>maintaining support for native extensions while speeding up the implementation would have been a goal even if it had to be in a fork.

That's not a necessary goal. One could very well want to improve CPython, the core, even if it meant breaking native extensions. Especially if doing so meant even more capabilities for optimization.

I, for one, would be fine with that, and am pretty sure all important native extensions would have adapted quite soon - like they adapted to Python 3 or the ARM M1 Macs.

In fact, one part of the current proposed improvements includes some (albeit trivial) fixes to native extensions.


If I remember correctly iron python ran on mono it just didn't have all of the .net bits. I remember at the python conference when it was introduced he actually showed how you could bring up clippy or a wizard programmatically from python talking directly to the operating system through its native apis.

Really the value came from the thread safe container/collections data types which are the foundation of high performance programming


Too bad .net is designed for windows and on linux the APIs feel like windows API that kinda sorta work on linux.


I've used a few .net applications on Linux and they work pretty well. But I've never actually tried to develop using mono. I really wish Microsoft had got all in on linux .net support 1 years ago


Core team wasn't friendly to breaking C API.

Current changes don't really break it.


Neither of those projects were ever going to be accepted upstream though.


My hope was that iron pythonreplaced cpython as the standard but for a number of reasons that was never to be.


Was the project run by a team based in Africa or Europe?


I don't know! [Whoosh]


That’s something that has always puzzled me as well.

Given the popularity, you’d think Python would have had several generations of JITs by now and yet it still runs interpreted, AFAIK.

JavaScript has proven that any language can be made fast given enough money and brains, no matter how dynamic.

Maybe Python's C escape hatch is so good that it’s not worth the trouble. It’s still puzzling to me though.


> JavaScript has proven that any language can be made fast given enough money and brains,

Yeah but commercial Smalltalk proved that a long time before JS did. (Heck, back when it was maintained, the fastest Ruby implementation was built on top of a commercial Smalltalk system, which makes sense given they have a reasonably similar model.)

The hard part is that “enough money“ is...not a given, especially for noncommercial projects. JS got it because Google decided JavaScript speed was integral to it's business of getting the web to replace local apps. Microsoft recently developed sufficient interest in Python to throw some money at it.


Yes, I’m aware of the Smalltalk miracle.

Google wasn’t alone in optimizing JS, it actually came late, Safari and Firefox were already competing and improving their runtime speeds, though V8 did doubled down on the bet of a fast JS machine.

The question is why there isn’t enough money, given that there obviously is a lot of interest from big players.


> The question is why there isn’t enough money, given that there obviously is a lot of interest from big players.

I'd argue that there wasn't actually much interest until recently, and that's because it is only recently that interest in the CPython ecosystem has intersected with interest in speed that has money behind it, because of the sudden broad relevance to commercial business of the Python scientific stack for data science.

Both Unladen Swallow and IronPython were driven by interest in Python as a scripting language in contexts detached from, or at least not necessarily attached to, the existing CPython ecosystem.


Probably. I’m not a heavy Python user.

A better question would be to ask where Python is slow today and if it matters to big business, then.

It rules in AI, for instance. Is it still mainly glue code for GPU execution and performance isn’t at all critical there?


> Maybe Python's C escape hatch is so good that it’s not worth the trouble.

Even if it wasn't good, the presence of it reduces the necessity of optimizing the runtime. You couldn't call C code in the browser at all until WASM; the only way to make JS faster was to improve the runtime.

> JavaScript has proven that any language can be made fast given enough money and brains, no matter how dynamic.

JavaScript also lacks parallelism. Python has to contend with how dynamic the language is, as well as that dynamism happening in another thread.

There are some Python variants that have a JIT, though they aren't 100% compatible. Pypy has a JIT, and iirc, IronPython and JPython use the JIT from their runtimes (.Net and Java, respectively).


JavaScript doesn't really have C extensions like Python has.

PyPy did implement a JIT for Python, and it worked really well, until you tried to use C extensions.


That’s what I meant, if JavaScript hadn’t been trapped in the browser and allowed to call C, maybe there wouldn’t have been so much investment in making it fast.

But still, it’s kind of surprising that PHP has a JIT and Python doesn’t (official implementation I mean, not PyPy).


Python has a lot more third-party packages that are written in native code. PHP has few partly because it just isn't used much outside of web dev, and partly because the native bits that might be needed for web devs, like database libraries, are included in the core distro.


Because it has to, there is no other way to make it fast, as typical two language dilemma.


I suspect that the amount of people and especially companies willing to spend time and money optimizing Python are fairly low.

Think about it: if you have some Python application that's having performance issues you can either dig into a foreign codebase to see if you can find something to optimize (with no guarantee of result) and if you do get something done you'll have to get the patch upstream. And all that "only" for a 25% speedup.

Or you could rewrite your application in part or in full in Go, Rust, C++ or some other faster language to get a (probably) vastly bigger speedup without having to deal with third parties.


> you can either dig into a foreign codebase ... > Or you could rewrite your application

Programmers love to save an hour in the library by spending a week in the lab


Everyone likes to shirk away from their jobs; engineers and programmers have ways of making their fun (I'm teaching myself how to write parsers by giving each project a DSL) look like work. Lingerie designers or eyebrow barbers have nothing of the sort, they just blow off work on TikTok or something.


TikTok’s got some fun coding content, if you can get the algorithm to surface it to you.


> Or you could rewrite your application in part or in full in Go, Rust, C++

Or you can just throw more hardware at it, or use existing native libraries like NumPy. I don't think there are a ton of real-world use cases where Python's mediocre performance is a genuine deal-breaker.

If Python is even on the table, it's probably good enough.


You are right, that's usually not how it works.

Instead, there are big companies, who are running let's say the majority of their workloads in python. It's working well, it doesn't need to be very performant, but together all of the workloads are representing a considerable portion of your compute spend.

At a certain scale it makes sense to employ experts who can for example optimize Python itself, or the Linux kernel, or your DBMS. Not because you need the performance improvement for any specific workload, but to shave off 2% of your total compute spend.

This isn't applicable to small or medium companies usually, but it can work out for bigger ones.


There was some guarantee of result. It has been a long process but there was mostly one person who had identified a number of ways to make it faster but wanted financing to actually do the job. Seems Microsoft is doing the financing, but this has been going on for quite a while.


Python has actually had concurrency since about 2019: https://docs.python.org/3/library/asyncio.html. Having used it a few times, it seems fairly sane, but tbf my experience with concurrency in other languages is fairly limited.

edit: ray https://github.com/ray-project/ray is also pretty easy to use and powerful for actual parallelism


I find asyncio to be horrendous, both because of the silliness of its demands on how you build your code and also because of its arbitrarily limited scope. Thread/ProcessPoolExecutor is personally much nicer to use and universally applicable...unless you need to accommodate Ctrl-C and then it's ugly again. But fixing _that_ stupid problem would have been a better expenditure of effort than asyncio.


>I find asyncio to be horrendous, both because of the silliness of its demands on how you build your code and also because of its arbitrarily limited scope.

Do you compare it to threads and pools, or judge it on its merits as an async framework (with you having experience of those that you think are done better elsewhere, e.g. in Javascript, C#, etc)?

Because both things you mention "demands on how you build your code" and "limited scope" are part of the course with async in most languages that aren't async-first.


> Because both things you mention "demands on how you build your code" and "limited scope" are part of the course with async in most languages

I don't see how "asyncio is annoying and can only be used for a fraction of scenarios everywhere else too, not just here" is anything other than reinforcement of what I said. OS threads and processes already exist, can already be applied universally for everything, and the pool executors can work with existing serial code without needing the underlying code to contort itself in very fundamental ways.

Python's version of asyncio being no worse than someone else's version of asyncio does not sound like a strong case for using Python's asyncio vs fixing the better-in-basically-every-way concurrent futures interface that already existed.


>I don't see how "asyncio is annoying and can only be used for a fraction of scenarios everywhere else too, not just here" is anything other than reinforcement of what I said.

Well, I didn't try to refute what you wrote (for one, it's clearly a personal, subjective opinion).

I asked what I've asked merely to clarify whether your issue is with Python's asyncio (e.g. Python got it wrong) or with the tradeoffs inherent in async io APIs in general (regardless of Python).

And it seems that it's the latter. I, for one, am fine with async APIs in JS, which have the same "problems" as the one you've mentioned for Python's, so don't share the sentiment.


> I've asked merely to clarify whether your issue is with Python's asyncio (e.g. Python got it wrong) or with the tradeoffs inherent in async io APIs in general (regardless of Python)

Both, but the latter part is contextual.

> I, for one, am fine with async APIs in JS

Correct me if you think I'm wrong, but JS in its native environment (the browser) never had access to the OS thread and process scheduler, so the concept of what could be done was limited from the start. If all you're allowed to have is a hammer, it's possible to make a fine hammer.

But

1. Python has never had that constraint

2. Python's asyncio in particular is a shitty hammer that only works on special asyncio-branded nails

and 3. Python already had a better futures interface for what asyncio provides and more before asyncio was added.

The combination of all three of those is just kinda galling in a way that it isn't for JS because the contextual landscape is different.


>1. Python has never had that constraint

Which is neither here, nor there. Python had another big constraint, the GIL. So threads there couldn't go so far as async would. But even environments with threads (C#, Rust) also got big into async in the same style.

>2. Python's asyncio in particular is a shitty hammer that only works on special asyncio-branded nails

Well, that's also the case with C#, JS, and others with similar async style (aka "colored functions"). And that's not exactly a problem, as much as a design constraint.


What has GIL to do with the thread model vs asyncio? asyncio is also single threaded, so cooperative (and even preemptive) green threads would have been a fully backward compatible option.

JS never had an option as, as far as I understand, callback based async was already the norm, so async functions were an improvement over what came before. C# wants to be an high performance language, so using async to avoid allocating a full call stack per task is understandable. In python the bottleneck would be elsewhere, so scaling would be in no way limited by the amount of stack space you can allocate, so adding async is really hard to justify.


>What has GIL to do with the thread model vs asyncio?

Obviously the fact that the GIL prevents effient use of threads, so asyncio becomes the way to get more load from a single CPU by taking advantage of the otherwise blocking time.


How would the GIL prevent the use of "green" threads? Don't confuse the programming model with the implementation. For example, as far as I understand, gevent threads are not affected by the GIL when running on the same OS thread.


Try C# as a basis for comparison, then. It also has access to native threads and processes, but it adopted async - indeed, it's where both Python and JS got their async/await syntax from.


Asyncio violates every aspect of compositional orthogonality just like decorators you can't combine it with anything else without completely rewriting your code around its constrictions. It's also caused a huge amount of pip installation problems around the AWS CLI and boto


Having both Task and Future was a pretty strange move; and the lack of static typing certainly doesn't help: the moment you get a Task wrapping another Task wrapping the actual result, you really want some static analysis tool to tell you that you forgot one "await".


Concurrency in Python is a weird topic, since multiprocessing is the only "real" concurrency. Threading is "implicit" context switching all in the same process/thread, asyncio is "explicit" context switching. On top of that, you also have the complication of the GIL. If threads don't release the GIL, then you can't effectively switch contexts.


> Concurrency in Python is a weird topic, since multiprocessing is the only "real" concurrency.

You are confusing concurrency and parallelism.

> Threading is "implicit" context switching all in the same process/thread

No, threading is separate native threads but with a lock that prevents execution of Python code in separate threads simultaneously (native code in separate threads, with at most on running Python, can still work.)


Threading IS concurrency. When you say "real" concurrency, you actually mean parallelism.


Not in CPython it isn't. Threading in CPython doesn't allow 2 threads to run concurrently (because of GIL). As GP correctly stated, you need multiprocessing (in CPython) for concurrency.


They're emphasizing a precise distinction between "concurrent" (the way it's structured) and "parallel" (the way it runs).

Concurrent programs have multiple right answers for "Which line of computation can make progress?" Sequential execution picks one step from one of them and runs it, then another, and so on, until everything is done. Whichever step is chosen from whichever computation, it's one step per moment in time; concurrency is only the ability to choose. Parallel execution of concurrent code picks steps from two or more computations and runs them at once.

Because of the GIL, Python on CPython has concurrency but limited parallelism.


> Threading in CPython doesn't allow 2 threads to run concurrently (because of GIL)

It does allow threads to execute concurrently. It doesn't allow them to execute in parallel if they all are running Python code (if at least one is rubbing native code and has released the GIL, then those plus one that has not can run in parallel.)


I have used asyncio in anger quite a bit, and have to say that it seems elegant at first and works very well for some use cases.

But when you try to do things that aren't a map-reduce or Pool.map() pattern, it suddenly becomes pretty warty. E.g. scheduling work out to a processpool executor is ugly under the hood and IMO ugly syntactically as well.


> scheduling work out to a processpool executor is ugly under the hood and IMO ugly syntactically as well.

Are you talking about this example? https://docs.python.org/3/library/asyncio-eventloop.html#asy...


I love asyncio! It's a very well put together library. It provides great interfaces to manage event loops, io, and some basic networking. It gives you a lot of freedom to design asynchronous systems as you see fit.

However, batteries are not included. For example, it provides no HTTP client/server. It doesn't interop with any synchronous IO tools in the standard library either, making asyncio a very insular environment.

For the majority of problems, Go or Node.js may be better options. They have much more mature environments for managing asynchrony.



This is exactly why Go is a better option for async use cases.


Until you need to do async FFI. Callbacks and the async/await syntactic sugar on top of them compose nicely across language boundaries. But green threads are VM-specific.


It does indeed, but personally, I believe with async/await the main pain point of this post (callback hell) is essentially gone.


I’m a fan of asyncio, the parent probably meant to say parallelism though, since that’s what getting rid of GIL unlocks.


asyncio is still single-threaded due to the GIL.


While not ideal, this can be mitigated with multiprocessing. Python asyncio exposes interfaces for interacting with multiple processes [1].

[1] https://docs.python.org/3/library/asyncio-eventloop.html#asy...


True, but it's not trying to be multi-threaded, just concurrent.


I don't wonder. For me it seems pretty clear.

I believe the reason is that python does not need any low-hanging fruits to have people use it, which is why they're a priority for so many other projects out there. Low-hanging fruits attract people who can't reach higher than that.

When talking about low-hanging fruits, it's important to consider who they're for. The intended target audience. It's important to ask ones self who grabs for low-hanging fruits and why they need to be prioritized.

And with that in mind, I think the answer is actually obvious: Python never required the speed, because it's just so good.

The language is so popular, people search for and find ways around its limitations, which most likely actually even increases its popularity, because it gives people a lot of space to tinker in.


> Low-hanging fruits attract people who can't reach higher than that.

Do we have completely different definitions of low-hanging fruit?

Python not "requiring" speed is a fair enough point if you want to argue against large complex performance-focused initiatives that consume too much of the team's time, but the whole point of calling something "low-hanging fruit" is precisely that they're easy wins — get the performance without a large effort commitment. Unless those easy wins hinder the language's core goals, there's no reason to portray it as good to actively avoid chasing those wins.


> is precisely that they're easy wins — get the performance without a large effort commitment.

Oh, that's not how I interpret low-hanging fruits. From my perspective a "low-hanging fruit" is like cheap pops in wrestling. Things you say of which you know that it will cause a positive reaction, like saying the name of the town you're in.

As far as I know, the low-hanging fruit isn't named like that because of the fruit, but because of those who reach for it.

My reason for this is the fact that the low-hanging fruit is "being used" specifically because there's lots of people who can reach it. The video gaming industry as a whole, but specifically the mobile space, pretty much serves as perfect evidence of that.

Edit:

It's done for a certain target audience, because it increases exposure and interest. In a way, one might even argue that the target audience itself is a low-hanging fruit, because the creators of the product didn't care much about quality and instead went for that which simply impresses.

I don't think python would have gotten anywhere if they had aimed for that kind of low-hanging fruit.


Ah, ok. We're looking at the same thing from different perspectives then.

What I'm describing, which is the sense I've always seen that expression used as in engineering, and what GP was describing, is: this is an easy low-risk project that have a good chance of producing results.

E.g. If you tell me that your CRUD application suffers from slow reads, the low-hanging fruit is stuff like making sure your queries are hitting appropriate indices instead of doing full table scans, or checking that you're pooling connections instead of creating/dropping connections for every individual query. Those are easy problems to check for and act on that don't require you to try to grab the fruit hard-to-reach fruit at the top of the tree like completely redesigning or DB schema or moving to a new DB engine altogether.


https://en.wiktionary.org/wiki/low-hanging_fruit

Easily obtained gains; what can be obtained by readily available means

https://www.merriam-webster.com/dictionary/low-hanging%20fru...

the obvious or easy things that can be most readily done or dealt with in achieving success or making progress toward an objective

"Maria and Victor have about three months' living expenses set aside. That's actually pretty good …. But I urged them to do better …. Looking at their monthly expenses, we found a few pieces of low-hanging fruit: Two hundred dollars a month on clothes? I don't think so. Another $155 for hair and manicures? Denied."

"As the writers and producers sat down in spring 2007 to draw the outlines of Season 7, they knew, Mr. Gordon said, that most of the low-hanging fruit in the action genre had already been picked."

"When business types talk about picking low-hanging fruit, they don't mean, heaven forbid, doing actual physical labor. They mean finding easy solutions."

https://dictionary.cambridge.org/dictionary/english/low-hang...

something that is easy to obtain, achieve, or take advantage of

"The easy changes have all been made. All the low-hanging fruit has been picked."

"When cutting costs, many companies start with the low-hanging fruit: their ad budgets."

"For the beauty-care industry, the teen demographic is a new category for them - low-hanging fruit."

"I'm a great believer in picking low-hanging fruit. Start with what's easy, and go higher later."

"This legislation is some of the low-hanging fruit - the issues that we can agree upon across parties and regions."


It can also be a "low-hanging fruit" in a sense that it's possible to do without massive breakage of the ecosystem (incompatible native modules etc). That is, it's still about effort - but effort of the people using the end result.


I see your point, but it directly conflicts with the effort many people put into producing extremely fast libraries for specific purposes, such as web frameworks (benchmarked extensively), ORMs and things like json and date parsing, as seen in the excellent ciso8601 [1] for example.

[1] https://github.com/closeio/ciso8601


I disagree that it conflicts. There's an (implied) ceiling on Python performance, even after optimizations. The fear has always been that removing the design choices that cause that ceiling, would result in a different, incompatible language or runtime.

If everyone knows it's never going to reach the performance needed for high performance work, and there's already an excellent escape hatch in the form of C extensions, then why would people be spending time on the middle ground of performance? It'll still be too slow to do the things required, so people will still be going out to C for them.

Personally though, I'm glad for any performance increases. Python runs in so much critical infrastructure, that even a few percent would likely be a considerable energy savings when spread out over all users. Of course that assumes people upgrade their versions...but the community tends to be slow to do so in my experience.


Isn't this the point made in the last paragraph, about how people find ways around the limitations?


> I believe the reason is that python does not need any low-hanging fruits to have people use it, which is why they're a priority for so many other projects out there. Low-hanging fruits attract people who can't reach higher than that.

Ah! So they are so tall that picking the low-hanging fruit would be too inconvenient for them.

Talk about stretching an analogy too far.


> Now let's get a sane concurrency story (no multiprocessing / queue / pickle hacks) and suddenly it's a completely different language!

Yes, and I would argue it already exists and is called Rust :)

Semi-jokes aside, this is difficult and is not just about removing the GIL and enabling multithreading ; we would need to get better memory and garbage collection controls. Parts of what's make python slow and dangerous in concurrent settings are the ballooning memory allocations on large and fragmented workloads. A -Xmx would help.


Perhaps they took a lesson from Perl; its code base was so complex as to be near unmaintainable.

In addition to the other point here about speed not being a target in the first place.


There is a race right now to be more performant. .NET, Java, Go already participate, Rust/C++ are anyway there. So to stay relevant they have to start participate. .NET went through the same some years ago.

And why they were not addressed: because starting a certain points, the optimization are e.g. processor specific or non intuitive to understand. Making it hard to maintain vs a simple straight forward solution.


Guido stepping away from the language will have a lot of impact on changes that were culturally guided.


Guido is one of the people leading the current performance efforts.


That's interesting. Thanks.


Is this speed a feature of python now, or is this performance something that could regress in future if needed for a different feature? As in, will python continue to run this fast (or faster)?


I think it’s hard to answer your question. The development mentality of Python seems to have changed a little, going into 3.11, and you see sort of a shift in the priority of clean vs fast. I think you can assume that Python will continue to care about speed, but really, it sort of always has. Just like it’s going to continue to care about clean implementation. Then there is the part where Python will likely never become a “fast” language in the sense that C is a fast language, but the flip side of this is that Python was probably already fast enough for most uses. I tend to like to use the “we aren’t Netflix” or the “stackoverflow runs on IIS and always has” arguments when it comes to worrying about speed in anyone who isn’t Netflix, but that doesn’t really apply to Python since the backend of Instagram is django, and if Python can power Instagram then it’s probably fast enough for you, unless you’re already working with C/C++/Rust and know exactly why you are working with those.

I know I’m a bit more focused on the business side of things than a lot of techies here on HN, it’s a curse and a gift, but what I read the 10-60% speed increase as is money. The less resources you consume, the less money you burn. Which is really good news for Python in general, because it makes it more competitive to languages like C# for many implementations.

This comes from the perspective of someone who actually things running Typescript on your backend is a good idea because it lets you share developer resources easier so that the people who are really good at react can cover for the people who are really good at the backend and the other way around in smaller teams in non-software-engineering organisations.


Is there a website which tracks which of the major libraries; pandas, requests, django, etc support each major version? I remember there was one years ago for python 3.6? Been a long time!


I think this is what you’re looking for: https://pyreadiness.org/3.11/


Thanks!


I’d be curious to see the speed up compared to python 2.4. To see how far it’s come since I first got started with it.


it means you are not following PEP8

if you get non descriptive errors, it means you haven't followed proper exception handling/management

its not the fault of Python but the developer. go ahead and downvote me but you if you mentioned parent's comment in an interview, you would not receive a call back or at least I hope the interviewer is realizing the skill gap.


I think this is both insulting and not even responding to the right comment.


Ah yes, the "why make things easier when you could do things the hard way" argument.


How so?


I wish there was something like llvm for scripting languages. Imagine if python, php, javascript, dart or ruby would not interpret the code themself, but compile to an interpretable common language, where you could just plug in the fastest interpreter there is for the job.


The RPython language PyPy is written in is designed to enable writing JIT-compiled interceptors for any language:

https://rpython.readthedocs.io/en/latest/

There is also the Dynamic Language Runtime, which is used to implement IronPython and IronRuby on top of .NET:

https://en.wikipedia.org/wiki/Dynamic_Language_Runtime


rpython is designed to write new interpreters. Not aware of many apps written in the rpython dialect.

There are bunch of transpilers which might provide a path from statically typed python3 to native binaries. py2many is one of them.

The downside is that all of the C extensions that python3 uses become unusable and need to be rewritten in the subset of python3 that the transpiler accepts.


There have been many attempts to do this, but the problem seems to be that a good target for one scripting language isn't necessarily a good target for another one with a different model, and the interest in a common backend isn't strong enough to overcome that.

(Though JS had some potential to do that by the power of it's browser role, before compiling interpreters to WASM became a better way of getting browser support than compiling another scripting language source to JS.)


Arguably the JVM has been the most successful variant of this type of thing. It's not a bad generic higher level language target overall.

It's just it also has sooo much historical baggage and normative lifestyle assumptions along with it, and now it's passed out of favour.

And historically support in the browser was crap and done at the wrong level ("applets") and then that became a dead end and was dropped.


JVM and GraalVM?

https://www.graalvm.org/


https://github.com/mypyc/mypyc

> Mypyc compiles Python modules to C extensions. It uses standard Python type hints to generate fast code. Mypyc uses mypy to perform type checking and type inference.


You can't compile to an executable with mypyc, just native C extensions.


My bad, thanks for clarifying.


This is called the JVM and everyone has spent the last twenty years getting really upset about it.


> everyone has spent the last twenty years getting really upset about it

Could you expand on this?


Is Perl's Parrot not meant to be something like that?


Parrot [0] was meant to be that and the main VM for Perl6 (now Raku) and failed at both.

[0] http://www.parrot.org/ states:

The Parrot VM is no longer being actively developed.

Last commit: 2017-10-02

The role of Parrot as VM for Perl 6 (now "Raku") has been filled by MoarVM, supporting the Rakudo compiler.

[...]

Parrot, as potential VM for other dynamic languages, never supplanted the existing VMs of those languages.


Raku uses MoarVM


Yes, the quote from the Parrot website I provided says exactly that: the role Parrot intended to fill for Perl6 has been filled by MoarVM supporting the Rakudo compiler.


Is that not where WebAssembly is headed? Taken from their website "Wasm is designed as a portable compilation target for programming languages [...]"


Not really. Scripting languages implemented on WASM are doing it by compiling their main interpreter to WASM, not the source that the interpreter would execute.


The issue is that Python is sort of necessarily slow. Most code translates 1:1 to straightforward C but of course it is possible to create some monstrosity where, after the 3rd time an exception is thrown, all integer arithmetics change and string addition turns into 'eval'.

All approaches to "make Python fast again" tend to lock down some of the flexibility so aren't realy general purpose.


In a sense the xla compiler for tensorflow and other python based machine learning systems is exactly this. And mlir is based on llvm. I predict there will be a second generation of such systems that are more general than mlir and compile large scale computations that work on out of core data into efficient data flow applications that run at native speed


This is already a thing, LLVM has a JIT interface which is used by the Julia interpreter.



That's why they are called scripting languages.


The Java VM has been that in the past with projects like Jython being the bridge. But it's never worked out very well in practice.


Webassembly can fulfill that requirement for certain use cases


I don't think that was the original intent of WASM nor what its VM is designed for, even if people are now attempting to make it work in this fashion.

It was designed for "native-like" execution in the browser (or now other runtimes.) It's more like a VM for something like Forth than something like Python.

It's a VM that runs at a much lower level than the ones you'd find inside the interpreter in e.g. Python:

No runtime data model beyond primitive types - nothing beyond simple scalar types; no strings, no collections, no maps. No garbage collection, and in fact just a simple "machine-like" linear memory model

And in fact this is the point of WASM. You can run compiled C programs in the browser or, increasingly, in other places.

Now, are people building stuff like this overtop of WASM? Yes. Are there moves afoot to add GC, etc.? Yes. Personally I'm still getting my feet wet with WebAssembly, so I'm not clear on where these things are at, but I worry that trying to make something like that generic enough to be useful for more than just 1 or 2 target languages could get tricky.

Anyways it feels like we've been here before. It's called the JVM or the .NET CLR.

I like WASM because it's fundamentally more flexible than the above, and there's good momentum behind it. But I'm wary about positioning it as a "generic high level language VM." You can build a high-level language VM on top of it but right now it's more of a target for portable C/C++/Rust programs.


Is there a reason python hasn't already done this?


There have been a number of release blockers recently, and growing even as others fixed:

https://mail.python.org/archives/list/python-dev@python.org/...

Folks not desperate for the improvements might want before jumping in.


This looks like incremental performance work rather than a ground-up new approach, like an optimising compiler or JIT...


Python JITs were tried before, more than once. The usual problem there is that the gains are very modest unless you can also break the ABI for native modules. But if you do the latter, most users won't even bother with your version.


I believe a JIT is planned for 3.12


Did anyone try running this benchmark with other python implementations? (Pypy etc)


Too many things with Pypy (and other supposedly faster implementations) don't work, so it's a no-go for a lot of projects to begin with.

In the last 10 years, I looked into pypy I think three or four times to speed things up, it didn't work even once. (I don't remember exactly what it was that didn't play nice... but I'm thinking it must have been pillow|numpy|scipy|opencv|plotting libraries).


This use of "faster" in the text; I think they mean "as fast".

"1x faster" to me says twice the speed of the original, but I believe they use it to mean "the same speed".


That's not actually a bad observation even though you were downvoted. I think in the end it's technically ambiguous but most people would be able to see what is meant especially given the previous performance is present (i.e. for benchmark "deltablue" we have columns "12.4 ms" and "6.35 ms (1.96x faster)".

But you're right, I think "1.96x as fast" sounds more correct.

edit: but now that I think about it "X as fast" or "X faster" in terms of a duration will always sound a bit weird


It's an OCD pet peeve of mine. Worse yet is "2x slower!" To me, 1x slower is "stopped". I know they MEAN "1/2 the speed", but it still bugs me.


Oh that's like the US thing of saying something as "10 times less than ..." rather than "one tenth of ..."


Exactly that. I'm in the US and see it, but didn't know it was a US "thing". But yes, that.


It could also just be a new thing for English overall that I've associated more with the USA because I listen to or watch a lot of American podcasts, videos and streamers.

I'm not the type who gets worked up at American vs British English :-)


What happened to multithreaded Facebookthon?


is this project sponsored by microsoft ? it seems that the devs in the github org all work for microsoft


Yes


That’s wild, do we typically see such gains in dot releases? I don’t remember the last time this happened.

Great news.


Python dot releases are significant releases. Perhaps you recall the last python integer version number change?

As a result, it will be quite some time before we see a version 4.


Programming language dot releases can tend to be significant.

Semver often means that major is the language version, and minor is the runtime version


To nitpick a little bit, Python does not follow semver and does not claim to follow semver. Minor versions are mostly backwards compatible, but do include breaking changes.


Python's versioning has been misleading ever since 3.0 shipped.

There was a heap of features added between 3.5 and 3.11, for example. Enough to make it a completely different language.


Guido’s group at Microsoft have been busy.


It's good but not good enough.


Next we need some standardization around package management. I recently tried installing the Metal optimized pytorch and descended into package and Python path hell.


Just use conda or docker


Cool improvement but changes very little when Python is x100 times slower than other GC languages.


This is probably the wrong comparison to make: Python is two orders of magnitude slower than compiled GC languages, but it's in the same order of magnitude as most other interpreted GC languages.

(It actually changes a whole lot, because there's a whole lot of code already out there written in Python. 25% faster is still 25% faster, even if the code would have been 100x faster to begin with in another language.)


> 25% faster is still 25% faster, even if the code would have been 100x faster to begin with in another language.)

And that's even assuming that the code would have existed at all in another language. The thing about interpreted GC languages is that the iterative loop of creation is much more agile, easier to start with and easier to prototype in than a compiled, strictly typed language.


I think this is the best explanation for its popularity. Sure, it runs slow, but it's very fast to write. The latter is usually more important in the world of business, for better or worse.


The majority of code I write, the efficiency of the code is mostly a secondary concern; it will be run only a few times at most, and the fact that its a a few 1000 times slower than something written in C or C++ means waiting a few more seconds (at most) for my results. Most of my time spent is writing it, not running it.


Not just in the world of business - a lot of developers play with these things in their spare time before promoting them to production tech; if it takes ages to write a silly mp3 player or some other hobby project, they'll use something else instead.


Python is compiled to a VM (that is CPython). The semantics and implementation of the VM haven't prioritized performance as much as some other systems. Here we're seeing improvements in the implementation. But the semantics will be the hard part, as those semantics limit the performance. For instance "a + b" is (I believe) compiled into bytecodes that pretty much follow the expression. But the implementation of "add" has to take into account __add__() and the possible function calls and exceptions and return types that implies. If you compiled that all down to machine code (which people have done) you'd still have to run all that code to check types and the existence of methods and so on.


Dynamic languages like Javascript have solved this problem by essentially caching the resolution of a particular expression that is executed very often. I don't see why this cannot be done in Python.


The expressions are dynamic, so they have to be evaluated every time.

Python is excessively dynamic, so it can't (conventionally) be sped up as easily as many other languages unfortunately. Finally some folks are being paid and allowed to be working on it.


I don't buy this. There are many contexts in which a smart JIT compiler can detect that an expression cannot be modified. Especially since, due to the GIL, python code is mostly non-threaded. They just didn't spend enough time to do the hard work that people spent on Javascript.


> can detect that an expression cannot be modified

Can you give some examples? I use Python a lot and it's absurdly dynamic. I've sometimes wondered if there'd be some way to let me as the programmer say "dynamicness of this module is only 9 instead of the default 11" but I'm skeptical of a tool's ability to reliably deduce things. Here's just one example:

  a = 0
  for i in range(10):
      a += 1
      log('A is', a)
In most languages you could do static analysis to determine that 'a' is an int, and do all sorts of optimizations. In Python, you could safely deduce that 99.9999% of the time it's an int, but you couldn't guarantee it because technically the 'log' function could randomly be really subversive and reach into the calling scope and remap 'a' to point to some other int-like object that behaves differently.

Would "legitimate" code do that? Of course not, but it's technically legal Python and just one example of the crazy level of dynamic-ness, and a tool that automatically optimizes things would need to preserve the correctness of the program.

EDIT: tried to fix code formatting


I think the way it is usually done is that the JIT assumes, for example after profiling, that an expression has a specific type, generates code according to that assumption and adds strategic guards to check that the assumptions are still valid, with a fall back path to the original unoptimized code. Thanks to the magic of branch prediction the guards have little runtime cost.


Python is harder to optimize than Javascript, as I mentioned. The subject is well explored.

It is true that they haven't tried very hard, or had the resources to until now. But Python will never be as fast as .js due to the reason above.


This is exactly what a lot of the current round of optimization work focuses on. Things like optimistic dispatch on hot loops that fall back to fully dynamic on a miss.


I thought Python was slower than many other interpreted languages but looking at some benchmark it turns out I was wrong.

It's about on par with Ruby, only Lua and JIT compiled runtimes beat it (for very understandable reasons).


Javascript spanks other scripting languages.


JS is interpreted, and it's much much faster than Python.


Python being interpreted is the main reason why it’s slow, but not the excuse to not compare it to similar and faster programming languages.


One of the biggest contributors in Python's (lack of) speed is dynamic typing. While a jump is a jump and an assignment is an assignment, an addition is more like "hmm... what is a left operand... is it double? what is a right operand? wow, it's also a double! okay, cpu add eax, ebx".


There are a lot of dynamically typed languages that are significantly faster than python. Late binding issues can be effectively worked around.


Do you have an example of a dynamically typed language where, say, addition of two lists of doubles would be significantly faster than in Python?


Assuming you mean pairwise addition, Pharo achieves over twice Python speed, in my laptop.

Python version:

  from random import randrange
  from time import time
  
  def main():
      L = [float(randrange(2**52, 2**53)) for _ in range(20000000)]
      M = [float(randrange(2**52, 2**53)) for _ in range(20000000)]
  
      t0 = time()
      N =  [ x+y for x,y in zip(L, M) ]
      print('Concluded in', round(1000*(time() - t0)), 'millisec.')
  
  main()
Results:

  Python 3.10.5 (main, Jun  9 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] on linux
  Type "help", "copyright", "credits" or "license()" for more information.
  
  ============ RESTART: /run/media/user/KINGSTON/benchmark_doubles.py ============
  Concluded in 1904 millisec.
Pharo 10 version:

  | L M N t0 |
  
  Transcript clear.
  L := (1 to: 2e7) collect: 
   [ :each | (( 2 raisedTo: 52 ) to: ( 2 raisedTo: 53 )) atRandom asFloat ].
  M := (1 to: 2e7) collect: 
   [ :each | (( 2 raisedTo: 52 ) to: ( 2 raisedTo: 53 )) atRandom asFloat ].
  t0 := DateAndTime now.
  M := L with: M collect: [ :x :y | x + y ].
  Transcript
    show: 'Concluded in '
        ,
            ((DateAndTime now - t0) asMilliSeconds asInteger ) asFloat asString
        , ' millisec.';
    cr.
Results:

  Concluded in 914.0 millisec.


You mean concatenating the lists or pairwise addition?

I don't expect the former to be slow in python as it would be a primitive implemented in C (although even a simple lisp interpreter can have an advantage here by just concatenating the conses).

For the latter, any language runtime capable of inference should be able to optimize it.


I would expect this microbenchpark in particular to be as fast in LuaJIT as it would be in C, without the risk of undefined behavior if the array boundaries are improperly calculated.


Doesn't that sort of operation become very fast in JS after a few runs?


Probably most Common Lisp implementations.


LuaJIT.


Racket


Also "what is equals?". That's why people will do:

    def fooer(i: str, strings: list[str]):
        push = str.append
        for s in strings:
            push(i, s)
Apparently this helps the interpreter understand that `append` is not getting overwritten in the global scope elsewhere.


This is one of the most common optimizations in Python, simply because the lookup code for modules/class instances is horrible and slow.


It'd be nice if there were a tool you could run over your code that did this sort of thing. Even if it can technically break shit in absurd cases, I'd totally run it on my code, which isn't absurd.


If it's double, it probably would be addss xmm0, xmm1 :-) (add eax, ebx would be for 32-bit integers.)


You're welcome to make the comparison. I'm just pointing out that it's a sort of category error, beyond the relatively bland observation of "interpretation is slow."


You can completely bypass the interpreter by running plain, unannotated Python code through Cython. You get a speedup but it is still slow from manipulating PyObjects and attribute lookups.


>Python being interpreted is the main reason why it’s slow

Common Lisp and Java begs to disagree.


Jvm at least is jitted, not interpreted. Dunno anything about common lisp.


Common Lisp has several compilers generating machine code AOT. See for example http://www.sbcl.org


I feel this is a bit nihilistic.

Python is used in so many different areas, in some very critical roles.

Even a small speedup is going to be a significant improvement in metrics like energy use.

The pragmatic reality is that very few people are going to rewrite their Python code that works but is slow into a compiled language when speed of execution is trumped by ease of development and ecosystem.


This was my immediate thought when I looked at the numbers, but I didn't want to be "that guy". I think if all you ever work in is Python, it's still nice even if it is a tiny improvement.


The goal with faster cpython is for small compounding improvements with each point release[0]. So in the end it should be much more than a tiny improvement.

[0] https://github.com/markshannon/faster-cpython/blob/master/pl...


It will still improve the data center power bill of thousands of companies, while saying to them "just Rewrite it in Rust" will not.


> other GC languages.

Does not sounds like Rust.

(But other than that I agree with your point.)


It changes a lot when that doesn't matter because an org only uses python for their ops automation.

Doing side by side comparisons between golang and python on Lambda last year, we halved the total execution time one a relatively simple script. Factor of 100, I assume, is an absolute best case.


12ms to 6ms is not 2xfaster. It's 1x faster or 2 times as fast


I just learned today that python is 100x slower than C. I have severely lost respect for python.


C is faster than Python, but it's much more nuanced than saying Python is "100x slower". Try writing a program to open and parse a CSV file line by line in both languages. You'll see why people like Python.

Also, because of FFI, it's not like the two languages are opposed to each other and you have to choose. They can happily work together.


If you just learned this today, then HN has severely lost respect for you! lol


Yes it’s deserved but I’ve been in management for a while.


From your perspective then, python is 100x faster than c


We don’t use python but you’re very articulate, Helen.


And oranges contain 10x the vitamin C of apples :)


If it's 25% faster then the point versions should be (rounded) 3.88.


If performance is relevant to you, Python is the wrong language for you.


If you think this is the kind of statement that it's possible to make, software might be the wrong field for you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: