I’ve tried optimizing some python that’s in the hot path of a build system and a few dozen operations out of over 2K nodes in the ninja graph account for 25-30% of the total build time.
I’ve found python optimization to be nearly intractable. I’ve spent a significant amount of time over the past two decades optimizing C, Java, Swift, Ruby, SQL and I’m sure more. The techniques are largely the same. In Python, however, everything seems expensive. Field lookup on an object, dynamic dispatch, string/array concatenation. After optimization, the code is no longer “pythonic” (which has come to mean slow, in my vernacular).
Are there any good resources on optimizing python performance while keeping idiomatic?
"Are there any good resources on optimizing python performance while keeping idiomatic?"
It is essentially impossible, "essentially" here using not the modern sense of "mostly", but "essential" as in baked into the essence of the language. It has been the advice in the Python community pretty much since the beginning that the solution is to go to another language in that case. There are a number of solutions to the problem, ranging from trying PyPy, implementing an API in another language, Cython/Pyrex, up to traditional embedding of a C/C++ program into a Python module.
However this is one of those cases where there are a lot of solutions precisely because none of them are quite perfect and all have some sort of serious downside. Depending on what you are doing, you may find one whose downside you don't care about. But there's no simple, bullet-proof cookbook answer for "what do I do when Python is too slow even after basic optimization".
Python is fundamentally slow. Too many people still hear that as an attack on the language, rather than an engineering fact that needs to be kept in mind. Speed isn't everything, and as such, Python is suitable for a wide variety of tasks even so. But it is, still, fundamentally slow, and those who read that as an attack rather than an engineering assessment are more likely to find themselves in quite a pickle (pun somewhat intended) one day when they have a mass of Python code that isn't fast enough and no easy solutions for the problem than those who understand the engineering considerations in choosing Python.
I agree with this. People advocate for "pick a better algorithm", and sometimes that can help dramatically, but at times the best algorithm implemented in python is still too slow, but can be made 500x faster by reimplementing the exact same algorithm in cython or C or fortran or so on.
Python is a great language for rapidly bashing out algorithmic code, glue scripts, etc, unfortunately due to how dynamic it is, it is a language that fundamentally doesn't translate well to operations CPUs can perform efficiently. Hardly any python programs ever need to be as dynamic as what the language allows.
I've had very good experiences applying cython to python programs that need to do some kind of algorithmic number crunching, where numpy alone doesn't get the job done.
With cython you start with your python code and incrementally add static typing that reduces the layers of python interpreter abstractions and wrappings necessary. Cython has a very useful and amusing output where it spits out a html report of annotated cython source code, with lines highlighted in yellow in proportion to the amount of python overhead. you click on any line of python in that report and it expands to show how many Cpython API operations are required to implement it, and then you add more static type hints and recompile until the yellow goes away and the compute heavy kernel of your script is a C program that compiles to operations that real world CPUs can execute efficiently.
Downside of cython is the extra build toolchain and deployment concerns it drags in - if you previously had a pure python module, now you've got native modules so you need to bake platform specific wheels for each deployment target.
For Python that's being used as a glue scripting language, not number crunching, worth considering rewriting the script in something like Go. Go has a pretty good standard library to handle many tasks without needing to install many 3rd party packages, and the build, test, deploy story is very nice.
Of solutions, Numba, Nuitka, are probably also worth mentioning.
Then just looking at those, I now know of Shedskin and ComPyler.
I do feel like one nice thing about so many people working on solutions to every problem in python, is that it means that when you do encounter a serious downside, you have a lot more flexibility to move forwards along a different path.
Depending on the nature of your code, just throwing pypy at it instead of cpython can be a huge win. This very much depends on what you're doing though. For example, I have a graphics editor I wrote that is unbearably slow with cpython; with pypy and no other changes it's usable.
If what you're trying to do involves tasks that can be done in parallel, the multithreading (if I/O bound) or multiprocessing (if compute bound) libraries can be very useful.
If what you're doing isn't conducive to either, you probably need to rewrite at least the critical parts in something else.
Pypy is great for performance. I'm writing my own programming language (that transpiles to C) and for this purpose converted a few benchmarks to some popular languages (C, Java, Rust, Swift, Python, Go, Nim, Zig, V). Most languages have similar performance, except for Python, which is about 50 times slower [1]. But with PyPy, performance is much better. I don't know the limitations of PyPy because these algorithms are very simple.
But even thougt Python is very slow, it is still very popular. So the language itself must be very good in my view, otherwise fewer people would use it.
>> optimizing python performance while keeping idiomatic?
That's impossible[1].
I think it is impossible because when i identify a slow function using cProfile then use dis.dis() on it to view the instructions executed, most of the overhead - by that i mean time spent doing something other than the calculation the code describes - is spent determining what each "thing" is. It's all trails of calls trying to determine "this thing can't be __add__() to that thing but maybe that thing can be __radd()__ to this thing instead. Long way to say most of the time wasting instructions i see can be attacked by providing ctypes or some other approach like that (mypyc, maybe cython, etc etc) - but at this point you're well beyond "idiomatic".
[1] I'm really curious to know the answer to your question so i'll post a confident (but hopefully wrong) answer so that someone feels compelled to correct me :-)
> I’ve found python optimization to be nearly intractable.
Move your performance critical kernels outside Python. Numpy, Pytorch, external databases, Golang microservice, etc.
It's in fact an extremely successful paradigm. Python doesn't need to be fast to be used in every domain (though async/nogil is still worth advancing to avoid idle CPUs).
However note that (and this isn't just true for Python) if your "performance critical kernel" is "the whole program" then this does have the implication you think it does, you should not write Python.
If your Python software is spending 99% of its time doing X, you can rewrite X in Rust and now the Python software using that is faster, but if your Python software is spending 7% of its time doing X, and 5% doing Y, and 3% doing Z then even if you rewrite X, and Y and Z you've shaved off no more than 15% and so it's probably time to stop writing Python at all.
This is part of why Google moved to Go as I understand it.
Sure, but there's also an argument for wrapping your pure Rust or Go executable in the thinnest layer of Python so that your users can optionally `pip install` it.
At that point you can use the OS package manager, which is more robust and installs programs in standard locations and incorporates things like process isolation.
> they have completely different package managers.
Exactly, that's one case where the OSes completely differ. Just because the OSes share the same kernel, which prioritizes userspace compatibility, doesn't mean they are not a different OS. A kernel doesn't make an OS.
I think you are needlessly conflating terms. My original comment was correct that you need to explicitly package your builds for many different distro package managers, if you want to go that route. That's a lot of work to maintain.
Or you can just put it on PyPI which is everywhere nowadays.
Which I as a user won't use. I trust my OS out of necessity, but I don't like to download and run random code from the internet and I don't have the time to review it. My OS instead often has patches, that strip out features I do not want and integrate it with the OS.
> Are there any good resources on optimizing python performance while keeping idiomatic?
At the risk of sounding snarky and/or unhelpful, in my experience, the answer is that you don't try to optimize Python code beyond fixing your algorithm to have better big-O properties, followed by calling out to external code that isn't written in Python (e.g., NumPy, etc).
But, I'm a hater. I spent several years working with Python and hated almost every minute of it for various reasons. Very few languages repulse me the way Python does: I hate the syntax, the semantics, the difficulty of distribution, and the performance (memory and CPU, and is GIL disabled by default yet?!)...
I spent a decade optimizing Python data science projects for a Fortune 100. I agree with the other sentiments that you can't really optimize Python itself because of these weird behaviors that seem to change the more you look at them... Mostly all of my optimization work was generic computer science stuff across the entire system and, in the data science world, this usually meant optimizing the database queries or the io throughput, or the overall shape of the kind of work that they were doing in order to parallelize it, or figure out how to simply do way less, etc. If someone actually came up to me with an actual problem with a Python code and it was the Python code that was not performing correctly, I would pretty immediately say let's use something else. I think the efforts to make a more efficient Python are really great, but I think the extreme utility of Python as a light glue to pull together lots of other things that are performing well in other computer languages is a widely recognized strength of the Python ecosystem.
Idiomatic Python code can be faster than naive C/C++ code. The secret is to offload hot paths outside pure Python. Most such things are written that way already: you do not try to optimize the internal loop for a matrix multiplication in pure Python, you call numpy.dot() instead (pytorch if GPUs can help).
Otherwise, optimizing code in Python is the same as in any other language eg
I was taught a long time ago that if you want to optimize Python for CPU usage, you write C and call it using a FFI. Or you find libraries like numpy that already do that for you.
I’ve found python optimization to be nearly intractable. I’ve spent a significant amount of time over the past two decades optimizing C, Java, Swift, Ruby, SQL and I’m sure more. The techniques are largely the same. In Python, however, everything seems expensive. Field lookup on an object, dynamic dispatch, string/array concatenation. After optimization, the code is no longer “pythonic” (which has come to mean slow, in my vernacular).
Are there any good resources on optimizing python performance while keeping idiomatic?