Scalene: A high-performance, high-precision CPU, GPU, memory profiler for Python

infocollector · 2024-10-22T02:24:07 1729563847

I've used Scalene at various times in the past few years, and always liked using it when I want to dig deeper compared to cProfile/profile. You might also want to look at:

1] https://github.com/joerick/pyinstrument

2] https://github.com/benfred/py-spy

3] https://github.com/P403n1x87/austin

4] https://github.com/bloomberg/memray

5] https://github.com/pyutils/line_profiler

1st1 · 2024-10-22T02:31:17 1729564277

For profiling memory consider far more advanced memray.

https://github.com/bloomberg/memray

codedokode · 2024-10-22T12:33:26 1729600406

In my experience, memory profilers that show line or function which allocated memory, are not very useful. What I usually want instead is to get a graph of all objects in memory with their size and path from root, to see who is using most memory. To see for example, that there is some cache or log that is never cleared.

For PHP and for browser's developer tools there is such kind of profiler. But judging by screenshots, this profiler cannot produce such a graph. So its usability to solve memory leak/high consumption issues is limited.

To summarize, what I usually need is not a line number, but a path to objects using/holding most memory.

PathOfEclipse · 2024-10-22T16:06:00 1729613160

You are comparing two different types of things:

1. Allocation profiler

2. Heap analyzer

Allocation profilers will capture data about what is allocating memory over time. This can be captured in real time without interrupting the process and is usually relatively low-overhead.

Heap analyzers will generally take a heap dump, construct an object graph, do various analyses, and generate an interactive report. This generally requires that you pause a program long enough to create a heap dump, which is often multiple GB or more in size, write it to disk, then do the subsequent analysis and report generation.

I agree that 2) is generally more useful but I assume both types of profilers have their place and purpose.

Loranubi · 2024-10-22T03:13:11 1729566791

Note: The Windows version currently only supports CPU and GPU profiling, but not memory or copy profiling.

That's a problem with many of the profiling tools around Python. They often support Windows badly or not at all.

solarkraft · 2024-10-22T04:01:29 1729569689

Isn’t the modern Windows runtime just Linux anyway?

emeryberger · 2024-10-22T06:01:57 1729576917

(Scalene author here) Nope, but WSL2 (Windows Subsystem for Linux) is, and Scalene works great with it.

pjc50 · 2024-10-22T14:33:50 1729607630

Er .. no? The modern Windows runtime, for various definitions of those words, is the confusingly named "WinRT" https://learn.microsoft.com/en-us/windows/uwp/cpp-and-winrt-...

Underneath it's still substantially similar to good old Windows NT.

There's a Linux "subsystem". Well, two of them. WSL1 is an API translation layer that ends up being cripplingly slow. Don't use it. WSL2 is more of a VM that just runs a Linux distro. This is before you get into third party compatibility layers like cygwin and mingw.

gloflo · 2024-10-22T10:36:49 1729593409

* with AI-powered optimization proposals

Sigh, why infest everything with "AI".

kristianp · 2024-10-21T23:21:20 1729552880

This profiler was mentioned in the context of rewriting js tools in faster languages here:

https://lobste.rs/s/ytjc8x/why_i_m_skeptical_rewriting_javas...

The rewrite discussion is here: https://news.ycombinator.com/item?id=41898603

tomrod · 2024-10-22T01:26:09 1729560369

Oh! Lobsters is neat!

enygmata · 2024-10-22T23:27:40 1729639660

It's great for my side projects but I have never been able to use it at work. I gave up and used pyinstrument instead.

I wanted to submit a ticket for my use case but I can't find a minimal program & setup to reproduce the issue. I just know it in my gut it has to do with the mix of multiprocessing (fork) + async + threading.

motbus3 · 2024-10-22T10:04:25 1729591465

I used many times and sometimes I used py-spy. That helped me to improve several projects where people told me there were network problems but it was actually not

embeng4096 · 2024-10-22T14:52:24 1729608744

+1 for py-spy. I would love to try out Scalene but the application I have does something funky with pickling for multiprocessing, breaking Scalene in the process (my fault, not Scalene's, I'll be looking into that). Py-spy worked, including catching all the sub-processes. Feeding the py-spy JSON output into https://speedscope.app makes for a very easy way to profile in a time crunch if you don't have time to get familiar with CLI tools or can't install stuff locally but have a browser and internet connection.

motbus3 · 2024-10-25T12:38:35 1729859915

Scalene is great too. I actually use to find some memory allocation problems that were quite hard do pin.

emeryberger · 2024-10-22T15:02:49 1729609369

Scalene author here - please file an issue!

embeng4096 · 2024-10-22T15:18:42 1729610322

I will! I'll try to pare down my customer's code into a minimal example I can post.

carlmr · 2024-10-22T15:06:59 1729609619

>where people told me there were network problems but it was actually not

Always ask if they assume it's network bound or they have measurements. Measurements may sometimes be wrong, but assumptions are more often wrong than right in performance engineering.

alfons_foobar · 2024-10-22T10:46:00 1729593960

kudos for actually looking what the problem is.

As a (former) NetEng, it bothers me to no end that so many people claim "it's the network" when their application is slow / broken, without understanding the actual problem.

bbstats · 2024-10-23T12:58:29 1729688309

Anyone find the optimizer prompt(s)?

ancplrt · 2024-10-22T08:49:16 1729586956

If people wonder why there are so many tools for the slowest language on the planet:

In addition to being slow and unsuited for abstractions, people write horrible code with layers and layers of abstractions in Python. These tools can sometimes help with that.

People who do write streamlined code that necessarily uses C-extensions in Python will probably use cachegrind/helgrind/gprof etc.

Or switch to another language, which avoids many categories of other issues.

codedokode · 2024-10-22T12:35:19 1729600519

Python might be not great on using CPU time, but is saves a lot of human time for writing the code compared to "fast" languages.

blackbear_ · 2024-10-22T10:21:11 1729592471

Opening throwaway accounts just to rage against things you don't understand is really shameful.