I highly recommend this talk by Armin Ronacher as an accompaniment (or prologue, or epilogue) to the article: "How Python was shaped by leaky internals", https://www.youtube.com/watch?v=qCGofLIzX6g
There's also this wonderful book [1] covering similar ground for Ruby. An extract from a review gives a flavour:
"Ruby Under a Microscope" does something fairly ambitious. It attempts to write a system internals book in a language that non computer scientists can readily understand. While there are numerous code snippets and examples to try and examine, the ability to look at the various Ruby internals and systems and see how they fit together can be accomplished by someone with general skills and basic familiarity with programming at the script level (which for many of us is as far as we typically get). An old saying says you can’t tell where yo hare going if you don’t know where you’ve been. Similarly, we can’t expect to get the most out of languages like ruby without having a more clear idea what’s happening under the hood. It’s entirely possible to work with Ruby and never learn some of this stuff, but having a guide like "Ruby Under a Microscope” opens up a variety of avenues, and does so in a way that will make the journey interesting and, dare I say it, even a little fun.
I heavily recommend this series from personal experience, since it was one of my primary references when working on https://docs.microsoft.com/en-us/visualstudio/python/debuggi.... Jumping straight into the CPython source code for the bytecode interpreter loop without a higher-level overview like this can be overwhelming.
I gather that BINARY_SUBTRACT in this context means type-independent subtraction of two operands. Not subtraction of two binary integers. So we aren't yet seeing the dispatching which decides what type of subtraction to do. Integers? Floats? Strings? Some object that defines subtraction?
An important thing to realize about CPython's implementation is that almost everything is really a key/value map underneath. Which is why number-crunching has to be done in C.
More importantly, even small integers are heap-allocated. Some are pre-allocated and reused (0 to 100, IIRC), but outside of that, every numeric result requires a malloc and object initialization.