Applying Mypy to real-world projects

thundergolfer · on Feb 11, 2020

> Two very common initial problems I've seen are ... mypy is not running as part of the build ...

If MyPy isn't running in your build/CI, it's possibly worse than useless.

Until recently typed-annotated Python code wasn't checked in the build. The only time you'd be able to notice a problem was when the IDE's MyPy plugin showed a red squiggly.

One I got MyPy integrated with Bazel we could run it over our codebase, and lo and behold there were at least a dozen errors which were actively misleading developers about the code they were reading.

If something (MyPy, Pyre) isn't checking the type-annotations all the time, they're going to decay into something worse than untyped Python.

dharmab · on Feb 12, 2020

We've been running mypy on our project for about a year now and it's one of the best decisions we've made. Code is easier to read and write.

We implemented it progressively. At first I added it as a make target but didn't make it mandatory in CI so I could learn how to use it. Then I made it mandatory for a few files that I was the only active contributor to. Then I slowly added more and more files across the project, sometimes as I touched them for other reason and other times as independent changes. Eventually as mypy caught more and more bugs in other contributor's changes they started getting on board and adding type hints as well, until the vast majority of the project was hinted (we'll be getting to 100% within a few weeks).

blumomo · on Feb 12, 2020

Thanks for sharing your experience. In the project you're talking about, what role do automated (unit or integration) tests play? Was it part of your project policy to include tests for fixed bugs/new features? How did that change when you introduced mypy? What role do tests play today where you're using mypy widely now?

dharmab · on Feb 12, 2020

We also use unit and integration tests heavily, although neither test suite has enforced type checks. Some parts of the code have enforced unit and integration tests for changes, but others are things that are very difficult to unit test because they interact with systems outside of our control that don't document their low level API responses or have bugs/behaviors that we only observe when running integration tests against the real thing (and subsequently have to report issues or contribute fixes for).

We also use flake8, shellcheck, yamllint and black in our automated tests, as well as a couple of custom scripts (e.g. one that makes sure if you change the documentation you also remembered to regenerate and commit an updated index)

We also still write our tests to assume no type checking in parts dealing with external data, because we can still get badly typed data at runtime. But we also use valdation code to assert those types at the edges of the system so internal modules can assume types are correct.

blumomo · on Feb 12, 2020

Thanks

mikeholler · on Feb 11, 2020

This is a good read. I did not know about typing.cast and was just putting # type: ignore on things.

So far, we've had a pretty positive experience with mypy and it's helped prevent a few bugs. Do be prepared to find weird edges you can cut yourself on along the way, but all-in-all I think it's a positive value add.

eire1130 · on Feb 11, 2020

I would advise to be cautious with cast. It can be powerful and mask bugs if you arent careful (ie, type checker said it's ok, so it must be ok!). You may also want to drop a comment if you use it.

stavros · on Feb 11, 2020

Same here, this article is worth it for cast alone. I also have the same experience as you, it's sometimes wrong but nothing an ignore won't fix, and the bugs it's found have saved me lots of time.

jesseryoung · on Feb 11, 2020

Good call on the ingore_missing_imports. I just moved that from the [mypy] section into a [mypy-pyspark.*] and found 4 errors that I didn't notice before.

The first result in Google for this error points here https://github.com/python/mypy/issues/3905 which explains the issue. I'm ashamed to say I probably just read the first reply, added it to my config and carried on with my day without actually reading what it did.

j88439h84 · on Feb 12, 2020

Good writeup. A couple points.

`zope.interface` is more explicit and scalable than `typing.Protocol`s, and more flexible than `abc.ABC`. There's a mypy plugin for it: https://github.com/Shoobx/mypy-zope

> The drawback is that code that changes the representation of its data a lot tends not to be fast code.

That's not a very convincing reason to avoid dataclasses except in the most performance-constrained environments -- and even then I'm doubtful it'd help. Especially with `slots=True`, dataclasses can take less resources.

calpaterson · on Feb 12, 2020

(Hi, author of article here)

I'm glad you liked the article :) Interesting tip on zope.interface - hadn't heard of that and will be looking into it

> > The drawback is that code that changes the representation of its data a lot tends not to be fast code.

> That's not a very convincing reason to avoid dataclasses except in the most performance-constrained environments -- and even then I'm doubtful it'd help. Especially with `slots=True`, dataclasses can take less resources.

re this - slotted (data)classes are useful when you want to reduce peak memory usage but don't help with the problem of programs that instantiate (and teardown) objects excessively.

IME this is a common pitfall that slows down Python programs (often by a couple of orders of magnitude) and contributes to an unjustified perception that Python is "too slow". The "excessive changes in data representation" anti-pattern is slow in every language but because the syntactic/correctness overhead of creating a mass of objects is higher in (eg) C than Python the arises much less often. My worry is that as the syntactic burden of this is lowered even further it becomes even more common.

The problem is so severe in the Java community that it has had a huge influence on JVM design. I haven't kept up with every development in the JVM but last I was using it (maybe 6 years ago) it already had a huge number of techniques to reduce the burden of frequent object creation. CPython of course doesn't have any of these tricks.

duckerude · on Feb 12, 2020

> Especially with `slots=True`, dataclasses can take less resources.

`slots=True` doesn't exist, though there's an experimental implementation at https://github.com/ericvsmith/dataclasses/blob/master/datacl... .

The main issue is that slots have to exist at class definition, and @dataclass takes an already defined class, so you'd have to create a clone of the class, which is nasty.

On top of that you can't always add __slots__ manually, because it conflicts with default field values (and field()s). __slots__ wants to add a descriptor as a class attribute but it can't do that if the default value is already an attribute.

calpaterson · on Feb 12, 2020

The GP is probably referring to the attrs library, which is another (older) way of creating dataclasses than the stdlib way. It has an easy way to create slotted dataclasses.

http://www.attrs.org/en/stable/examples.html#slots

gonational · on Feb 12, 2020

I know performance is not an explicit goal of Mypy, but I’ve been curious whether or not this project could lead to something much faster in the future (compiled Mypy with Nim-like speed, something else?). Any thoughts?

Also, does anybody know of any benchmarks of typical programs in Mypy vs cPython?

dfee · on Feb 12, 2020

The types are ignored at runtime; they’re annotations. I guess they could still theoretically have some impact, but it should be negligible.

Compiled mypy is a stated potential goal - but outside the scope of this project itself. Read more here: https://mypy.readthedocs.io/en/stable/faq.html#will-static-t...

hsaliak · on Feb 12, 2020

There’s an experimental mypyc that can compile typed code to extensions, it’s not complete but it is used in mypy itself if I understand correctly.

More practically, I’ve found that it’s much easier to translate well typed code to cython. It still requires work, and ymmv.

anentropic · on Feb 12, 2020

eventually there may be... https://github.com/python/mypy/tree/master/mypyc

nimmer · on Feb 19, 2020

"compiled Mypy with Nim-like speed" is Nim, essentially.

hiccuphippo · on Feb 12, 2020

I think cython is already using type annotations to compile python code.

anentropic · on Feb 12, 2020

I just wish they'd get recursive types working

(I know they're working on it, but I seem to hit this on nearly every project, would be the number one missing feature ATM IMHO)

hereisdx · on Feb 12, 2020

Zulip (https://zulip.com/) has been doing this for a long time.

See this interesting blog post on how it was done: https://blog.zulip.org/2016/10/13/static-types-in-python-oh-...

luord · on Feb 12, 2020

> Try not to lose sight of the fact that type checking is supposed to be an aid to correctness and not an intellectually satisfying end in itself.

Indeed, loved that closing sentence.

gnusty_gnurc · on Feb 12, 2020

I recently discovered the Dropbox PyCharm plugin doesn't follow imports by default (dmypy doesn't support it apparently - there's github issues raised). Spent an unnecessary amount of time figuring out why TypedDict was raising mypy errors.