Hacker News new | past | comments | ask | show | jobs | submit login
Applying Mypy to real-world projects (calpaterson.com)
127 points by rbanffy on Feb 11, 2020 | hide | past | favorite | 23 comments



> Two very common initial problems I've seen are ... mypy is not running as part of the build ...

If MyPy isn't running in your build/CI, it's possibly worse than useless.

Until recently typed-annotated Python code wasn't checked in the build. The only time you'd be able to notice a problem was when the IDE's MyPy plugin showed a red squiggly.

One I got MyPy integrated with Bazel we could run it over our codebase, and lo and behold there were at least a dozen errors which were actively misleading developers about the code they were reading.

If something (MyPy, Pyre) isn't checking the type-annotations all the time, they're going to decay into something worse than untyped Python.


We've been running mypy on our project for about a year now and it's one of the best decisions we've made. Code is easier to read and write.

We implemented it progressively. At first I added it as a make target but didn't make it mandatory in CI so I could learn how to use it. Then I made it mandatory for a few files that I was the only active contributor to. Then I slowly added more and more files across the project, sometimes as I touched them for other reason and other times as independent changes. Eventually as mypy caught more and more bugs in other contributor's changes they started getting on board and adding type hints as well, until the vast majority of the project was hinted (we'll be getting to 100% within a few weeks).


Thanks for sharing your experience. In the project you're talking about, what role do automated (unit or integration) tests play? Was it part of your project policy to include tests for fixed bugs/new features? How did that change when you introduced mypy? What role do tests play today where you're using mypy widely now?


We also use unit and integration tests heavily, although neither test suite has enforced type checks. Some parts of the code have enforced unit and integration tests for changes, but others are things that are very difficult to unit test because they interact with systems outside of our control that don't document their low level API responses or have bugs/behaviors that we only observe when running integration tests against the real thing (and subsequently have to report issues or contribute fixes for).

We also use flake8, shellcheck, yamllint and black in our automated tests, as well as a couple of custom scripts (e.g. one that makes sure if you change the documentation you also remembered to regenerate and commit an updated index)

We also still write our tests to assume no type checking in parts dealing with external data, because we can still get badly typed data at runtime. But we also use valdation code to assert those types at the edges of the system so internal modules can assume types are correct.


Thanks


This is a good read. I did not know about typing.cast and was just putting # type: ignore on things.

So far, we've had a pretty positive experience with mypy and it's helped prevent a few bugs. Do be prepared to find weird edges you can cut yourself on along the way, but all-in-all I think it's a positive value add.


I would advise to be cautious with cast. It can be powerful and mask bugs if you arent careful (ie, type checker said it's ok, so it must be ok!). You may also want to drop a comment if you use it.


Same here, this article is worth it for cast alone. I also have the same experience as you, it's sometimes wrong but nothing an ignore won't fix, and the bugs it's found have saved me lots of time.


Good call on the ingore_missing_imports. I just moved that from the [mypy] section into a [mypy-pyspark.*] and found 4 errors that I didn't notice before.

The first result in Google for this error points here https://github.com/python/mypy/issues/3905 which explains the issue. I'm ashamed to say I probably just read the first reply, added it to my config and carried on with my day without actually reading what it did.


Good writeup. A couple points.

`zope.interface` is more explicit and scalable than `typing.Protocol`s, and more flexible than `abc.ABC`. There's a mypy plugin for it: https://github.com/Shoobx/mypy-zope

> The drawback is that code that changes the representation of its data a lot tends not to be fast code.

That's not a very convincing reason to avoid dataclasses except in the most performance-constrained environments -- and even then I'm doubtful it'd help. Especially with `slots=True`, dataclasses can take less resources.


(Hi, author of article here)

I'm glad you liked the article :) Interesting tip on zope.interface - hadn't heard of that and will be looking into it

> > The drawback is that code that changes the representation of its data a lot tends not to be fast code.

> That's not a very convincing reason to avoid dataclasses except in the most performance-constrained environments -- and even then I'm doubtful it'd help. Especially with `slots=True`, dataclasses can take less resources.

re this - slotted (data)classes are useful when you want to reduce peak memory usage but don't help with the problem of programs that instantiate (and teardown) objects excessively.

IME this is a common pitfall that slows down Python programs (often by a couple of orders of magnitude) and contributes to an unjustified perception that Python is "too slow". The "excessive changes in data representation" anti-pattern is slow in every language but because the syntactic/correctness overhead of creating a mass of objects is higher in (eg) C than Python the arises much less often. My worry is that as the syntactic burden of this is lowered even further it becomes even more common.

The problem is so severe in the Java community that it has had a huge influence on JVM design. I haven't kept up with every development in the JVM but last I was using it (maybe 6 years ago) it already had a huge number of techniques to reduce the burden of frequent object creation. CPython of course doesn't have any of these tricks.


> Especially with `slots=True`, dataclasses can take less resources.

`slots=True` doesn't exist, though there's an experimental implementation at https://github.com/ericvsmith/dataclasses/blob/master/datacl... .

The main issue is that slots have to exist at class definition, and @dataclass takes an already defined class, so you'd have to create a clone of the class, which is nasty.

On top of that you can't always add __slots__ manually, because it conflicts with default field values (and field()s). __slots__ wants to add a descriptor as a class attribute but it can't do that if the default value is already an attribute.


The GP is probably referring to the attrs library, which is another (older) way of creating dataclasses than the stdlib way. It has an easy way to create slotted dataclasses.

http://www.attrs.org/en/stable/examples.html#slots


I know performance is not an explicit goal of Mypy, but I’ve been curious whether or not this project could lead to something much faster in the future (compiled Mypy with Nim-like speed, something else?). Any thoughts?

Also, does anybody know of any benchmarks of typical programs in Mypy vs cPython?


The types are ignored at runtime; they’re annotations. I guess they could still theoretically have some impact, but it should be negligible.

Compiled mypy is a stated potential goal - but outside the scope of this project itself. Read more here: https://mypy.readthedocs.io/en/stable/faq.html#will-static-t...


There’s an experimental mypyc that can compile typed code to extensions, it’s not complete but it is used in mypy itself if I understand correctly.

More practically, I’ve found that it’s much easier to translate well typed code to cython. It still requires work, and ymmv.



"compiled Mypy with Nim-like speed" is Nim, essentially.


I think cython is already using type annotations to compile python code.


I just wish they'd get recursive types working

(I know they're working on it, but I seem to hit this on nearly every project, would be the number one missing feature ATM IMHO)


Zulip (https://zulip.com/) has been doing this for a long time.

See this interesting blog post on how it was done: https://blog.zulip.org/2016/10/13/static-types-in-python-oh-...


> Try not to lose sight of the fact that type checking is supposed to be an aid to correctness and not an intellectually satisfying end in itself.

Indeed, loved that closing sentence.


I recently discovered the Dropbox PyCharm plugin doesn't follow imports by default (dmypy doesn't support it apparently - there's github issues raised). Spent an unnecessary amount of time figuring out why TypedDict was raising mypy errors.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: