For those looking to validate dictionaries / JSON responses in Python, the voluptuous library works quite well: http://github.com/alecthomas/voluptuous. It also works for lists and other data types.
I use a fork/successor library, good[0], for configuration validation. I've especially liked the data transformation it can do (I can easily allow a configuration entry to be a single value, or list of values, and transform that to always be a list)
Recently I had a similar wrote a library that creates (and dumps) your typed NamedTuples, datetimes and similar objects from simple JSON, using type annotations:
"JSON support for named tuples, datetime and other objects, preventing ambiguity via type annotations"
I've recently been using attrs as an easy way to make simple datatypes, but its only gesture towards validation is an arbitrary callback per field. Hooking into Python 3 type annotations is a great idea!
Does/will Pydantic handle all the standard dunder fields like __eq__, __lt__, __hash__, __cmp__ and faux-immutability like namedtuple and attrs do?
"faux-immutability" is a good way to describe it. My problem is the "faux"; proper immutability is virtually impossible in python and I'm not convinced about providing partial immutability and thereby giving people a false sense of security. That said there's an issue about it: https://github.com/samuelcolvin/pydantic/issues/38. I'll consider it if we can find a performant and elegant way of using it.
__eq__ makes sense, I'll do it when I get round to it
__hash__ would be nice but far from simple to do in a performant way.
In the small, it's just using attrs the way it's described in its documentation.
In the large, I've been learning about Rust recently and wrapping my head around the design-patterns of static typing. For internal data-structures the benefit is not as clear, but for serialising and deserialising external data (like from config files or JSON APIs) I really prefer having specific, named types instead of a generic bucket of dicts.
API documentation can be more concise. You can say "this argument must be an instance of BuildArtifact" rather than "this argument must be a dict with an 'href' key whose value is the URL to a build artifact and a 'hash' key whose value is the SHA256 of that artifact" in every relevant API.
Debugging is easier when inspecting a variable starts with "<BuildArtifact ...>" rather than just dumping a dict at you.
If you need to operate on a particular kind of data, a named class gives you an obvious place to hang a method, instead of having a loose function rattling about. For operations between two data-types (like 'merge' or 'intersection'), a loose function might still be the most appropriate, but operations like searching or summarizing are naturally methods.
Marshmallow is still the best lib in town. Indeed, most of the libs fall short when you start to use them in the real world. Where validation is more than a type, when fields are dependants on each others, when data is generated on the fly post validation and where you need all that to cascade down your nested, sometime recursive, data structure validation, which then should produce equally complex error messages.
Pydantic is not just a toy, I built it having used and abused numerous other libraries and found them wanting in one way or another.
Because it reuses python's typing system it should have the most pythonic and flexible description of types possible.
I agree about the need for complex validation chains relating to numerous fields, that's already partially possible with pydantic (although not documented). I'll add support for this stuff as well as documentation over the next few weeks.
Having used both colander and marshmallow extensively - I prefer colander mainly because it has first class explicit handling of null, missing and required values and it's support of nested and inheritance is also much nicer than marshmallow.
I've recently been using good[0] , which also allows for minor data transformation. Looking at Marshmallow, it doesn't seem like it allows inline declaration of nested schemas - each level of the schema needs its own data type.
I agree. Marshmallow is the best game in town and I love using it. That said its not perfect and I'm eagerly awaiting a successor. I use it frequently and have run into a lot of cruft.
Sure, my complaints are mostly around error handling. My comments might not be accurate since its off the top of my head.
When a field raises an error I've noticed the value gets replaced with a "None" type (or maybe its removed from the passed data object) when using @validates_schema. This is annoying because I have to pass the original data and see if the value is actually null or not. This can suck when using JSONAPI because you have to be super careful with your data extraction (eg. data.get('data', {}).get('relationships', {})...etc).
I would like more control over how validation is executed. The ordering of validation, the ability to stop validation or continue validation at arbitrary points. Maybe in my validator I could do something like "raise ValidationError(msg, stop_validation=True).
I would like more control over pre and post dump/load order. Sort of like a z-index in css. So when using marshmallow-jsonapi, I can specify pre_loads that access the data before and after the jsonapi pre_load formatting.
I would like a better way of using "class Meta". Right now its annoying to inherit from a base then define additional "class Meta". I think I ended up subclassing SchemaOpts and settings my defaults that way on my base schema.
I would like a way to replace error messages so instead of "Missing data for required field", it would say "Please specify data" (or whatever). I want to define this at the schema level too. I don't want to have to constantly define fields with the same behavior everywhere. It would be cool if Marshmallow had a dictionary of error codes and messages. So it would look like {'1': 'Missing data for required field.'} and I could override that error with "errors['1'] = msg".
If I had to sum it up. I like to write small functions and classes that have limited uses. A lot of times I feel like Marshmallow pushes me into more monolithic work so I can control the flow.
I wish the people who designed APIs would constrain themselves to libraries that could be validated as types. I rarely see insane data structures for APIs built in static languages, and I say this because my company has some absurd APIs which would not be so bad if the developers were constrained a bit.