I'm surprised you're hitting that boundary at just 5-10K lines. I work on a team...

bsaul · on July 11, 2013

I guess it really depends on what you're building. Just to give you a taste here's the kind of thing i'm doing:

- tree-structured sqlalchemy managed objects comparaison , generating diffs, then applying diffs to those trees, and persist everything. I'm using sqlalchemy declarative approach.

That diff applying is performed in a celery background task, reusing my flask configuration.

So, in the worst case, i have to deal at the same time with : - Business logic on a bunch of SqlAlchemy ORM object ( declarative approach) - a Flask request context - a celery task context - and sqlalchemy session

At that point, i'm changing the signature of a function that takes pieces of those three parts to perform some business logic. Now i'm telling that the IDE (pyCharm, the best one) and python "compile phase" doesn't give you a CLUE on what you're doing.

You're dealing with so much "magic" that it becomes unmanageable. You don't need that many lines of code to reach that point.

EDIT : you've got 6 people working on 60 LOC spread on 8 discrete apps. That's about the same amount of isolated group of LOC per person than me (a person having to deal with a group of 5-10K LOC)

aidos · on July 12, 2013

I'm not sure the complexity breaks down that way. More people + more code means things need to be better organised because things might change under you without warning (or you may have to work with an unfamiliar part of the system).

How is your app structured? I'm guessing that Flask is the top level glue and everything else is scattered around the Flask app. That's the general approach (in most modern MVC frameworks) and I think it's also the root cause of complexity.

Celery and Flask (and sql alchemy too) should really be asides to the main codebase. The code should be layered and discrete libraries for handling different parts of the system. If you have 6k loc that all cross reference one another then you have problems in any language. Presumably there are a number of different components in there. Each should stand on its own with as simpler api as possible. As ever, too much coupling is going to make it impossible to reason about your code.

If you're about to change a signature for a function, it should already be fairly obvious as to where it is called from. If not, you need to ask yourself why. What is this function that's so fundamental to the system that it could be called by any module? Why is it buried in another module an being accessed from elsewhere?

My current app has about 4k loc in python and the same again in js (angular). It's broken into dozens of parts that I only connect where needed through a simple api.

At the core is a sort of image processing library (that itself contains lots of different components). On top of that is a system that works with the image processing. Above that another system that interacts with the data models, uses the system below and farms out processing to picloud (though could use celery). Finally, the Flask layer just provides a web interface to talk to the system that handles that business processing. I can tap into any of those layers to drive them. The point is that I can operate at a high level without needing to consider any details of deeper parts of the system.

These are the layers of abstraction that make a system understandable and stop it from being brittle.

bsaul · on July 12, 2013

The problem lies in the interfaces. Components, even when they are independents, expose interfaces. Those interfaces acts as a contract between the component and its users.

Python needs a way to make those interfaces automatically verifiable.

It's even worse once you start to use big libraries. If you're using top level functions, then maybe your IDE can help you, but as soon as you're dealing with magical properties or parameters, that becomes a mess.

Take for example the "desc" magical function in SQLAlchemy, on things like "order_by" on relationships. That's extremely useful and clever, but i'd really like python to give me some "ok, you're not doing things wrong" message as I'm typing.

Even better, once i enter a relationship declaration, it should give me a list of all the parameter i can use, along with the things thoses properties accept for value. This way i wouldn't have to check the documentation every time i'm writing one. It could also let me discover new things as i'm typing ("hey what's that property doing ? that looks interesting..."). Autocompletion is another way of discovering APIs.

But for that, you need type declarations.

EDIT : as for my api, it's really nothing fancy. It's structured in three big parts "admin / common / public". They each have their "model / business / service" layers, and each have their modules. Only the service layer is impacted by flask. I have some "utils" modules for very low-level stuffs (json serialization, etc). Flask configuration are used a bit everywhere, because i want my api to have only one configuration file. Nothing special, really.

boothead · on July 12, 2013

It's not a total solution but I find the zope.interface and zope.component libraries really good for this. I believe that zope.component provides some stuff to build unit tests that verify if and interface is correctly implemented/provided too.

aidos · on July 12, 2013

I definitely understand where you're coming from. On the discovery aspect, maybe that's just not something I notice because I work in ipython so it's interactive anyway.

boothead · on July 12, 2013

Not to come across as too much of a Haskell fanboy, but this is almost the exact sweet spot for Haskell: Typesafe operations over recursive structures. I'd say C# and F# would also be good choices. I completely understand and agree with your concerns about microsoft, but there always mono :-)

On a separate note: sqlalchemy is one of the best pieces of software I've ever used in any language!

tome · on July 12, 2013

Could you explain briefly what you like so much about SQLAlchemy? I've never been able to understand why it's so good.

boothead · on July 12, 2013

Wow, where to start :-)

Biggest benefit is probably the flexibility. SA allows every pattern from raw table access right up to complex object hierarchies mapping to joins or views. If you've already got an existing database structure it's invaluable to be able to hide the "implementation" in this way (I know that's not really the right term) and just present the API that better reflects the domain.

On the other hand it's also a really good tool for creating the tables yourself from the python table declarations.

I've used SA in both enterprise environments mapping really hairy old database schemas to greenfield web apps and I've never found anything that it can't do well. I'd go so far as to say If you're doing relational database stuff in python and you're not using SA - you're doing it wrong :-)

tome · on July 12, 2013

Great, thanks for the summary :) I'm not using Python but I'm writing an alternative HaskellDB API, so it's good to know the strengths of other systems.

boothead · on July 12, 2013

Would definitely like to hear more about that! I'd like to build the backend of whatever I build next in Haskell. Is it on github or accessable anywhere?

tome · on July 12, 2013

It's currently internal only, but hopefully I'll be able to release it to the public soon. I'm happy to correspond by email about it. How should I get in touch with you?

nostrademons · on July 12, 2013

What's your test coverage? It's usually not that hard to reach 100% if you have a decent coverage analyzer. Or even if you don't - one easy way is to never write a line of code unless you have a failing test that exercises it.

Changing a function signature is the kind of thing that tests can & should catch. You need them anyway to catch edge cases in the logic and document the code, and then once you've done that you usually get pretty good coverage for free.

bsaul · on July 12, 2013

My unit tests did catch it, but if you've ever (seriously) coded in a statically typed language, you really like to have those errors show almost as you type.

To give you an example on how i do small refactoring in objective-c :

I change the function (or class) signature i know i need to change. I compile, then xcode shows me every single line of code i need to change.

Most of time, it shows me places i didn't remember also used that function (i'm talking "utils"-like functions).

nostrademons · on July 12, 2013

There's nothing that prevents an IDE from running all the tests associated with a project and then highlighting the line numbers of any exceptions that get thrown. Infinitest will do this for Java in Eclipse and IntelliJ, though I don't think it works with Python.

http://infinitest.github.io/

Python has an unfortunate culture of "just an editor, please", so the main IDEs for it are light-years behind Java IDEs, but you could probably easily whip up a vimscript or .el that does this and highlights the line in the editor.

zcrar70 · on July 12, 2013

"whip up a vimscript"...

runT1ME · on July 11, 2013

It's also possible you have a 60k loc python app because you aren't able to get near the amount of code reuse or abstractions due to the lack of static typing, and this is why you haven't run into issues with refactoring.

Buttons840 · on July 11, 2013

If Wikipedia is to be believed: "Dynamic typing typically allows duck typing, which enables easier code reuse."

I would expect to see more code reuse in Python than in any statically typed language.

coolsunglasses · on July 12, 2013

It's a true statement if you qualify it properly.

It wins you more code reusability with less code written to accomplish aforementioned reusability.

I always run into silly inheritance chains, type coercion or adapter fns/methods to handle reusing code for unlike-types, rather than relying on a common base of functionality regardless of where it was derived from (function API, inheritance, mixin, etc).

Best way I've seen this done is in Clojure. Good mix of the best of both worlds in terms of static and dynamic typing, multimethods, protocols, essentially structurally typed arguments, etc.

spamizbad · on July 11, 2013

If that was the case our codebase would be a serious pain to work on and I'd be much worse off than the OP.

Also I should clarify, 60KLS is not all 1 app: it's spread across 8 discrete apps, an API, and data layer (models).