Push ifs up and fors down

crabmusket · on Nov 16, 2023

I was initially surprised by the pushback this article is getting.

Then I remembered that this is data-oriented design advice, and I imagine most people on this forum (myself included most of the time) are writing line-of-business web apps where this advice seems like nonsense. I had already internalised the context, and wasn't planning to go apply this to my Laravel backend code.

A heuristic: if in your usual daily work you don't need to think about the instruction cache, then you should probably ignore this advice.

If you haven't yet and want to get a taste of when this advice matters, go find Mike Acton's "Typical C++ Bullshit" and decipher the cryptic notes. This article is like an understandable distillation of that.

Despite what Casey Muratori is trying to argue (and I'm largely sympathetic to his efforts) most line-of-business software needs to optimise for changeability and correctness ("programming over time") not performance.

berniedurfee · on Nov 16, 2023

I think this is still very much applicable in OOP.

Developers tend to break complex business logic within classes down into smaller private methods to keep the code DRY. The “push ifs up” mantra is really useful here to ensure the branching doesn’t end up distributed amongst those methods.

The “push fors down” is also very relevant when most call threads end up at an expensive database query. It’s common to see upstream for loops that end up making many DB calls downstream somewhere when the looping could have been replaced by a “where” clause or “join” in the SQL.

In fact, the “push fors down” mantra is useful even at the architecture level, as it’s usually better to push looping logic for doing aggregations or filtering into a DAO where it can be optimized close to the DB, rather than pulling a bunch of objects out of the database and looping through them.

I love simple and clear design principles like this!

Though, as with all design principles, one needs to consider it deliberately vs applying it as dogma!

tetha · on Nov 16, 2023

I've been experimenting with a similar idea to "pushing ifs up" in OOP, too: Question ifs in the code of a class and just use more classes.

For example - python terminology, as it came from there - don't have a class "BackupStatus", and have the BackupStatus use 2-3 Booleans to figure out how to compute a duration, a status message and such. Instead, have a protocol "BackupStatus" to declare what a BackupStatus has to do, and figure these out for a FailedBackup, a PartialBackup, a SuccessfulBackup and so on.

It's different from what I did for a long time, and some people are weirded out by it at first. But at the same time, it allows me to create a bunch of small, complete and "obviously correct" classes. Or at the least, very concrete cases to reason about with somewhat technical stake holders. And once your backup status classes / enum members / whatever are just correct, then you can go and discuss when to create and use which of states.

It's not for everything naturally, but it quickly gives a solid foundation to build more complex things upon.

cwillu · on Nov 18, 2023

It does enable proxy shenanigans though, which is something to be mindful of:

I can be reasonably certain of what a reasonably complicated piece of logic is doing/capable of doing.

Comprehending a function using an object implementing a protocol, on the other hand, will end up requiring me to cross reference logic across 4 or more files before I can feel confident knowing what to expect: the protocol definition and/or the basic implementation, the calling context, the factory context that was responsible for creating the object, and the actual definition of the current object. All of which may be trivial, but you need to actually look at them all.

It's the difference between a breaker box and a house of light switches: the switches are all trivial (you hope), but you still hit the breaker when you need to be sure that you are going to be surprised.

tetha · on Nov 18, 2023

Yeah. I'm kinda thinking about writing a blog post about that project, because I kinda like how it turns out.

Imo, the method I described to me is mostly reasonable to implement closed and fairly complete domain objects. In this case, understanding the protocol is more akin to understanding the concept from the problem domain, and the implementing classes are different cases or states the thing can be in. In your image, I'd compare the main thing I use this for as fuses. Simple, encapsulated and fulfilling a purpose - just implemented in different ways depending on the context.

On the other hand, if you're dealing with implementing a flow chart, I've grown to very much like a decently structured imperative function - supported by these smaller objects.

GuestHNUser · on Nov 16, 2023

Yeah, data-oriented design (DOD) always seems to get people riled up. I think it largely gets this type of reaction b/c DOD implies that many aspects of the dominant object-oriented approach are wrong.

> most line-of-business software needs to optimise for changeability and correctness, not performance.

It's a shame that so many see changeability and performance in opposition with each other. I've yet to find compelling evidence that such is the case.

rkangel · on Nov 16, 2023

> It's a shame that so many see changeability and performance in opposition with each other. I've yet to find compelling evidence that such is the case.

I think agree with some of the sibling comments that you can write in a default style that is both maintainable and performant. Regularly though, when profiling and optimising, the improvement can reduce maintainability.

Two examples from the last 6 months:

A key routine is now implemented in assembly for our embedded target because we couldn't reliably make the compiler generate the right code. We now have to maintain that in parallel with the C code implementation we use on every other target.

We had a clean layering between two components. We have had to "dirty" that layering a bit to give the lower layer the information needed to run fast in all cases.

With thought you can often minimise or eliminate the impact, but sometimes you take the trade-off.

wizzwizz4 · on Nov 16, 2023

If you had good tooling, you could annotate the assembly routine as being equivalent to the C implementation; then maintenance would be a lot easier, since the compiler would yell at you if they ever diverged.

Depending on how you've dirtied your abstractions, language tooling might help with that (e.g. if you could decouple the abstraction from the implementation, and map multiple abstractions to the same data), but I don't know whether that could work, or if it'd even be an improvement if it did.

Lvl999Noob · on Nov 16, 2023

> If you had good tooling, you could annotate the assembly routine as being equivalent to the C implementation; then maintenance would be a lot easier, since the compiler would yell at you if they ever diverged.

Is that something actual compilers can do right now? I don't think I have heard of something like though, though I don't work that close with compilers much. Furthermore, if the compiler can recognise that the assembly routine is equivalent to the C implementation, wouldn't it also be able to generate the same routine?

PhilipRoman · on Nov 16, 2023

>Is that something actual compilers can do right now?

it is not (outside of research)

>if the compiler can recognise that the assembly routine is equivalent to the C implementation, wouldn't it also be able to generate the same routine?

If you're willing to provide the equivalence proof yourself, it is much easier to verify such a proof than to produce one. The more work you're willing to put in when writing it, the simpler the proof verifier becomes. You could probably write a basic school arithmetic proof verifier within an hour or so (but it would not be pleasant to use).

I've thought about this idea a lot myself and I think it should be feasible, (maybe not with assembly right away). You could write an easily readable function in C for code review/verification purposes and then specify a list of transformations. Each transformation must preserve the semantics of the original function. For example "unroll this loop by a factor of 4", "Fuse these 4 additions in a single vector instruction". It would be a pain to write such a transformation list (AI assistance maybe?) but once written you can rely on the compiler to make sure there are no mistakes.

zaptheimpaler · on Nov 18, 2023

I guess you could run roughly the same set of unit tests on both implementations even if you can't formally prove equivalence. The unit tests would likely need to be adapted a little for each implementation. Of course, you then need to prove that the two sets of unit tests are equivalent :)

crabmusket · on Nov 16, 2023

Well it's hard to argue about that tradeoff in general, but I think the existence of languages like Python, Ruby and PHP is compelling. Though I'd accept the argument that they help optimise for neither performance nor changeability!

My perspective is necessarily limited, but I often see optimisation as a case of "vertical integration" and changeability as about "horizonal integration".

To make something fast, you can dig all the way down through all the layers and do the exact piece of work that is required with the minimum of useless faffing about for the CPU[0]. But to make something robust, you might want to e.g. validate all your inputs at each layer since you don't know who's going to call your method or service.

Regarding the DOD/OOP wars, I really love this article, which argues that OOP doesn't have to be bad[1]. I also think that when performance is a requirement, you just have to get more particular about your use of OOP. For example, the difference between Mesh and InstancedMesh[2] in THREE.js. Both are OOP, but have very different performance implications.

[0] Casey Muratori's "simple code, high performance" video is an epic example of this. When the work he needed to do was "this many specific floating point operations", it was so cool to see him strip away all the useless layers and do almost exactly and only those ops.

[1] https://www.gamedev.net/blogs/entry/2265481-oop-is-dead-long...

[2] https://threejs.org/docs/index.html?q=instanc#api/en/objects...

GuestHNUser · on Nov 16, 2023

Thanks for the thoughtful reply.

> Well it's hard to argue about that tradeoff in general, but I think the existence of languages like Python, Ruby and PHP is compelling. Though I'd accept the argument that they help optimise for neither performance nor changeability!

I see the point you're making, and I don't disagree with it. But I should be a bit more clear in what I really meant in my parent comment:

Given whatever language a developer is working in, whether it is fast like C++ or a slow language like Python (or perhaps even Minecraft Redstone), I think the programmer that takes a data oriented approach (meaning they write their code thinking about the kinds of data the program can receive and the kind it will likely receive, along with what operations will be the most expensive) will have better code than a programmer that makes a nice object model following all the SOLID principles. The majority of the OOP code I've worked with spends too much time caring about abstractions and encapsulation that performance is lost and the code is no better to work with.

> Regarding the DOD/OOP wars, I really love this article, which argues that OOP doesn't have to be bad[1]. I also think that when performance is a requirement, you just have to get more particular about your use of OOP. For example, the difference between Mesh and InstancedMesh[2] in THREE.js. Both are OOP, but have very different performance implications.

Absolutely agree here. Classic gamedev.net article. ECS != DOD and I think the next parts of the article illustrate how DOD isn't necessarily in opposition with programming paradigms like OOP and FP.

With that said, I think it can be argued that common patterns within both OOP and FP circles are a hurdle at times to utilizing hardware to its fullest. Here's Casey Muratori's argument against SOLID principles[0] for instance.

---------------------

I think the point still stands: performance isn't in opposition to making a maintainable/changeable program.

[0] https://youtu.be/tD5NrevFtbU?si=sED7befZRnVnQIxQ

Jach · on Nov 16, 2023

A missing element to the conversation is another interpretation of DOD, that is, domain-oriented design. My favorite writing on the matter is "Programming as if the Domain (and Performance) Mattered" (https://drive.google.com/file/d/0B59Tysg-nEQZSXRqVjJmQjZyVXc...)

When OOP has bad performance, or is otherwise another instance of ball of mud architecture, it often stems from not modeling the domain correctly. Only using or being forced to use an inferior conception of OOP (i.e. OOP as merely fancy structs + function pointers) doesn't help either, and ends up encouraging the sorts of patterns found in GOF even preemptively before they can maybe earn their keep. And what's especially insidious about Kingdom of Nouns style OOP is that just because you have named something, or created a type hierarchy, does not actually make it a particularly good model of anything. If you interview enough people, you'll find some thinking it's entirely reasonable to do the equivalent of making a Car a subclass of a Garage just because garages contain cars. When bad modeling infects production code, it's difficult to remove, especially when it's so overgrown that a bunch of other code is created not to fix the modeling but to try and wrangle a bit of sanity into localized corners (often at the further expense of performance -- frequently from a lot of data copying and revalidating).

On the other hand, modeling things too close to the hardware, or (as the linked paper goes through) too close to one ideal of functional programming purity (where you want to just express everything with map and reduce on sequences of numbers if you can, in theory giving high performance too), can severely get in the way of changing things later, because again you're not actually modeling the domain very correctly, just implementing one clever and specific mapping to raw numbers. When the domain changes, if your mapping was too coupled to the hardware, or too coupled to a nice map/reduce scheme, the change throws a big wrench in things. Sometimes, of course, such direct hardware and code exploitation is necessary, but we live in a fat world of excess and most business software isn't so constrained. An interesting example from the past could be Wolfenstein 3D, where its doors and especially its secret push walls are handled as special cases by the raycaster. Carmack initially refused to add push wall functionality to his engine, it seemed like it would be too big of a hack and possibly hinder performance too much. Big special cases as ifs down in the fors usually suck. After much needling and thinking though, and realizing the game really did need such a feature, he eventually got it done, surprising everyone when at a final attempt at convincing him to add them, he revealed he already did it.

hinkley · on Nov 19, 2023

> But to make something robust, you might want to e.g. validate all your inputs at each layer since you don't know who's going to call your method or service.

If you actually have layers, then repeated guarding is not necessary. The guards are necessary when the same function appears at multiple levels of abstraction - when calls reach across multiple layers of even back up the hierarchy.

In Angular you had services. Everything was meant to talk to those to get to the data, and I think a lot of people misunderstood the value this provides. If you want user data to look a certain way to the front end, you can do cleanup at insertion time (which can be spread out over years when you didn’t know it was a problem) or at read time. Ideally you want this in the backend, but it can be a challenge because you don’t know what you need until you need it, and expecting the backend people to drop everything to tackle your latest epiphany RFN is unreasonable.

But you can make the Service work the way you would like the backend to work, and make a pitch for that logic to move into the backend. It’s generally self explanatory code, without a lot of tendrils elsewhere. Airlifting it across process boundaries is far easier than surgically removing a set of assumptions and decisions smeared across the code.

“Beyond this point, all foo have bar, and you don’t need to worry about it.” This takes architectural discipline but the dividends are great.

gpderetta · on Nov 16, 2023

There is a point where performance optimizations get in the way of clarity, but that's after a long plateau where simple software == fast software. And simple software is the most amenable to changeability. It might not be the fastest way to initially write the software though, as leveraging existing large complex frameworks can give an head start to someone familiar with it.

gnuvince · on Nov 16, 2023

Somewhere, somewhen, we, as software developers, started thinking that other programmers would rather extend code rather than modify it. This has led us to write code that tries to predict the use cases of future programmers and to pre-emptively include mechanisms for them to use or extend our code. And because it has seeped so deeply into our culture, if we don't do this -- engage in this theater -- we get called out for not being good software engineers.

Of course, the extra hooks we put in to allow re-use and extensibility usually results in code that is slower and more complex than the simple thing. Worse, very often, when a customer needs a new feature, the current extension hooks did not predict this use case and are useless, and so the code has to be modified anyway, but now it's made 10x more difficult because of the extra complexity and because we feel that we have to respect the original design and not rip out all the complexity.

I like Knuth's quote [1] on this subject:

> I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.

[1] https://blog.jj5.net/blog/2021/05/21/knuth-on-reusable-code/

mattacular · on Nov 16, 2023

Writing generally "reusable code", aka a library, warrants a different approach to software development than application code in many areas.

1. Application code = Fast-changing, poorly specified code. You need to have a rapid development cycle that supports "discovering" what the customer wants along the way. Your #1 job is pleasing the customer, as quickly, and as reliably, as possible.

2. Library code = Slow-changing, highly specified code. You have a long, conservative development cycle. Your #1 job is supporting application programmers (the customers of your library).

mschuster91 · on Nov 16, 2023

> Somewhere, somewhen, we, as software developers, started thinking that other programmers would rather extend code rather than modify it.

That was when stuff like "proper testing" was deemed to be too expensive. It's unlikely to break existing workflows with extending something, but very easy to do so during a modification.

Companies used to have hordes of manual testers/QA staff, that all got replaced by automated tools of questionable utility and capability.

wizzwizz4 · on Nov 16, 2023

The tools are very useful, and they have well-known capability. That capability is strictly less than the capability of most manual testers / QA staff, but it's a lot faster at it, and gets much closer to being exhaustive.

Automation should mean you can do a better job, more efficiently, more easily. Unfortunately, ever since the Industrial Revolution, it seems to mean you can do a quicker job with less money spent on labour costs.

mschuster91 · on Nov 16, 2023

> That capability is strictly less than the capability of most manual testers / QA staff, but it's a lot faster at it, and gets much closer to being exhaustive.

That's if you put the effort in to write good tests. When I look at the state of gaming in general, it's ... pretty obvious that this hasn't worked out. Or the GTA Online JSON debacle - I'm dead sure that this was known internally for a long time, but no one dared to modify it.

And even then: an automated test can't spot other issues unrelated to the test that a human would spot immediately. Say, a CSS bug causes the logo to be displayed in grayscale. The developer who has accidentally placed the filter on all img elements writes a testcase that checks if an img element in content is rendered in greyscale, the tests pass, the branch gets merged without further human review... and boom.

wizzwizz4 · on Nov 16, 2023

We do actually have automated testing tools for that. https://bbc.github.io/wraith/ is one: see the write-up https://responsivenews.co.uk/post/56884056177/wraith or a contemporary interview https://source.opennews.org/articles/responsive-css-testing-... .

I get your overall point, though.

brabel · on Nov 16, 2023

Simplest is definitely not more amenable to changes. We've just implemented a large feature where some devs tried to "hardcode" all the logic to apply a kind of rules-engine. I was horrified because the whole thing was coupled to the rules we currently needed, but we all know this is just the beginning and we plan to add more rules , and even allow custom rules to be defined by our customers. So, even though what they were trying to do is often lauded on HN and other forums because it's applying KISS and YAGNI, in this case adding a new rule would mean, basically, changing everything - the engine, the data that the engine persists, potentially the end result... everything! Now, perhaps this was indeed simpler though. However, it's the opposite of modifiable (and by the way, implementing it with abstract rules which store their own data, which the engine needs not know about, is actually much cleaner and the de-coupling just comes for free, almost).

funcDropShadow · on Nov 16, 2023

This doesn't sound like a simple solution from your fellow devs. It appears to have been an easy solution. If the distinction isn't familiar to you, there is a great talk of Rich Hickey [0], that explains that distinction and some more. The talk is definitely not only applicable to Clojure.

YAGNI is a great slogan, but must not be a dogma. If you already know, that you are going to need custom rules. Then prepare for it. But, if --- for example --- the current rules are almost trivial, and you don't know yet, which rule engine will be the right fit later on. Then it might be a good idea to postpone the decision about the rule engine, until you know more. In the meantime a hard-coded solution could be enough.

[0]: https://www.youtube.com/watch?v=LKtk3HCgTa8

brabel · on Nov 16, 2023

I know that talk very well. And I still disagree with you that hardcoding the rules is not simple, just easy. It's always going to be simpler to implement code that's more specific (i.e. has less room for the unknown) than less. Or do you think there is anything Rich said that shows that not to be always true?

funcDropShadow · on Nov 17, 2023

Rich didn't say it, if I remember correctly. But there are problems where a more general algorithm is simpler than a more specific straight-forward algorithm. Usually because you change the way you model the problem.

Otherwise, I have to take your word for it, because I cannot see your specific example.

bluefirebrand · on Nov 16, 2023

YAGNI definitely doesn't apply in cases where you do actually know that you are gonna need it.

If the task is "build a rules engine with X, Y, and Z rules, that can add A, B, and C rules next year" then delivering hardcoded XYZ rules engine is an absolute failure to engineer and a very braindead use of "YAGNI"

RHSeeger · on Nov 16, 2023

> It's a shame that so many see changeability and performance in opposition with each other. I've yet to find compelling evidence that such is the case.

In the article, when I saw this

> For f, it’s much easier to notice a dead branch than for a combination of g and h!

My first thought was "yes, but now if anyone _else_ calls h or g, the checks never happen (because they live in f). I'd much rather have h and g check what _they_ need in order to run correctly. That way, if another call to one of them is added, we no longer need to rely on _that_ call correctly checking the conditions. Plus it avoids duplication.

But... and this goes back to the original point from your post... this is a matter of code being correct over time; changeability. If you're worried about performance, then having the same check in 2 different places is a problem. If you're not (less) worried, then having the code less likely to break later as changes are made is helpful.

Gadiguibou · on Nov 16, 2023

You can also often encode those checks in the type system like in the Option example near the start of the article.

RHSeeger · on Nov 16, 2023

Assuming your type system is robust enough to support such things. That's not the case for many mainstream languages.

sfink · on Nov 17, 2023

In that case, use DEBUG-only asserts. They're not foolproof (they're only as good as your test suite), but they're pretty damn good.

rcxdude · on Nov 16, 2023

They don't need to be in opposition: it's enough that the changeability, correctness, and performance of solutions is uncorrelated for you to frequently need to tradeoff between them, especially given the inevitable tradeoff of all of these against time to write.

Gravityloss · on Nov 16, 2023

Yeah. Is there an article showing how a clever one liner that is hard to read and an "inefficient looking" but easy to understand multiline explanation style code with proper variable names etc will result in the same compiled code?

I would assume compilers would be sufficiently advanced nowadays...

GuB-42 · on Nov 16, 2023

> most line-of-business software needs to optimise for changeability and correctness ("programming over time") not performance

These are not mutually exclusive, in fact, more often than not, they are correlated.

Maybe the most important aspect of performance is to make things small. Small code, small data structures, small number of executed instructions. Writing small code is what "thinking about the instruction cache" essentially is, btw.

And as it turns out, the smaller the code, the less room there is for bugs, the more you can understand at once, and the easier it is to get good coverage, good for correctness. As for changeability, the smaller the code, the smaller the changes. The same applies to data.

Now, some optimization techniques can make the code more complicated, for example parallelization, caching, some low level optimization, etc... but these only represent a fraction of what optimizing for performance is. And no serious performance conscious programmer will do that without proper profiling/analysis.

Then there are things that make the code faster with limited impact (positive and negative), and this is what the article is about. Functionally, if/for is not really different from for/if, but one is faster than the other, so why pick the slow one? And even if the compiler optimizes that for you, why rely on the compiler if you can do it properly at no cost. Just like looping over 2D arrays, it is good to know that there are two ways of doing it, and while they look equivalent, one is fast and one is slow, so that you don't pick the slow one by accident.

Obscurity4340 · on Nov 16, 2023

Is for/if faster since loops get started right away where ifs need to check conditionals constantly on top of whatever action they are supposed to execute?

tremon · on Nov 16, 2023

Loops are fastest when they fit in the processor's instruction cache (and preferably, only touch data that fits in the data cache). Similarly, code is fastest when it has to execute the least amount of instructions. In the first example, the walrus(Option) function is designed to be executed unconditionally, only to return early when there is no walrus. That's an unnecessary function call that can be removed by changing the method signature (in Rust, because it has non-nullable types. In other languages you would need to do the non-null check anyway for safety reasons).

loup-vaillant · on Nov 16, 2023

> Despite what Casey Muratori is trying to argue (and I'm largely sympathetic to his efforts) most line-of-business software needs to optimise for changeability and correctness ("programming over time") not performance.

Then consider listening to John Ousterhout instead: turns out that in practice, changeability and correctness are not nearly at odds with performance than we might initially think. The reason being, simpler programs also tend to run faster on less memory. Because in practice, simpler programs have shorter call stacks and avoid convoluted (and often costly) abstractions. Sure, top notch performance will complicate your program. But true simplicity will deliver reasonable performance most of the time.

By the way, while pushing fors down is mostly a data oriented advice, pushing ifs up is more about making your program simpler. Or more precisely, increasing its source code locality: https://loup-vaillant.fr/articles/source-of-readability that’s what concentrating all the branching logic in one place is all about.

simiones · on Nov 16, 2023

Honestly, both of these I think are pretty applicable to line-of-business apps as well.

The "push loops down" advice most especially: for any CRUD app, handling creation and updates in bulk when possible will typically save huge amounts of time, much more than in CPU bound use cases. The difference between doing `items.map(insertToDb/postToServer) ` vs doing `insertToDb/postToServer(items)` is going to be orders of magnitude in almost all cases.

I have seen optimizations of this kind take operations from seconds or minutes down to milliseconds. And often the APIs end up cleaner and logs are are much easier to read.

atherton33 · on Nov 16, 2023

Great point. The n+1 sql queries antipattern comes to mind as a very common "push loops down" application in line of business stuff. Let the db do the work/join.

crabmusket · on Nov 16, 2023

Oh definitely. I've seen the same kind of benefits. Avoiding 1+n queries is a similar change - push the loop "down" into the database.

bryanrasmussen · on Nov 16, 2023

maybe - although most of the time, especially last few years, I write web apps and I push ifs up all the time because it allows for early return like the article says.

Determining that you need to do nothing should be done as soon as possible, especially in any system where performance is essential and web apps make more money the better their performance as a general rule.

valand · on Nov 16, 2023

Not even strictly DOD, this trivial principle produces better semantics and drives better code separation down the line.

Several years ago I was exposed to DOD (and then this principle) when working on complex JS/TS-based for long-running systems. It results in better code navigability, accurate semantic synthesis, and easier subsequent refactors.

The side effect: some people remarked that the code looked like C

gnuvince · on Nov 16, 2023

> The side effect: some people remarked that the code looked like C

Did you understand this comment to be positive or negative?

valand · on Nov 17, 2023

It's neutral. However, a brief negative implication: it's less "idiomatic", as in the pattern that's commonly advertised as best practices

fl0ki · on Nov 16, 2023

I almost agree except that I may have in mind a different kind of engineer than you do: people who don't think about the instruction cache even though they are writing performance-critical code. This post won't change that, but I hope that people writing about performance know that audience exists, and I dearly hope that people in this situation recognize it and strive to learn more.

At a large enough scale, even line of business code can become a bottleneck for real-world business activity. And unsurprisingly, engineers who don't think about performance have a way of making small scales feel like large ones because they use architectures, algorithms, and data structures that don't scale.

This happens even in the FAANG companies famed for their interviewing and engineering. I've seen outages last for hours longer than they should have because some critical algorithm took hours to run after a change in inputs, all the while costing millions because one or more user-facing revenue-critical services couldn't run until the algorithm finishes (global control, security, quota, etc. systems can all work like this by design and if you're lucky the tenth postmortem will acknowledge this isn't ideal).

I've had to inherit and rework enough of these projects that I can definitively say the original developers weren't thinking about performance even though they knew their scale. And when I did inherit them and have to understand them well enough to make them as fast as they needed to be, some were at least written clearly enough that it was a joy, and others were a tangled mess (ironically, in some cases, "for performance" but ineffectively).

See also: the evergreen myth that "you don't need to learn algorithms and data structures, just learn to find them online" resulting in bullshit like a correct but exponential algorithm being put into production when a simple linear one would have worked if the engineer knew more about algorithms.

There's so much wrong with how many people do performance engineering that I don't think pushing fors down is even in the top 10 tips I would give, I just think that folks posting and commenting in this space recognize how large and impactful this section of the audience is.

mdavidn · on Nov 16, 2023

I have practiced this advice in line-of-business software to great effect. Popular ORM libraries operate on individual records, but that is quite slow when software needs to import a large number of objects. Restructuring the code to operate on batches—push fors down—results in significantly fewer database queries and network round-trips. The batched routines can resolve lookups of foreign keys in batches, for example, or insert multiple records at once.

Granted, that is more difficult to maintain. I only batched the import routines. The rest still uses the more maintainable ORM.

thesnide · on Nov 16, 2023

There are indeed 2 religions. Data oriented or Code oriented.

Both are making very different choices as they have diverging trade offs.

I wrote something along the lines. Not the same, but very similar given enough abstraction

https://blog.pwkf.org/2022/08/07/two-programming-religions-c...

throwaway2037 · on Nov 17, 2023

    Despite what Casey Muratori is trying to argue (and I'm largely sympathetic to his efforts) most line-of-business software needs to optimise for changeability and correctness ("programming over time") not performance.

In investment banking (markets / trading), much of the most economically valuable software could easily run on a Raspberry Pi. I always call it "10,000 if-thens" (all the business rules that build up over 10+ years). Excel with VBA can still do a lot. A very tiny fraction of markets / trading software needs to be performant, yet, those tiny parts dominate most of the conversations (here and at work). It is tiring. Keep writing simple if-thens with a few for loops. Keep making money...

schwartzworld · on Nov 18, 2023

Pushing ifs up is one of my most common code review feedback on frontend prs. I think it's very applicable to standard web apps, especially on large teams where people are reticent to refactor.

sdfghswe · on Nov 17, 2023

I work in a domain where performance is very important, and I'm surprised that such a simple heuristic can be so good.

Tainnor · on Nov 16, 2023

The more experience I get, the more I feel that too many programmers worry about making things pretty "in the small", but not enough care about proper design of an entire codebase.

I'll admit that I like a concise and well-crafted function just as much as many others, but articles like this one are probably the things that can lead to the kind of unproductive bikeshedding that is sometimes experienced in PRs and other discussions. I don't care that much about whether your function is messy - or about where you put your ifs and fors (or if you use maps and filters instead), as long as the function is properly named, has a good interface (including expressive types), a clear purpose, is properly documented, doesn't make excessive use of side effects etc.

Leo_Germond · on Nov 16, 2023

The advice about if up is not bikeshedding though, it is the exact kind of architectural choice you're saying one should decide on. Don't believe me ? Well imagine you have inputs, where should you validate them ? According to this rule of thumb it's at the topmost level, when they are received. Well that seems super sensible, and it's typically something that helps with understanding the code (rather than checking them at thw very last moment). Also for proofs that's technically necessary to allow the preconditions to "percolate up", which has the same effect of moving the if up.

So the first advice is definitely not bike shedding, the second one I'm not so clear though ;)

mpweiher · on Nov 16, 2023

> it is the exact kind of architectural choice you're saying one should decide on.

While I can't speak to what the OP had in mind, architectural concerns are definitely not inside a function. Even connecting individual functions/procedures barely registers at the smallest, most granular scale of architecture.

Of course, our programming languages for the most part don't let us express architectural concerns...so here we are.

Tainnor · on Nov 16, 2023

Yes, this is what I was talking about. Debating which function should include an "if" condition seems rather immaterial to me.

Tainnor · on Nov 16, 2023

But the article wasn't about "validate inputs at the boundaries" (which is a semantic concern), it was about "push your ifs up" (which is a syntactic concern).

I agree that in the provided example, those two seem to somewhat coincide (although it's hard to say, given that the author makes an effort to give their functions names such as "frobnicate" that don't indicate their purpose), but in the general case that doesn't have to be true.

mike_hock · on Nov 16, 2023

Are we reading the same article? There is zero concern for syntax.

The first point is literally what you said. Write functions with clear interfaces. If your function doesn't handle None, make its argument type reflect that (Walrus instead of Option<Walrus>).

The second point is about performance. Hoist branches out of loops, and process batches so any constant cost is amortized over the batch. Is that even controversial?

> the author makes an effort to give their functions names such as "frobnicate"

Yes, and that's good, because it keeps the focus on his actual points rather than the minutiae of a contrived example.

Tainnor · on Nov 16, 2023

> Are we reading the same article? There is zero concern for syntax.

By "syntactical" I mean "concerned with the surface level appearance of low-level code structure".

> Yes, and that's good, because it keeps the focus on his actual points rather than the minutiae of a contrived example.

Only if you think that guidelines about how to write code should be devoid of the context of the problem you're trying to solve. I happen to think that this is mostly wrong.

mike_hock · on Nov 16, 2023

Yes, I think it's possible to have discussions on an abstract level and not everything has to be a case study.

pjc50 · on Nov 16, 2023

"If" is semantic!

Tainnor · on Nov 16, 2023

Its location in the codebase isn't.

patmorgan23 · on Nov 16, 2023

If it changes the behavior of the program it is.

Tainnor · on Nov 16, 2023

Well, let's consider the three examples the author gives for "pushing ifs up":

The first one isn't even a proper example because they write "push the if to the caller", but doesn't actually show the caller. So the good and bad examples aren't even straightforwardly comparable. For the second example, calling g() has exactly the same behaviour as calling f(), and for the third example, the behaviour also remains identical with or without the enum.

I don't see how this is any more than just pushing code around without having more context about these methods and who calls them.

t8sr · on Nov 16, 2023

I work in security, and get to interact with a lot of area tech lead and architect types (L7-L9 in FAANG terms). I have found that the people who most concern themselves with “proper design” are generally about 5-10 years into their career and trying to prevent juniors from shooting themselves in the foot, but don’t yet have a proper appreciation for the cost of complexity and how code changes over the long run. Conversely, after you’ve been doing this for 20+ years you learn to value simplicity over most other technical factors.

Questions like “how early do we branch” and “what is the actual job this has to do” end up being the ones whose answers tend to have the most long-term value. Questions about abstractions and encapsulation tend to lead to the type of argument you describe. And the “the big picture” people are the ones with the most security issues in their code, because they don’t really understand what the properly architected codebase is really doing.

xtracto · on Nov 16, 2023

This resonates with my experience. Apparently now the "inversion of control", "dependency injection" patterns are in vogue. So my Sr. Developers are building new code with all that crap. So much that when you want to make a simple change, you have to hunt down 20 files of `xxxCreator`s, `xxxHandler`s and `xxxProvider`s that are glorified functions invoked by `.execute` or `.call`.

And this is in TypeScript ... we were laughing about that kind of complexity in Java 20 years ago... in Slashdot.

valand · on Nov 17, 2023

I always deplore this shallow influencing of design patterns without proper examination of whether said design pattern fits or is even correctly described by the popularizer. People fail to realize that inversion of control is necessary ONLY if some controls need to be inverted.

To be fair, "inversion of control" is a great concept, but unfortunately told in a very specific way across time so the meaning now has changed into an inferior form of the original.

Tainnor · on Nov 17, 2023

I do personally think it's preferable for components to have their dependencies injected in some way, and that it's better to have initialisation logic (especially related to heavy weight components) in one place.

That doesn't mean that you need some huge ass meta-framework for it though. Just have your main function assemble the components together, done.

valand · on Nov 18, 2023

Even this is an over simplification of inversion of control.

Dependency injection is one form, but even a library that exposes parameters is also a form of inversion of control.

The degree of control is what varies, whether it is a mere flag (boolean), a variant directive (complex struct), a turing complete construct (pointer of instantiated function) or anything in between.

What I am saying is that this concept has been reduced as it is popularized and it isn't doing justice to the original powerful concept.

Tainnor · on Nov 16, 2023

> Questions like [...] “what is the actual job this has to do” end up being the ones whose answers tend to have the most long-term value.

How is that not "big picture", proper design and abstraction/encapsulation?

You seem to be arguing against a strawman. I didn't write "complexity". But if you want to fix security issues in your database access patterns, you better not have SQL distributed across your whole codebase.

t8sr · on Nov 16, 2023

I don't think I am. You seem to disagree that things like where to branch are worth talking about, but you haven't given examples of what you mean by the big picture that is worth talking about. In this thread, you've told a few people they don't understand what you're saying - so what are you saying?

Re: SQL. Maybe? Maybe not? It's a lot easier to do bad things to a database if you let an ORM generate the query. Writing SQL directly is not an anti-pattern, as long as you parametrize the query. But then again, maybe an ORM is not what you're suggesting - once again, you're saying what you don't like, and others are left to draw conclusions about what you think is good style.

Tainnor · on Nov 17, 2023

1. Yes, I happen to think that the difference between this:

  fn f() {
    if foo && bar {
      if foo {

      } else {

      }
    }
  }

and this:

  fn g() {
    if foo && bar {
      h()
    }
  }

  fn h() {
    if foo {

    } else {

    }
  }

is not worth talking about.

2. If we're talking about things like validate user input at the boundaries (and if you're a library developer, "user input" might refer to what the library users will pass in to your code), then I think that is a valid and useful guideline. If we talk about "don't spend time executing a bunch of code when you can exit early", then I'll also agree, although in most contexts, this doesn't really matter at the granularity of functions - but it does matter that you e.g. reject invalid input early on in your request handling (assuming a web app here) and not start processing it before deciding that you don't need to (with the notable exception of password validation, which should always be constant time).

3. What I mean by "big picture" is things like clearly delineating which parts of the codebase are responsible for what - for example, which parts of the code contain logic related to the database, which parts are related to business logic, which parts to interfacing with external APIs, which parts about handling responses. It's things such as "functional core, imperative shell" (Gary Bernhard) or other means to try to manage side effects. It's deciding whether to write blocking or non-blocking code, and in the latter case, how exactly to go about it (instead of making a total mess by mixing both styles with a lot of back and forth). It's about deciding on how you want to structure your test suite so it gives you the maximum amount of confidence.

4. I think ORMs provide mostly false confidence and outside of prototype scenarios, I wouldn't personally use them anymore - especially not ones like Hibernate and ActiveRecord - but, alas, it's not always my decision. Their biggest flaw is IMHO that they pollute your domain model (and sometimes even your presentation logic - looking at you, ActiveRecord) with database logic, just for the convenience of being able to call "save()" directly on an object. I don't terribly mind writing raw SQL (as long as you properly parameterize inputs), although I think that generally, type-safe query builders with a 1:1 relationship to SQL (such as jOOQ) are preferable. I do think that all of your SQL-related logic in your application shouldn't be scattered randomly across 10,000 files - it should be in a dedicated place (a number of files grouped in a common folder/package/module, maybe) so that if you need to review certain aspects of it (related to security or performance, e.g. avoiding N+1 queries), you don't have to hunt that logic down throughout your whole application.

valand · on Nov 16, 2023

> but articles like this one are probably the things that can lead to the kind of unproductive bikeshedding that is sometimes experienced in PRs and other discussions

I'd counterargue that if these trivial principles are established beforehand, unproductive bikeshedding of this type will happen less on PRs and other discussions.

> as long as the function is properly named, has a good interface (including expressive types), a clear purpose, is properly documented, doesn't make excessive use of side effects etc.

In performance-sensitive software where DOD thrives, more care needs to be put into these "in the small" things because that is how compiler optimization works. Then, it changes the rule of the game: statement semantics become crucial, self-documenting code makes more sense, in-code comments are for reasoning exclusively, and "documentation" becomes "specification and user manual".

joelthelion · on Nov 16, 2023

> The more experience I get, the more I feel that too many programmers worry about making things pretty "in the small", but not enough care about proper design of an entire codebase.

I've seen the reverse a lot, too. People who enjoy designing over-engineered cathedrals and can't be bothered to think about the low-level efficiency of their algorithms.

Tainnor · on Nov 16, 2023

"Over-engineered cathedrals" aren't what I mean by "good, proper design".

I hate huge, messy meta-frameworks where everything happens implicitly and you can debug nothing. But I hate piles of ad-hoc code where everything calls everything and you can't change a single thing without breaking 2000 other things just as well. You usually don't need complicated code constructs to organise your code well - but you do need to think.

Designing maintainable code is a difficult skill and one that involves careful thinking about tradeoffs. Unfortunately, mostly we as an industry never seem to be able to have time for it.

maeln · on Nov 16, 2023

In general, I feel like too many developer get caught up in dogma, "best-practices" that are effectively often unrealistic, and tend to adopt extremist way of thinking about the big and / or the small picture. A lot of the "dev culture" feel like it lacks a sense of pragmatism.

berkes · on Nov 16, 2023

> People who enjoy designing over-engineered cathedrals and can't be bothered to think about the low-level efficiency of their algorithms.

When that's deliberate, I'd say it's good.

Quite often we need to optimize for maintenance. Or for high churn in developers. For readability. Adaptability. And so on. Each optimization is a trade-off. It's not always "performance vs readability", but quite often it is.

If we spend time and effort on the low-level efficiency of algorithms, AND that comes at the cost of readability, adaptability, maintainability or any other -ility, we should very deliberate make a choice: does performance matter more than X. The answer is suprisingly often "no". And it's one of the many reasons why today's software is slow, bloated, battery-eating and unstable.

Well, the times when that choice is not made deliberately are probably more to blame.

antupis · on Nov 16, 2023

I think this tells more about company culture than the people themselves, some companies like small things, and others like cathedrals.

mpweiher · on Nov 16, 2023

> ... but not enough care about proper design of an entire codebase.

One reason is that we don't have ways of meaningfully expressing components much larger than functions or that connect differently from functions in our programming languages. At most we can aggregate functions together. After that we're on our own.

pjmlp · on Nov 16, 2023

To go along the same line of thought, too many think about the beatuy of the flower, instead of the whole florest.

The amount of hours occasionally wasted discussing this in PR, that add zero value to what the customer actually cares, e.g. does pressing the button print or not their monthly invoice.

Sure there is a balance to be had between completly unmaintainable code, and a piece of art to expose in the Louvre, however too many focus on the latter, instead of what is the customer getting out of their beautiful flower.

valand · on Nov 17, 2023

Often the people who overfocus on the beauty of the flower are those detached too far from the customers, too many indirections that hide failures and pressure

kubb · on Nov 16, 2023

Programming language design also tends to focus on the small (e.g. optimizing the number of language keywords).

teo_zero · on Nov 16, 2023

Poor choice of words in TFA and a cursory read by a reader generated this misunderstanding. "Up" means "to the caller" for the author, but was probably understood as "towards the beginning (of the function or file)".

Woshiwuja · on Nov 16, 2023

From the title i would have guessed "Use more ifs and less fors" and it didnt make sense at the beginning

Tainnor · on Nov 16, 2023

No, I did understand the point the author was making. I just think it's a weak one.

tangjurine · on Nov 16, 2023

Like there's the stuff that can be refactored because the changes are local, or the places to change are clear.

And then the stuff that is a lot harder to refactor because you would need to change code in a lot of places, and the places that need to be changed aren't clear. That's probably more the proper design of a codebase stuff.

nerdponx · on Nov 15, 2023

Pushing "ifs" up has the downside that the preconditions and postconditions are no longer directly visible in the definition of a function, and must then be checked at each call site. In bigger projects with multiple contributors, such functions could end up getting reused outside their intended context. The result is bugs.

One solution is some kind of contract framework, but then you end up rewriting the conditions twice, once in the contract and once in the code. The same is true with dependent types.

One idea I haven't seen before is the idea of tagging regions of code as being part of some particular context, and defining functions that can only be called from that context.

Hypothetically in Python you could write:

  @requires_context("VALIDATED_XY")
  def do_something(x, y):
      ...

  @contextmanager
  def validated_xy(x, y):
      if abs(x) < 1 and abs(y) < 1:
          with context("VALIDATED_XY"):
              yield x, y
      else:
          raise ValueError("out of bounds")

 with validated_xy(0.5, 0.5) as x_safe, y_safe:
      do_something(x_safe, y_safe)

  # Error!
  do_something(0.5, 0.5)

The language runtime has no knowledge of what the context actually means, but with appropriate tools (and testing), we could design our programs to only establish the desired context when a certain condition is met.

You could enforce this at the type level in a language like Haskell using something like the identity monad.

But even if it's not enforced at the type level, it could be an interesting way to protect "unsafe" regions of code.

imron · on Nov 15, 2023

> Pushing "ifs" up has the downside that the preconditions and postconditions are no longer directly visible in the definition of a function

You're missing the second part of the author's argument:

"or it could push the task of precondition checking to its caller, and enforce via types"

The precondition is therefore still directly visible in the function definition - just as part of the type signature rather than in an if statement.

The "enforce preconditions via types" is a common pattern in Rust (the language used in the article), and unlike checking with if statements, it's a strict precondition that is checked at compile time rather than at runtime and you won't even be able to compile your program if you don't meet the pre-condition.

scns · on Nov 16, 2023

> The "enforce preconditions via types" is a common pattern in Rust

Called the Type State Pattern. Here are two links from my bookmarks:

https://cliffle.com/blog/rust-typestate/

https://yoric.github.io/post/rust-typestate/

nerdponx · on Nov 16, 2023

Definitely, that's a pattern in many programming languages (including, increasingly, in Python). And in dependently-typed languages, you actually can obtain a proof of validity that gets erased at runtime, leaving nothing but basic unboxed data to be executed.

However that's not always an option in all languages or situations, and it's hard to encode conditions like "must only be called while an event loop is running" that way.

I'm envisioning this partly as kind of a catch-all/fallback mechanism that can either be bolted on a language that doesn't have these kinds of features. But it's not always the case that you can effectively communicate contextual requirements through the types of function parameters, and this would cover that case as well.

If you want to reduce this to anything, it's a workaround for not having monads in your language, with some tradeoffs around composability and dynamic extent.

funcDropShadow · on Nov 16, 2023

> "or it could push the task of precondition checking to its caller, and enforce via types"

Or the enforcement could be done via a design-by-contract discipline. Either using classical assertions, or something like reusable specifications in Clojure's spec library.

jon-wood · on Nov 16, 2023

You can use types for this in Python:

  def do_something(position: ValidatedPosition):
    ...

  position = Position(x, y)
  if abs(position.x) > 1 or abs(position.y) > 1:
    raise ValueError("out of bounds")

  position = ValidatedPosition(position.x, position.y)
  
  do_something(position)

In practice I'd probably have ValidatedPosition encapsulate that validation in it's constructor, but you get the idea. If you attempt to pass a Position to do_something then mypy is going to shout at you for passing an object of the wrong type to it.

Python's type checking definitely isn't as comprehensive as something like Rust, but I have found it increasingly useful for situations like this where you want to ensure the data being passed in has been appropriately handled.

nerdponx · on Nov 16, 2023

Sure, but now you have to rewrite your code to accept a ValidatedPosition as input and not two floats. In Rust that might be "zero cost", but in Python it certainly is not, even if you implement all this in Cython. And even in the context of a Cython/Rust/whatever extension, what if this is calling a BLAS function or some other external procedure that has specific preconditions?

It's usually fine to use the type state pattern as described by sibling commenters, but sometimes it's expensive or infeasible.

Another situation is less related to performance and more related to managing dynamic state. Consider the following annoyance that arises when we try to attach static types to an old-school Python idiom:

  class AppClient:
      def __init__(self, url: str, http_client: httpx.Client | None = None) -> None:
          self.url = url
          self.http_client = httpx.Client() if http_client is None else http_client
          self._auth_token: str | None = None

      def login(self) -> None:
          self._auth_token = ...

      def _post(self, body: dict[str, Any]) -> None:
          self.http_client.post(
              self.url, headers={"Authorizaton": f"Bearer {self._auth_token}"}, json=body
          )

      def submit_foo(self, foo: Foo) -> None:
          self._post(foo.to_dict())

Users will end up with a difficult-to-debug auth failure if they forget to call `login` first. Also, Mypy won't catch this without contorting your code.

We naturally would want to add a test case like this:

  def test_error_if_not_logged_in():
      app_client = AppClient(url="https://api.example.net/v1")
      with pytest.raises(RuntimeError):
          app_client._post({})

Thus we'll need to guard every single usage of `self._auth_token` with:

  if self._auth_token is None:
      raise RuntimeError(...)

And there are three usual approaches for handling this:

1. Disregard the article's advice and push the "if" down, into the _post method, to ensure that we only actually need to write that guard condition once.

2. Write a _check_logged_in method that encapsulates the if/raise logic, use that to at least make it easier to apply the guard where needed.

3. Refactor the internals to use something akin to the type state pattern, which is a funky and cool idea that I support. But it could easily end up turning into an "enterprise fizzbuzz" exercise where your little client library expands into a type-driven onion that obfuscates the business logic.

I'm proposing a 4th way:

4. Declare that _post may only be called from inside a specific dynamic context, and ensure that that dynamic context is only established by login().

Is it better? I'm not sure. But I've never seen it before, and it's kind of appealing as I continue to think about it.

tetha · on Nov 16, 2023

Something I've seen a few times in Go and java: Make login() a constructor for the AppClient, or declare what's essentially a factory called "AppClientConfig" with a login() function to create the AppClient.

I've only recently started to think about using factories like that, so I'm very sure there are patterns of dynamic state I wouldn't have a recipe for, but it's a prety appealing idea to me.

paholg · on Nov 17, 2023

That's called RAII, and I'm a big fan (not of the name, though. It's a terrible name).

https://en.m.wikipedia.org/wiki/Resource_acquisition_is_init...

nerdponx · on Nov 16, 2023

That's another good way to do it for sure, and IMO badly underused within the Python ecosystem specifically.

kingdomcome50 · on Nov 18, 2023

This is just poor API design. It's that simple. `login` is on the wrong object. I'm not going to write it all out but consider this snippet:

``` def login(self) -> AppClient ```

garethrowlands · on Nov 16, 2023

The “push ifs up” advice is not about checking preconditions, it’s about selecting the right code path. If you have a function with a precondition, by all means assert it at the beginning of that function. For example, the Java type system allows nulls, so a function that requires an object should throw an exception if a null is passed.

Each call site is absolutely responsible for ensuring that it only calls a function if its precondition is true. This follows from the definition of precondition. Nothing can change that.

Calling a function in violation of its precondition is a bug (in the caller). And, sure, functions often need code that checks that (to prevent undefined behaviour). But we should distinguish these assertions from our program’s actual control flow. The article is about the latter.

timeon · on Nov 15, 2023

> In bigger projects with multiple contributors, such functions could end up getting reused outside their intended context. The result is bugs.

Yes but in this particular example `fn frobnicate(walrus: Walrus)` if you pass here anything other then owned Walrus then program would not compile. Even if it was something generic passing the arg would have to satisfy trait bounds. Definition of those bounds in function definition would be required by compiler based on how the argument will be used inside function.

aeonik · on Nov 16, 2023

Isn't the intention of public vs private supposed to cover the "tagging of code for a particular context"?

Or maybe we need more specific semantics around this that can cross cut the domain that public and private (and protected in .NET ecosystems) cover?

nerdponx · on Nov 16, 2023

Sure, this could definitely be a generalization of "public" and "private".

tengbretson · on Nov 15, 2023

Branded types in typescript are an interesting way to do this.

bcrosby95 · on Nov 16, 2023

The first example is bad for reasons not related to ifs and fors. In general, if you can, if you have a "container" for something, you should write functions on the contained, domain-level "Thing" rather than the container with the domain level thing.

As an example I work with - Clojure. Sometimes I use agents. I don't write functions for agents, I write functions for things agents might contain.

Similar rules for Elixir. My primary domain level functions don't work off a PID. They work off the underlying, domain-level data structures. GenServer calls delegate to that where necessary.

This makes them more flexible, and tends to keep a cleaner distinction between a core domain (frobnicate the Walrus) and broader application concerns (maybe the Walrus is there, maybe not... oh yeah, also frobnicate it).

crdrost · on Nov 16, 2023

Yeah. I think the given advice probably takes validation logic and floats it too high. It is of course nice to have early validation logic, but it is also nice when your functions don't mysteriously crap out with some weird error but instead shout a validation error at you.

Haskell solves this with newtypes, “here is a transparent container that certifies that you did the appropriate validation already,” that helps for this.

The advice that I really want to hammer into people's heads is, prefer “sad ifs.” That is, I will almost always find this

    if (something_is_wrong_in_way_1) {
      // fix it or abort
    }
    if (something_is_wrong_in_way_2) {
      // fix it or abort
    }
    if (something_is_wrong_in_way_3) {
      // fix it or abort
    }

more readable and maintainable than this

    if (things_are_ok_in_way_1) {
      if (things_are_ok_in_way_2) {
        if (things_are_ok_in_way_3) {
          // do the happy path!
        } else {
          // fix or abort thing 3
          // if fixed, do the happy path
        }
      } else {
        // fix or abort thing 2
        // if fixed, test way 3 again
        // if way 3 is good do the happy path, else fix it
        // ...
      }
    } else {
      // ...
    }

I feel like it's in human nature to focus on the expected case, I want everyone whose code I meet to do the exact opposite, focus primarily on the unexpected. Every “if” imposes a mental burden that I am keeping track of, and if you have to go to an external system to fetch that information, or you need to exit early with an error, I can immediately discharge that mental burden the moment I know about it, if the handling and the detection are right next to each other.

Izkata · on Nov 16, 2023

> The advice that I really want to hammer into people's heads is, prefer “sad ifs.”

I think this is the first time I've heard that name for this idea. You'll find more under the term "guard clauses" - it even has its own Wikipedia page: https://en.wikipedia.org/wiki/Guard_(computer_science)

softfalcon · on Nov 16, 2023

Yeah, same, also know this as guard clauses. But "sad if's" is kind of adorable sounding, so I might start using it.

philbo · on Nov 16, 2023

pessimifs

867-5309 · on Nov 16, 2023

depressifs?

btown · on Nov 16, 2023

I once had a CS teacher who hated early returns and break statements; they would always prefer the "nested ifs" approach. And they're far from being alone - many others believe that path is cleaner.

I'm with the parent poster, though: if you can use control flow to keep the happy path the unindented path, and you have syntax highlighting to help you show that each guard clause actually does interrupt the control flow, it can often be significantly cleaner.

I also like to think of it in terms of "railway oriented programming" (https://fsharpforfunandprofit.com/rop/ which is a must-read for anyone trying to wade into monads and functional programming IMO) - the "happy path" should be the path that it's easiest for the train to go down, namely a straight line down the page. Only give yourself the headache of a sharp turn when you're interrupting that path!

jaza · on Nov 16, 2023

I don't mind early returns, but I'm infamously anti "break" and "continue" statements (and all my colleagues to date have disagreed with me and have found my stance hilarious and/or infuriating). My rationale is that they turn a loop into a mutant abomination that's a loop and conditionals all mushed up together. I also can't shake this gut feeling that they're crude and antiquated control flow mechanisms, only one step away from the fiery inferno of "goto". So instead of this:

    for thing in things:
        if thing.is_yellow():
            continue

        if thing.is_rainbow():
            break

        thing.thingify()

I'll do:

    things_iter = iter(things)

    thing = next(things_iter, None)
    is_rainbow_thing_found = (
        thing.is_rainbow()
        if thing is not None
        else False
    )

    while (
        thing is not None
        and not is_rainbow_thing_found
    ):
        if not thing.is_yellow():
            thing.thingify()

        thing = next(things_iter, None)
        is_rainbow_thing_found = (
            thing.is_rainbow()
            if thing is not None
            else False
        )

sethammons · on Nov 16, 2023

I almost never say this: you are wrong. Code needs to be maintained. Everyone can understand the first example. Everyone has to slowdown and reason about your second example. It is objectively worse. And your argument is flawed: break and continue are not the goto-like behavior that goto is criticized for as they are limited to the scope of that block of code.

The problem traditionally cited with goto is it can go anywhere (even into the body of another function where expectations on which variables are set may not hold).

If you don't like break and continue, how do you justify exceptions? The handling code for those may be in a different file or nowhere. Much closer to goto.

jaza · on Nov 17, 2023

As others have said here, if you want to do it nicely and avoid break / continue, then do it in a proper FP way with chained limit / filter calls, rather than doing it per my example.

I accept your criticism, that my example is much harder to understand, and I admit that it's probably not the way to go in terms of maintainability. However, I'd still argue that my example is closer to an FP approach (even though it still uses a procedural loop), and that it communicates the logic in the same way that an FP approach does. In my example, it's clear that we're filtering out yellow (whereas with "continue", the logic is written in an inverted way); and it's clear just from reading the loop's header that we stop looping when rainbow is found (whereas with "break", logic to stop looping is being added at any arbitrary spot inside the loop's body).

Re: exceptions. I know that many more people object to them, than to break / continue (and whole languages make a point of not using them, most notably Golang). The pros and cons of exceptions is a whole topic in itself that I'd rather not digress into right here. Anyway, personally I'm ok with them. Firstly, they're for errors, and I feel that having weird control flow is sort-of justified for errors, whereas it's not justified for expected behaviour. And secondly, they're at least consistent in that they can occur anywhere, whereas break / continue can only occur inside loops (in Python, at least - in other C-family languages, "break" has different behaviour in different contexts, which is IMHO even worse).

toxik · on Nov 16, 2023

Surely you are joking? Your version is four times longer and repeats a large section of itself.

The way to avoid control flow here is to simply find the rainbow first, take the sequence up until then, filter by yellow, and thingify the remaining things.

    rainbow_idx = next(i for i, t in enumerate(things) if t.is_rbow())
    yellows = (t for t in things[:rainbow_idx] if t.is_ylw())
    for t in yellows:
        t.thngry()

Excuse formatting, cellphone.

Not sure this is any better than the obvious solution either.

jaza · on Nov 17, 2023

Yes, I agree, that's the way to do it. Although it isn't always so obvious how to break it up into limit / filter steps, my example was quite trivial. But there should always be a way somehow.

indigo945 · on Nov 16, 2023

Considering break and continue to be too powerful as control flow statements is a somewhat common opinion, in my experience. (I still use them, but only sparingly, and where I am quite sure that they're not confusing.)

That said, your second code block is unreadable to me -- even knowing what it does, it's just confusing. No way that that's more readable than making use of break here. If you don't want so many conditionals in the loop body, you can factor the first check out:

    no_yellow_things = (t for t in things if not t.is_yellow())
    for t in no_yellow_things:
        if t.is_rainbow():
            break
        t.thingify()

indiv0 · on Nov 16, 2023

Minimizing control flow in loops is a good idea, but if you can FP-ize the loop entirely that keeps it pretty readable IMO.

    things
        .iter()
        .filter(|t| !t.is_yellow())
        .take_while(|t| !t.is_rainbow())
        .for_each(|t| t.thingify());

jaza · on Nov 17, 2023

That's the nicest counter-example to my example, thanks for that! I wasn't familiar with take_while() (looks like that's Rust, and looks like Python has a similar itertools.takewhile() ), TIL a neat FP version of break. My example was quite trivial, it's not always so obvious how to break up a convoluted loop, but it should always be possible.

Tainnor · on Nov 17, 2023

That's the one I would prefer. If you're at all used to FP, this signals intent very clearly.

KineticLensman · on Nov 16, 2023

I’m with your colleagues I’m afraid. The intent of the all-mushed-up-together version is really clear. I had to spend a long time looking at the longer version to figure out what was going on. I know which I’d prefer to maintain if I had to.

reroute22 · on Nov 16, 2023

I'm afraid I'm with your colleagues.

I value being able to grasp what the code does at a glance while scrolling through. Your first snippet is entering my brain like as if without even reading at all, all at once and instantly.

Your second example absolutely does not and I'm sure you know why.

WesolyKubeczek · on Nov 16, 2023

Do they come from the times when structured programming was hot and new, SESE had been the buzzword of the year, and that doodad called "transistor" just had been invented?

quickthrower2 · on Nov 16, 2023

In short: avoid `else`.

Both what you mention, and also the case with :

   if (...) {
     ...
   } else { 
     return 0; 
   }

Which can become

   if (...) {
     ...
   }

   return 0;

hedora · on Nov 16, 2023

Even better, if you're using a language with pattern matching (rust, any ML, haskell), just put it in a single match with the happy case on top, and the error cases below.

To do this, you need to encode the distinction between happy and sad path in your types. That's good for all sorts of other reasons, so it's not really a drawback.

KineticLensman · on Nov 16, 2023

Yes, pattern matching really helps with clarity, especially when there are multiple happy paths, like in a parser

jstanley · on Nov 16, 2023

I tend to prefer a style more like:

    if (!...) return 0;

    ...

Because otherwise you can end up with lots of nested conditions which are very confusing to follow, even though they boil down to "do the thing if these are true, otherwise don't do the thing".

Xeamek · on Nov 16, 2023

In functional languages like elixir there is no return though, and you are forced to use if-else structure

theteapot · on Nov 16, 2023

What about `else if`? Not a fan?

stouset · on Nov 16, 2023

Personally I only either use if statements that wrap a single line of logic, or a switch/case statement where each branch is a single line of logic.

Most situations are handled by guard clauses, exhaustive type switching, or switching on logical cases. Very occasionally do I have to break this practice for a legitimate reason, but this keeps functions simple to read and to reason about.

I occasionally am unable to avoid an else, but I will always avoid else if. It puts too much reliance on remembering how this condition interacts with the one on the if statement above it, which can often be many lines away. I’d infinitely rather use a case statement where all the alternatives are in an easily-understood table with every condition aligned on successive lines.

quickthrower2 · on Nov 16, 2023

Occasional vice for me.

at_a_remove · on Nov 16, 2023

I usually move this up in an ETL process into a function I usually call "Pre_Flight_Checklist." Over the years this has gotten its own helper functions. For example, if I am going to use a file, I have function that not only checks for the existence of the file, but that it is a file. This function can also be fed an expected size range and date if "freshness" is expected. If something is out of whack, an error message will mention that a file was expected at /Adirectory/Bdirectory/Cdirectory/filename.ext, but only /Adirectory/Bdirectory was found, nothing under it. I believe in overly-explanatory error messages.

Now, I do run the risk of having things change out from under me, but putting all of the defensiveness up front allows the happy path and some alternatives to be more succinct in the code.

I keep finding new things to be paranoid about, though!

blandflakes · on Nov 16, 2023

I at first had the opposite reaction, which is that I spend a ton of time fighting inverted ifs! But my reaction is not incompatible with yours. It's not, but it is related: I HATE reading:

if (!someCondition) { // short amount of code } else { // longer happy path }

The contextual overhead of having to invert a negative (i.e. !!someCondition) is just annoying.

I do agree that if (happy) { /* tons of code / } else { / I forgot how we got here */ } can also be an issue.

Izkata · on Nov 16, 2023

The key to the idea above is that you don't have any "else" clause. It's an early return, raise an exception, or modify the data so it's no longer bad.

blandflakes · on Nov 16, 2023

Yes, that's why I said I realized they weren't quite the same situation, while adding another example of a situation where people create hard to follow structures.

softfalcon · on Nov 16, 2023

TIL: another word for "guard clauses" is "sad if's".

sachahjkl · on Nov 16, 2023

Great insight. many programmers fail to think of their errors valid as values to get as an output. Golang got this right

slowmovintarget · on Nov 16, 2023

I recall avoiding multiple method exits (in Java) like the plague. They were hell to get through in a debugging session. But I guess we don't use debuggers much any more.

When I began writing C# I recall my other team members "refactoring" my code to be shot through with multiple exits instead of a my collecting variable, because "it looked nicer." When they asked me why I wrote it the way I did, I mentioned the Java debugger thing, and they kind of looked at me blankly and said, "Huh, never thought of that."

Times change.

_gabe_ · on Nov 16, 2023

Why is multiple method exits hell to get through with a debugger? I use a debugger for my C++ code all the time and I’ve never run into an issue because of multiple returns.

slowmovintarget · on Nov 16, 2023

Setting breakpoints to inspect local state means you have to catch every single exit. Back then, if you had a thirty second to one minute round trip to get back to the method you were inspecting, it became... frustrating. You probably didn't have to deal with those kind of start-up round trips in C++. With C++ you'd pay on the compile time.

If you wrote the code to have a single exit, the cognitive load was also typically lower, as the control flow could be thought of as one way in, and two ways out (return or exception). Good disciple also followed "the jolly good idea of Demeter" or whatever it came to be called; that being to only operate on the state (or preferably, values) passed in, where possible.

_gabe_ · on Nov 20, 2023

I know this is a late response, but this still doesn’t really make sense to me. What stopped you from placing the break point at the start of the function, before the return values were hit and stepping through the code?

In my experience, having multiple exits is even better because I can check which edge case is being hit by placing multiple breakpoints, or a breakpoint at a specific exit I want to check. I understand the pain of having to step through the code to get back to a certain execution point, but I’ve never been in a situation where multiple exits caused such cognitive load as to hinder that process further.

mcronce · on Nov 16, 2023

The author called out the newtype solution specifically in the first paragraph:

> it could push the task of precondition checking to its caller, and enforce via types (or an assert) that the precondition holds

(Emphasis mine)

GuestHNUser · on Nov 16, 2023

Correct me if I am misunderstanding, you are saying the first example would be better if it was `walrus.frobnicate()`? Isn't that a syntax preference more than an issue with the point the author is trying to make?

runeks · on Nov 16, 2023

> As an example I work with - Clojure. Sometimes I use agents. I don't write functions for agents, I write functions for things agents might contain.

I'm not following you. Do you have a better example?

bcrosby95 · on Nov 16, 2023

Similar to the example of the OP:

  (def walrus (agent {}))

  ; good
  (defn frobnicate [w]
    (assoc w :frobnicated true))
  (send walrus frobnicate)

  ; bad
  (defn frobnicate [w]
    (send walrus #(assoc % :frobnicated true)))
  (frobnicate walrus)

It might make sense to also have a function that executes something like (send walrus frobnicate). But IMHO code should basically never look like the bad example.

torstenvl · on Nov 15, 2023

I wouldn't quite say this is bad advice, but it isn't necessarily good advice either.

I think it's somewhat telling that the chosen language is Rust. The strong type system prevents a lot of defensive programming required in other languages. A C programmer who doesn't check the validity of pointers passed to functions and subsequently causes a NULL dereference is not a C programmer I want on my team. So at least some `if`s should definitely be down (preferably in a way where errors bubble up well).

I feel less strongly about `for`s, but the fact that array arguments decay to pointers in C also makes me think that iteration should be up, not down. I can reliably know the length of an array in its originating function, but not in a function to which I pass it as an argument.

lytigas · on Nov 16, 2023

> A C programmer who doesn't check the validity of pointers passed to functions and subsequently causes a NULL dereference is not a C programmer I want on my team.

I disagree. Interfaces in C need to carefully document their expectations and do exactly that amount of checking, not more. Documentation should replace a strong type system, not runtime checks. Code filled with NULL checks and other defensive maneuvers is far less readable. You could argue for more defensive checking at a library boundary, and this is exactly what the article pushes for: push these checks up.

Security-critical code may be different, but in most cases an accidental NULL dereference is fine and will be caught by tests, sanitizers, or fuzzing.

jrockway · on Nov 16, 2023

I agree with that. If a function "can't" be called with a null pointer, but is, that's a very interesting bug that should expose itself as quickly as possible. It is likely hiding a different and more difficult to detect bug.

Checking for null in every function is a pattern you get into when the codebase violates so many internal invariants so regularly that it can't function without the null checks. But this is hiding careless design and implementation, which is going to be an even bigger problem to grapple with than random crashes as the codebase evolves.

Ultimately, if your problem today is that your program crashes, your problem tomorrow will be that it returns incorrect results. What's easier for your monitoring system to detect, a crashed program, or days of returning the wrong answer 1% of the time? The latter is really scary, depending on the program is supposed to do. Charge the wrong credit card, grant access when something should be private, etc. Those have much worse consequences than downtime. (Of course, crashing on user data is a denial of service attack, so you can't really do either. To really win the programming game, you have to return correct results AND not crash all the time.)

wavemode · on Nov 16, 2023

Yeah but, not checking for null in C can cause undefined behavior. One possible outcome of undefined behavior is that your program doesn't even crash, but rather continues running in a weird state. So such a bug doesn't always "expose itself".

If we accept that bugs are inevitable, and that accidentally passing a null pointer to a function is a possible bug, then we also conclude that your code really should include non-null assertions that intentionally abort() the program. (Which run in debug/staging mode but can be disabled in release/production mode.)

LegionMammal978 · on Nov 16, 2023

Indeed, Rust's own standard library uses this method. There are lots of public-facing unsafe functions that can result in undefined behavior if called incorrectly. But if the standard library is compiled in debug mode (which currently requires the unstable flag -Zbuild-std), then it will activate assertions on many of these unsafe functions, so that they will print a message and abort the program if they detect invalid input.

afdbcreid · on Nov 16, 2023

The Rust compiler has even started recently to put extra checks on unsafe code in codegen, e.g. on raw pointer dereference to check it is aligned.

johncowan · on Nov 16, 2023

That raises a more general point. When you can't or don't have compile-time checks, removing run-time checks in production amounts to wearing your seat belt only when driving around a parking lot and then unbuckling when you get on the highway. It's very much the Wrong Thing.

wavemode · on Nov 16, 2023

I wouldn't really characterize it that way. You (ideally) shouldn't be hitting code paths in production that you didn't ever hit in testing.

But, in any case, if you are fine with the slight performance hit (though many C/C++ projects are not), you can always just keep assertions enabled in production.

jrockway · on Nov 16, 2023

Very good point. For C, I like the idea of sticking an assertion in there.

gpderetta · on Nov 16, 2023

assert_always(ptr != nullptr);

(custom assert_always macro, so it doesn't get compiled out in release builds)

sowbug · on Nov 18, 2023

I used to ask this same question in interviews: should C code always check for NULL? My favorite answer was that the code should have a notion of boundaries, runtime checks should happen at the boundaries, and debug-only assertions are nice but not required inside the boundaries.

bregma · on Nov 16, 2023

In C a NULL is often a valid non-dereferencable pointer value. If you're checking for invalid pointer values you need to check for any of the 0xfffffffffffffffe possible invalid values. If all you're doing is checking for NULL you're not checking for invalid values at all.

If the precondition of your function is "parameter p can not be NULL" that's fine, go ahead and check for it. If the precondition is "parameter p must be a valid pointer" well then, good luck to you finding the appropriate assertion condition.

branko_d · on Nov 16, 2023

My rule of thumb is: if the type system doesn't prevent an invalid value, it's your responsibility to prevent it at run-time.

I've been writing a lot of T-SQL lately, which doesn't let you declare a parameter or variable as NOT NULL. So it's a good idea to check for NULLs as early as reasonable - usually at the top of the stored procedure (for parameters). Otherwise, a NULL might propagate unexpectedly deep into the call hierarchy and cause less-than-obvious problems.

Fortunately, the data in the table can be declared as NOT NULL, so these kinds of bugs will usually not corrupt the data, but catching them as early as possible makes life easier. However, if there is piece of logic that writes something to the database depending on the value of some parameter, and that parameter is unexpectedly NULL, that might lead to a wrong thing being written, or a necessary thing not being written at all, effectively corrupting the data.

So, defensive programming all the way, baby!

smokel · on Nov 15, 2023

Without a proper context, this is fairly strange, and possibly even bad advice.

For loops and if statements are both control flow operations, so some of the arguments in the article make little sense. The strongest argument seems to be about performance, but that should typically be one of the latest concerns, especially for rule-of-thumb advice.

Unfortunately, the author has managed to create a catchphrase out of it. Let's hope that doesn't catch on.

koonsolo · on Nov 16, 2023

> Unfortunately, the author has managed to create a catchphrase out of it. Let's hope that doesn't catch on.

In you next pull request: "Hey can you push this if up?" :D.

actionfromafar · on Nov 15, 2023

    try
        let’s hope
    catch
        not on

aktenlage · on Nov 16, 2023

> The strongest argument seems to be about performance

It may be an argument, but it's not a strong one. If the improved code can be written like the author puts it in their example (see below), the condition is constant over the runtime of the loop. So unless you evaluate an expensive condition every time, you are good. Branch prediction will have your back. If condition is just a boolean expression using const values, I'd even guess the compiler will figure it out.

  if condition {
    for walrus in walruses {
      walrus.frobnicate()
    }
  } else {
    for walrus in walruses {
      walrus.transmogrify()
    }
  }

Branch prediction should have you covered here. If you can easily rewrite it in

jasonjmcghee · on Nov 16, 2023

I really thought the whole article was building up to a code example like

  [fwalrus, twalrus] = split(walrus, condition)
  frobnicate_batch(fwalrus)
  transmogrify_batch(twalrus)

And instead went for

  if condition {
    for walrus in walruses {
      walrus.frobnicate()
    }
  } else {
    for walrus in walruses {
      walrus.transmogrify()
    }
  }

crabmusket · on Nov 16, 2023

Sort the data! Sort it! [1]

[1]: https://macton.smugmug.com/Other/2008-07-15-by-Eye-Fi/n-xmKD...

vjk800 · on Nov 16, 2023

I'm surprised by how often programmers coming from software engineering background do this wrong. I started programming in science and there it's absolutely necessary to think about this stuff. Doing for loops in a wrong order can be the difference between running your simulation in one hour instead of one week.

With this background, I instinctively do small-time optimization to all my codes by ordering for's and if's appropriately. Code that doesn't do this right just looks wrong to me.

crabmusket · on Nov 16, 2023

Watch out for a visit from the premature optimisation police!

topaz0 · on Nov 16, 2023

I try to avoid the premature optimisation police just as much as the regular police. Neither of them tend to be serving the public interest, whatever they may say.

p4bl0 · on Nov 15, 2023

I'm not convinced that such general rules can really apply to real-world code. I often see this kind of rules as ill-placed dogmas, because sadly even if this particular blog post start by saying these are rule of thumbs they're not always taken this way by young programmers. A few weeks ago YouTube was constantly pushing to me a video called "I'm a never-nester" apparently of someone arguing that one should never nest ifs, which is, well, kind of ridiculous. Anyway, back at the specific advice from this post, for example, take this code from the article:

    // GOOD
    if condition {
      for walrus in walruses {
        walrus.frobnicate()
      }
    } else {
      for walrus in walruses {
        walrus.transmogrify()
      }
    }
    
    // BAD
    for walrus in walruses {
      if condition {
        walrus.frobnicate()
      } else {
        walrus.transmogrify()
      }
    }

In most cases where code is written in the "BAD"-labeled way, the `condition` part will depend on `walrus` and thus the `if` cannot actually be pushed up because if it can then it is quite obvious to anyone that you will be re-evaluating the same expression — the condition — over and over in the loop, and programmers have a natural tendency to avoid that. But junior programmers or students reading dogmatic-like wise-sounding rules may produce worse code to strictly follow these kind of advices.

ToValueFunfetti · on Nov 15, 2023

Re: 'never-nesting', I'm not especially dogmatic, but I've never empirically seen a situation where this:

  match (condition_a, condition_b){
   (true, true) => fn_a()
   (true, false) => fn_b()
   (false, true) => fn_c()
   (false, false) => fn_d()
  }

isn't preferable to this:

  if condition_a {
     if condition_b {
       fn_a()
     } else {
       fn_b()
     }
  else if condition_b {
     fn_c()
  } else {
     fn_d()
  }

(Assuming the syntax is available)

agumonkey · on Nov 16, 2023

I really prefer the flat variant cause it helps me ensure exhaustiveness. In the end people probably trained their brain to read the nested variant to the point it's transparent to their neurons.

makeitdouble · on Nov 16, 2023

I think the lines gets blurred when doing early exits and guard conditions.

For instance

  if !condition_a && !conditon_b {
    return fn_d() // probably an error condition ?
  }

  if condition_a && condition_b {
    return fn_a()
  }

  if condition_a {
   fn_b()
  } else {
   fn_c()
  }

1letterunixname · on Nov 16, 2023

These 3 examples optimize to the the same thing because they generate identical instructions:

https://godbolt.org/z/a1Yq9rceE

Note: Inverting and rearranging conditions changes what LLVM decides to output, sometimes for the worse. --opt-level=s is counterproductive here.

cratermoon · on Nov 16, 2023

Oh no, hard disagree. The match implementation is far easier to reason about. I can see at a glance that if both condition_a and condition_b are true, call fn_a(). For the nested if version I have to trace each expression by hand.

Izkata · on Nov 16, 2023

They said "isn't", not "is". Double negative. You two agree.

cratermoon · on Nov 17, 2023

Ugh. I missed that. No wonder my grade school teachers taught us "don't use no double negatives".

titzer · on Nov 16, 2023

I think I would like the former syntax. Witness one of the several nested matching situations I run into:

https://github.com/titzer/wizard-engine/blob/master/src/engi...

This would be much, much better if Virgil had pattern matching on tuples of ADTs.

tacitusarc · on Nov 16, 2023

I think this is actually a great example of why `if` should be "pushed up." The objective of the code is to perform an particular operation on the walrus, given a condition. The is actually ifs being instead of polymorphism and a type system. Why does the walrus have these two functions, which must be called in different scenarios? Why not have a single function, and two types, and the correct type is passed in? Even given the current structure, we could accomplish this:

    // There are better ways to accomplish this, depending on the language
    func frobnicate(walrus) {
        return walrus.frobnicate();
    }

    func transmogrify(walrus) {
        return walrus.transmogrify();
    }

    // Somewhere high in the code
    if condition {
        processing_function = frobnicate
    } else {
        processing_function = transmogrify
    }


    // Somewhere else in the code
    for walrus in walruses {
        processing_function()
    }

If the decisions are made as early as possible, they do not need to be littered throughout the code. The guts of the code can run without branches, performing the same operations every time, the output only modified through the construction graph.

Of course, this is not a new idea: https://www.youtube.com/watch?v=4F72VULWFvc

It was old 15 years ago.

hedora · on Nov 16, 2023

That pattern is not zero-cost: It breaks inlining and other compile-time optimizations. It also can hurt maintainability and readability. If the type of processing_function() is significantly generic, figuring out what is being called in the loop can be hard (as in undecidable; it is equivalent to the halting problem).

In an extreme case that I've seen in real code, processing_function() is called send(t: Object) -> Object or recv(u: Object), and all non-trivial control flow in the system is laundered through those two function names.

So, you have 100's or 1000's of send, recv pairs, and grep (or even a fancy IDE) can't match them up.

tacitusarc · on Nov 16, 2023

For a non-trivial codebase, this style is incredibly valuable for readability. With it, decisions about execution are made in a single place, rather than distributed throughout the code. If the conditionals are embedded deeply in the call graph, sure a reader may be able to understand a single function when looking at it, but they won't know the implications of that function or be able to reason about the program as a whole. I've seen a huge class of bugs that are eliminated by properly using polymorphism rather than embedded conditionals.

That being said, yeah, its possible to just layer unnecessary abstraction and way over complicate things, making the layers utterly incomprehensible. But that seems to be the result of a misunderstanding- the point here isn't to _abstract_, it's to properly model polymorphic behavior. Any abstraction that takes place to do that is suspicious at best.

I think the performance aspect is negligible for 98% of programs. The compiler may or may not be able to inline the underlying function, depends on the compiler. Depends on the optimizations. And in almost all cases, if you're relying on compiler optimizations to fix your performance problems, you're looking in the wrong place. Not to say such cases don't exist, and there's a slew of people who work in those areas, but I trust they know their business well enough to evaluate the trade offs here and make the right decisions.

hedora · on Nov 16, 2023

If you're using a language with proper static typing and generics support (so C++, Rust, Swift, Go, etc), disabling the optimizer is typically a 10x hit. Most of what the optimizer does relies on statically analyzing call graphs, which it can't do if you're slinging function pointers around all over the place.

Godbolt.org has a great example for Swift. Switch the editor to swift mode, load the "Sum over Array" example, and compile it with defaults. It produces reams and reams of assembly code. Now, pass -O in as a compiler flag to enable optimization, and it produces extremely tight code. (Swift 5.0 even noticed the two functions are semantically identical, and deduped them. 5.9 doesn't, but it also produces more compact code.)

It's a similar story for rust, since the example is written in a functional style with an iterator. The C and C++ versions only have an imperative for loop variant, so the optimizer doesn't matter as much for them. (Though you could easily produce a C++ version of the code that takes a sum() function, and has similar bloat until optimization is enabled.)

tacitusarc · on Nov 17, 2023

It seems necessary to point out that moving the goalposts during a discussion isn’t terribly productive.

gsuuon · on Nov 15, 2023

The GOOD refactor would only work if the condition didn't depend on `walrus` and helps to make that fact explicit. If you apply "push fors down" again you end up with:

  if condition {
    frobnicate_batch(walruses)
  } else {
    transmogrify_batch(walruses)
  }

swsieber · on Nov 16, 2023

> But junior programmers or students reading dogmatic-like wise-sounding rules may produce worse code to strictly follow these kind of advices.

Yes.

I for one am happy such articles exist. Thus one articulated something I've run into quite a few times without being able to fully articulate the issues. It seems like a nice mental model to have in my back pocket.

That said, I appreciate comments like your because I hope the dogmatic junior dev comes across it and hopefully becomes a little more nuanced.

Olreich · on Nov 16, 2023

I’m close to giving up on dogmatic junior devs. They don’t ever seem to give up their dogma, even in the face of direct damage caused by their dogma. The closest I’ve seen is a dogmatic junior dev saying they understand and will relax on the dogma, but turning around and enforcing it when they think you aren’t looking.

I’ve seen dogmatic junior devs turn into dogmatic senior devs, but I’ve never seen a dogmatic junior dev turn into a pragmatic senior dev.

matklad · on Nov 16, 2023

Thanks! You are exactly the target audience for this article! It really is only useful when:

* you already have the intuitive understanding of problems being described

* but you haven’t quite yet verbalized the problem and connected the dots

This clicked for me yesterday, happy that it helped you to articulate what you’ve already knew today!

whartung · on Nov 16, 2023

> saying these are rule of thumbs they're not always taken this way by young programmers.

The important thing to learn about all “rules of thumb” and “best practices” is the WHY behind them. Programmers especially should not copy/paste these things and apply them by rote.

RoT and BP blindly applied may well not be a good idea. As with everything “it depends”.

tenahu · on Nov 16, 2023

That was my same thought. This example would only work in specific circumstances, so why present it as a rule,

hollerith · on Nov 15, 2023

Agree. Also, most of the time, the form that is easier to modify is preferred, and even if `condition` does not currently depend on `walrus`, it is preferable for it to be easy to make it depend on `walrus` in the future.

keithnz · on Nov 16, 2023

given it looks state mutating, I'd write this something like

    walruses.apply(condition ? frobnicate : transmorfify)

where apply does the looping.

Waterluvian · on Nov 15, 2023

This kind of rule of thumb usually contains some mote of wisdom, but generally just creates the kind of thing I have to de-dogmatize from newer programmers.

There’s just always going to be a ton of cases where trying to adhere to this too rigidly is worse. And “just know when not to listen to this advice” is basically the core complexity here.

gavmor · on Nov 16, 2023

De-dogmatizing needs to happen, so what? I think these kinds of rules are helpful to play with; adopt them, ride them as far as they go, invert them for a day or year , see where that takes you. You learn their limits, so what? More grist for the palimpsest.

actionfromafar · on Nov 15, 2023

I think this article could be useful as a koan in a larger compilation.

Some of the koans should contradict each other.

ryanjshaw · on Nov 15, 2023

I wrote some batch (list) oriented code for a static analyzer recently.

It was great until I decided to change my AST representation from a tuple+discrimated union to a generic type with a corresponding interface i.e. the interface handled the first member of the tuple (graph data) and the generic type the second member (node data).

This solved a bunch of annoying problems with the tuple representation but all list-oriented code broke because the functions operating on a list of generics types couldn't play nice with the functions operating on lists of interfaces.

I ended up switching to scalar functions pipelined between list functions because the generic type was more convenient to me than the list-oriented code. The reality is you often need to play with all the options until you find the "right" one for your use case, experience level and style.

aeonik · on Nov 16, 2023

I'm curious, why couldn't the list of generic types play nice with functions operating on lists of interfaces?

ryanjshaw · on Nov 16, 2023

Have a look here: https://onecompiler.com/fsharp/3ztmx2uhr

Basically we want a "Yes" or a "No" when the family has children:

    let kidsYN = family |> numberOfChildren |> yesOrNo

But we get:

    error FS0001: Type mismatch. Expecting a
      INode list    
    but given a
      Node<Person> list    
    The type 'INode' does not match the type 'Node<Person>'

Forcing us to do:

    let familyAsINode = family |> List.map (fun n -> n :> INode)

Sure you can wrap this up in a function but it's ugly and annoying to have to use this everywhere and takes away from your logic. It ends up being better to split your "batch" and "scalar" operations and compose them e.g. by introducing a "mapsum" function:

    let kidsYN2 = family |> mapsum numberOfChildrenV2 |> yesOrNo

ilitirit · on Nov 16, 2023

I'm glad many people have identified why "pushing ifs up" is often bad advice. This article should give examples of when and why you should use either approach. Furthermore, I would argue that there's far too little information and context presented here to even make a decision like that.What do `frobnicate` and `transmogrify` do? How many callers would need to perform these conditional checks? Do these if statements convey domain logic that should actually belong in the walrus class? If these checks need to be made often, would it make better sense to capture the call as a lambda and then only perform the check once instead of having a conditional for loop? Etc etc.

jampekka · on Nov 15, 2023

Fors down sounds like a bad advice, and the rationale for it seems to be The Root of all Evil.

"Fors up" allows for composition, e.g. map. Fors down makes it clunky at best.

cubefox · on Nov 16, 2023

The pushing-ifs-up assumes you are using a language where objects aren't nullable. This doesn't apply to most languages. Otherwise we would just get a potential null pointer exception when we don't check for null inside the function.

gumby · on Nov 16, 2023

The idea of hoisting precondition ifs into the caller is terible! Sure there are special cases where it's a good idea (if nothing else it skips a function call) but in the common case you don't want to this.

In a library you want to check preconditions at the external boundary so the actual implementation can proceed knowing there are no dangling pointers, or negative numbers, or whatever the internal assumptions may be. Depending on the caller to do the check defeats the purpose.

Also in many cases you would need to violate encapsulation/abstraction. Consider a stupid case: `bool cache_this (T obj)`. Let the cache manager itself check to see if the object is already there as it can probably touch the object fewer times.

I agree on the `for` case but it's so trivial the article barely talks about it. Basically it's the same as the encapsulation case above.

LegionMammal978 · on Nov 16, 2023

> In a library you want to check preconditions at the external boundary so the actual implementation can proceed knowing there are no dangling pointers, or negative numbers, or whatever the internal assumptions may be. Depending on the caller to do the check defeats the purpose.

I think the idea is to instead address this with a type-safe interface, designed so that the external boundary physically cannot receive invalid input. The caller would then be responsible for its own if statements when constructing the input types from possibly-invalid raw values.

> Also in many cases you would need to violate encapsulation/abstraction. Consider a stupid case: `bool cache_this (T obj)`. Let the cache manager itself check to see if the object is already there as it can probably touch the object fewer times.

I don't see the suggestion as encouraging such a thing: the "cache_this" check should only ever be performed when it's known for certain that the user wants to access the cached object, so the entry point of the cache abstraction acts as a kind of boundary that the if statement depends on. And the if statement clearly shouldn't be pushed above its own dependency.

gumby · on Nov 16, 2023

> I think the idea is to instead address this with a type-safe interface, designed so that the external boundary physically cannot receive invalid input.

Ah, an idea that people have broken their picks on for 50 years.

We will pay out bonuses to employees who worked at least 9 months last year. Actually current employees and former employees. I should create a special type for this case and then depend on the caller to construct such types only for qualifying recipients? That’s just “moving the if” back into the caller. Any responsible developer would validate those entries at the interface…and is it complete?

LegionMammal978 · on Nov 16, 2023

Ths idea is, we wouldn't be "depending on the caller": the constructor itself would indicate failure instead of producing an object if the data is invalid, so the caller can't mess up without completely violating the abstraction.

So we would still be doing the validation "at the interface". It's just that the interface would be split into two separate functions, one checking for qualification and another operating on qualified employees.

Also, an alternative would be to push the for loop down into the interface, so that it enumerates each employee, immediately filters them for qualification, and then performs whatever other work is needed for paying out. (It could even include a caller hook to perform additional filtering before paying out.)

hedora · on Nov 16, 2023

In cases where that doesn't make sense:

   let f = Some(get_a_u16());
   foo(f);

   ... 

   func foo(f: u16) -> u16 {
      match f {
          None => 0,
          Some(f) => f * 1234
      }
   }

I'd expect any reasonable compiler to include enough link time optimization and dead code elimination to compile the whole mess down to a single multiply instruction.

LegionMammal978 · on Nov 16, 2023

I don't see what you're tying to show with your example? The typical case is, you start with a raw type in the caller (u16), and then you create a derived type (Option<MultipliedU16>) as a result of validation. Also, you'd ideally want to handle overflow errors in the multiplication, especially with something as small as a u16.

(And in case it helps anyone, for a trivial function to be inlined across crates in Rust, either the function must be marked with #[inline], or a special LTO option must be explicitly set in the Cargo profile. So often it's a good idea to mark all public trivial utility functions in a crate with #[inline].)