Data Still Dominates

MikeOfAu · on July 10, 2019

LISPs have this idea so deeply etched into them that code itself is data - homoiconicity.

In the Clojure community, for example, you will find that the primacy of data is well understood and used. Data-Oriented design and all that. You'll see the aphorism "Data is the ultimate in late binding" mentioned.

You will also find that it is immutable data that reigns supreme.

Notice also how, in a reactive system, it is the data (or the arrival of it) that coordinates the functions/processing, not the other way around.

I was exposed to the "data first" (and non-OO) way of thinking about 5 years ago, after being in the OO world since the mid-80s. I haven't looked back. When the penny drops for you on this it is a profound and liberating moment (or at least it was for me).

hackits · on July 10, 2019

There was two competing viewpoints of OO. A object is nothing more than an animated data structure, and the opposing view saw objects as behaviors and you didn't map them explicitly to your data model.

As most people viewed object's as simple extension over the data, it's not surprising to see most developer opt out for more basic form of data representation and that doesn't need object's at all.

i_v · on July 9, 2019

I've once designed "The Fantastically Modular and Configurable Ball of Mud." Scratch that. I've definitely succeeded many times even if that wasn't the goal. I now take the approach of writing single-purpose throwaway tools until I've gotten a firm grasp on exactly the use-cases I'm not targeting. Few open source projects list anti-features, but when they do, I'm extremely appreciative.

js8 · on July 10, 2019

Good grief, the three examples sound like a project I am working on..

I have a feeling that microservices are the new OOP, just a layer higher. Just like in OOP, where you keep data in lots of tiny objects, you keep the data across a bunch of little databases instead of a big central one.

It is a very seductive idea (and as Alan Kay explains, inspired by biology), unfortunately, it is really not that useful for designing real-world computer systems, where we want data to be consistent and aggregate them in different ways.

dkersten · on July 10, 2019

> it is really not that useful for designing real-world computer systems, where we want data to be consistent

I believe it was "Life Beyond Distributed Transactions"[1] that showed an alternative approach that often works without providing globally consistent data. I've done a few thought experiments in the past, where I tried to design some systems without transactional consistency. Its not necessarily easy, but it often (although, I assume not always) can be done. Whether the extra effort is worth it or not, depends on your specific needs and goals, I guess.

[1] https://queue.acm.org/detail.cfm?id=3025012

js8 · on July 11, 2019

I think this is a very different approach than microservices. You're not sidestepping the problem of consistency, you're aware of the trade-off.

dkersten · on July 11, 2019

Well, you're designing around the limitations of not having consistency. But yes, its not really related to microservices.

joeblubaugh · on July 10, 2019

In many real-world distributed systems, you actually trade off data consistency for scalability or the ability to independently change components among engineering teams.

I would argue that if you need strong consistency in very large datasets, distributed designs should more often be "sharded" designs where strong consistency is maintained in data shards and not necessarily between data shards.

js8 · on July 11, 2019

I think I don't have a problem with inconsistent shards. My point is, I think, more about the consistency of the data schema rather than the actual data. (You could also say that inconsistency is bearable as long as you can describe it.)

It's similar to a debate in OOP vs. FP about encapsulation. In OOP, data (schemas really) are hidden from other objects (services). In FP, data are "naked" and visible to everyone.

So the FP approach forces you to deal with the changes to the schema (which are often a consequence of resolving inconsistencies) on global level. OOP doesn't and it is eventually going to bite you.

rossdavidh · on July 10, 2019

I have heard it said that the ur-problem, the source of most other problems, in computer programming is that every lesson needs to get relearned, the hard way, every 5 years, because the average age is so young, and the excuses for ignoring the lessons of the past are so easy.

adrianratnapala · on July 10, 2019

Is it because people or so young, or that the pipeline to become a junior-but-still-responsible person goes through schools rather than something like an apprenticeship.

rossdavidh · on July 10, 2019

Both, I think. The field has grown in size, so that alone makes the young-to-middle-aged ratio skewed. Then, many middle-aged programmers drop out to become managers.

I totally agree that programming would be better taught by an apprenticeship model than the college model, and I say this as a person with two college degrees.

jackyinger · on July 9, 2019

I wish I’d realized this much earlier. But perhaps the journey to independently arriving at this idea was worth the delay.

Along similar lines, there’s so much noise about new frameworks and languages on HN that it may inspire a sort of cargo cult in less experienced developers (it took me a while to see through that fog).

Anyone have more suggestions for useful aphorisms?

sadness2 · on July 9, 2019

For an opposing perspective, I cite this article, entitled Process first, not data first

https://scalablenotions.wordpress.com/2015/09/30/process-fir...

TheOperator · on July 10, 2019

I can tell you process first thinking is hugely important in large business because there is usually already a process and it is incredibly costly in terms of political capital and job security to deviate from a long existing enterprise process.

lincpa · on July 10, 2019

In a large business, `process first thinking` is based on the credential(evidence). This credential is data.

Credentials (data) flow through the nodes in the process to form a dataflow.

Every node in the process is a pipe-function (pure function).

This is [The Pure Function Pipeline Data Flow](https://github.com/linpengcheng/PurefunctionPipelineDataflow).

sadness2 · on July 23, 2019

Thanks for sharing this classical perspective. When I'm kicking off a system for a complex organisation, I start by focusing on what processes/changes occur in the organisation. I find that users can intuitively reason about what they are trying to get done, which leads to clear pipe-functions, from which one can infer and discover correct inputs and outputs. If you begin with a focus on inputs/outputs/data structures, you tend to end up with a lot of disagreement and omissions. The idea is to get to a more suitable Pure Function Pipeline Data Flow sooner.

lincpa · on July 24, 2019

In large enterprises and legal societies, procedural justice pays more attention to the traces of procedures on data (auditable evidence)，So the input and output data is the most important.

Addition， I agree with the following view:

```

Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. ... Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data.

   ---- Eric Steven Raymond, The Art of Unix Programming, Basics of the Unix Philosophy

Show me your flow charts and conceal your (data) tables and I shall continue to be mystified, show me your (data) tables and I won’t usually need your flow charts; they’ll be obvious.

    ---- an early edition of The Mythical Man-Month

```

chrisweekly · on July 10, 2019

Awesome article, thanks for sharing.

noobermin · on July 9, 2019

I'm not a systems designer, so I am somewhat ignorant to make this statement, but this sounds almost like "nouns-first" is the right thing, but almost every single one of those people cited in the article are anti-oop people. Moreover, OOP has generally fallen by the way side and been blamed for unnecessary complexity in some legacy systems from the 90s/00s. The moment you conceptualize your system as the interaction of concrete data types, OOP starts to seem like a logical paradigm to choose.

May be this is just in my head because I recently discovered this talk (ignore the incendiary title): https://www.youtube.com/watch?v=QM1iUe6IofM

crimsonalucard · on July 9, 2019

OOP is not data first. Or noun first. Objects are different from data in the sense that objects are a combination of data and methods.

In OOP the line between data and verbs blur into a combination of the two called an object. The object is a primitive and if your primitive is a combination of data and function you cannot program things in a way that has data first. At least you can't do it without making the code awkward.

FP is not, despite the name, a verb first language. It separates verb and noun allowing you to choose to build a program in a way that is noun or data oriented while OOP does not.

Data is a noun that cannot do anything except exist. Functions are verbs that cannot do anything but change things. Objects are a combination that are neither and thus hard to do data oriented programming if objects in your code exist as primitive building blocks.

All this article is saying is make the data a primitive building block.

mamcx · on July 9, 2019

> Objects are different from data in the sense that objects are a combination of data and methods.

So data are models, functions views and objects controllers?

crimsonalucard · on July 9, 2019

Data are structs, arrays, tuples, variables, sets, enums, etc. Data structures.

wellpast · on July 9, 2019

(The talk you link to is strongly anti-OOP. I assume you linked it as an example demonstrating your statement that anti-OOP is on the rise these days.)

To respond to your point, though:

Traditional OOP wants to meld behavior into the noun. Which should make the data-first (noun-first?) POV more apparent -- data isn't first-class when it's coupled to behavior as traditional OOP usually has it. So functional-oriented program is more data oriented as data remains first-class (ie decoupled from behavior).

noobermin · on July 9, 2019

I'm sympathetic to your POV as it makes sense, but the author of OP even calls out FP at the end along with a list of what they seem to imply are not "data first" ways of designing systems. I'll go ahead and admit my bias here: rules are overrated, especially ones that propose "X first" at the expense of everything else because there are no one size fits all paradigms. You can find situations where orienting around types makes sense but you can always find situations were it doesn't make sense.

I also don't imply it's "50/50" as I really don't know what works or doesn't, it could be data-types first design works in 85% of the cases and so it's good to promote it or think in terms of design that way. I just don't like "X is the right way" talk in general, although it doesn't seem like the author is proposing this be a hard fast rule.

wellpast · on July 10, 2019

I agree and I think rules are pointless. Everything is a tool and you pick the right tool for the job and that requires context and thinking.

However, having built systems in OO and then in functional, I've come to realize that the OO tools always have a surface of familiarity (`cat.meow()` vs `meow(cat)`) but are highly coupled (inflexible) while the latter, functional composition is not.

And if there's any objective measure to complexity it's a measure of degrees of coupling. And OO distinctly loses to functional there.

mpweiher · on July 11, 2019

Hmm...I would say that cat.meow() has higher cohesion and lower coupling than meow(cat). Which is exactly what you want.

Maybe a different example would better illustrate the point?

0815test · on July 9, 2019

Data is defined by behavior, even for "plain old" types. The point of "objects" is that sometimes you can implement the very same behavior in ways that are essentially isomorphic from the POV of your outside code, and want the freedom of switching out the underlying implementation at any time, and perhaps of validating complex invariants about the behavior your object has been designed for-- without letting implementation details dictate what sorts of behaviors you're going expose (the way, e.g. a "record" datatype exposes the equivalent of getters and setters, or a "variant record" exposes pattern matching, etc.).

This ("objects-based" programming; or programming with "abstract types") actually works fine. The part where OOP leads to real problems that make it inimical to true modularity is all about the tacked-on features of inheritance and polymorphism; specifically, implementation inheritance. Because that means you've started relying on the very interface you were supposed to define in order to implement some other behaviors implied in it, and then for good measure you're allowing that interface to change in practically arbitrary ways as new "derived" classes are defined. It's not surprising that this fails to work well.

wellpast · on July 10, 2019

> Data is defined by behavior, even for "plain old" types.

This is the POV of the OOP-ist, but it is not necessary and it is limiting. It's actually the "debate" we are having, so asserting it isn't proving it!

Functional programming has a complete story for polymorphism so OOP does not win on that account contrary to what you're implying.

I do think we have common ground in shunning class inheritance which is useless and a complete disaster.

However even without inheritance it seems to be that "objects" are provably replaceable by functions. (E.g., promote the implicit `this` reference to a first-class reference provided as an explicit arg to functions.)

I see no reason why data can't have opaque parts so data encapsulation doesn't seem to be a unique OOP claim either.

So then if objects aren't necessary for polymorphism and data encapsulation, then what are they good for?

0815test · on July 10, 2019

> This is the POV of the OOP-ist, but it is not necessary and it is limiting.

How is it "limiting"? And if you want proof, look at the untyped lambda calculus - there you find data types defined entirely in terms of functions - pure behavior! (For example, the Church natural numbers are defined by the behavior of iterating some arbitrary function exactly n times; the Church booleans by taking two arguments and returning either the first or the second argument (which in turns makes it possible to define if-then-else, a sort of pattern matching); and so on and so forth.) It just so happens that this behavior-focused encoding is enough to express arbitrary programs - which is the opposite of limiting!

wellpast · on July 10, 2019

> there you find data types defined entirely in terms of functions - pure behavior!

In every programming environment that I am aware of, such data-less functions you describe would be happily deleted without any worry to customers & stakeholders.

For example:

   add : Int -> Int -> Int

This may look nice on paper but on a real computer there are bounds to this purity. And anyway my program only becomes useful when actual integers are instantiated and appearing on stacks and the heap.

The "data-first" ideas we're discussing here ask one to stop obsessing over the functions and model the data soundly. You'll find any PL will do when operating over sound data expressions. This approach ime brings clarity and power to problem solving.

Theory divorced from practice is limiting. This is not a philistinic take, btw -- theory is supremely powerful when applied successfully for outcomes. But the "pure behavior!" you're talking about here seems too excitedly far away from practitioner-space.

meheleventyone · on July 10, 2019

The fact you can make function like constructs in languages that don’t have them as a first class concept doesn’t mean functions are good for nothing. Likewise objects. If you can see benefits to making object like things in a different paradigm those still hold for paradigms that make objects a first class concept. If you’re asking why use a specific language with that concept over one you can implement similar concepts in then usually the argument is going to be about the ergonomics and expressivity of doing so. Similar arguments were had as functions/procedures crawled out of the primordial ASM soup.

chrisweekly · on July 10, 2019

Yeah. It's _classical_ inheritance, in particular, that is typically associated with overly-broad criticisms of "OOP".

combatentropy · on July 10, 2019

What makes it clear to me is a fuller quote from Eric Raymond:

"Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. To see this, compare the expressiveness and explanatory power of a diagram of (say) a fifty-node pointer tree with a flowchart of a fifty-line program. . . .

"Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data."

--- http://www.catb.org/~esr/writings/taoup/html/ch01s06.html

To paraphrase with a modern example, imagine a thousand lines of JSON, with deep nesting and all kinds of fields. It would still be easy to understand. Even a nonprogrammer could read it and find the part he's looking for (say, someone's address).

Compare that to a thousand lines of procedural C code, or even a thousand lines of Python. That would be much harder, and your nonprogrammer friend would throw up his hands and walk away.

The thing you hopefully learn after years of programming is that you can trade the length of one for the other. That is to say, you could simplify the JSON into half its length, if you make the code that processes it more complicated. Or you could simplify your procedural code by storing more complexity in your JSON data structure. The advice is to do the second thing, "to shift complexity from code to data."

Also, from another chapter:

"Data-driven programming is sometimes confused with object orientation, another style in which data organization is supposed to be central. There are at least two differences. One is that in data-driven programming, the data is not merely the state of some object, but actually defines the control flow of the program. Where the primary concern in OO is encapsulation, the primary concern in data-driven programming is writing as little fixed code as possible. Unix has a stronger tradition of data-driven programming than of OO."

--- http://www.catb.org/~esr/writings/taoup/html/ch09s01.html

noobermin · on July 10, 2019

Thanks. Usually thanks comments are in bad taste on HN but this reply, especially the last quote is very enlightening. There is overlap but "data" here is not merely state of an "object" (which itself is an entity of abstraction).

dexwiz · on July 9, 2019

Data Structures are more like Types than Classes. Any useful program is going to have Data Structures and Algorithms that operate on it. OP v FP are just different opinions on how that should be organized, either combined into a single object or separated into types and functions.

vageli · on July 9, 2019

> Data Structures are more like Types than Classes. Any useful program is going to have Data Structures and Algorithms that operate on it. OP v FP are just different opinions on how that should be organized, either combined into a single object or separated into types and functions.

What is the difference between a type and a class though? Classes have modes of interaction through their interfaces, types can only be interacted with using specific operators. To me they are two sides of the same exact coin.

marcosdumay · on July 9, 2019

And there's the problem, data types shouldn't interact. Verbs are a much better place to put your interactions.

I think the simplification of OOP as "noums first" loses too much. That "first" there isn't chronological, but mere visual weight.

There are patterns of OOP design that put data first, but there are also way too many that put behaviors first, but that behavior is owned by some data. At the other way around, people doing "verbs first" programing are even more likely to start with their data design than the OOP ones, because they lack of some boilerplate structure and need to get something to fundament their design.

jinfiesto · on July 9, 2019

You can be nouns first and still be anti-OOP. I think the objection anti-OOP people have with OOP is the business of folding your verbs up with your nouns, not being "noun oriented."

i_v · on July 9, 2019

That's an interesting take on it. I'm curious what gave you the impression that the example projects were written by anti-OOP developers. I feel like it could go either way. Even as I was reading the article, I was actually thinking to myself how I first observed discipline in data-first thinking from my functional-programming friends.

lincpa · on July 10, 2019

The Pure Function Pipeline Data Flow, based on the philosophy of Taoism and the Great Unification Theory, In the computer field, for the first time, it was realized that the unification of hardware engineering and software engineering on the logical model. It has been extended from `Lisp language-level code and data unification` to `system engineering-level software and hardware unification`. Whether it is the appearance of the code or the runtime mechanism, it is highly consistent with the integrated circuit system. It has also been widely unified with other disciplines (such as management, large industrial assembly lines, water conservancy projects, power engineering, etc.). It's also very simple and clear, and the support for concurrency, parallelism, and distribution is simple and natural.

There are only five basic components:

1. Pipeline (pure function)

2. Branch

3. Reflow (feedback, whirlpool, recursion)

4. Shunt (concurrent, parallel)

5. Confluence.

The whole system consists of five basic components. It perfectly achieves unity and simplicity.It must be the ultimate programming methodology.

This method has been applied to 100,000 lines of code-level pure clojure project, which can prove the practicability of this method.

[The Pure Function Pipeline Data Flow](https://github.com/linpengcheng/PurefunctionPipelineDataflow)

wodenokoto · on July 10, 2019

lincpa · on July 10, 2019

In The Pure Function Pipeline Data Flow,

- programming is the process of designing a data model that is simple and fluent in manipulation.

- Data and logic are strictly separated, Element level separation of data and logic, data stream processing.

- Its Warehouse / Workshop Model is ideal for data programming.

heavenlyblue · on July 10, 2019

If I make an another comment, will you add more buzzwords in your next reply?

lincpa · on July 10, 2019

I don't remember having commented on you. There are many people who like my article. Their skills are very good. I hope that you can have a technical comment.

codeisawesome · on July 10, 2019

So I love the quotes at the top of the article, and evidence bears out on those. HOWEVER, is it really that big a problem that the Search Index goes out of sync with the Database? I suppose it depends on the use-case, but without that spec info we can't make a judgement. Additionally, deploying a beefier task queue and increasing the size of the search cluster might have also helped... I feel it's a bad example.

rossdavidh · on July 10, 2019

It depends upon the process which was doing the search. For example, does some automated process search for possible duplicates of a given field, nearly immediately, and need to know whether the returned set includes the new record (or not), so it can know whether "1" means "1 duplicate" or "no duplicates". That's just an example I made up on the spot, but there are more possibilities. If storing a record sets off a chain of events, and one of those might do a search, then the sync between index and database might be critical.

Or, of course, depending on the system, it might not.

combatentropy · on July 10, 2019

> is it really that big a problem that the Search Index goes out of sync with the Database?

No. My takeaway is that they should have used an SQL database as the main storage from the beginning, since it was "basically responses to standard forms." Then the ability to search would have been included. Instead they oversimplified their primary storage as something that only handled "get" and "put" --- I imagine something like Berkeley DB.

robohamburger · on July 9, 2019

It is always worth taking a look at what you need to accomplish and the data you have and the tools for manipulating it then build (or not build) a system.

If you just hit everything with the REST, OOP or db hammer you end up with things that more complicated than they need to be.