Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Starting Over: A FOSS proposal for a new type of OS for a new type of computer (liam-on-linux.livejournal.com)
157 points by lproven on Feb 8, 2021 | hide | past | favorite | 119 comments


It's not that new an idea. Keeping everything in persistent memory has been tried quite a few times. LISP went down that road, with Interlisp and Symbolics. Everything is in one persistent LISP image, and you save entire images if desired. Symbolics had some extra hardware support to tag objects in memory and get some level of pointer safety.

OS/400, and to a lesser extent the B5500 and successor machines, went down that road too. That model was "everything is a relational database." OS/400 had a 20 year run; released in 2008, end of life in 2018.

Other ideas that never really caught on include hardware key/value stores, or "Smart RAM". That may come back, since accessing non-volatile RAM through disk drivers is inefficient.

The talk seems mostly focused on GUIs, which is a separate issue.

If you want to think about a new generation of operating systems, think about what to do in the GPU space. That's the future of hardware. Single CPUs have hit the wall on speed, but there's plenty of room for GPUs with more functional units.

GPU programs are almost stateless. Very functional. But at some level you need state. How should that interact with a near-stateless GPU? Big unsolved problem.


I was stuck in the planning phase for a pet project up until a week ago, for a reason why Lisp and Smalltalk always made me nervous.

Bugs can put your persistent state into uncharted territory, and there may be no clear path back. There's a reason we still have 'turn it off and back on again' in our bag of tools. Often it's the only thing that reliably works. This makes systems designed never to be shut down and turned back on again deeply unimpressive to the more practically minded members of the audience.

One of the places persistent state crashes and burns is when the system of record and the source of truth combine. Once the source of truth breaks we can't know what's true anymore.

My breakthrough was figuring out that I'm trying to get data from something that is already a pretty good system of record, I can just keep letting it do that job indefinitely. My source of authority needs to transform that data, not own it. As long as I can detect out-of-band changes to the system of record (which I can), I can always rebuild my models from scratch. That allows me some excellent test fixtures. That also gives me the option to do manual 'surgery' on the system of record rather than sinking my roadmap to implement a full feature to do something I might not need to do again for another year, or that no customer will ever see.

I am just on the same continuum as everybody else. My code needs to make a lot of decisions. I can't afford to make them all from first principles every time. I need to store intermediate values. I also need to identify intermediate values and question them. If I don't get help with this from my architecture, myself and coworkers will blur the lines every time a problem seems a little beyond our abilities, and eventually nobody will know what's true anymore except the delusional ones. How do I know this? Because I have never seen any other outcome. The only differences are in how much surprise the team exhibits.


> There's a reason we still have 'turn it off and back on again' in our bag of tools. Often it's the only thing that reliably works.

An alternative perspective—at the very large scale, the opposite may be true. Turning something off and then on again may be one of the worst ways to fix problems with certain large systems.

At a large scale, you would document and manage the state of the system, make changes which put that system into a new state, and build tools that monitor and enforce system state (declaratively where possible, but it’s not always possible to be declarative).

Sometimes there will be duplicated effort between making sure a system runs smoothly and making changes while it’s running, and making it so the system can turn off and on again. Think about the duplicated effort between setting up a new database and making schema changes to a live production database.

YES, you want to have a source of truth. But the “turn it off and on again” source of truth is usually a sequence of imperative commands rather than a description of the desired system state, and comes with its own reliability problems. A description of system state as source of truth has a different set of reliability problems but I think it’s absolutely necessary to explore the solution space between these two extremes—shooting for a completely declarative system is futile, but an imperative system can be very hard to reason about.


> Turning something off and then on again may be one of the worst ways to fix problems with certain large systems.

From an overall system-wide perspective, sure. But from an individual component perspective, it seems to work pretty darn reliably. There's a reason why Erlang/OTP (as one example) has such a reputation for fault-tolerance and robustness.

And on that note, Erlang's model of supervised preemptive processes seems like it'd be a good fit for unified-memory computing, especially if taken to an "everything is a process" extreme.


Not only that, but in order to build a fault tolerant system you __WANT__ your components to fail fast and hard. Anything else runs the risk of your system doing the wrong thing while you have no idea it's happening.

failing is noisy, it's a very clear signal to the system to pay attention.


Yeah I think the whole “need to reboot to get to a clean state” is maybe a bug not a feature.

It is certainly possible to build systems that can run for years or decades.

I’m guessing that persistent memory will require us to think about changes to state in memory are more like ACID in databases. We should think about rollback instead of reboot.


The problem isn't so much about the correctness of a record, or being able to recover from changes in state, it's more about ensuring the consistency of the states of multiple different parts of the overall system. How do you recover consistency?

Even our brains have not entirely solved this problem. Sometimes when you hear a stimulus such as a phone ringing different parts of your brain will receive and process the stimulus at different times, so it can seem as though you become conscious that the phone has rung before you become consciously aware of hearing the sound of it ringing.

In a computer system you might have some parts working based on derived values from a record in the source of truth. This can lead to other parts working based on other derived values based on a different value when it has changed, so they disagree.


It is possible to build such reliable systems, but it is not economical for the vast majority of software that gets written — at least with current tools.

If getting something production-ready takes X time, making sure that thing doesn’t ever need to be restarted takes many times X, not to mention the huge investment in testing to ferret out those rare edge cases that cause hangs or inconsistent state.

Someday when software engineering comes into its own as an actual profession, maybe our tools will also achieve a higher level of consciousness, as it were. But nothing mainstream today comes close to enabling this.


> is maybe a bug not a feature.

Once you start thinking of safety as a feature you've ruined your ability to think responsibly about anything else. A reset is a safety mechanism. You shouldn't need it, but only a sociopath would try to get rid of it.

Almost all of the software we write is now affecting people's lives. That your bug may eat someone's term paper has been a problem for decades, and nobody ever internalizes how that affects that person's life. The avalanche of cause and effect may never settle down. Now we can get people fired. Ostracized. We are trying to take the power but not the responsibility, and it's bullshit. No, hypocritical bullshit, because we spend a lot of time here complaining about others doing the same thing.


> You shouldn't need it, but only a sociopath would try to get rid of it.

> Almost all of the software we write is now affecting people's lives.

Don’t twist a technical disagreement into some kind moral outrage. We both have the same goal of providing reliable software to people.

The problem is that for the past decades, resetting the machine has deleted countless term papers. Nowadays, people generally put their term papers in cloud services that are always on. This is a huge step forward. We can bring this kind of reliability to offline usage. I think this requires taking lessons learned from cloud/server farm/mainframe world (always-on) and figuring out ways to apply them to personal computers.


The main lesson we can take is to design our software with the assumption that the machine can be turned off (or crash) at any time. That is obvious when you’re in a browser or running in a VM, but it was very far from engineers’ minds 20-30 years ago.

Most note taking software I’ve used doesn’t even have a save command. That’s a huge step forward.


[flagged]


If you’re just commenting in order to stir up trouble, please leave.


I think that there are non-sociopathic reasons to explore interesting areas of designspace. It might turn out to be safer in the long run, and until it is, nobody's forced to use experimental software.


Sometimes the solution is different because the scale is different.

Sometimes, the solution is different because we think we understand a problem that we really don't. I'm not ready to allow that the current wisdom on system design is not what Tony Hoare was thinking of when he said, "a system so complex that there are no obvious problems".

Arguing such things with people only seems to stick when we are collectively in the Trough of Disillusionment, which may be coming up pretty soon. So ask me again in 18 months.


> the “turn it off and on again” source of truth is usually a sequence of imperative commands rather than a description of the desired system state

Not quite sure if it's related, but this is why I've been hesitant to go down the Docker rabbit hole that so many devs are pursuing these days.

On the one hand, I absolutely see the value of 'disposable' containers: discarding flaky instances and rebuilding a pristine one from a version-controlled config (i.e. "turn it off and on again").

However, the mechanisms to do that (re)building seem to consist of an unreproducible opaque binary blob (a "base image"), which gets augmented by a sequence of highly imperative commands; usually doing completely unpredictable things like 'apt-get update && apt-get install -y foo'.

I want the former, but the latter seems like a massive step backwards. I've seen some work on building Docker "images" using Nix, so that's the path I ultimately want to try. Colleagues may take some convincing though!


I fear you are suffering from misconceptions about how containerization works. It's as if stating that a compiler is invoked every time when a process starts.

Most Docker images consist of multiple superimposed filesystems. When a container is started, a new empty layer is stacked on top which captures all filesystem writes during the container's execution.

This allows the following features:

- After the container stops running, these layers can be combined into a new image. This is what happens when you build images.

- The original image is never changed and can be used at any time to start a new container.

- When a new version of the application is released, it is not necessary to rebuild the whole image from scratch. You just start from the layers below.


Building Docker images is now in nixpkgs, I think the function is named "buildDocker".

If you're deploying to a system under systemd, you can run anything that looks like a sysroot using nspawn. I find this much more enjoyable than going through the huge sidecar that is the Docker runtime.


As systems become very large, ‘turn it off and back on again’ is often counter-productive to the point of being the most destructive thing you can do. Repairing state is slow and expensive, and meanwhile the world is happening. This is the reason why all ultra-scale systems use some kind of “k-safety” durability model. The system would effectively be unrecoverable if you actually rebooted it because it would struggle to get back online. This is increasingly the rule rather than the exception for data infrastructure.

Even in the handful of recent systems I worked on where there was an explicit external system of record and rebooting was an option operationally and technically achievable, we still had to do semi-exotic custom engineering to make sure it was practically achievable.

When I started in the industry, ‘turn it off and back on again’ was the default strategy. These days it is rightly reserved for the dwindling number of cases, outside of small systems, where it still works. The computational cost of restarting is prohibitive.


Wifi routers have this notorious problem with the nvram. Sometimes some state in the nvram is corrupted and the routers would stop working. Turning off and on again won't work. A factory-reset to wipe everything out is the only way to get out of the corrupted state.


Around the time of Randall Monroe's "shibboleet" comic, I had an ISP connection that would flake out. If nobody kicked the cord on my end, the same series of events played out every time:

Check the cables.

Turn it off for 30 seconds.

Check the connection to the router.

Check the lights.

Try random other shit.

ISP reboots their modem instead of mine.

Networking restored.

Now, in theory, when a commercial modem loses carrier it should reset the connection completely. This they kept telling me, every four or five months for two years. And every time I got to sit patiently while their support person learned that this is not always the case, over the course of the next 20 minutes. If I tried to tell them about last time, then they would verbally pat me on the head and keep wasting my time. Whatever cooperative game or The Killer Is About to Reveal Himself was now a distant memory.


Right the problem is really about maintaining consistency, how do you checkpoint the entire system and all your different components with their derived values.

I used to work on a system with a distributed hierarchical object store. We had copies of the data in various geographic regions, with copies of the same or different applications running in each region connected to a local copy of the store. When you write to your local copy, the storage system would propagate changes out to the other copies.

Maintaining consistency was a constant problem. One way to handle it was to give each region running a copy of a given application it's own regionally named folders in the object store.

The problem is similar to when you have application or database clusters with nodes that lose contact with each other, so called split brain syndrome, and strategies for determining which node should become authoritative. This is where the term STONITH comes from - Shoot The Other Node In The Head.

You might not want to restart the whole system, but you do need to be able to tell subsystems that they need to refresh their state. A reboot is simply a manual implementation of that, but in a continuously running system you need to build algorithms and strategies for deciding when to do such refreshes in the system itself.


> This makes systems designed never to be shut down and turned back on again

Not necessarily; you just need a way to bootstrap the memory. Plan9 is pretty easy to run a fully functional terminal from memory/network access. Live CDs work pretty well, too.


> OS/400 had a 20 year run; released in 2008, end of life in 2018

Err, setting aside the math inconsistency there, AS/400 was released in 1988, and the 2 most recent versions are still supported..

[0] https://www.ibm.com/support/pages/release-life-cycle


They are not only still supporting it, it is a current product under a new name (IBM i). We just bought at S924 last year running IBM i.


OS/400 and AS/400 are some of most underrated and unknown machines.

AS/400 (I only used it back in those days, I don't know enough about newer series). Would run for years with no attention given to them. I saw people stuff it in a closet and forgotten about it.

Some did have the smarts to rotate backup tapes then they knew where it was.

If the machine needed help it would "call" IBM and report it, and there would be a technician there to help it before the customer knew there was a problem.

As with many things, I wish IBM would make it GPL, but the changes of that happening is very low. Also the programming languages is not familiar to many


The current machines we have (S924) calls IBM by itself. It had a bad RAID card so an IBM technician showed up and replaced it. It is a bit strange as a System Admin being the second one called.

We do a daily tape backup. LTO-7 is big enough to hold all the financial data. There is also a couple of companies that will do automatic offsite backup for a very reasonable price. They will even stage the data to a hosted environment until you can get a replacement machine (and I suppose a replacement building).

I do wish IBM had a simulator or small machine that programmers could use. The cost of entry to write software for the things is way too high.

I would like a BSD / MIT license as I am a BSD fan. Honestly, I wish I had time to work on a non-UNIX non-Windows Clone Operating System. Ideas like the AS/400 / IBM i architecture are amazing.


Sorry, I meant 1988. My bad. Wikipedia has old info about versions and end of life.


Real applications don't hit CPU walls, they're bottlenecked by I/O more often, and the biggest CPU wall is memory accesses. There is a lot of room for innovation there if you shirk off POSIX compatibility.

Also, state is not a problem for functional semantics. Remember "functional" is a semantic distinction, it doesn't imply a ton about optimized implementations of those semantics. For a stateful system the only thing different about a functional approach is the programmer cannot express a program where yielding next state invalidates current state. Most of the time a sound, functional program can be compiled into in place (mutable) operations on the state, the difference is they're free of logical errors.


You are making the author's point for them :-). Now add in that some of the stuff that was "custom hardware" that supported the concept of 'always resident' OS state is now off the shelf (specifically Optane/Crosspoint stuff) Of course Intel killed of consumer Optane for SSDs so one wonders what is going on there.

The second point, about how to think about GPUs in the context of "modern" computing is, for me, the more interesting one. As mobile processors have for years been a GPU with a processor accessory laptops and desktop computers seem to be headed that way as well. To be honest I'm surprised there isn't a GPU "motherboard that you can plug one or two CPU daughter cards into.


I think the answer as to why Optane consumer SSDs were killed off is fairly clear.

They were 5 or 6 times the price per gigabyte as NAND based SSDs and didn't offer more performance in most consumer applications. The SOHO community didn't like them all that much either because the consumer Optane SSDs lacked most of the low queue depth performance that the enterprise disks are known for. Most consumers are also replacing their SSDs long before they actually wear out, so the extreme durability of Optane is kind of moot there.

It's a fantastic enterprise product, but I don't think most consumers are really getting any extra value out of them over NAND at this point. The only guy I knew who bought one wanted to put their Gentoo build directory (/var/tmp/portage) on it.


I wasn't aware that intel killed Optane-Only SSDs for consumers so I looked up and here's a recent article about it: https://www.tomshardware.com/news/intel-kills-off-all-optane...


[Author of the piece here]

Thanks for the comment.

You do make me wonder if you listened to the whole talk, or read the script (https://docs.google.com/document/d/1wM1-c7euvQaRaCL4hKCaE8VR...)

I specifically addressed and discussed Lisp, Symbolics, & OS/400 (which is still alive and well).

IBM i (as it's now called) seems to more or less treat everything as a single giant disk than as a single memory space, though. However, it's been hard to find solid info for non-IBM-specialists.

It is not at all focussed on GUIs, no. However, I needed something pretty to put on the slides.

The focus is explicitly, as discussed, on end-user general-purpose computers, not servers. Servers are very well-served (pun intended) already.

Yes, I _know_ that this is not all-new. Indeed, that was one of my core points. The reasons are:

[1] I think there is more significance to something obscure or niche that has survived for decades and is still actively maintained and used than almost any flashy new offering which could fizzle out again just as fast.

[2] Using existing tech means that there are people out there with experience in it. It is not all-new for everyone. That's helpful.

[3] I personally am not very interested in GPU tech, and I think the new Apple M1 ARM chips bear out why: the standalone super-powerful GPU with its own RAM is probably a temporary artifact of PC designs and the gaming market.


I believe there are only two problems in computing: how data is divided up, and how data is bound together. They function nearly the same because you can use one to get the other, but the choice of which one to apply is critical and influences the shape of the media built on top. Where division is easy, things get shoved into countless categories. Where binding is easy, everything gets glued together.

With respect to the issues of operating systems, "data" maps to "hardware resources". With execution logic, "data" maps to "software architecture". And so on.


[Author of the piece here]

Those are not things of central concern to me currently, but another project that has some similar central axioms is looking at just that.

This talk is long but goes into quite a lot of depth. You might enjoy it.

https://www.youtube.com/watch?v=bSNda9EzNOI


>If you want to think about a new generation of operating systems, think about what to do in the GPU space. That's the future of hardware. Single CPUs have hit the wall on speed, but there's plenty of room for GPUs with more functional units.

Will it be more like the web with HTTP?


As long as the problem you are trying to solve is shaped like rendering pixels on a screen, GPUs are great! The future is multicore processors not bottle necked by the bus to a GPU. And coroutines.


OS/400 is very much alive, its called System i now.


Wait, OS/400 is going to be EOL-ed? How, why? :-/


It's not, bad source apparently misled commenter.


I feel like I have so many problems with this. Like, look, I have some huge projects in Adobe Illustrator and they are quite deliberately spread out among a ton of files because anything over a certain size is just asking for working on it to become super-slow, and for some obscure edge case to completely trash the file.

I'm skimming the PDF of his slides and I'm just seeing a long list of quirky environments that have continued to be dead-ends. Maybe he talks about how this would actually work with huge power tools if I watch the talk, maybe he grapples with issues like "programs crash" and "iterative backups are good". Maybe he even deals with things like "files are a metaphor based in the physical world and perhaps the reason we keep on coming back to them is because they provide a good way to say that this particular piece of information is over here". I dunno, I'm not seeing suggestions of anything that makes me want to invest the time in listening to the talk.

I wish him the best of luck, maybe he will be the one to finally find a way out of the world of files to something so clearly better that it's worth dealing with the raw edges of it only being a few years old vs a modern GUI atop a filesystem, but I sure ain't gonna hold my breath.


I can say this. I highly doubt we’re done innovating with how software is made. I feel like systems development has gone extremely out of fashion in the past couple decades. The OS’s and tools we have are good enough to get a lot of stuff done.

However, there’s still quite a few problems, and we don’t even know the productivity level we could get to, because we haven’t seen the future. There’s nothing wrong with not being a dreamer or not trying wanting progress the industry. But, there’s also nothing wrong with doing those things either.


[Author of the piece here]

The slides are just decoration and can be safely ignored.

You would be much better off reading the script: https://docs.google.com/document/d/1wM1-c7euvQaRaCL4hKCaE8VR...


I applaud the drive and vision to experiment with a fundamentally different operating system paradigm. But even if your persistent storage is RAM, I don't see why you wouldn't still want a filesystem. Filesystems arose as a method of organizing data. However fast your access time, you still want that. Maybe you could have some kind of database backed tagging system instead of a directory hierarchy, but the benefits and drawbacks of such a system should apply to an SSD or even HDD as much as to something in RAM. This goes for user data (eg. photos, documents) as well the organization of the operating system.

Anyway, I'm curious to see what what this project comes up with. I suspect it will be something that looks and functions a lot like a filesystem, but I'd be happy to be proven wrong.


I see what you are saying but I think it misses the point. We need a filesystem right now because data disappears from RAM when electricity goes away.

With persistent memory you don't need that extra serialization/deserialization step to/from the filesystem. Look, think it like this:

1. In memory we store our program code and data as blobs in various parts, in the stack, in a heap, some only for the current program, other stuff like shared libraries linked at runtime to various programs. It's a rats nest of pointers.

2. When we "save" or "restore" we takes that rats nest and pull out some of the important parts (e.g. we ignore the shared libraries in memory and the app code) and we serialize that state out to disk.

3. Then the file system takes that logically contiguous "file" and breaks it up into pieces and stores those across the disk in various blocks and writes metadata which describe how to put the pieces back together again.

With persistent memory we can ignore steps 2 and 3 most of the time, but they still might be useful for sending data to other computers, especially step 2.

Disclaimer: I read the slides but didn't watch the talk


Sending data to other programs in the same computer too. Example:

1. Download pictures from camera. You need a program to transfer them somewhere you can find them.

2. Choose 12 of them to make a calendar for next year (OK, let's assume we're already in November). You need a program to view them.

3. Edit them with your favorite. Save as copies, not to overwrite the originals.

4. Upload them to a web service using a browser.

A file system is useful to support all of these steps. Android's share to other apps kind of hides the file system but it doesn't succeed 100%. Sometimes no app registered to handle some file type and we end up picking the file in a file explorer interface. Sometimes we even have to know where that app stores its files. My most usual case for this are the gpx files of OsmAnd, when I transfer them from the small phone I use as tracker to my main phone.


I'm very interested in getting rid of "Files" and making programs use/manipulate databases more, but the issue of sharing things has always been a problem. Like you say in Android, but also iOS, it's annoying.

However, you've given me the idea of what if instead of registering to handle a file type and giving the user a way to select a file we made sharing better by making applications provide a kind of "Files" view in to their data which may simply be items in their databases. On iOS the app ShellFish provides a seamless view to the files on my servers inside of the Files app and the Files picker of the OS. In macOS the finder's file picker has a section for Photos which lets you select a photo without needing to export it to a specific file. We could instead select from applications on our system instead of going to each application and sharing each piece of data we need.


When getting rid of files remember that there will be legacy OSes and media forever.

In my example, that camera is storing pictures on a removable media, maybe FAT32 even in 2050. And there are good reasons for that: I saw mode than one person filling more than one 64 GB SD card with pictures on a 3 weeks journey.

And at the end of the example the machine that prints the calendar might want individual files and reset its state at the end of each print. It can even be a way to resist to attacks that upload malicious pictures.


I think the main benefits would be rethinking the core of persistence to be more about atomic operations on datastructures as opposed to POSIX filesystem IO.

I personally believe that files are still a useful metaphor, and this hypothetical OS that we're discussing would likely still provide them, at least for interop.

An example being a database. Provided we didn't have to batch writes, be aware of dirty pages, write things in certain order to maintain coherence in the event of a crash, etc... we could build something really simple. Ultimately it would depend a lot on the details like memory bandwidth+latency between DRAM and NVRAM


Even AWS S3 is a key-value storage system. Meaning your key can be anything.

But over time, people tend to name the key using a path structure, separated by forward slashes, to give a logical separation of data into subdirectories.


Advances in performance are largely about impedance matching, from sequential access tapes to registers in a CPU. You want to use an efficient interface and abstraction for the level you're at. The direction of progress is toward faster, more randomly accessible with less seek cost: HDD -> SSD -> XPoint, etc. This totally makes sense--using RAM through a disk interface (e.g. RAM disk) is useful if you want a faster disk, to get something faster change the interface to take advantage of the medium.


This part:

> A possible next evolutionary step for computers is persistent memory: large capacity non-volatile main memory. With a few terabytes of nonvolatile RAM, who needs an SSD any more? I will sketch out a proposal for how to build an versatile, general-purpose OS for a computer that doesn't need or use filesystems or files, and how such a thing could be built from existing FOSS code and techniques, using lessons from systems that existed decades ago and which inspired the computers we use today.

reminds me a bit of something I've long wanted to experiment with (or see someone else experiment with). Here's how I described it in a comment here on HN a while back [1]:

> I've toyed with the idea of replacing files with processes. If you have some data that you want to keep, you have a process that holds it in its process memory, and can give it other process via an IPC mechanism (if the other process is local) or over the network (if remote, although you could of course also use the network locally).

> I never got around to trying it out. I think I may have tried to start some discussion on usenet along these lines maybe 10-15 years ago, but no one seemed interested.

> A "directory" would simply by a process that provides some kind of lookup service to let other processes find the data storage processes that contain the data they are looking for.

> You'd still have disks on your computer, but they would be mostly used as swap space.

> The system would include some standard simple data holding and directory processes that implement a Unix-like namespace and permission system, but it would be easy to override this for data that needs special treatment. Just write a new data holding program that implements the special treatment you want and knows how to register with the standard directory processes.

[1] https://news.ycombinator.com/item?id=8311532


> If you have some data that you want to keep, you have a process that holds it in its process memory, and can give it other process via an IPC mechanism.

Basically this is how Erlang / Elixir programs store state. Look for Elixir's GenServer behavior.


I may be misunderstanding your proposal, but wouldn't this make privilege escalation vulnerabilities both more severe and more common?


You should definitely look into Erlang then.


This is basically how the Palm Pilot worked. The RAM in the system was a developer-invisible cache that sat on top of a record-oriented access layer on top of the nonvolatile flash and system ROM.

I'm simplifying slightly but it's the same idea. It worked okay but had its quirks.


[Author of the piece here]

Yes, indeed - I specifically discussed that in the talk.

The Palm platform sold over 50 million units, with no filesystem at all. This shows the tech is valid for end-user devices, IMHO.


In fact, Palm Pilot didn't have proper non-volatile memory, instead, it had static RAM, which would preserve data for some time even if you swapped the batteries.


Did not also Newton have something of that kind?


The Newton did have a storage called Soups. It was extremely fun to program and I miss it dearly.


The author is conflating hardware and logic. A hard drive is hardware but a file system is a logical tree. The author provided simplistic reasoning for abandoning a type of hardware but no reasoning for abandoning file systems.

If not a file system then what? Don’t say RAM, because that is still hardware.


Yea, a hierarchical system for organizing data seems useful even if its stored on the same physical medium as main memory. Your profile is Highly Appropriate btw.


It is useful but limiting. The point of the logical tree is to index files, but a tree is a subset of more powerful indexing methods. Relational and graph structures come to mind.

Incidentally, databases exist to marshal indexed data so that it's not visible or managed by the file system. This seems like something a non-hierarchical file system would supplant, which would allow for interoperability with file-oriented tools.


decades ago i was following a kid's graduation project of a OS with full permanence design. that was before Android and other always on "computers".

it was called something like unununium (yeah, same name as the element, for extra hard-to-search points). I noticed the project when he suggested unununium-time as an alternative to linux time in a list was at, to fix some time skew problems... But what caught my interest was his vision that in the near future (remember, before android/IOS) computers would not care about offline data storage and a OS should be optimized for always-on and RAM only.

I can still find some of the assembly versions, but the fun stuff and interesting ideas showed up on a rewrite in python(!) and that i can't find anything any more.

edit: here's the best i could find https://web.archive.org/web/20060208191407/http://en.wikiped...

seems the vision was to threat the persistent storage as the only system memory.


NVRAM then?


the visionary aspect was that it didn't need new hardware concepts. Just drop the ephemeral ram (or treat it like L4 cache, so to say) and hope permanent storage gets faster, which it did.


I also don't quite get why you would ditch files/directories which are just a way of organizing data. You still have to organize them if everything is in main memory. Also, sharing data between programs has to be possible somehow. You will need some kind of reference/pointer to the data you talk about. Why are directories, files, and paths such a bad idea for that?


To take the Smalltalk angle on this, instead of files you simply have everything in the system described as live objects. These are different from files because they are not just data. A system comprised of such objects has no "applications" or "programs" in the conventional sense -- you just have certain arrangements of objects interacting with each other. This is much more flexible, dynamic, and explorable than just having files for data and stovepiped programs that read those files.


Apps would still appear in a system like this as soon as you have third-party developers, as a natural consequence of Conway’s law. And once you have that, there’s also security and principles like the rule of least power that motivate the current design which you haven’t gotten rid of.


As a matter of user experience, rather than implementation, the 'filesystem' in iOS is just an app. It's one way to handle data sharing and transfer between apps, but it isn't always the best, nor is it the main one.

This has some tradeoffs, everything does, but I'm glad someone is exploring the OS space without considering filesystems as we currently understand them to be an inevitable part of that.

I can imagine an object capabilities framework, in which a 'filesystem' is just an object which owns data, organizes it in the familiar way, and shares capabilities with other objects, being quite useful and powerful.


I'm not a kernel dev, or anything remotely close to it, but it seems to me that purely nonvolatile memory would have the unhappy side effect of not being able to just reboot the machine when things aren't running quite right. It's amazing how often this fixes problems, even in 2021.


[Author of the piece here]

I did actually consider this.

When your memory contents are persistent, there is no "boot process". Booting means loading OS code from secondary storage into RAM and then executing it.

If there's no secondary storage, this is never normally needed.

However, there is absolutely no reason why you cannot _re-initialise_ the system. Start again at first run, reset temporary data and run it from scratch.

Secondarily, you could of course dump your data and work in progress to (say) a reserved memory partition first, or over the network to a remote server. Then you re-start and choose what you want to reload.


Surprised no one mentioned https://en.wikipedia.org/wiki/Phantom_OS


I went through the slides, saw bunch of pictures of old computers, but where is the proposal?


The Google Doc has the proposal at the bottom. Basically, take the bottom half of Oberon and slap on the top half of Squeak*, bara-bim bara-boom - you've a next-gen OS.

*Or SBCL + Dylan

The whole schpiel is to make a high-level interactive OS / computing environment that straddles the line between interpreter and compiler. The whole is is run in-memory, and can be snapshotted and restored quickly. No filesystem, just RAM.

Interesting idea, I guess :)


[Author of the piece here]

Thanks!

I should have expected that so many people wouldn't listen or read it, just look at the pictures... >_<



Looks to me like an outdated idea, at least on the concept level :) . Had similar thoughts around 2002, the closest thing to it was Ousterhout's RAMCloud. Today an approach would be to look at merging CPU and RAM.


I don't see what's so groundbreaking here. It looks like a lot of back patting.

You've basically described a classical von Neumann architecture. The only reason hard drives were added to that is because volatile memory is expensive and the only reason why we don't use flash memory instead is because flash is not performant enough.

The proposal has never been impossible. Just impractical and non performant in the real world. There's no such thing as "terabytes of non volatile memory" without either spending hundreds of thousands of dollars on RAM or settling for slow flash RAM.


[Author of the piece here]

Back patting? Whose?

This is entirely orthogonal to stuff like von Neumann architectures.

The core change is eliminating the distinction between primary and secondary or auxiliary storage: https://en.wikipedia.org/wiki/Computer_data_storage#Secondar...

> There's no such thing as "terabytes of non volatile memory" without either spending hundreds of thousands of dollars on RAM or settling for slow flash RAM.

You are technologically out of date. This is commercial, shipping technology now.

https://www.storagereview.com/news/intel-optane-dc-persisten...


I think this is really interesting, but I don't see any reason why you'd abandon the idea of a filesystem. UNIX has been consistently useful for all sorts of purposes largely because of the everything-is-a-file concept.

Are we going to keep only a single copy of everything in memresister or whatever? someone will reinvent raid, and someone will reinvent the disk controller.

i'm sure this tech will change all kinds of things but why wouldn't i want a simple hierarchial naming system to refer to all of the different blobs i have?

that said, i'm sure i'll watch this!


The filesystem, at least the Unix concept of it, is nice in theory but surprisingly ill-suited for anything besides, say, batch processing, once you start to look more closely at it.

If you look at projects like SQLite or PostgreSQL, you’ll see all the fantastic, precise logic you need in order to ensure expected levels of consistency when you are using real filesystems to store data in a database. These problems come with serious enough pitfalls and traps that it’s common to see blanket recommendations to avoid directly interacting with the filesystem altogether, and simply do everything through SQLite or another library when possible.

At the other end, if you look at GUI applications, the filesystem is a bit lower-level than it should be. At least, the Unix concept of filesystem is too low-level. If I open a file in a word processor, save it, shouldn’t I be able to move around the file and rename it in the file browser, without affecting the relationship of that file to the word processor? This is possible on a Mac, but this is done with APIs inherited from the pre-Unix days of macOS.


I think the most important thing about files is being able to take a unit of work - a picture or a document - and interact with it as a single object, doing things like copying it or sharing it with others. Any replacement system that doesn't let me take my work and simply drag it to a flash drive is flawed. This doesn't mean we can't use a database, but it is something important to remember.


> I think the most important thing about files is being able to take a unit of work - a picture or a document - and interact with it as a single object, doing things like copying it or sharing it with others.

Yes, we want this, but the filesystem is actually quite bad at this! We have just gotten used to how bad files and filesystems are, and ignore the fact that they are not good matches for our metaphors about how they work.

Just to be clear—I completely disagree with your claims about how files work. You can’t really interact much with a file as a single object. When you open a document, a copy of the file is loaded into memory. This separate, invisible copy does not show up on the filesystem and will be out of synch with the filesystem copy.

For example,

1. Open a text file in Vim, Nano, or Emacs.

2. In another terminal window, rename the file.

3. Make changes to the text file and save it.

What happens? You end up with two different files! This is not some weird hackery, this is just renaming a file, one of the most boring things you can do. Something I do all the time. It shows that files are not single objects, and you do not interact with them that way. This is one of the failings of the Unix filesystem model—files are not single objects. There is a ton of room for improvement.

Just as a point of comparison, Windows is a bit worse (usually won’t even let you rename the file) and the Mac is a bit better (when you rename a file, the open document will follow the rename).


I'm speaking more of files as a metaphor than as a file system implementation detail.

I'm pointing this out because the mobile first and web first world dropped the ball on this incredibly badly. Every webapp is an isolated silo with weird logins and no proper offline usage whatsoever.

To share a small anecdote, I recently asked a non-technical acquaintance how they would share a Word document from their desktop computer: they said they'd email the file. Then I asked how they would share a note from their phone's notes app: they had no clue. Yes, implementation details bleed through: I can tell that Word operates on files and the notes app probably has some sqlite database under the hood, but ultimately it's about how we interact with data. Word operates on content you have, while the notes app operates on content it has. And this is where my flash drive test came from, not because I'm attached to inodes or directories, but because of the interaction model that decouples apps and data.

If tomorrow I wake up in a world where ext4 has been replaced by Postgres as its file system, I wouldn't miss chmod or fsck. But I'd want to be sure I can still take a single object (a file!) and do actions with it - send it to somebody, delete it, copy it, all without it having some inseparable link to an app or a website.


> I'm speaking more of files as a metaphor than as a file system implementation detail.

Ah, you are definitely not responding to my comment, then. Never mind.


A file is more than simply a unit or collection of data in a local hierarchy. It's an abstraction that supports the copying, exchange and even possession of data. How is one supposed to share part of a database with another system or user without ultimately resorting to the file metaphor?


> How is one supposed to share part of a database with another system or user without ultimately resorting to the file metaphor?

It is exceptionally difficult to share a database using files. This is probably one of the worst ways to share a database. Most of the time, the best you can do is take a snapshot of the database, dump it to a file, and import it into another system.

Generally you would want to share a database by granting multiple users access to a single running instance.


> How is one supposed to share part of a database with another system or user without ultimately resorting to the file metaphor?

With an access-controlled view, perhaps?

Your question isn't entirely academic, incidentally, the Open Data movement prompted a lot of organizations (particularly government or government-like ones) to address this sort of issue (especially with geodata) en masse starting about a decade ago, and providing snapshots in the form of downloadable files is generally seen as an important but generally inferior when-all-else-fails solution for most use-cases that don't involve forking the data and/or bulk-processing it into a new form. Live read-only access to the system of record or a DMZ'd proxy or cache is usually preferred.

And the Semantic Web / Linked Open Data movement addressed a similar set of issues two decades ago.

I suppose it's time for another turn if the wheel, perhaps with a distributed web twist.


There're so many screenshots of GUIs. But there is a heavy path dependence on customer-facing GUI systems. A starting over on OS seems more likely to happen on cloud OS. In any case, the question is do we really need the existing OS abstractions if we implement everything in managed languages?

[1] https://www.microsoft.com/en-us/research/publication/singula...


[Author of the piece here]

Ignore the slides. The slides are decoration.

Read the script, or listen to the talk. https://docs.google.com/document/d/1wM1-c7euvQaRaCL4hKCaE8VR...


Intel actually discontinued Optane for consumers so this won't really fly... At least not yet.

I do think a completely out-of-the-box rethink of computing could lead to very interesting results though!


[Author of the piece here]

No, they did not. They discontinued the SSDs. The NVDIMMs are not affected.


But they were never a consumer product in the first place :)

Not saying that they couldn't be useful to consumers, but they're not sold for consumer hardware like intel. They have some strange preconceptions about what consumers need, unfortunately. The same way we still don't have ECC memory in 2021 :/


OK, yes, fair points. :-)

I discovered something that I didn't know in the Q&A afterwards: that flash SSDs can be permanently damaged if the power goes out unexpectedly.

https://www.atpinc.com/blog/how-industrial-SSDs-handles-powe...

Optane fixed this, or rather, was not affected.

This is not such a bad issue in laptops, but it's a potential killer in desktops with SSDs.

Perhaps if Intel had marketed Optane SSDs better, they'd have sold to enthusiasts for just this reason...


True, I was definitely interested myself. Not really for this reason (my important stuff is triple-backed-up including offsite :) ) but for the great performance. But I was waiting for the price to drop a bit. It didn't really happen. I totally agree, Intel is very poor at marketing.


I love people experimenting with new OS paradigm. Although I was expecting something else.

But getting 8GB of Super Fast Memory + SSD will forever be cheaper than persistent memory. Not to mention persistent memory is nowhere as fast as DRAM. There are far more benefits with CPU and GPU sitting next to each other sharing the same memory address space, this requires high bandwidth memory. Which basically means there is no cost incentive unless this shift provides some gigantic leap forward in value.


[Author of the piece here]

This is technologically out of date, I'm afraid.

You appear not to have read about Intel 3D Xpoint and related tech. https://en.wikipedia.org/wiki/3D_XPoint3

This is retail tech now, under the brand name Optane: https://www.storagereview.com/news/intel-optane-dc-persisten...

SSD is cheaper than RAM. Optane is orders of magnitude cheaper, faster and longer-lived than SSD.

It is here now.

The value is discussed in the talk; I suggest reading the script. https://docs.google.com/document/d/1wM1-c7euvQaRaCL4hKCaE8VR...


I am aware of Optane.

But even in its current best, 2nd Generation Optane Memory DIMM and not the the Optane SSD, both its Latency and Bandwidth doesn't come close to DRAM. Even IBM innovation of OMI ( Open Memory Interface ), allowing Petabyte of Memory while only adding 10ns of latency to DRAM were already met with some scepticism. And Optane DIMM is at least adding 50ns at best, and hundreds at worst.

Our Multi-Core CPU, and the roadmap of SoC, merging CPU, GPU and VPU are already meet with some Memory Bandwidth bottleneck.

Intel has also stopped all consumer / retail of Optane. They will only be available on server and probably from selected partner from now on.

Optane is cheaper than DRAM per GB, but it isn't cheaper than NAND SSD unless you take into account of TCO. Which I would argue is still more expensive than SSD given the cost of SSD has declined quite a bit ( or back to normal depending on how you look at it ). But Optane ( both DIMM and SSD ) is faster than SSD especially in IOP heavy operation so it is still worth the money depending on applications.

Lastly, no one has a clear roadmap on Optane or 3D XPoint cost reduction. There is a difference of we want to reduce our cost and we know how to reduce our cost. 3D XPoint's Fab has been under utilize from Day one and are currently making a small loss even with Intel's guarantee purchase agreement. ( And Intel aren't selling all of them either ) There are nothing in the next 3 years that changes this landscape, and considering how technology moves, I would bet that 5 years down the line things would be about the same.

I dont want to be too dismissive. I would surely want Persistent Memory and a computer that just turned on like the old days with a press of power button. And the revival of SmallTalk like OS! But right now It doesn't seem feasible in the foreseeable future without some massive trade offs.


I am aware that it does not come close to DRAM, but it is faster than Flash, can be rewritten between thousands and tens of thousands of times more than Flash, and it is also byte-rewritable, unlike flash memory.

(I also specifically address some of the comparative issues in the talk.)

It is a significant step forwards. One of the problems with it is that current OSes can't really make good, efficient use of it.

That is what I am proposing a fix for – and one that should be relatively easy to implement.


I have had this thought about much web software these days. Requesting a lot of data from DB instead of just keeping a persistent object system in the memory.


Persistent memory sounds whole lot like OS/400.


OS/400 under its newish name IBM i https://www.ibm.com/it-infrastructure/power/os/ibm-i


[Author of the piece here]

Up to a point, yes. Also, Multics.

But this is a different take.


I've never used Multics myself, but from what I understand it was also based on the idea of mapping files in memory. When I first heard this, I thought the idea was pretty neat, but then the person I was speaking to remarked that it wasn't all that great in practice. I'm not really sure why, but I'm curious.


`mmap()` sucks if you don't have a lot of addresses. 64-bit computers now have a lot of addresses.


I'm skeptical this was the reason, though (at least in and of itself). Going off of https://en.wikipedia.org/wiki/Multics#Novel_ideas, it seems that the address space (36 address bits ⇒ 36 bits/word × 2³⁶ ÷ 8 bits/byte = 288 GiB) was much larger than memory of the machine Multics was designed for (about 2 MB), and the size of a segment was limited to about 1 MB (36 bits/word × 2¹⁸ ÷ 8 bits/byte), nearly half of all memory – so I doubt this was troublesome at the time.


because now you've turned failed IO operations from an errno into a segfault.


Doesn't iOS have some kind of persistent memory abstraction? It seems more like a single level to me.

Safari on iOS saves tons of tab state going back months, but it's not using all your memory, etc. I have never really done iOS programming so I don't know the details.


That’s called restorable state and it’s better than assuming everything can be always persistent, like Lisp/Smalltalk images did.

For one thing, restarting all the time makes it less fragile. You don’t have to untangle your state that’s been persistent for months if there’s a bug in it.


Doesn't this almost say no to the con Neumann architecture? Is this a good idea? Isn't the point of having multiple files to simplify managment and reduce risk of corruption?


The concepts are orthogonal. Von Neumann machines can be used without file systems and vice versa.

Persistent state has its pros and cons - for me the ability to reset the machine's state is an important feature. For the author this doesn't seem to be the case.

As for abandoning the concept of a file system, that's something that has been worked on for decades. Microsoft even wanted to include this as the main feature for Windows Vista, back in 2003 (dubbed "WinFS"). They basically wanted to replace traditional files with a single relational database backed by schemas for describing objects.

This would've been a truly revolutionary concept, as it would've allowed for features like searching, sorting, grouping, and versioning of objects without any complex application-level code.

Another advantage of such system would be the ability to exchange data between apps by just passing an id. All required meta-information would be available through the schema and the app could choose a view that suits its needs: providing the id of an audio file could result in the title and artist for a word processing program, the audio data for a music player, or the associated cover art in case of an image editing program.

There are many options for replacing the concept of files and directories that are just as robust and provide extra features. The general architecture of the CPU is not affected by or related to this.


parahrqph five before mentioning lisp and Psion. pretty restrained for Liam!


Ahahaha! OK, it's a fair cop, but the industry's to blame. :-D


The problem with starting over is that you don’t understand how the current state came to be, so you’ll probably just reinvent it instead of getting anywhere new. See Chesterton’s fence.


[Author of the piece here]

I suggest reading the script. https://docs.google.com/document/d/1wM1-c7euvQaRaCL4hKCaE8VR...

There is a lot of history in it, and that is entirely intentional.


Since I design processors for Smalltalk I am certainly in favor of the ideas you present.

I think the main thing that kept Plan 9 from becoming popular was that AT&T tried to make money on it at a time when Linux was already a big thing. By the time they gave up and open sourced it several of its unique features had been included in Linux.

Squeak is not implemented in C but in a restricted subset of Smalltalk (called Slang) which can be easily translated to C. But since it is Smalltalk it can be tested using Squeak's nice development tools instead of gdb or similar.

Although Lisp machines were more commercially successful, there were many interesting Smalltalk machine projects as I pointed out in these slides:

http://www.merlintec.com/download/2019_slides_jecel_fast1v2....


To me things that are worth starting over would be the CPUs that we no longer understand and the OSes that are debugged into existence. Not a computer that has no SSD because memory is persistent.


Both OSes and complex CPUs are a direct result of their capabilities.

I'd wish physics would end at Newtonian mechanics and Maxwell equations, but alas special- and general relativity cannot be avoided in some circumstances and neither can quantum mechanics in others.

The same is true for CPUs and operating systems - if you need high performance and the ability to do real-time high-resolution media playback, digital content creation, high-bandwidth networking, high-colour and high-resolution graphic displays with font smoothing, auto-scaling, multi-monitor support, pluggable peripherals, etc. etc. you'll get complexity.

There's only so much that can be achieved with simple TTL and core memory...

Microkernels and provably correct OSes are great and available, too. But things start to get messy quick once you add support for various protocols, devices, and capabilities. Not just because it gets harder and harder to do, but also because there's diminishing returns: the vast majority of users aren't OS developers and just don't care.

The same way the vast majority of people don't care about the layout, build quality, and logical soundness of the plumbing in their homes (as long as it works sufficiently well) or the technical details of their refrigerator.

That's why there's very little incentive to build small, "clean", and provably correct OSes for the general public. You will find them, though, and they are in use.


For me at least this link breaks the browser back button and prevents me from coming back to HN (iOS Safari)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: