Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is good that these "water is wet" statements get written down so we can point humidity-skeptical people to them from time to time.

The deeper problem is the sad state of affairs of distributed computing for the end user:

* Application instances expect to be the only ones modifying the files that underlie the document being edited. Most of them simply bail out when the files get modified by another application.

* The default is "one device = one (local) filesystem" which is the exact opposite to what everyone needs: "one person = one (distributed) filesystem."

* The case for local-only filesystems only addresses corner cases, or deficient distributed file systems that fail to uphold basic security constraints (such as "my data is only in my devices" or "no SPOF" for my data).

* Whatever gets pushed to the cloud becomes strongly dependent on devices and vendors. Users end up handcuffed to a specific hardware (iCloud) or software (Android) if they want to have any chance of interacting with their own documents from their own devices.

* What we need is not cloud desktops, or cloud storage. We need local desktops with a decent distributed filesystem, and vendor agnostic access to that filesystem from all our devices.



I couldn't agree more. I've been working on CRDTs the last few years, and there's a huge opportunity here if we can reinvent the concept of the filesystem. Ideally, we'd replace files with CRDT-backed objects in the operating system. Then instead of fread / fwrite commands (which wastefully overwrite the entire file), applications could express semantic changes which get saved in a log.

Those changes can be transparently replicated between applications, between devices and between users. We'd get better performance on-device, and automatic, transparent device-to-device replication. And we could trivially enable realtime collaborative editing between users. Better still, if it happened at the OS level, we could make it work in every application on the system.

Right now "linux on the desktop" is slowly and inevitably dying in the face of cloud services. How would OpenOffice even compete with Google Docs? Do opensource application authors need to run their own web servers? (And if so, who pays for that?). If we replaced the filesystem with CRDTs, openoffice (and every other program on the desktop which edits "documents") could have best-in-class collaboration features, right there out of the box.

There's an opportunity here to build a really amazing computing system.


I think we're very much in line. I made a wall of text done time ago to gather my thoughts:

https://personalfilesystem.org/


>What we need is not [...] cloud storage. We need [...] a decent distributed filesystem,

Distributed files to where exactly? You need to be more concrete about the remote location of non-local data that normal people can use. Ok, so you want "distributed filesystem" to not mean "cloud storage" ... So is it p2p? Something else?

In other words, we want Windows, macOS, Linux, iPhone, Android, etc operating systems... to have a file system that all points to the same "distributed filesystem" and see the same files -- and for other collaborators to see those files.

But we don't want those os configurations to point to DropBox / MS OneDrive / Google Drive / Backblaze, etc. So, we need to be concrete on the alternative common remote location that those file system APIs would point to. What would the topology of that solution look like?


> Distributed files to where exactly?

My desktop computer, my laptop computer, my tablet computer, my pocket computer. Whether that’s cloud or p2p doesn’t matter to me, the user. I should be able to start working on a spreadsheet or presentation on one and, without the ceremony of “save to a shared location, close the app, switch devices, open the app … now where’s that file again?” switch to another and continue editing.

First we need to specify a distributed fs. THEN we can decide the “to where” bit.


That sounds like syncthing. Dunno about phones and tablets, but I have that functionality among my computers.


I have no affiliation but want to second this.

If you want to keep your filesystem in sync across many devices Syncthing fully enables this.

It is a use case where you would expect a payed service would be easier or more reliable but with Syncthing it is the exact opposite. Just install it on your devices and select the folders you want to keep in sync ... done.

I had never had any problems with it something I cannot say about Dropbox which can be terribly slow, hogs my PC and resulted in lost files on some occasion.


My filesystem consists of [checks du -h . | tail -1] 189 GB.

I don't think Syncthing (which I love) can cram 189 GB on my 64 GB phone.

Yet I expect to have access to my filesystem from my phone.

Synthing is a nice "pump-hose system" between reservoirs of data. What I was arguing above is to stop having separate reservoirs of data to begin with.


I use seafile (similar to syncthing) and it allows to browse your data libraries without requiring a local copy for these cases


The issue is not so much "can I browse a virtual file system" but more "why should I depend on one local file system sitting on a remote server, probably owned by a third party, as being the single source of truth for my own files."


? With seafile the data is only on computers I physically own


I use syncthing. It's awesome for computers. Coming from Dropbox, then Nextcloud, I find it solves all my needs much, much better, at least on well-supported platforms.

I love how I can decide what to sync where, and even create my own topology of sync-devices if I like. That may sound like crazy complex stuff and over-engineering and what not, but it was a solution I landed on organically, just through normal use.

That said, it's not entirely smooth on iOS and you sometimes needs to manually launch the (third party) app to force a sync after changing some files.


Syncthing doesn't solve the problem OP is talking about. It's amazing software that works as long as you don't have to sync the same file edited on two machines before they have a chance to sync. There is no logic, besides something using CRDTs, that can reliably resolve the conflicts in every situation when you just have two sets of bytes and nothing else.

Even if you maintain "last synced" copies + the current latest version and use those to compare against the server there are still simple situations where the conflict resolution doesn't work and/or requires user input. Anything that requires user-input like that can't properly sync binary files without resorting to making a new "File (Conflicted 1).sqlite" which you now manually need to compare.

It just isn't the same thing


But that's pretty much a fundamental issue - you can't create information about the system that just isn't there.

I think the only way around that limitation is to have a node that's always on - make sure it's always known what order the changes were made in.


I just wish Syncthing would allow for deferred sync, i.e. you see the file, but it only gets fetched once you access it.

That's, imo, the only way to sync large folders. I don't need all my Documents/Photos/Movies/Whatever on my phone at all times, but I do wish I could access them when I need them.


OneDrive does that on Windows and Mac.


i have it running on an ancient android (4.4) phone, perfectly


The "to where" bit is really important when specifying the fs. If it includes at least one high-bandwidth high-storage high-uptime device (like a server), the requirements and capabilities change drastically compared to if it's composed of a bunch of battery-powered portable devices on limited data plans.


Funny you use a spreadsheet as an example. That's been the default for Excel for years. Save to SharePoint/Teams/OneDrive (whatever MS is calling it these days, it's all the same backend) is the default option - and multi-user live editing (or one user in multiple sessions) just works.



To where exactly: my devices. And if I don't want to buy my own devices, then to a cloud service that offers opaque storage of binary blobs with an API that my filesystem can abstract for me.

So the topology is a mesh network of my devices, and perhaps optionally a few defined remote endpoints that the opaque blob storage service provides me, and that I enter as part of the config of my filesystem.


That sounds like Cloud Storage to me (Dropbox, Google Drive Backup/Restore/Sync/whatever it's called this year).


Cloud storage is the opposite of distributed.

With cloud storage, you must have one single fixed central location (often a third party) that contains the real data, many satellite locations with a partial replica of the data, and hit-or-miss mechanisms to notice changes in replicas and propagate them to the central location. If the central location is down there is no more synchronization. If the central location is not yours, they can shut you down anytime.

A distributed filesystem does away with the need for a fixed central storage by storing data across all locations with a configurable level of replication. A strong, consistent cascading of changes (eg a crdt semantic) brings all replicas in sync whenever connectivity allows. No third parties need to be involved, no single device is a point of failure.


This is the problem that needs to be solved. Cloud storage and p2p are solutions looking for a problem, but it would be nice to let them distract us too much.


> What we need is not cloud desktops, or cloud storage. We need local desktops with a decent distributed filesystem, and vendor agnostic access to that filesystem from all our devices.

That's absolutely spot on. The problem is: who is going to pay for it?

No vendor will do this because it would break lock-in, and building something like this and making it polished enough for widespread adoption is far beyond what pure volunteer open source can reasonably accomplish.

The problem is economic, not technical. There is no business model for user-empowering software anymore.

Software is extremely costly to produce but we pretend it's free and won't pay for it directly, so instead the industry has deeply wrapped itself around business models in which we are the product or that use lock-in to force payment eventually.


Quite a few complex solutions (mostly unknowingly) used by vast amounts of people have come from pure volunteer open source work.

How much we can expect that to continue is a whole other matter though.


If you dig deeply you’ll see that a large fraction of that is actually employees at big companies, universities, and governments. In other words it’s subsidized. Any on the clock OSS work is a subsidy. It’s not pure volunteer.

This tends to be done when there is a strong common interest, but it’s almost always for deep tech and dev tooling stuff. I have never seen an open source consumer product subsidized in this way because consumer lock in is where the money is.

You will never see an open Uber, Ring, or Alexa unless a way can be found to charge for it. As it stands free means “as in beer” more than freedom and nobody would pay for such a thing.

I have played with stuff like Home Assistant. It’s not bad if you are technical. A non-techie could never deploy it.


> We need local desktops with a decent distributed filesystem, and vendor agnostic access to that filesystem from all our devices.

I am very happy with pCloud. One of the reasons I got it is: it works very well on Linux. It works on Android. It works on Windows.

And it works in the browser, for things like video and photos.

Also: no risk of trigger-happy account deletion like with Google, if pCloud dies my email still works.

Previously I used OVH online drive service, but it was EOL and pCloud is the replacement.


> Previously I used OVH online drive service, but it was EOL and pCloud is the replacement.

So a vendor has the power to disrupt you whenever they feel like EOLing the service you depend on. I understand that's as good as it gets today, but it's not good enough.

For me, the only acceptable level of impact is as follows:

* vendor X sunsets their service by date D

* before date D, I sign-up for a new account with vendor Z and configure it in my filesystem settings

* I set vendor X as "deprecated,EOL-date=D" in my filesystem settings.

Then my filesystem takes care of everything else for me transparently, with zero downtime and zero effort. Date D comes and goes and I haven't noticed a thing.


That's how it happened. OVH announced the EOL about a year before the system was shut down.

pCloud is a replacement I choose, nothing to do with old OVH service.


Do you imply that the change was fully transparent to you, except for a config change?


I had to remove the old client and install the new one. Login in the different account.

In all devices.

Apart from that, yes.


How does pCloud differ from Dropbox?


I paid for a lifetime subscription.

One payment and so far no complaints at all.


It’s incredible how many systems are basically two silos that sometimes somehow sync in a totally custom manner, when we have so many ways of keeping distributed systems in sync.

Especially true for mobile.


> * What we need is not cloud desktops, or cloud storage. We need local desktops with a decent distributed filesystem, and vendor agnostic access to that filesystem from all our devices.

While I agree, this by itself doesn't solve the problem when you depend on such a FS for your work in a way that when you lose network connectivity, you can no longer work.


Ideally that would be handled on the FS layer and completely transparent to all apps. Things would get synchronized once connection is restored.


You can't just synchronise things without knowing the file formats. You can't do a seamless distributed FS which allows offline changes.

Or more precisely, you can, but by choosing the most fresh file and people have lost work that way. There's a few "dropbox ate my files" stories out there.


That is usually handled by exposing a "driver" API where the relevant programs can install merging components.

And yes, default into choosing some one, with an extended interface for displaying and managing conflicts.


I assume that "decent" in the comment meant to address that?


It would have to be magical not just decent. This is not solvable on the level of the file system. You can't add a blue line to an image on one system and a red line on another system and expect the filesystem to somehow figure out how to handle that on its own. The best you can count on is the conflict being flagged with both versions exposed.


CRDTs?

Perhaps the idea of plain text/binary files is a little outdated too.


CRDT is extremely format-specific. File systems don't operate at that level. And that's before we even decide if the merged edits is what you actually want.


I mean, isn't that a potential usecase for Syncthing? If I go offline on one of my devices, my files are still locally on the system. When it comes back online, it re-syncs the other devices to use my latest files.


This is a very simple use case when you are working alone. Think about a team of people and hundreds of potentially conflicting changes to review manually.* Sometimes a tangible divide between online and offline is extremely useful.

*) Unless you believe this can be resolved by software - I'm afraid we're very far from that point yet.


But then you have to deal with conflicts in a sensible way that won't lose users files and make it simple enough for people to choose which files to synchronise.


That would have been nice. Cached 9P perhaps.

Instead we got OneDrive, Dropbox and iCloud. Ugh.


Couple of buzzwords that address this point:

* CRDTs

* "Intelligent Edge Platforms" as Ditto [1] calls them

[1] https://www.ditto.live/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: