Hacker News new | past | comments | ask | show | jobs | submit login
Kart: DVC for geospatial and tabular data. Git for GIS (kartproject.org)
133 points by starkparker on Oct 30, 2023 | hide | past | favorite | 33 comments



Hi everyone, I'm Hamish, PM for KartProject here. If you want to learn more about Kart:

  * Our CTO Rob Coup presenting on Kart at FOSS4G 23: https://www.youtube.com/watch?v=1B-HB2Z3Vlc
  * Docs are available at https://docs.kartproject.org/en/latest/
  * We also have a QGIS plugin! This gives you visual diffs of vector feature changes. https://plugins.qgis.org/plugins/kart/
Happy to answer any questions!


Hi, thanks for sharing!

OSM got vandalized recently [0][1] and apparently the community is having a hard time restoring the data / reverting the edits.

Would using KartProject enable storing edits as commits which could then get easily reverted? Also localized commits?

[0] https://www.openstreetmap.org/#map=17/32.09438/34.77448

[1] https://www.openstreetmap.org/#map=16/32.0914/34.7746&layers...


I touched on this a little in this comment https://news.ycombinator.com/item?id=38075723 - but tldr; OSM has it's own versioning model, and their data model isn't a great fit for the flat-table structure of most of the GIS data we commonly work with.

That said, the general issue of "data supply chains" and keeping track of who did what, and when, is largely unsovled outside of OSM, and if you're looking after your own GIS data and you're concerned with tracking and detecting unwanted changes, Kart is a great option. You get all the cryptographic verifyability of git for free.


Why is this going to succeed when something like Geogig never took off? I was super interested in the project at the time of its initial release but it’s been dead for years. What did that project fundamentally do wrong vs what Kart is doing? Or is it just a super niche thing?


Having worked on a Geogig-as-a-Service proof-of-concept for a moment in time, the problems there were: no way to see commits and changes on a map (GitHub's later addition of a GeoJSON view and diff map being the final knell), and no server to host your repos (i.e. you would host your own Java/Maven server). The word was that a big government or military customer wanted this technology and didn't need any of that consumer-facing stuff, so the open source aspect was more of a "let's see if it catches on".

My project never took off because it was my first cloud and Maven project, but it was fun to tinker with the idea until GitHub got around to its GeoJSON stuff.


This is a great question, and one of the first things we had to answer before deciding to build Kart. These are my top 3:

1. We found Geogig hard to get running. I'm a software engineer but I struggled to get Geogig installed correctly. Typical GIS people are technical, but maybe not in command lines or building software. Kart is designed to be 'batteries included', installs easily on Windows, Mac and Linux.

2. Geogig was "inspired by" git, but isn't built "on top of" git. That meant rebuilding a lot of things that git is already _really_ good at. Kart teaches git how to work with spatial data, so we're getting all of the benefits of many thousands of human-hours in optimizations in git. It also means a lot of git tooling works with kart today, leaving us free to focus on GIS-specific enhancements. There are a few specific examples of later additions to git, like git filters, that have made critical features possible (e.g. spatially filtered clones).

3. Geogig was a great project largely sponsored by Boundless Geo before they got bought by Planet, and it more-or-less died at that point. Ultimately, these projects have a big bootstrapping problem. You need data, the tool, and willing users. Kart is sponsored by Koordinates.com - a platform that's been doing GIS data delivery for over 10 years. We have a lot of data that we're been 'mirroring' into Repos and will make avaiable soon, we have a lot of users with specific use cases (fetching updates to large, regularly updated datasets) who are already using it, and we're making a long term committement to Kart as an OS project.


I wish all the success for this project, GIS is an under-valued and under served technical system.


If you're wondering where to find an architecture document, the nearest to such may be:

https://docs.kartproject.org/en/latest/pages/development/tab...

Top takeaway being that it's not just versioned geo feature items, but versioned per-feature formats. Various popular GIS database formats are supported as the "checked-out" representation, analogous to a git local-filesystem tree. Maybe does conversions between standard GIS formats well -- wasn't obvious.

One question I'm left with is performance:

> Every database table row is stored in its own file. ...


One of the benefits of building on Git is a lot of people have put a lot of time into make it work really well with lots of objects. And even though we say "files", Git abstracts that into packfiles etc very efficiently.

So, we're seening pretty good performance. We're maintaining a number of repositories with several millions features, with a decade of weekly updates of ~10,000+ rows. It _does_ take some time to push that data around, but it's _vastly_ better than old ways, and once you have your clone, maintaining updates becomes extremely trivial - a _major_ unsolved problem in the GIS/data world.

I'd add - Kart has GIS specific features that nullify some of these issues. The ability to spatially index the objects, then filtering them on Clone, means I rapidly clone a tiny subset of the data to work with.


Is there a public git repo available somewhere that represents a Kart repository?

Are the raw files in the working repository GeoPackages? How is it tracking the changes made inside the geopackages? What happens if it's replaced with an updated copy of the geopackage the was edited via some other application? How does it diff the changes?


Good questions

> Are the raw files in the working repository GeoPackages?

The working copy for a vector/table dataset can be in a GeoPackage or a SQL database like PostGIS. For rasters/point-clouds they're flat files.

> How is it tracking the changes made inside the geopackages?

In general, triggers which store RowIDs/PKs of inserts/updates/deletes. Then when you ask for a diff or make a commit Kart figures out any actual row-level (or schema) differences.

> What happens if it's replaced with an updated copy of the geopackage the was edited via some other application?

If it's edited by something else (QGIS, ArcGIS, python/go/whatever application, SQL CLI, whatever) it'll work: you do edits where you want to. If it's replaced by something else, it won't work.

> How does it diff the changes?

Comparing the features/rows in the repository (and their schemas) against the rows in the working copy database. It uses the stored list of modified rowids to make this fast.


> The ability to spatially index the objects, then filtering them on Clone, means I rapidly clone a tiny subset of the data to work with.

Okay -- so is the "--depth=N" filtering option to git-clone supported as well? And does it remain useful in the context of Kart applications?


Yes, you can do shallow clones with `--depth` as well. This is incredibly useful - it means we can publish massive Kart repositories of spatial data with lots of versioning info, but still allow users to work with small subsets of the most recent changes. Very important for typical GIS use cases.


Great to see stuff like this! Data in general, and not just code, gets updated or corrected constantly. Given that data is used collaboratively in a distributed setting, it should be a first-class citizen wrt diffing and merging, just as line-wise text is. Anybody working in geophsyics or other data-heavy scientific fields should see the value in this approach.


I was wondering about that. How DOES diffing work with this, like in a Geopackage?


I assume it just displays rows edited.

I don't see how they'd display anything other than points. That leaves XYZM for diffing.

They might be showing summary stats like length, perimeter, area, volume, but that's usually not easy to generalize.


Hi there,

Kart supports points, lines & polygons, as well as GeoTIFFs for imagery and LAZ for point clouds.

Kart is a CLI tool, but provides fully machine readable outputs. You can use the QGIS plugin to get a visual diff of vector feature changes though.


> You can use the QGIS plugin to get a visual diff of vector feature changes though.

This sounds like an amazing opportunity for a screenshot, btw :)


Neat!

Does this improve Git's support for large binaries generally, or is it necessary to have introspection into any filetype you want to support?

Is there good interoperability with existing Git repos?


Copied from their site:

> Because Kart uses Git for data transfer and storage, you can host a Kart repository anywhere you can host a Git repository - for example, GitHub, Bitbucket...

ref: https://docs.kartproject.org/en/latest/pages/basic_usage_tut...


Kart repositories are also Git repositories - they're 'interopable' in the sense that there is a lot of tooling that will work, but the storage structure for vector data differs, and using Git on a Kart repository won't work very well.

Kart serializes vector/tablular data into datasets in the repository, and manages the process of writing them out to useful working copies (GeoPackages, or into databases).

For large binaries - rasters and pointclouds - we're using LFS. We include some additional spaital information into pointer files to enable some very useful GIS functionality, like spatially filtered clones (this works for vector data too).


>Does this improve Git's support for large binaries generally

No, this still uses LFS for larger binary formats (ie raster or point cloud datasets)


So how does it work for Openstreetmap? I mean git is not very good at handling large repositories, fsck and all that takes ages. So what performance do you get with an small geographical area?


Hi there,

OpenStreetMap has it's own versioning mechanisms (and a fairly specific-to-OSM data model) and Kart isn't really designed to work with OSM data as such. Kart adds version control to the GIS data that planners, academics, architects, civil engineers, etc, use day-to-day. There's a lot of data out there!

"Large" is relative, but Kart works well with quite big vector datasets for these typical use cases. For example, we're regularly working with datasets that have over 2 million features, with a decade of weekly data changes.

Kart includes some feautres specifically for working with small geographic areas. We can spatially filtering cloned data so you're working with a small subset of a much larger dataset, but you still retain the abilityt commit/merge/push to the source repo.


Have you tried out https://underhive.in/ for this?


I haven't, but it looks like it's a Git repo hosting solution? This issue with using Git with data directly, is you generally loose the per-row/feature change information. With common binary GIS data formats, just putting them into Git looses a lot of the utility and will blow out the size of the repo as you apply changes.

Kart gives you row-level tracking, so you can see who made what change & when, and diffs small and fast to apply.


Cool product! How does this compare to (e.g.) dolt, which is pitched as 'git for data'?


Dolt is a neat project, but it's tightly coupled to MySQL. Kart supports MySQL as a working-copy format but MySQL has some limitations around geometry support that make it unsuitable for most GIS usage - see our docs for more info: https://docs.kartproject.org/en/latest/pages/wc_types/mysql_...

Kart works with GIS working copies that are more familiar to GIS people - e.g. GeoPackage, Postgres/PostGIS & MSSQL databases. Differenet users can use different working copies, and still collaborate together too.


Wow cool. I needed something like this 10 years before for a project


This looks super cool! I will for sure be testing this out and keeping an eye out for future additions.


Cool project but homepage needs two things:

* Docs should not be hidden in small font and as disabled link color, make it big button in features list or make features clickable to relevant docs.

* Add some screenshots

I spent way too much time clicking every heading to figure out what is this all about till I found Docs link.


Hi! It's neat to see Kart making HN. These are great points, we'll get that Docs link much more visible.


Just saw this, which might be a better home page: https://koordinates.com/products/kart/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: