Hacker News new | past | comments | ask | show | jobs | submit login
The fundamental problem of programming language package management (ezyang.com)
134 points by route66 on Aug 26, 2014 | hide | past | favorite | 72 comments



Nix, NixOS, Nix ... a thousand times Nix.

I can't believe the article doesn't mention it.

I've been using NixOS as my OS for development, desktop and we're in the middle of transitioning to using it for production deployments too.

Nix (the package manager not the distribution) solves so many of the discussed problems. And NixOS (the linux distribution) ties it all together so cleanly.

I keep my own fork of the Nixpkgs repository (which includes everything required to build the entire OS and every package), this is like having your own personal linux distribution with the but with the simplest possible way of merging changes or contributing from upstream.

I use it like I'd use virtualenv. I use it like I'd use chef. I use it like I'd use apt. I use it like I'd use Docker.

http://www.nixos.org


Yes! The Haskell community has been using Nix to great effect and I would like to see other programmers catch on as well. Here is a great talk about using Nix targeted at Python programmers: http://pyvideo.org/video/3036/rethinking-packaging-developme...

In addition to Nix, there is also a newer project: GNU Guix. Guix is built on top of Nix but replaces the custom package configuration language with Scheme, among other differences. https://gnu.org/software/guix/

When package management is solved at the system level, our deployment situation becomes a whole lot better. I used to do a lot of Ruby programming. Wrestling with RVM and bundler was a real pain, especially since bundler was incapable of helping me with the non-Ruby software that I needed as well like libmysqlclient, imagemagick, etc. Using Nix/Guix, you can throw out hacky RVM (that overrides bash built-ins like cd!) and simply use a profile that has the right Ruby version.

Bye pip, bundler, composer, CPAN, puppet, ansible, vagrant, ..., and hello Nix/Guix!


> In addition to Nix, there is also a newer project: GNU Guix. Guix is built on top of Nix but replaces the custom package configuration language with Scheme, among other differences. https://gnu.org/software/guix/

Personally, I'm rather more keen on Nix; the language is pretty much designed for writing JSON-style configuration except as a formal programming language, which is what the vast majority of Nix code is (both package definitions and system configurations).

Additionally, with Nix, you can be close to certain that if you build something twice, you'll get the same result, because it can't access impure resources.

Finally, because Guix is a GNU project, the official repositories are going to go nowhere near non-free software. Nixpkgs contains non-free software, although disabled from installation by default. You might be a little less likely to have people help you get non-free software working on the GNU Guix mailing lists, if you happen to use any.


>the language is pretty much designed for writing JSON-style configuration except as a formal programming language, which is what the vast majority of Nix code is (both package definitions and system configurations).

Guix uses an embedded domain specific language that is also designed for easily writing package recipes, but it uses s-expressions instead of something that is "JSON-style". Also, Nix build scripts are written in Bash, whereas Guix build scripts are written in Scheme. I think that makes Guix more consistent in its programming style.

>Additionally, with Nix, you can be close to certain that if you build something twice, you'll get the same result, because it can't access impure resources.

Guix has this same certainty because it uses the Nix daemon, and the defaults are a bit stricter than Nix.

>Finally, because Guix is a GNU project, the official repositories are going to go nowhere near non-free software.

That doesn't mean that you can't host your own non-free packages or use someone else's non-free packages. But yes, Guix does not ship with packages that ask the user to give up their freedom. To me, that's an advantage.


> Also, Nix build scripts are written in Bash, whereas Guix build scripts are written in Scheme. I think that makes Guix more consistent in its programming style.

On the other hand, in order to write a Guix build script, you have to know Scheme (and whatever libraries Guix provides for this task) rather than utilising your existing knowledge of writing shell scripts.

> Guix has this same certainty because it uses the Nix daemon, and the defaults are a bit stricter than Nix.

Really? So you don't actually get access to any of the I/O Scheme libraries from Guix? My understanding (and it seems the understanding of several other people) is that while Guix uses the Nix daemon and thus derivations (and thus build processes) are pure once generated, the process for generating them from the Scheme code is not guaranteed to be so, given that the Scheme code can do practically anything.

Of course, you might not actually write non-deterministic Scheme code, but it's nice to have the guarantee that given a .nix file and a specific version of nixpkgs, the build will always come out to the same result no matter what the creator of that file has done.


> On the other hand, in order to write a Guix build script, you have to know Scheme (and whatever libraries Guix provides for this task) rather than utilising your existing knowledge of writing shell scripts.

To learn Nix you need to learn both how to write shell scripts and how to write Nix expressions. How is that better than just learning Scheme, which is very trivial to learn the basics - and for most packages, you don't really need to learn much because you can reference other packages - it's really just like a configuration file.

One of the goals of GNU is to really make Guile ubiquitous - used for configuration of packages, build processes, service configuration (via DMD) and software configuration/extension. There should be no need to learn dozens of different configuration formats and languages, scheme is the only language you'll need to be able to fully drive your OS. (Well, perhaps not strictly true, you'll probably still need to use the shell, but you'd preferably write guile scripts rather than plain bash).

> My understanding (and it seems the understanding of several other people) is that while Guix uses the Nix daemon and thus derivations (and thus build processes) are pure once generated, the process for generating them from the Scheme code is not guaranteed to be so, given that the Scheme code can do practically anything.

You can do anything from the shell too (which nixpkg can invoke) - you can even invoke guile from a shell script. The guarantee given by both systems is that the build happens in an isolated environment (via chroot), and it doesn't matter what general purpose computation happens inside the environment.

Neither Nix nor Guix make guarantees about the resulting binary from a build process - we do not yet have reproducible builds[https://wiki.debian.org/ReproducibleBuilds]. The only guarantees made by both PMs is that packages have an identity which is a hash of their source, dependencies and build instructions. Changing the build instructions results in a new derivation, so unless you have some crazy package that deliberately tries to make itself non-reproducible, you should get approximately/functionally identical binaries from building, even if they are not bit-exact. Obviously we'd like ReproducibleBuilds in both systems, to be able to authenticate the actual build via its hash.


> To learn Nix you need to learn both how to write shell scripts and how to write Nix expressions. How is that better than just learning Scheme, which is very trivial to learn the basics - and for most packages, you don't really need to learn much because you can reference other packages - it's really just like a configuration file.

Except that as a Linux admin, you probably already know how to write shell scripts anyway, and you'd be rather hard-pressed to manage a Linux system while never once having to write or read one. There's also a lot more information on writing shell scripts than Guile scripts for various common purposes.

> scheme is the only language you'll need to be able to fully drive your OS.

With a different DSL for each use, meaning you have to learn the restrictions of each DSL anyway.

> You can do anything from the shell too (which nixpkg can invoke) - you can even invoke guile from a shell script.

That's true, although in Nix's case, the derivations map directly to the source, as Nix code itself can't call out to anything (only return derivations which might) - and in theory, the build could then happen in a sandbox, making it even more likely that the result would be the same.

We might not have reproducible builds yet, but Nix is closer to having them than Guix if somebody wanted to make a research project out of it.


You could of course use a platform that doesn't depend on OS dependent binaries (like the JVM) and a package manager that likes ad-hoc and easily created repositories and that has lots of plugins available (Maven, or derivates like Gradle, SBT or Leiningen).

I worked with a lot of platforms, such as PHP, Perl, Ruby, Python, Node.js and .NET. I felt the pain of pip, easy_install, setup-tools, virtualenv, bundler, gems, cpan, pear, rvm, rbenv, npm, bower, apt-get or whatever else I used at some point or another. And I swear, in spite of all the criticism that Java or Maven get and in spite of all warts, in terms of packaging and deployment for me it's been by far the sanest. I mean, it's not without warts, heaven forbid to end up with classpath issues due to transitive dependencies, but at the very least it is tolerable.


>You could of course use a platform that doesn't depend on OS dependent binaries (like the JVM) and a package manager that likes ad-hoc and easily created repositories and that has lots of plugins available (Maven, or derivates like Gradle, SBT or Leiningen).

I don't think being locked into the JVM is a very good solution. Java libraries can depend on other non-Java components.


There are Java jars with native binaries, just sayin. Though I do like Maven and Java quite a bit.

https://bitbucket.org/xerial/sqlite-jdbc

https://github.com/twall/jna


I would be very interested in reading a detailed post about your experiences with Nix/NixOS.

I've been hearing a lot about this project, but I always thought it was just an academic experiment. I'm in the process of packaging and maintaining a Python+Javascript+Redis+PostgreSQL application and Nix certainly is something I should learn more about.


But is it hard getting packages into the Nix ecosystem? If there's too much friction or if it's too hard to resolve all the dependencies yourself before pushing your pkg, I fear it may suffer from lagging behind the latest available versions of most packages.


Everyone has to resolve their immediate dependencies anyway to push it to pip or whatever else, or nobody can install it. The only additional dependency to push it to Nix would be the language interpreter.

Nix doesn't require you to specify the entire dependency tree; each dependency specifies its own dependencies, and those are resolved during the build process.

(For the record, at least Haskell, Python, and node.js packages are pulled into the nixpkgs tree from their respective package repositories regularly, albeit many missing native dependencies; there's a separate file that you can edit and send a pull request for packages which have native dependencies.)


Yes, thank you! But people still want to "solve" the problem the hard way, I guess.

Also, "Java" is mentioned twice in the article but I can't find mention of Ocaml Functors. I thought they solved most package problems even before Nix was around?


There's a bit of missing context: the author is writing this most likely as part of his research on implementing Backpack—a systems which would being OCaml Functor-like things to Haskell.


Functors are part of the Ocaml module system, but they don't really have any relation to package management with version numbers and dependencies.


This topic reminded me of some very interesting thoughts from Joe Armstrong, that I remember seeing posted somewhere (HN?) some time ago -- "Why do we need modules at all?":

    [...]
    The basic idea is
    
        - do away with modules
        - all functions have unique distinct names
        - all functions have (lots of) meta data
        - all functions go into a global (searchable) Key-value database
        - we need letrec
        - contribution to open source can be as simple as
          contributing a single function
        - there are no "open source projects" - only "the open source
          Key-Value database of all functions"
        - Content is peer reviewed
    
    These are discussed in no particular order below:
    [...]
Full thread: http://thread.gmane.org/gmane.comp.lang.erlang.general/53472


> all functions have (lots of) meta data

This is the crux of the problem. Before too long, the amount of metadata dwarfs the thing it describes and it's easier to rewrite the function than it is to find or describe it.


I already deal with a large library of functions like that, and it's totally mind-numbing. I had to create tools just to try to group the functions in different ways so I could figure out if there was anything like what I wanted to use, what it was called, and how to use it.

Modules and sub-module hierarchy offers a greater, simpler organizational methodology.


I think developers of QuickUtil independently thought of the same and are pursuing this idea.

http://quickutil.org/


I don't see the difference between this and how module systems in dynamic languages work, the key-value database is accessed with require(), include(), import() or whatever. I suppose you'd need to write a shim to invoke the package manager when an unexpected module is requested but that wouldn't be hard.


> - all functions have unique distinct names

Seriously? Is that his solution to package management?


Well, if you're using packages and modules, the full name of your function is still packagename/modulename::functionname; it might as well be packagename.modulename.functionname.

Keep in mind that Joe Armstrong is talking about Erlang here, which is a functional language - most of the functions in libraries are sort-of kind-of independent from each other; they especially don't share state.


I can't see how that can be better than a module cohesive API.


When I sit around and think "what's the biggest improvement I could make personally to the computing world?" there's always this voice in my head saying "kidnap whoever is building the next package management system and lock them in a deep dark box."

There seems this fundamental disconnect between people making languages about how people use their languages. I don't have time to follow your Twitter feed, because I'm working on a lot of different things. I know it's important to you, the Language Developer, and so you think it should be important to me, the Language User. But I have dozens of things to keep track of, and all of them imagine that they're the most important thing in my world.

It's like the old office culture mocked in "Office Space" where the guy has 7 different bosses, each imagining their own kingdom is the most important.


Completely misses the real fundamental problem: people make assumptions about their target. Almost all of this discussion is centered around Linux systems. What about Windows? What about Solaris? HPUX? AIX? VMS? Tru64? Plan9? BeOS? Android?

Package management itself is not a solved problem, so you can't very well expect programming languages to be any different. The existing systems work quite well and make total sense: your package manager is tailored to your specific use case. Centralized/decentralized is a red herring. First figure out how to package every single thing for every single system and use case in the world, and then come back to me about organizational systems.


Agree completely. The fundamental problem of package management is the dependencies of the package manager itself.

Every programming language has its own package manager because it's written in that language. No language maintainer is going to say something like, "Hey, want to use Ruby? Just install Perl first so you can install some Ruby packages!"

Likewise, every OS-level package manager assumes an OS. I'm sure apt-get, and yum and Nix are great. I'm also sure their greatness isn't very helpful to Windows users.

There's also the dependency between those two. An OS-level package manager can't easily be written in a high-level language, because one of its core jobs is to install high level languages. A language-level package manager doesn't want to re-invent the OS stack.

Bootstrapping is hard. Package managers sit very very low on the software stack where any dependencies are very difficult to manage and where consolidation is nigh impossible.


The downside he mentions to "pinned versions" actually applies to everything on this page. If you don't pay attention to security updates, you will be vulnerable whether or you forgot about your pinned versions or you forgot about your stable distribution.

"Stable" distributions have an additional downside he doesn't mention: when you upgrade every package all at once it's a LOT more effort than if you had upgraded them slowly over time. Dealing with multiple library changes at once is an order of magnitude more difficult than dealing with them one-at-a-time.

And also, to some extent, if all the libraries you are using have a long term stable API, then it doesn't actually matter which one you pick - anything is painless.


> Dealing with multiple library changes at once is an order of magnitude more difficult than dealing with them one-at-a-time.

Curious... I have exactly the opposite experience. I find that a certain amount of time is required to carefully regression-test my application code after upgrading a library. Doing this 23 times for my 23 different dependencies that need to be upgraded can be quite costly. If I, instead, upgrade all of the libraries at once and perform my extensive regression testing just once, I save a great deal of effort.

That's if everything goes smoothly. If something does NOT go smoothly and I encounter an error, then I need to determine which upgrade caused the problem. Most of the time (85% perhaps?), that turns out to be easy and obvious just by looking at the error that presents itself. In the remaining cases, I simply roll back half of the package upgrades and start binary-searching to identify the culprit (or culprits in the case of a conflict between libraries).


Personally I agree with you, except that sometimes you find out that a library you depend on hasn't been upgraded and cannot be used in combination with your other libraries, due to some conflict somewhere. At which point one needs to drop the library from the project and that can prove to be costly, so it is better to identify libraries that aren't well maintained earlier rather than later. This isn't a problem for well established popular libraries, but is a problem if you're using newer, more cutting age stuff.


Do you have an automated testing suite? That might explain the difference in experience. I use a lot of automated tests


I don't think its the testing thats work intensive, its fixing things that broke. The more often you upgrade the more often things will break.


Maybe, but s/he did explicitly mention the regression testing taking a long time


> Dealing with multiple library changes at once is an order of magnitude more difficult than dealing with them one-at-a-time.

From my experience exactly the opposite is true. Compare uprading Slackware to keeping an Arch Linux running. With Slackware, I have to sit down for an hour, do the upgrade, read the notices that come along with it, maybe see if it will break any of my custom packages. This happes once or twice a year (security upgrades are completely painless as they don't break things). With Arch Linux I need to do that every day. If I don't have time to do it for a month, the system is basically broken beyond recognition...


What I'm advocating is not analogous to the ArchLinux model, partly because I don't think we should stop supporting old versions and I don't think upgrades should be essentially required the moment something is release.

I'm fine with having the odd out of date version of something, I'm just saying: be incremental about keeping your stuff up to date.


Every program with plugins of any sort will eventually include a sketchy rewrite of apt-get. Not just languages - WordPress, MediaWiki ...

If you're very lucky, the packaging in question will not conflict horribly with apt or yum. So you probably won't be lucky.


As if apt-get would solve the problems that we have. Good luck in installing multiple versions of the same package with apt-get.


Inability to install multiple versions of the same package can be a feature. It encourages the creation of stable, backwards-compatible packages.


Sure, it kind of makes sense for managing the software repository of your operating system. But when doing software development, you end up with multiple projects using multiple versions of the same dependency. I mean I tried going the apt-get route back when I was working with Perl - a lot of people in the Perl community like apt-get and there are tools for easily packaging a library into a deb, but it was a pain working on or deploying multiple projects on the same machine.


> But when doing software development, you end up with multiple projects using multiple versions of the same dependency.

That's what I wish the language designers had avoided. It would be convenient for users if there were never any reasons to want anything but the most recent version.

I know Perl and similar languages tend to value rapid development over complete reliability, but I'd prefer not to think about version numbers or worry about updates that break things. Maybe if the package management systems had been designed with less rope, users would more rarely get hung.


Yes, there's that too.


yeah yeah, I read the previous post (http://www.standalone-sysadmin.com/blog/2014/03/just-what-we...) too.

Maybe this time we can talk about how to meaningfully solve these problems instead of just fighting pointlessly about if old tools are so great should be used for everything.

Decentralized package management huh?

How would that work?

A way of specifying an ABI for a packages instead of a version number? A way to bundle all your dependencies into a local package to depend on and push changes from that dependency tree automatically to builds off of it, but only manually update the dependency list?

I'm all for it. Someone go build one.


"Decentralized package management huh? How would that work?"

http://0install.net/ does this (sad to see it wasn't mentioned in the article). Basically:

1. Use URIs rather than short names to identify packages.

2. Scope dependencies so different applications can see different versions of the same library where necessary.

Here's an OSNews article from 2007 about such things:

http://www.osnews.com/story/16956/Decentralised-Installation...


> A way of specifying an ABI for a packages instead of a version number?

Technically impossible for many languages (have fun figuring out what it would look like in Perl...). And even when it's possible, it's not a guarantee: you can have a semantic change without an ABI change. Cargo, Rust's newfangled package manager, supposes semantic versioning, and I think it's a sane attitude.


Stronger and stronger typing makes this ABI guarantee stronger and stronger. Dependent typing could package theorems about the properties of your interface and then ensure that all matches satisfy those properties. You'd like depend upon their theorems to prove things about your own program knowing that nothing can break.


Yes, but something like Idris isn't mainstream. 99% of the languages out there don't allow you to express this kind of invariant.


Certainly, but it's going to become more and more so. The right way to do an API is to expose your invariants. Some day that will be a thing—and the distributed problem will still be here.


That or we will all have to write Javascript in Javascript :)


That or we will all have to write Javascript in Javascript.


Just by reading the title I expected this would be about fixing the dependency diamond problem. I.e. when library A needs library C 1.0 and B needs library C 1.1 incompatible with C 1.0 and then libraries A and B meet in the same project :(


In Java-land, OSGi was invented to solve this very problem. Every module has its own classloader so module A can load C 1.0 and module B can load C 1.1. Modules are registered and other modules can look them up in the registry and call them so A can look up and then call B without conflicts.


OSGi is good in theory but too much ado for most real-world projects.


Agreed. The Spring support helps a lot though. But there's no question that OSGi is one of the final big xml holdouts in Java-land.


The author is actually working on implementing a solution to that in ghc/cabal as well!


Even reducing the problem to one language you run into problems. All serious R developers have run into issues with CRAN (which is mostly centralized) causing problems with different version installs using install.package(). There are mechanisms for dealing with it, but the best solution is usually to maintain your own distribution of packages and a build script: further centralization. And R has a relatively good/simple package management system compared to something like pip, luarocks or Maven. Two which I never had problems with ... maybe because I didn't use them enough: leiningen and go get.


> The Git of package management doesn't exist yet

We've taken a pretty good shot at this in the OCaml ecosystem the via OPAM package manager (https://opam.ocaml.org).

* OPAM composes its package universe from a collection of remotes, which can be fetched either via HTTP(S), Git, Hg or Darcs. The resulting package sets are combined locally into one view, but can be separated easily. For instance, getting a view into the latest XenAPI development trees just requires "opam remote add xapi-dev git://github.com/xapi-project/opam-repo-dev".

* The same feature applies to pinning packages ("opam pin add cohttp git://github.com/avsm/ocaml-cohttp#v0.6"). This supports local trees and remote Git/Hg/Darcs remotes (including branches).

* OCaml, like Haskell, is statically typed, and so recompiles all the upstream dependencies of a package once its updated. This lets me work on core OCaml libraries that are widely used, and just do an "opam update -u" to recompile all dependencies to check for any upstream breakage. We did not go for the very pure NixOS model due to the amount of time it takes to compile distinct packages everywhere. This is a design choice to balance composability vs responsiveness, and Nix or 0install are fine choices if you want truely isolated namespaces.

* By far the most important feature in OPAM is the package solver core, which resolves version constraints into a sensible user-facing solution. Rather than reinvent the (rather NP-hard) solver from scratch, OPAM provides a built-in simple version and also a CUDF-compatible interface to plug into external tools like aspcud, which are used by other huge repositories such as Debian to handle their constraints.

This use of CUDF leads to some cool knobs and utilities, such as the OPAM weather service to test for coinstallability conflicts: http://ows.irill.org/ and the solver preferences that provide apt-like preferences: https://opam.ocaml.org/doc/Specifying_Solver_Preferences.htm...

* Testing in a decentralized system is really, really easy by using Git as a workflow engine. We use Travis to test all incoming pull requests to OPAM, much like Homebrew does, and can also grab a snapshot of a bunch of remotes and do bulk builds, whose logs are then pushed into a GitHub repo for further analysis: https://github.com/ocaml/opam-bulk-logs (we install external dependencies for bulk builds by using Docker for Linux, and Xen for *BSD: https://github.com/avsm/docker-opam).

All in all, I'm very pleased with how OPAM is coming along. We use it extensively for the Mirage OS unikernel that's written in OCaml (after all, it makes sense for a library operating system to demand top-notch package management).

If anyone's curious and wants to give OPAM a spin, we'd love feedback on the 1.2beta that's due out in a couple of weeks: http://opam.ocaml.org/blog/opam-1-2-0-beta4/


I am a Haskell programmer who got to use OPAM when installing Coq (HoTT version) recently. It was surprisingly nice.

Also, you can pick which version of the compiler to run, and have it manage switching everything.

It seemed like it was years ahead of cabal, but that might just be because I only used it a little, I don't know. But there are some things to learn from OPAM.

Do you have a blog post like this, or something I could post the the Haskell subreddit?


We're planning a post after the ICFP rush next week dies down, but there's a rather specific one on the new pinning workflow in OPAM 1.2 here:

http://opam.ocaml.org/blog/opam-1-2-pin/

(How to pin a development is central to the day-to-day development workflow of OCaml/OPAM users and quite annoying to change after-the-fact, so we're eager for feedback on this iteration before we bake it into the 1.2.0 release).

The OPAM blog is only about 2 weeks old, so there'll are quite a few more posts coming up as our developers discover there's quite a lot to write about :)


(It looks like your rss feed is broken somehow. If I subscribe, and then try and view your site I get about:blanks, at least in InoReader. But I can view it in the reader fine.)

I'm quite excited.


Does OPAM work on Windows? I've read it doesn't which has kept me from further investigations.


OPAM itself compiles on Windows, but most of the package repository doesn't. That's next on our list after the 1.2.0 release comes out (along with cross compilation for targets like iOS, Android and Java, due to the availability of compiler backends for all of these systems now).


Apache Ivy is built on Java and could be a starting point for the uber package manager. It is extremely extensible and can resolve depencies against maven style repos, file systems, and really any other storage mechanism. It can resolve against multiple repos simultaneously.

The real problem is that its so powerful and hard to ramp up on... The docs aren't sufficient for its overall complexity. That all aside, if the will were there, it could be the git of package managers.


From a dumb-user perspective, I feel like most of the effort of package managers is spent on resolving these fundamentally incompatable philosophical dilemmas instead of common-denominator solvable problems like:

* Quality and Trust mechanisms. If there are 14 different postgres clients, which do I choose?

* Package Metadata management. Where can I send bug reports? Who is the maintainer? How can I contact someone? Is there an IRC channel?

* Documentation and Function/Class Metadata. Why should I go to the Github README for one package, and to a random domain for another package?

* Linking compile and runtime error messages to documentation or bug reports. Why is google still the best way to track down the cause of an obscure error message?

* Source data linking and code reviews. I should be able to type in a module/namespace qualified function name and view the source without having to scour a git repository. I should also be able to comment directly on that source in a way that is publicly visible or privately visible.


The situation with packages and dependency hell today is horrendous, particularly if you work in a highly dynamic environment like web development.

I want to illustrate this with a detailed example of something I did just the other day, when I set up the structure for a new single page web application. Bear with me, this is leading up to the point at the end of this post.

To build the front-end, I wanted to use these four tools:

- jQuery (a JavaScript library)

- Knockout (another JavaScript library)

- SASS (a preprocessor to generate CSS)

- Jasmine (a JavaScript library/test framework)

Notice that each of these directly affects how I write my code. You can install any of them quite happily on its own, with no dependencies on any other tool or library. They are all actively maintained, but if what you’ve got works and does what you need then generally there is no need to update them to newer versions all the time either. In short, they are excellent tools: they do a useful job so I don’t have to reinvent the wheel, and they are stable and dependable.

In contrast, I’m pretty cynical about a lot of the bloated tools and frameworks and dependencies in today’s web development industry, but after watching a video[1] by Steven Sanderson (the creator of Knockout) where he set up all kinds of goodies for a large single page application in just a few minutes, I wondered if I was getting left behind and thought I’d force myself to do things the trendy way.

About five hours later, I had installed or reinstalled:

- 2 programming languages (Node and Ruby)

- 3 package managers (npm with Node, gem with Ruby, and Bower)

- 1 scaffolding tool (Yeoman) and various “generator” packages

- 2 tools that exist only to run other software (Gulp to run the development tasks, Karma to run the test suite) and numerous additional packages for each of these so they know how to interact with everything else

- 3 different copies of the same library (RequireJS) within my single project’s source tree, one installed via npm and two more via Bower, just to use something resembling modular design in JavaScript.

And this lot in turn made some undeclared assumptions about other things that would be installed on my system, such as an entire Microsoft Visual C++ compiler set-up. (Did I mention I’m running on Windows?)

I discovered a number of complete failures along the way. Perhaps the worst was what caused me to completely uninstall my existing copy of Node and npm — which I’d only installed about three months earlier — because the scaffolding tool whose only purpose is to automate the hassle of installing lots of packages and templates completely failed to install numerous packages and templates using my previous version of Node and npm, and npm itself whose only purpose is to install and update software couldn’t update Node and npm themselves on a Windows system.

Then I uninstalled and reinstalled Node/npm again, because it turns out that using 64-bit software on a 64-bit Windows system is silly, and using 32-bit Node/npm is much more widely compatible when its packages start borrowing your Visual C++ compiler to rebuild some dependencies for you. Once you’ve found the correct environment variable to set so it knows which version of VC++ you’ve actually got, that is.

I have absolutely no idea how this constitutes progress. It’s clear that many of these modern tools are only effective/efficient/useful at all on Linux platforms. It’s not clear that they would save significant time even then, compared to just downloading the latest release of the tools I actually wanted (there were only four of those, remember, or five if you count one instance of RequireJS).

And here’s the big irony of the whole situation. The only useful things these tools actually did, when all was said and done, were:

- Install a given package within the local directory tree for my project, with certain version constraints.

- Recursively install any dependent packages the same way.

That’s it. There is no more.

The only things we need to solve the current mess are standardised, cross-platform ways to:

- find authoritative package repostories and determine which packages they offer

- determine which platforms/operating systems are supported by each package

- determine the available version(s) of each package on each platform, which versions are compatible for client code, and what the breaking changes are between any given pair of versions

- indicate the package/version dependencies for a given package on each platform it supports

- install and update packages, either locally in a particular “virtual world” or (optionally!) globally to provide a default for the whole host system.

This requires each platform/operating system to support the concept of the virtual world, each platform/operating system to have a single package management tool for installing/updating/uninstalling, and each package’s project and each package repository to provide information about versions, compatibility and dependencies in a standard format.

As far as I can see, exactly none of this is harder than problems we are already solving numerous different ways. The only difference is that in my ideal world, the people who make the operating systems consider lightweight virtualisation to be a standard feature and provide a corresponding universal package manager as a standard part of the OS user interface, and everyone talks to each other and consolidates/standardises instead of always pushing to be first to reinvent another spoke in one of the wheels.

We built the Internet, the greatest communication and education tool in the history of the human race. Surely we can solve package management.

[1] http://blog.stevensanderson.com/2014/06/11/architecting-larg...


Sure we can. I imagine first we'll need the ISO to create a project to begin the standardization of software management. Then there'll probably be a few years of research to identify all the kinds of software, platforms they run on, interoperability issues, levels of interdependencies, release methodologies, configuration & deployment models, maintenance cycles, and expected use cases. Then the ISO can create an overly-complex standard that nobody wants to implement. Finally somebody will decide it's easier to just create smaller package managers for each kind of software and intended use case and write layers of glue to make them work together.

So now that we know what to do, the big question is: who's going to spend the next 5-10 years of their life on that project?


So now that we know what to do, the big question is: who's going to spend the next 5-10 years of their life on that project?

But this is my point: We are already solving all of those problems, and doing almost all of the work I suggested.

All of the main package managers recognise versions and dependencies in some form. Of course the model might not be perfect, but within the scope of each set of packages, it is demonstrably useful, because many of us are using it every day.

All of the people contributing packages to centralised package repositories for use with npm and gem and pip and friends are already using version control and they are already adding files to their projects to specify the dependencies for the package manager used to install their project — or in many cases, for multiple package managers, so the project can be installed multiple different ways, which is effectively just duplicated effort for no real benefit.

All major operating systems already come with some form of package management, though to me this is the biggest weak point at the moment. There are varying degrees of openness to third parties, and there is essentially no common ground across platforms except where a few related *nix distributions can use the same package format.

All major operating systems also support virtualisation to varying degrees, though again there is plenty of scope for improvement. I’ve suggested before that it would be in the interests of those building operating systems to make this kind of isolation routine for other reasons as well. However, even if full virtual machine level isolation if too heavyweight for convenient use today, usually it suffices to install the contents of packages locally within a given location in the file system and to set up any environment accordingly, and again numerous package managers already do these things in their own ways.

There is no need for multi-year ISO standardisation processes, and there is no need to have everything in the universe work the same way. We’re talking about tools that walk a simple graph structure, download some files, and put them somewhere on a disk, a process I could have done manually for the project I described before in about 10 minutes. A simple, consolidated version of the best tools we have today would already be sufficient to solve many real world problems, and it would provide a much better foundation for solving any harder problems later, and it would be in the interests of just about everyone to move to such a consolidated, standardised model.


The problem you cited before is supposed to be easy, but in practice, software development that uses 3rd-party software is destined to run into conflicts. You found portability issues, platform-dependent issues, environment/configuration issues, and multi-layer software dependency issues.

These all happen regularly when OS maintainers have to package software for release. They spend thousands of hours to resolve [by hand] each one in order to support the various use-cases of their end users. If you are imagining some automated process just magically makes all your software come together to build you a custom development environment, you are mistaken. It's all put together by humans, and only for the use cases that have been necessary so far.

So yes, all these things exist. In small, bespoke, use-case-specific solutions. What you're asking for - universal software management standardization - can't practically be achieved in more than one use case. This is why we are all constantly stuck in dependency hell, until a bug is filed, and the system is once again massaged into a working state by a human. Frustrating, sure. But it works most of the time.


I think it’s a stretch to call a tool like npm, which currently offers 90,000+ packages, a “small, bespoke, use-case-specific” solution. I’m also fairly sure most people publishing their code via npm’s index aren’t spending “thousands of hours” resolving conflicts with other packages by hand; certainly no-one is manually checking over 4 billion pairwise combinations of those packages to make sure they don’t conflict.

And yet npm remains a useful tool, and mostly it does what it should do: download a bunch of files and stick them somewhere on my disk. The same could be said for gem, pip, Bower, and no doubt many other similar tools. They just all do it a bit differently, which leads to a huge amount of duplicated effort for both the writers/maintainers and the users of these packages.

I’m not arguing for magic or for orders of magnitude more work to be done. I’m just arguing for the work that is mostly being done already to be co-ordinated and consolidated through standardisation. To some extent I’m also arguing for operating systems that include robust tools to navigate the modern software landscape as standard, mainly because installing things with tools like apt has an unfortunate way of assuming there should be one global copy of everything, which is frequently not the case for either development libraries or end user software on modern systems, and because if the OS doesn’t provide good universal package management tools then someone else will immediately invent new tools to fill the gaps and now we are back to having overlapping tools and redundancy again.


Again, nothing you use works without it being designed specifically to work that way. You can't use Visual C++ to build software that was designed for Linux without writing portable abstractions and host targets for both platforms, and it definitely won't work on two different architectures without being designed for the endianess and memory width of each. It's bespoke because it's designed for each use case. It simply will not work on anything it wasn't designed for.

And no, it isn't code publishers that spend thousands of hours resolving broken and incompatible builds, it's release maintainers. Go look at bug lists for CentOS. Look at the test trees for CPAN. It is literally mind numbing how much shit breaks, but it makes total sense when you realize it's all 3rd party software which largely is not designed with each other in mind. Somebody is cleaning it all up to make it work for you, but it sure as shit ain't the software authors.

Once you develop enough things or maintain enough things you'll see how endlessly complex and difficult it all is. But suffice to say that the system we have now is simpler than the alternative you are proposing.


You can't use Visual C++ to build software that was designed for Linux...

Sure you can. Projects of all scales do this all the time. Have you never heard C described as being portable assembly language?

Unless you are writing low-level, performance-sensitive code for something like an operating system or device driver, usually details like endianness matter only to the extent that they specify external protocols and file formats. I would argue that this sort of detail is normally best encoded/decoded explicitly at the outer layers of an application anyway.

Obviously if you rely on primitive types like int or long in C or C++ having a specific size or endianness, or if you assume that they will be equivalent to some specific external format, you’re probably going to have problems porting your code (and any package containing it) across some platforms.

However, that issue does not contradict what I proposed. It’s perfectly viable — indeed, it’s inevitable — to have packages that are only available on some platforms, or packages which depend on different things across platforms. That’s fine, as long as your packaging system doesn’t assume by default that the same thing works everywhere.

And no, it isn't code publishers that spend thousands of hours resolving broken and incompatible builds, it's release maintainers.

Who is the “release maintainer” who made those jQuery libraries I mentioned in my extended example above play nicely together?

Again, this issue does not contradict what I proposed anyway. In my ideal world, if packages are incompatible or don’t have sufficient dependencies available on a certain platform, you just don’t list them as available for that platform in whatever package index they belong to. Once again, this is no harder than what a bunch of different package management tools do (or fail to do) right now.


Dependency management (ie class/script/resource loading) is too coupled with the programming language execution environment.

It's not something you can make generic like a file/folder based version control tool. It's like asking for the Git of unit testing/continuous integration or whatever, not going to happen.


You can still build your language package manager on top of an existing one. For example, the ebox installer (for the E language) uses 0install to download metadata and package archives, cache things, solve version constraints, etc, but it takes care of actually wiring the language-level modules together:

http://0install.net/ebox.html

It needs to do this because each application is sandboxed. For most uses a generic packager is fine though. After all, most languages also have RPM, Deb, packages etc.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: